Find pattern in vector while ignoring/skipping certain indices

1 view (last 30 days)
Hello,
Is there an efficient way to search for a specific pattern in a mat vector while ignoring some indices in the pattern?
For example, I need to search for a 9-element pattern [0 4 X 0 6 Y 0 8 Z] in a mat vector, where X, Y, Z can be any values.
I currently have a loop based approach but is there a faster vectorized approach?
Thank you.
  3 Comments
Haider Ali
Haider Ali on 12 Jun 2022
I am afraid I did not post the complete scenario in my question.
The data vector vec contains an ID pair and its associated data in the format [ID1 ID2 data ID1 ID2 data ...]. The goal is to find data associated with each ID pair. It is expected that data in vec has both noise and missing because of which an ID pair is searched ([0 6]) and then the previous ([0 4]) and next ([0 8]) ID pairs are also searched to reliably get the data value of each ID pair.
pattern = ID pairs
vec = data to be searched
out = first 2 columns are ID pairs, 3rd column is associated data
I have tried all of your methods but the following seems to be the fastest. Please have a look at the following code and suggest if it can be executed any faster.
load vec;
load pattern;
load out;
len = length(out);
no_of_IDs_to_search = 1000;
tic
for j = 2:no_of_IDs_to_search % skip searching for ID pairs at 1st location
ind = strfind(vec,pattern(j*2-1:j*2)); % firstly, find all indices of a single ID pair e.g. [0 2] or [0 6]
ind(ind<4 | ind>((len-1)*3)) = []; % remove indices to aviod errors
if (~isempty(ind))
for k = 1:length(ind) % search through all indices and determine if previous and next IDs are a match
if (isequal(vec(ind(k)-3:ind(k)-3+1), pattern((j-1)*2-1:(j-1)*2)) && isequal(vec(ind(k)+3:ind(k)+3+1), pattern((j+1)*2-1:(j+1)*2)))
out(j,3) = vec(ind(k)+2); %update the corresponding index in output vector
break; % break if previous, current and next IDs are a match
end
end
end
end
toc
I have attached the data files.
Thank you.

Sign in to comment.

Answers (4)

Image Analyst
Image Analyst on 11 Jun 2022
I think this should work but for your given pattern, and a vector of 100 million elements of random values, I never did see a match. And I ran it several times. Never found a match so hopefully you believe there should be a match somehow and you're not just using random integers like I did.
% Create sample data.
vec = randi(8, 1, 100000000);
% Define the pattern. Nan = "don't care".
pattern = [0 4 nan 0 6 nan 0 8 nan]
% Define a mask for what values we want to check.
mask = ~isnan(pattern)
lastIndex = length(vec) - length(pattern);
% Scan along the vector looking for matches.
for k = 1 : lastIndex
% Print out progress every 100 thousand window locations.
if mod(k, 100000) == 0
fprintf('k = %d of %d (%.1f%%)\n', k, lastIndex, 100*k/lastIndex);
end
% Extract the window.
thisWindow = vec(k : k+length(pattern)-1);
% Compare this window to our pattern but only at the mask = true locations.
if isequal(pattern(mask), thisWindow(mask))
% Found a match. Report where it was.
fprintf('Match at k = %d where vec = [%d, %d, %d, %d, %d, %d, %d, %d, %d]\n', k, thisWindow)
end
end
fprintf('Done!\n');
  1 Comment
Image Analyst
Image Analyst on 11 Jun 2022
If there is a match, it will find it quickly, just like the other solutions since it's basically the same algorithm.

Sign in to comment.


Matt J
Matt J on 11 Jun 2022
Edited: Matt J on 11 Jun 2022
vec=[0 4 1 0 6 5 0 8 7, 3 3 3 , 0 4 2 0 6 4 0 8 6]; %patterns start at i=1 and i=13
pat = [0 4 nan 0 6 nan 0 8 nan];
pat=pat(:); vec=vec(:)';
m=numel(vec); n=numel(pat);
include=find(~isnan(pat));
idx=0:m-n;
sequences = cell2mat(arrayfun(@(i)vec(i+idx),include,'uni',0));
matchlocations=find(all(sequences==pat(include),1) )
matchlocations = 1×2
1 13

per isakson
per isakson on 11 Jun 2022
I assume it's a vector of integers.
Steve Amphlett showed this trick at comp.soft-sys.matlab twenty years ago.
%% Create sample data
pat = [0,4,nan,0,6,nan,0,8,nan];
msk = true(1,numel(pat));
msk(isnan(pat)) = false;
pat(not(msk)) = 0;
vec = randi([-8,8],1,1e6);
vec(101:109) = [0,4,11,0,6,12,0,8,13];
vec(701:709) = [0,4,14,0,6,15,0,8,16];
%
%% Search matches
tic
z = conv(vec,pat(end:-1:1));
hit = find(abs(z==sum(pat.^2)))-numel(pat)+1;
%%
% hit may contain false hits.
for ix = hit
v9 = vec(ix:ix+8);
if all( v9(msk) == pat(msk) )
disp(ix)
end
end
101 701
toc
Elapsed time is 0.044468 seconds.

Voss
Voss on 11 Jun 2022
Edited: Voss on 11 Jun 2022
% the pattern:
pat = [0 4 NaN 0 6 NaN 0 8 NaN];
% create some data containing the pattern:
data = randn(1,10000);
idx = find(~isnan(pat));
for ii = 100:100:9900
data(ii+idx-1) = pat(idx);
end
% find the pattern in the data:
idx = find(~isnan(pat));
result = find(all(data((0:numel(data)-numel(pat)).'+idx) == pat(idx),2));
% display the result:
disp(result);
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by