Speeding up lookup in large tables
8 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi, I hope you can help me speed up my process by making it smarter because it works but it takes a very long time.
Situation on hand:
1) Table A has a column Time that contains a linspace of one day in seconds, thus 86400 entries. --> strictly ascending, could be important later
2) Table B contains columns StartTime and EndTime of some hundred to thousands of periods with an according value B.Value. So e.g. from 1:30-2:00 a.m., the B.Value = 13, at a different given time it is 16 or whatever. Very simplified here of course. --> also strictly ascending.
So I'm checking for every entry of A.Time if it's within B.StartTime and B.EndTime to get the relevant value :
for i=1:1:height(A)
for j=1:1:height(B)
if A.Time(i) >= B.StartTime(j) && A.Time(i) <= B.EndTime(j)
A.Value(i)=B.Value(j);
break
end
end
end
However, this becomes a lengthy process depending on the height of A and B. I had the idea that because both tables are strictly ascending timewise and the hit is at j=1230, there is no need to start for j=1 again for the next iteration of i.
But I learned that you cannot manipulate a for loop from within so I did some tries with while loops but this led to no success. Do you guys have an idea for a smart solution? I did search on this platform but didn't find a matching discussion so I'm sorry if this question has been asked before.
Thanks so much in advance already!
0 Commenti
Risposta accettata
Voss
il 27 Mar 2024
Here's one way that could work, if you have enough memory to generate the idx matrix, which is of size [size(A,1) x size(B,1)]
% crating some tables
rng(1)
Time = dateshift(datetime('now'),'start','day')+seconds(0:86399).';
StartTime = Time(1)+hours(0:23).';
n = numel(StartTime);
EndTime = StartTime+minutes(randi([1,120],n,1));
Value = rand(n,1);
A = table(Time);
B = table(StartTime,EndTime,Value);
A,B
% initialize A.Value
A{:,'Value'} = NaN;
% get the index of the row in B whose interval covers the entry in A.Time
idx = A.Time >= B.StartTime.' & A.Time <= B.EndTime.';
% r is row index in A, c is row index in B
[r,c] = find(idx);
In case intervals in B can overlap, if you want to take the first B.Value for A.Value do this:
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
Or if you want to take the last B.Value in overlapping intervals for A.Value (or if you don't care, or if there are no overlaps - in which case it doesn't matter) do this:
A.Value(r) = B.Value(c);
For example, two intervals in this B overlap from 10:00 to 10:05
% taking the B.Value from the first overlapping interval (9:00 to 10:05)
% for A.Value from 10:00 to 10:05
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
A(10*3600:10*3600+3,:)
A(10*3600+300:10*3600+300+3,:)
% taking the B.Value from the last overlapping interval (10:00 to 10:51)
% for A.Value from 10:00 to 10:05
A.Value(r) = B.Value(c);
A(10*3600:10*3600+3,:)
A(10*3600+300:10*3600+300+3,:)
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Surfaces, Volumes, and Polygons in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!