Speeding up lookup in large tables

8 visualizzazioni (ultimi 30 giorni)
RR
RR il 27 Mar 2024
Commentato: Voss il 28 Mar 2024 alle 12:57
Hi, I hope you can help me speed up my process by making it smarter because it works but it takes a very long time.
Situation on hand:
1) Table A has a column Time that contains a linspace of one day in seconds, thus 86400 entries. --> strictly ascending, could be important later
2) Table B contains columns StartTime and EndTime of some hundred to thousands of periods with an according value B.Value. So e.g. from 1:30-2:00 a.m., the B.Value = 13, at a different given time it is 16 or whatever. Very simplified here of course. --> also strictly ascending.
So I'm checking for every entry of A.Time if it's within B.StartTime and B.EndTime to get the relevant value :
for i=1:1:height(A)
for j=1:1:height(B)
if A.Time(i) >= B.StartTime(j) && A.Time(i) <= B.EndTime(j)
A.Value(i)=B.Value(j);
break
end
end
end
However, this becomes a lengthy process depending on the height of A and B. I had the idea that because both tables are strictly ascending timewise and the hit is at j=1230, there is no need to start for j=1 again for the next iteration of i.
But I learned that you cannot manipulate a for loop from within so I did some tries with while loops but this led to no success. Do you guys have an idea for a smart solution? I did search on this platform but didn't find a matching discussion so I'm sorry if this question has been asked before.
Thanks so much in advance already!

Risposta accettata

Voss
Voss il 27 Mar 2024
Here's one way that could work, if you have enough memory to generate the idx matrix, which is of size [size(A,1) x size(B,1)]
% crating some tables
rng(1)
Time = dateshift(datetime('now'),'start','day')+seconds(0:86399).';
StartTime = Time(1)+hours(0:23).';
n = numel(StartTime);
EndTime = StartTime+minutes(randi([1,120],n,1));
Value = rand(n,1);
A = table(Time);
B = table(StartTime,EndTime,Value);
A,B
A = 86400x1 table
Time ____________________ 27-Mar-2024 00:00:00 27-Mar-2024 00:00:01 27-Mar-2024 00:00:02 27-Mar-2024 00:00:03 27-Mar-2024 00:00:04 27-Mar-2024 00:00:05 27-Mar-2024 00:00:06 27-Mar-2024 00:00:07 27-Mar-2024 00:00:08 27-Mar-2024 00:00:09 27-Mar-2024 00:00:10 27-Mar-2024 00:00:11 27-Mar-2024 00:00:12 27-Mar-2024 00:00:13 27-Mar-2024 00:00:14 27-Mar-2024 00:00:15
B = 24x3 table
StartTime EndTime Value ____________________ ____________________ ________ 27-Mar-2024 00:00:00 27-Mar-2024 00:51:00 0.87639 27-Mar-2024 01:00:00 27-Mar-2024 02:27:00 0.89461 27-Mar-2024 02:00:00 27-Mar-2024 02:01:00 0.085044 27-Mar-2024 03:00:00 27-Mar-2024 03:37:00 0.039055 27-Mar-2024 04:00:00 27-Mar-2024 04:18:00 0.16983 27-Mar-2024 05:00:00 27-Mar-2024 05:12:00 0.87814 27-Mar-2024 06:00:00 27-Mar-2024 06:23:00 0.098347 27-Mar-2024 07:00:00 27-Mar-2024 07:42:00 0.42111 27-Mar-2024 08:00:00 27-Mar-2024 08:48:00 0.95789 27-Mar-2024 09:00:00 27-Mar-2024 10:05:00 0.53317 27-Mar-2024 10:00:00 27-Mar-2024 10:51:00 0.69188 27-Mar-2024 11:00:00 27-Mar-2024 12:23:00 0.31552 27-Mar-2024 12:00:00 27-Mar-2024 12:25:00 0.6865 27-Mar-2024 13:00:00 27-Mar-2024 14:46:00 0.83463 27-Mar-2024 14:00:00 27-Mar-2024 14:04:00 0.018288 27-Mar-2024 15:00:00 27-Mar-2024 16:21:00 0.75014
% initialize A.Value
A{:,'Value'} = NaN;
% get the index of the row in B whose interval covers the entry in A.Time
idx = A.Time >= B.StartTime.' & A.Time <= B.EndTime.';
% r is row index in A, c is row index in B
[r,c] = find(idx);
In case intervals in B can overlap, if you want to take the first B.Value for A.Value do this:
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
Or if you want to take the last B.Value in overlapping intervals for A.Value (or if you don't care, or if there are no overlaps - in which case it doesn't matter) do this:
A.Value(r) = B.Value(c);
For example, two intervals in this B overlap from 10:00 to 10:05
% taking the B.Value from the first overlapping interval (9:00 to 10:05)
% for A.Value from 10:00 to 10:05
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
A(10*3600:10*3600+3,:)
ans = 4x2 table
Time Value ____________________ _______ 27-Mar-2024 09:59:59 0.53317 27-Mar-2024 10:00:00 0.53317 27-Mar-2024 10:00:01 0.53317 27-Mar-2024 10:00:02 0.53317
A(10*3600+300:10*3600+300+3,:)
ans = 4x2 table
Time Value ____________________ _______ 27-Mar-2024 10:04:59 0.53317 27-Mar-2024 10:05:00 0.53317 27-Mar-2024 10:05:01 0.69188 27-Mar-2024 10:05:02 0.69188
% taking the B.Value from the last overlapping interval (10:00 to 10:51)
% for A.Value from 10:00 to 10:05
A.Value(r) = B.Value(c);
A(10*3600:10*3600+3,:)
ans = 4x2 table
Time Value ____________________ _______ 27-Mar-2024 09:59:59 0.53317 27-Mar-2024 10:00:00 0.69188 27-Mar-2024 10:00:01 0.69188 27-Mar-2024 10:00:02 0.69188
A(10*3600+300:10*3600+300+3,:)
ans = 4x2 table
Time Value ____________________ _______ 27-Mar-2024 10:04:59 0.69188 27-Mar-2024 10:05:00 0.69188 27-Mar-2024 10:05:01 0.69188 27-Mar-2024 10:05:02 0.69188
  2 Commenti
RR
RR il 28 Mar 2024
My gosh, this works like a charm and sped up the process from something like 1.5 hours to 30 seconds, thank you so much!
Voss
Voss il 28 Mar 2024 alle 12:57
You're welcome!

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Surfaces, Volumes, and Polygons in Help Center e File Exchange

Prodotti


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by