Speeding up lookup in large tables

Question

RR il 27 Mar 2024

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2099701-speeding-up-lookup-in-large-tables

Commentato: Voss il 28 Mar 2024 alle 12:57

Risposta accettata: Voss

Apri in MATLAB Online

Hi, I hope you can help me speed up my process by making it smarter because it works but it takes a very long time.

Situation on hand:

1) Table A has a column Time that contains a linspace of one day in seconds, thus 86400 entries. --> strictly ascending, could be important later

2) Table B contains columns StartTime and EndTime of some hundred to thousands of periods with an according value B.Value. So e.g. from 1:30-2:00 a.m., the B.Value = 13, at a different given time it is 16 or whatever. Very simplified here of course. --> also strictly ascending.

So I'm checking for every entry of A.Time if it's within B.StartTime and B.EndTime to get the relevant value :

for i=1:1:height(A)
    for j=1:1:height(B)
        if A.Time(i) >= B.StartTime(j) && A.Time(i) <= B.EndTime(j)
            A.Value(i)=B.Value(j);
            break
        end
    end
end

However, this becomes a lengthy process depending on the height of A and B. I had the idea that because both tables are strictly ascending timewise and the hit is at j=1230, there is no need to start for j=1 again for the next iteration of i.

But I learned that you cannot manipulate a for loop from within so I did some tries with while loops but this led to no success. Do you guys have an idea for a smart solution? I did search on this platform but didn't find a matching discussion so I'm sorry if this question has been asked before.

Thanks so much in advance already!

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Voss il 27 Mar 2024

2
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2099701-speeding-up-lookup-in-large-tables#answer_1432171

Apri in MATLAB Online

Here's one way that could work, if you have enough memory to generate the idx matrix, which is of size [size(A,1) x size(B,1)]

% crating some tables
rng(1)
Time = dateshift(datetime('now'),'start','day')+seconds(0:86399).';
StartTime = Time(1)+hours(0:23).';
n = numel(StartTime);
EndTime = StartTime+minutes(randi([1,120],n,1));
Value = rand(n,1);
A = table(Time);
B = table(StartTime,EndTime,Value);
A,B
A = 86400x1 table
            Time        
    ____________________

    27-Mar-2024 00:00:00
    27-Mar-2024 00:00:01
    27-Mar-2024 00:00:02
    27-Mar-2024 00:00:03
    27-Mar-2024 00:00:04
    27-Mar-2024 00:00:05
    27-Mar-2024 00:00:06
    27-Mar-2024 00:00:07
    27-Mar-2024 00:00:08
    27-Mar-2024 00:00:09
    27-Mar-2024 00:00:10
    27-Mar-2024 00:00:11
    27-Mar-2024 00:00:12
    27-Mar-2024 00:00:13
    27-Mar-2024 00:00:14
    27-Mar-2024 00:00:15
B = 24x3 table
         StartTime                EndTime            Value  
    ____________________    ____________________    ________

    27-Mar-2024 00:00:00    27-Mar-2024 00:51:00     0.87639
    27-Mar-2024 01:00:00    27-Mar-2024 02:27:00     0.89461
    27-Mar-2024 02:00:00    27-Mar-2024 02:01:00    0.085044
    27-Mar-2024 03:00:00    27-Mar-2024 03:37:00    0.039055
    27-Mar-2024 04:00:00    27-Mar-2024 04:18:00     0.16983
    27-Mar-2024 05:00:00    27-Mar-2024 05:12:00     0.87814
    27-Mar-2024 06:00:00    27-Mar-2024 06:23:00    0.098347
    27-Mar-2024 07:00:00    27-Mar-2024 07:42:00     0.42111
    27-Mar-2024 08:00:00    27-Mar-2024 08:48:00     0.95789
    27-Mar-2024 09:00:00    27-Mar-2024 10:05:00     0.53317
    27-Mar-2024 10:00:00    27-Mar-2024 10:51:00     0.69188
    27-Mar-2024 11:00:00    27-Mar-2024 12:23:00     0.31552
    27-Mar-2024 12:00:00    27-Mar-2024 12:25:00      0.6865
    27-Mar-2024 13:00:00    27-Mar-2024 14:46:00     0.83463
    27-Mar-2024 14:00:00    27-Mar-2024 14:04:00    0.018288
    27-Mar-2024 15:00:00    27-Mar-2024 16:21:00     0.75014
% initialize A.Value
A{:,'Value'} = NaN;
% get the index of the row in B whose interval covers the entry in A.Time
idx = A.Time >= B.StartTime.' & A.Time <= B.EndTime.';
% r is row index in A, c is row index in B
[r,c] = find(idx);

In case intervals in B can overlap, if you want to take the first B.Value for A.Value do this:

A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));

Or if you want to take the last B.Value in overlapping intervals for A.Value (or if you don't care, or if there are no overlaps - in which case it doesn't matter) do this:

A.Value(r) = B.Value(c);

For example, two intervals in this B overlap from 10:00 to 10:05

% taking the B.Value from the first overlapping interval (9:00 to 10:05)
% for A.Value from 10:00 to 10:05
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
A(10*3600:10*3600+3,:)
ans = 4x2 table
            Time             Value 
    ____________________    _______

    27-Mar-2024 09:59:59    0.53317
    27-Mar-2024 10:00:00    0.53317
    27-Mar-2024 10:00:01    0.53317
    27-Mar-2024 10:00:02    0.53317
A(10*3600+300:10*3600+300+3,:)
ans = 4x2 table
            Time             Value 
    ____________________    _______

    27-Mar-2024 10:04:59    0.53317
    27-Mar-2024 10:05:00    0.53317
    27-Mar-2024 10:05:01    0.69188
    27-Mar-2024 10:05:02    0.69188
% taking the B.Value from the last overlapping interval (10:00 to 10:51)
% for A.Value from 10:00 to 10:05
A.Value(r) = B.Value(c);
A(10*3600:10*3600+3,:)
ans = 4x2 table
            Time             Value 
    ____________________    _______

    27-Mar-2024 09:59:59    0.53317
    27-Mar-2024 10:00:00    0.69188
    27-Mar-2024 10:00:01    0.69188
    27-Mar-2024 10:00:02    0.69188
A(10*3600+300:10*3600+300+3,:)
ans = 4x2 table
            Time             Value 
    ____________________    _______

    27-Mar-2024 10:04:59    0.69188
    27-Mar-2024 10:05:00    0.69188
    27-Mar-2024 10:05:01    0.69188
    27-Mar-2024 10:05:02    0.69188

2 Commenti
Mostra NessunoNascondi Nessuno

RR il 28 Mar 2024

My gosh, this works like a charm and sped up the process from something like 1.5 hours to 30 seconds, thank you so much!

Voss il 28 Mar 2024 alle 12:57

You're welcome!

Accedi per commentare.

Speeding up lookup in large tables

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Speeding up lookup in large tables

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno