Optimizing for speed. Moving skewness finder. Cumulative sum proving to be bottleneck.

Question

atharva aalok il 25 Ago 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2013192-optimizing-for-speed-moving-skewness-finder-cumulative-sum-proving-to-be-bottleneck

Modificato: Bruno Luong il 30 Ago 2023

Risposta accettata: Bruno Luong

Trial.m

Apri in MATLAB Online

Hello,

I have written the below provided function to calculate the moving window skewness of a time series.

To find out where the bottleneck is I used tic/toc and it seems the part where the cumulative sum (T2) is calculated takes the most amount of time.

The fraction of time taken (for each section) depends on N, the size of the time series. This can be found out using the attached script (Trial.m). For N = 1000000 the cumsum takes around 83% of the time.

The concern:

How can I speed up this code? Or has it reached its limit?

function sk_timeseries = moving_skewness(x, window_size, window_step)
    
    % T1 start
    X1 = x;
    X2 = X1 .* x;
    X3 = X2 .* x;
    % T1 end
    
    % T2 start
    S1 = cumsum([0, X1]);
    S2 = cumsum([0, X2]);
    S3 = cumsum([0, X3]);
    % T2 end
    
    % T3 start
    % Get the indices at which to calculate skewness values
    window_ends_idx = window_size: window_step: length(x);
    
    % Preallocate array for storing skewness time series
    sk_timeseries = zeros(1, length(window_ends_idx));
    
    N = window_size;
    % T3 end
    
    % T4 start
    % Generate skewness time series
    for k = 1: length(window_ends_idx)
        j = window_ends_idx(k) + 1;
        i = j - window_size;
        
        % Calculate M1_ji
        M1_ji = (S1(j) - S1(i)) / N;
        
        % Calculate necessary central moments
        M2_ji = (S2(j) - S2(i)) - window_size * M1_ji * M1_ji;
        M3_ji = (S3(j) - S3(i)) - 3 * M1_ji * (S2(j) - S2(i)) + 2 * N * M1_ji * M1_ji * M1_ji;
        % Calculate skewness
        sk_timeseries(k) = sqrt(N) * M3_ji / (M2_ji^1.5);
    end
    % T4 end
end

2 Commenti
Mostra NessunoNascondi Nessuno

Bruno Luong il 25 Ago 2023

Apri in MATLAB Online

The concatenation with 0 takes as much as the cumsum

N = 1000000;
X1 = rand(1,N);
tic
X1 = [0 X1];
toc
Elapsed time is 0.004564 seconds.
tic
S1 = cumsum(X1);
toc
Elapsed time is 0.004470 seconds.

dpb il 25 Ago 2023

Apri in MATLAB Online

Elapsed time is 0.005967 seconds.
Elapsed time is 0.001201 seconds.

function sk_timeseries = moving_skewness(x, window_size, window_step)
    
    % T1 start
    X1 = [0 x];
    ...
end

would at least remove two of those catenation operations...and

N = 1000000;
X1 = rand(1,N);
tic
X1 = [0 X1];
toc
X1 = rand(1,N+1);
tic
X1(1)=0;
toc

would eliminate needing either if could generate x with the extra position to begin with. Clearing the one location is much cheaper than needing to allocate the new array [] requires.

Of course, if generating the initial array is just adding a [0 ...] somewhere else it doesn't help unless it can be moved out of the function and only done once't.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Bruno Luong il 26 Ago 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2013192-optimizing-for-speed-moving-skewness-finder-cumulative-sum-proving-to-be-bottleneck#answer_1294112

Modificato: Bruno Luong il 26 Ago 2023

Apri in MATLAB Online

Speed optimized code changes are:

Vectorize for-loop
Avoid padding 0 by treating the first "iteration" in separate code.
Mutualize the quantities that are repeatly computed several times
Replace power 1.5 (using costly exponential) with x*sqrt(x)

The optimized code doesn't look very maintainable friendly. Not spectacular but surely faster.

clear

N = 1000000;

window_size = floor(N * 0.05);

window_step = floor(window_size * (1 / 100));

x = randi([1e3 1e4],[1 N]);

r1 = moving_skewness(x, window_size, window_step);

r2 = moving_skewness_BLU(x, window_size, window_step);

close all

plot(r1)

hold on

plot(r2)

t_org = timeit(@()moving_skewness(x, window_size, window_step),1); % 7.93 ms

t_BLU = timeit(@()moving_skewness_BLU(x, window_size, window_step),1); % 5.48 ms

t_org_ms = 1000*t_org,

t_org_ms = 17.7467

t_BLU_ms = 1000*t_BLU,

t_BLU_ms = 4.7287

function sk_timeseries = moving_skewness(x, window_size, window_step)

% T1 start

X1 = x;

X2 = X1 .* x;

X3 = X2 .* x;

% T1 end

% T2 start

S1 = cumsum([0, X1]);

S2 = cumsum([0, X2]);

S3 = cumsum([0, X3]);

% T2 end

% T3 start

% Get the indices at which to calculate skewness values

window_ends_idx = window_size: window_step: length(x);

% Preallocate array for storing skewness time series

sk_timeseries = zeros(1, length(window_ends_idx));

N = window_size;

% T3 end

% T4 start

% Generate skewness time series

for k = 1: length(window_ends_idx)

j = window_ends_idx(k) + 1;

i = j - window_size;

% Calculate M1_ji

M1_ji = (S1(j) - S1(i)) / N;

% Calculate necessary central moments

M2_ji = (S2(j) - S2(i)) - window_size * M1_ji * M1_ji;

M3_ji = (S3(j) - S3(i)) - 3 * M1_ji * (S2(j) - S2(i)) + 2 * N * M1_ji * M1_ji * M1_ji;

% Calculate skewness

sk_timeseries(k) = sqrt(N) * M3_ji / (M2_ji^1.5);

end

% T4 end

end

function sk_timeseries = moving_skewness_BLU(x, window_size, window_step)

X2 = x .* x;

S1 = cumsum(x,2);

S2 = cumsum(X2,2);

S3 = cumsum(X2.*x,2);

N = window_size;

j = N: window_step: length(x);

i = j - N;

i(1) = j(1); % dummy

M1_ji = S1(j) - S1(i);

M1_ji2 = M1_ji .* M1_ji;

dS2 = S2(j) - S2(i);

M2_ji = dS2 - 1/N * M1_ji2;

M3_ji = (S3(j) - S3(i)) + M1_ji .* ((-3/N)*dS2 + (2/(N*N))*M1_ji2);

sk_timeseries = sqrt(N) * M3_ji ./ (M2_ji.*sqrt(M2_ji));

M1_ji = S1(N);

M1_ji2 = M1_ji .* M1_ji;

dS2 = S2(N);

M2_ji = dS2 - 1/N * M1_ji2;

M3_ji = S3(N) + M1_ji .* ((-3/N)*dS2 + (2/(N*N))*M1_ji2);

sk_timeseries(1) = sqrt(N) * M3_ji ./ (M2_ji.*sqrt(M2_ji));

end

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Bruno Luong il 30 Ago 2023

Modificato: Bruno Luong il 30 Ago 2023

Apri in MATLAB Online

You can replace

X2 = x .* x;
S1 = cumsum(x,2);
S2 = cumsum(X2,2);
S3 = cumsum(X2.*x,2);

% by

[S1, S2, S3] = cumsum3(x);

with this mexfile (needed to be compiled). It is slighly faster according to my test.

/**************************************************************************
 * Matlab Mex file cumsum3.c
 *
 * [S1, S2, S3] = cumsum(x)
 * for x double vector array performs
 *
 * S1 = cumsum(x);
 * S2 = cumsum(x.^2);
 * S3 = cumsum(x.^3);
 *
 * [S1, S2, S3] = cumsum(x, true) pads 0 in the head of S1, S2 and S3
 *
 * mex -O -R2018a cumsum3.c
 *************************************************************************/
#include "mex.h"
#include "matrix.h"
#define X prhs[0]
#define PAD0 prhs[1]
#define S1 plhs[0]
#define S2 plhs[1]
#define S3 plhs[2]
/* Gateway of SUM1 */
void mexFunction(int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[]) {
    
    mwSize i, M , N, MS, NS;
    int Pad0;
    double *xptr, *s1ptr, *s2ptr, *s3ptr;
    double x, x2, s1, s2, s3;
    
    M = mxGetM(X);
    N = mxGetN(X);
    xptr = mxGetPr(X);
    Pad0 = nrhs >= 2 ? (int)mxGetScalar(PAD0) : 0;
    if (N == 1) 
    {
        NS = N;
        MS = Pad0 ? M+1 : M;
    }
    else
    {
        MS = M;
        NS = Pad0 ? N+1 : N;
    }        
    S1 = mxCreateDoubleMatrix(MS, NS, mxREAL);
    S2 = mxCreateDoubleMatrix(MS, NS, mxREAL);
    S3 = mxCreateDoubleMatrix(MS, NS, mxREAL);
    if (S1 == NULL || S2 == NULL || S3 == NULL)
    {
        mexErrMsgTxt("cumsum3: Out of memory");
    }
    s1ptr = mxGetDoubles(S1);
    s2ptr = mxGetDoubles(S2);
    s3ptr = mxGetDoubles(S3);
    M *= N;
    s1 = s2 = s3 = 0;
    if (Pad0)
    {
        *s1ptr = *s2ptr = *s2ptr = 0;
        s1ptr++;
        s2ptr++;
        s3ptr++;
    }
    for (i=0; i<M; i++)
    {
        *s1ptr++ = (s1 += (x = *xptr++));
        *s2ptr++ = (s2 += (x2 = x*x));
        *s3ptr++ = (s3 += (x2*x));
    }
}

Such function is useful for various scenarios of running stats with 3rd order moment.

The MATLAB function is

function sk_timeseries = moving_skewness_BLU2(x, window_size, window_step)
[S1, S2, S3] = cumsum3(x,1);
N = window_size;
j =  N+1: window_step: length(x)+1;
i = j - N;
M1_ji = S1(j) - S1(i);
M1_ji2 = M1_ji .* M1_ji;
dS2 = S2(j) - S2(i);
M2_ji = dS2 - 1/N * M1_ji2;
M3_ji = (S3(j) - S3(i)) +  M1_ji .* ((-3/N)*dS2 + (2/(N*N))*M1_ji2);
sk_timeseries = sqrt(N) * M3_ji ./ (M2_ji.*sqrt(M2_ji));
end

Accedi per commentare.

Optimizing for speed. Moving skewness finder. Cumulative sum proving to be bottleneck.

2 Commenti
Mostra NessunoNascondi Nessuno

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

Optimizing for speed. Moving skewness finder. Cumulative sum proving to be bottleneck.

2 Commenti Mostra NessunoNascondi Nessuno

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

2 Commenti
Mostra NessunoNascondi Nessuno

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti