How to vectorize a specific for-loop

Question

Paolo Binetti il 9 Dic 2016

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/316129-how-to-vectorize-a-specific-for-loop

Commentato: Jan il 19 Dic 2016

I am trying to vectorize the for-loop hereafter. Would you have any hint? Thank you

for i = 1 : numel(text)-k+1  % "text" is a string
  pattern(i,:) = text(i:i+k-1);
end

2 Commenti
Mostra NessunoNascondi Nessuno

Jan il 9 Dic 2016

I've formatted the code for you. Please use the "{} Code" button the next time. Thanks.

Paolo Binetti il 9 Dic 2016

Thank you

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Jan il 9 Dic 2016

2
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/316129-how-to-vectorize-a-specific-for-loop#answer_246552

Modificato: Jan il 9 Dic 2016

Apri in MATLAB Online

Before you start a vectorization, care for a pre-allocation:

n = numel(str) - k + 1;
pattern = repmat(' ', n, k);   % <-- added
for ii = 1:n
  pattern(ii,:) = str(ii:ii+k-1);   % [EDITED, was: pattern(k, :)]
end

For a string with 10'000 characters and k=6 this 8 times faster already. But you can get much faster with a loop:

n = numel(str) - k + 1;
pattern = repmat(' ', n, k);
for ii = 1:k                         % <-- k instead of n
   pattern(:, ii) = str(ii:ii+n-1);  % <-- (:, ii) instead of (ii, :)
end

Now the values are copied to a continuous block of memory, which is much faster. In addition the loop is much shorter assumed that k << n.

[EDITED] For your amusement a vectorized version:

index   = bsxfun(@plus, (1:numel(str) - k + 1).', 0:k-1);
pattern = str(index);

[EDITED 2] And if we are on the way, a C-MEX for completeness:

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
  mwSize k, n, f;
  mxChar *in, *out;
  mwSize dims[2];
    // Get inputs:
    in = (mxChar *) mxGetData(prhs[0]);
    k  = (mwSize) mxGetScalar(prhs[1]);
    // Create outputs:
    n       = mxGetNumberOfElements(prhs[0]) - k + 1;
    dims[0] = n;
    dims[1] = k;
    plhs[0] = mxCreateCharArray(2, dims);
    out     = (mxChar *) mxGetData(plhs[0]);
    // Copy data:
    f = out + n * k;
    while (out < f) {
      memcpy(out, in++, n * sizeof(mxChar));
      out += n;
    }
  }

Timings for Matlab 2015b, Win64, mean of 100 iterations:

str = repmat(' ', 1, 10000);
k   = 6;
Original:        0.134 sec
Pre-allocation:  0.0178 sec
Vectorized:      0.000806 sec
Continuous:      0.000348 sec
C-MEX:           0.0000529 sec

Now use the profiler to find out, if this piece of code is still the bottleneck of the program. If this piece of code takes 1% of the processing time only, accelerating it by a factor of 2 let the total program run only 0.5% faster. Therefore it is most likely a waste of time.

[EDITED 2] Conclusion:

The continuous copying is the fastest M-version I can imagine.
Vectorizing wastes time with creating the large index. Do not overestimate vectorization, when you do not operate on complete matrices.
The C-MEX is 7 times faster than the continuous copy version.
Avoiding the creation of the large redundant array would be much more efficient. Better use str(ii:ii+k-1) in the code instead of the fluffy pattern matrix.

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Jan il 9 Dic 2016

Modificato: Jan il 9 Dic 2016

My first version adds a pre-allocation only. This is 8 times faster already for a string of size 10'000 and k=6.
My second version creates the output columnwise and not rowwise: pattern(:, ii) instead of pattern(ii, :). Because Matlab's arrays are stored in columnwise order, neighboring memory elements are accessed in this case, which is much faster than hopping across columns.
The 2nd version is efficient, because it spends the most time with the main work: copying. If you try a vectorized version, you have to create a large index array at first. And this will waste so much time, that the total runtime will be longer. At least I'm convinced that this is the case and therefore I refuse to program it. The speedup factor of 350 for the 2nd version compared to the original is such high, that this piece of code might not be the bottleneck anymore. Then further improvements (if possible at all) are not useful anymore: If this piece of code needs 1% of the processing time of the total programm, a further acceleration of the factor 1000 with make your program less then 1% faster.

Note that vectorization is not faster in general. It depends how expensive the creation of the intermediate index arrays is. If you calculate sin(X) for a matrix, there is no such index matrix, but in your problem this is not possible. But in pattern(:, ii) = str(ii:ii+n-1) the indexed copy is a vectorized operation already compared with the elementwise copy. Fortunately this is implemented fast in Matlab, because the index vector ii:ii+n-1 is not created explicitely.

I've added a vectorized version for fun. As expected, it is slower. See [EDITED]

And a C-MEX also, see [EDITED 2]

PS. Wow, what a long answer to a short question. This is an interesting example for writing efficient code and the limited power of vectorization. I create a tutorial from this topic.

Paolo Binetti il 17 Dic 2016

Modificato: Paolo Binetti il 17 Dic 2016

Thank you very much! Amazing how much you can learn from a simple question, how many ways exist to code a simple operation, and how different is the performance.

Too bad my email provider originally classed the notification of your answer as spam, so I only saw your answer late. In the meantime I had independently came up with the column-wise version, and was happy because indeed the improvement was significant and good enough. I had not tried pre-allocation, because I did not understand its influence.

I will now try adding pre-allocation, and not only in that piece of code, because I understand that it is a good thing in general. I will not try the vectorized version, since you proved it's slower. I don't know what a C-MEX, but having seen the performance, I need to find out, and thern I will try your code.

I see what you mean by "Avoiding the creation of the large redundant array would be much more efficient". But do not understand what exactly you mean by "Better use str(ii:ii+k-1) in the code instead of the fluffy pattern matrix."

Jan il 19 Dic 2016

Apri in MATLAB Online

Pre-allocation is very important in all software projects. Example:

v = [];
for k = 1:1000
  v(k) = k;
end

Now in each iteration Matlab creates a new array and copied the old data. Therefore Matlab has to reserve and write sum(1:1000)*8 bytes, which is >4MB, although the results has 8kB only.

C-Mex is an interface from Matlab to C-code. C is usually much slower during the processing, but extremely tedious during writing and debugging.

If you do not use the large array "pattern" fianlly, but stay at the dynamic indexing and use str(ii:ii+k-1), the code might run faster. But this depends on what you do with the pattern array later on.

Accedi per commentare.

Answer 2

Roger Stafford il 9 Dic 2016

2
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/316129-how-to-vectorize-a-specific-for-loop#answer_246563

Apri in MATLAB Online

You might try the ‘hankel’ function:

   n = numel(text);
   nk = n-k+1;
   pattern = hankel(text(1:nk),text(nk:n));

2 Commenti
Mostra NessunoNascondi Nessuno

Jan il 9 Dic 2016

Apri in MATLAB Online

The vectorized version I've posted:

bsxfun(@plus, (1:numel(str) - k + 1).', 0:k-1)

is the core of the hankel function.

Paolo Binetti il 17 Dic 2016

Thank you!

Accedi per commentare.

How to vectorize a specific for-loop

2 Commenti
Mostra NessunoNascondi Nessuno

Risposta accettata

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Più risposte (1)

2 Commenti
Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Community Treasure Hunt

How to vectorize a specific for-loop

2 Commenti Mostra NessunoNascondi Nessuno

Risposta accettata

5 Commenti Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Più risposte (1)

2 Commenti Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Community Treasure Hunt

2 Commenti
Mostra NessunoNascondi Nessuno

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno