Replace multiple intervals in array with NaN - No loops

Question

Michael Madelaire il 14 Feb 2017

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/324955-replace-multiple-intervals-in-array-with-nan-no-loops

Commentato: Jan il 15 Feb 2017

Hi

I am working on a project with a lot of data. In the area of 10.000.000 rows. I therefore cannot use for loops. The data has to be cleaned, with respect to certain parameters. These parameters are found independently and are not on measured with the same timesteps. Below is a simplified example that explains the idea of the problem. :)

The main dataset, let's call it:

dataset = 1:30;

Two arrays are now given. They indicate intervals within which measurements has to replaced with NaN.

upperbound = [2,5,10,18,27];
lowerbound = [0,3,8,15,25];

With a for loop I would normally say:

for i = 1:length(upperbound);
  dataset(lowerbound(i):upperbound(i)) = NaN;
end

But with so much data this is not possible.

Do any of you have a good idea to solve this ?

2 Commenti
Mostra NessunoNascondi Nessuno

Jan il 14 Feb 2017

The smallest lowerbound index is zero. Is this a typo, because indices must be >= 1?

Michael Madelaire il 15 Feb 2017

Yes, that is a typo!

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Jan il 14 Feb 2017

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/324955-replace-multiple-intervals-in-array-with-nan-no-loops#answer_254735

Modificato: Jan il 15 Feb 2017

Apri in MATLAB Online

With logical indexing:

data  = 1:30;
upper = [2,5,10,18,27];
lower = [1,3,8,15,25];   % Not 0 as 1st element
% FAILING for upper==lower or overlapping intervals:
index        = zeros(1, numel(data));
index(lower) = 1;
index(upper) = -1;
toNaN        = (cumsum(index) == 1) | (index == -1);  % Thanks Adam!
data(toNaN)  = NaN;

Alternatively use Bruno's FEX: mcolon:

v       = mcolon(lower, upper);
data(v) = NaN;

If this is the bottleneck of your code, use a C-Mex function.

// File: BlockCopy.c
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
 double *low, *up, *out, NaN = mxGetNaN(), *p, *pend;
 size_t n, lowi, upi, maxi;
   maxi = mxGetNumberOfElements(prhs[0]);
   low  = mxGetPr(prhs[1]);
   up   = mxGetPr(prhs[2]);
   n    = mxGetNumberOfElements(prhs[1]);
   if (n != mxGetNumberOfElements(prhs[1])) {
      mexErrMsgTxt("*** BlockCopy[mex]: Index vectors must have the same length.");
   }
   plhs[0] = mxDuplicateArray(prhs[0]);
   out     = mxGetPr(plhs[0]) - 1;  // For 1-based Matlab indexing
   while (n--) {
      lowi = (size_t) *low++;       // Check validity of indices
      upi  = (size_t) *up++;
      if (lowi < 1 || upi > maxi) {
         mexErrMsgTxt("*** BlockCopy[mex]: Index out of bounds.");
      }
      p    = out + lowi;
      pend = out + upi;
      while (p <= pend) {
         *p++ = NaN;
      }
   }
   return;
  }

This works with overlapping intervals also.

Attention: The C-code will crash if any index is <= 0 or > then the length of the input array or if the inputs are no DOUBLE arrays. Either care for proper inputs or add checks of each index.

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Jan il 15 Feb 2017

Modificato: Jan il 15 Feb 2017

Apri in MATLAB Online

@Michael Madelaire: I'm impressed. The vectorized method cannot handle intervals with a single element and overlapping intervals. mcolon is slower than expected. And The speedup of the MEX is not overwhelming:

function test  
dataset = 1:1e7;
lower = sort(randi([1, 1e7 - 1000], 1, 1e5));
upper = lower + randi([1,1000], size(lower));
tic; v1 = method1(dataset, lower, upper); toc;
tic; v2 = method2(dataset, lower, upper); toc;
tic; v3 = method3(dataset, lower, upper); toc;
tic; v4 = method4(dataset, lower, upper); toc;
tic; v5 = method5(dataset, lower, upper); toc;
function v = method1(v, lower, upper)  % Original loop
for i = 1:length(upper)
  v(lower(i):upper(i)) = NaN;
end
function v = method2(v, lower, upper)  % Vectorized 1
index        = zeros(1, numel(v));
index(lower) = 1;
index(upper) = -1;
toNaN        = (cumsum(index) == 1) | (index == -1);
v(toNaN)     = NaN;
function v = method3(v, lower, upper)  % Vectorized 2
index          = zeros(1, numel(v)+1);
index(lower)   = 1;
index(upper+1) = index(upper+1) + 1;
toNaN          = (mod(cumsum(index), 2) == 1);
v(toNaN(1:end-1)) = NaN;
function v = method4(v, lower, upper)  % mcolon
idx    = mcolon(lower, upper);
v(idx) = NaN;
function v = method5(v, lower, upper)   % C-Mex
v = BlockCopy(v, lower, upper);
 Elapsed time is 0.328137 seconds.  Original loop
 Elapsed time is 0.286607 seconds.  Vectorized 1
 Elapsed time is 0.878544 seconds.  Vectorized 2
 Elapsed time is 1.516311 seconds.  mcolon
 Elapsed time is 0.149635 seconds.  C-mex

Matlab 2016b/64, Win, SDK7.1 compiler, dual core. Note that the vectorized versions do not handle the overlapping intervals correctly. A data set without overlaps:

lower = 1:1e3:(1e7-1000);
upper = lower + randi([1, 999], size(lower));
 Elapsed time is 0.153022 seconds.  Original loops
 Elapsed time is 0.332387 seconds.  Vectorized 1
 Elapsed time is 0.787915 seconds.  Vectorized 2
 Elapsed time is 0.252076 seconds.  mcolon
 Elapsed time is 0.109385 seconds.  C-mex

Summary: The naive loops are fine! The vectorized version can almost compete, but is less powerful with scalar or overlapping intervals (perhaps somebody finds a better solution). The C-mex is 30% to 55% faster only. Omitting the boundary checks saves some percent only, but let the Matlab session crash in case of unexpected input.

The Matlab loop is efficient, because the main work of writing NaNs to the memory performs well, while the loop itself and the check of the boudaries is not very expensive.

Michael Madelaire il 15 Feb 2017

Thank you so much for the thorough answer answer! The elapsed time table is very cool. I think i will stick to the loops then. But really had expected the vectorized solution to be much faster.

Jan il 15 Feb 2017

BlockCopy.zip

@Michael: Feel free to use the C-mex. I've attached the commented C-code and a pre-compiled function for Win/64.

Accedi per commentare.

Replace multiple intervals in array with NaN - No loops

2 Commenti
Mostra NessunoNascondi Nessuno

Risposta accettata

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

Replace multiple intervals in array with NaN - No loops

2 Commenti Mostra NessunoNascondi Nessuno

Risposta accettata

7 Commenti Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

2 Commenti
Mostra NessunoNascondi Nessuno

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti