Split cell array rows by delimiter (2016b)

Question

Hau Kit Yong il 27 Giu 2019

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/469181-split-cell-array-rows-by-delimiter-2016b

Commentato: Jan il 27 Giu 2019

I have a vertical cell array of char vectors that I want to split into smaller vertical cell arrays based on rows in the array that serve as delimiters. For example,

x = ...
    {'LINE1'; ...
    '* THIS IS A COMMENT LINE'; ...
    '* THERE CAN BE MORE THAN ONE COMMENT LINE'; ...
    'LINE2'; ...
    'LINE3'};

should be split into

x_split = ...
    {{{'LINE1'}}; ...
    {'LINE2';'LINE3'}};

where lines starting with '* ' are comment identifiers.

I would like the operation to be as fast as possible so I would like a vectorized approach, perhaps involving cellfun/arrayfun. I can get the indices of the comment lines easily enough using cellfun and strncmp, but I'm not sure how to proceed with the splitting.

2 Commenti
Mostra NessunoNascondi Nessuno

Jan il 27 Giu 2019

You forgot to mention, why the first line is stored as a scalar cell array, while the other 2 are a cell vector. Do you want to join the char vectors by using all blocks of comments as separators?

Hau Kit Yong il 27 Giu 2019

Thank you for the spot. I have edited the desired output. I want every element in the output cell to be cell vectors.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Jan il 27 Giu 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/469181-split-cell-array-rows-by-delimiter-2016b#answer_381056

Modificato: Jan il 27 Giu 2019

Apri in MATLAB Online

Let's start with a loop approach to clarify at first, what you exactly want:

C = {'LINE1'; ...
    '* THIS IS A COMMENT LINE'; ...
    '* THERE CAN BE MORE THAN ONE COMMENT LINE'; ...
    'LINE2'; ...
    'LINE3'};
limit  = [true, strncmp(C, '*', 1).', true];  % no need for the slow cellfun here!
ini    = strfind(limit, [true, false]);
fin    = strfind(limit, [false, true]) - 1;
n      = numel(ini);
Result = cell(n, 1);
for k = 1:n
    Result{k} = C(ini(k):fin(k));
end

Now you hope that a vectorized approach or cellfun is faster? I do not think so.

Maybe find(diff()) this is faster than calling strfind twice:

limit  = [true, strncmp(C, '*', 1).', true];  % no need for the slow cellfun here!
index  = find(diff(limit))
n      = numel(index) / 2;
Result = cell(n, 1);
for k = 1:n
    Result{k} = C(index(2*k-1):index(2*k)-1);
end

Well, let's try splitapply:

isComment = strncmp(C, '*', 1);
index     = zeros(size(C));
index(strfind([true, isComment], [true, false])) = 1;
index     = cumsum(index);
index(isComment) = NaN;
Result = splitapply(@(x) {x}, C, index);

This seems to be too complex. mat2cell is more direct:

isCmt  = strncmp(C, '*', 1);
limit  = [true, isCmt.', true];
ini    = strfind(limit, [true, false]);
fin    = strfind(limit, [false, true]) - 1;
Rexult = mat2cell(C(~isCmt), (fin - ini + 1).')

Some timings:

C = repmat(C, 10000, 1);  % A larger input
% With tic/toc, Matlab 2019a ONLINE:
% STRFIND:    0.084 sec
% FIND(DIFF): 0.091 sec
% SPLITAPPLY: 0.235 sec
% MAT2CELL:   0.046 sec

The timings in the ONLINE machine need not be accurate, so test it locally again.

2 Commenti
Mostra NessunoNascondi Nessuno

Hau Kit Yong il 27 Giu 2019

I was asking for a vectorized approach as I thought it was generally faster than for loops, but your method is speedy enough! It took about 0.05s to split 150k lines. Thanks for the reminder that strncmp works for cells without cellfun as well.

Jan il 27 Giu 2019

I've edited the answer and added a splitapply and mat2cell appraoch, which might be considered as "vectorized".

Accedi per commentare.

Split cell array rows by delimiter (2016b)

2 Commenti
Mostra NessunoNascondi Nessuno

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

Split cell array rows by delimiter (2016b)

2 Commenti Mostra NessunoNascondi Nessuno

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

2 Commenti
Mostra NessunoNascondi Nessuno

2 Commenti
Mostra NessunoNascondi Nessuno