Textscan function ignores final delimiter when token is empty

Question

Derek Wolfe il 15 Mag 2023

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1964294-textscan-function-ignores-final-delimiter-when-token-is-empty

Modificato: Rik il 17 Mag 2023

%str = 'a,b,c';  % works, yields 3 tokens
%str = 'a,,c';  % works, yields 3 tokens
str = 'a,b,';  % doesn't work, yields 2 tokens
line = textscan(str, '%q', 'Delimiter', ',', 'MultipleDelimsAsOne', 0);
disp(line{1})
    {'a'}
    {'b'}

textscan function seems to ignore the final delimiter if token is empty. 'a,b,' should be 3 tokens and it only generates 2

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Rik il 15 Mag 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1964294-textscan-function-ignores-final-delimiter-when-token-is-empty#answer_1235939

Modificato: Rik il 17 Mag 2023

Apri in MATLAB Online

From what I can tell, this is intended functionality. It is unable to match the format specification to an empty array, so it is skipped.

If you want the trailing empty token, you may consider either the split() function, or regexp() with the split output.

Edit: here you have the examples I meant.

str{1,1} = 'a,b,c';
str{2,1} = 'a,,c';
str{3,1} = 'a,b,';
regexp(str,',','split') % three tokens each
ans = 3×1 cell array
    {1×3 cell}
    {1×3 cell}
    {1×3 cell}
split(str,',')
ans = 3×3 cell array
    {'a'}    {'b'     }    {'c'     }
    {'a'}    {0×0 char}    {'c'     }
    {'a'}    {'b'     }    {0×0 char}
% To show your solution works as well
cellfun(@(x)disp(textscan([x ','], '%q', 'Delimiter', ',', 'MultipleDelimsAsOne', 0)),str)
    {3×1 cell}

    {3×1 cell}

    {3×1 cell}

As you can see, each solution returns 3 tokens, but all in a slightly different format.

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Derek Wolfe il 16 Mag 2023

Apri in MATLAB Online

I don't think it should work this way so I submitted a bug. Thanks for the ideas but it would be easier to workaround by adding an extra delimiter to the end of the string before scanning it

str = 'a,b,';
str = [str, ','];
line = textscan(str, '%q', 'Delimiter', ',', 'MultipleDelimsAsOne', 0);
disp(line{1})
    {'a'     }
    {'b'     }
    {0×0 char}

Rik il 16 Mag 2023

That works too.

Regarding your bug report: since the current behavior matches the description in the documentation, I don't see how this could be classified as a bug. You could call this an enhancement request.

Either way, your main problem is solved. Just be aware that textscan will try to read your format, not tokens. If you want tokens you need something like regexp.

Accedi per commentare.

Answer 2

Walter Roberson il 16 Mag 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1964294-textscan-function-ignores-final-delimiter-when-token-is-empty#answer_1237219

Apri in MATLAB Online

filename = tempname() + ".txt";
str = {'12,453   ,  c'};
writelines(str, filename);
fid = fopen(filename, 'r');
datacell = textscan(fid, '%f%f', 1, 'Delimiter', ',', 'MultipleDelimsAsOne', 0);
rest = char(fread(fid, [1 inf], '*uchar'));
fclose(fid);
datacell
datacell = 1×2 cell array
    {[12]}    {[453]}
rest
rest = 
    '  c
     '

What does this tell us? Well, it tells us that after textscan finishes processing the format, it examines the input stream, consuming Whitespace and up to the first copy of the Delimiter, and then stops.

So in your example when processing 'a,b,' with %q it first processes the a as part of the %q format, and then eats the delimiter that is there, leaving 'b,' in the input stream. No count is specified so it tries processing again, reads the b with the %q format, eats the delimiter. No count is specified so it tries processing again. The input stream is empty, so it stops. Yes, this is the same thing that would happen for the case of 'a,b' with no final delimiter.

A potential alternative would have been if the delimiter was left in the buffer, and it was up to the processing to skip one leading delimiter. But in such a case the input ',b,c' would be treated the same as 'b,c' and it seems unlikely to me that you would want that to be the case.

Yes, textscan could have been designed to set an internal flag indicating that a delimiter had been seen before the current position, but I am not sure that would meet with expectations. For example if the input were 123,abc,<newline>456,def, and the format were %f%q then with the current behaviour that would generate {{123;456}, {'abc'; 'def'}} . If trailing delimiters should invoke the missing-contents behaviour (your interpretation) then that would imply that the 123 should be accepted by the first time %f is processed, then the abc should be accepted by the first time %q is processed, then the empty field at the end of the line should be accepted and turned into nan by the second time %f is processed, then the 456 should be accepted and turned into character vector by the second time %q is processed, then the def should trigger a format mismatch the third time %f is processed, stopping the scanning and leaving def, in the buffer. It might be potentially be a consistent way of processing, but is it what would be expected and useful ?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Textscan function ignores final delimiter when token is empty

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (2)

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Textscan function ignores final delimiter when token is empty

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (2)

4 Commenti Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti