read empty line by textscan
44 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi Everyone,
I am trying to organize a txt file with 12000 lines, which is too large to use readtable. And i choose to use textscan.
But the problem is textscan just skip all the empty lines, but i need to the exact lines number of certain element in the original file.
I searched a lot online but didn't help. i tried code like this to delete all whitespace but doesn't help.
default = textscan(fid,'%s%s','Delimiter','=','whitespace', '')
Thank you for your help!
2 Commenti
Rik
il 11 Apr 2019
Did you try either suggested solution? If you still have issues, we'll be happy to help.
Jeremy Hughes
il 11 Apr 2019
I know someone has already added a solution, and it's a fine solution for what you're doing. But I'm surprised that READTABLE has a problem. Can you attach a sample?
12,000 lines isn't all that large especially if there are only two columns.
If you have 19a, you might also try:
M = readmatrix(filename,'OutputType','string','Delimiter','=','Whitespace','')
Risposta accettata
Rik
il 10 Apr 2019
Modificato: Rik
il 10 Apr 2019
If your file doesn't contain any special characters, you could try fileread (which reads a file as one long char array), then split it with regexp. If you aren't sure about the encoding of special characters, you may consider my readfile function (which returns a cell array with 1 element per line, also for empty lines).
default = fileread(filename);
default = regexp(default,'\n','split');
%or:
default = readfile(filename);
The output of those two methods is equivalent if there are no special characters encoded in the file. The allowed characters are shown below. (readfile doesn't have this restriction)
% $%&'()*+,-./0123456789:;<=>?@
% ABCDEFGHIJKLMNOPQRSTUVWXYZ
% [\]^_`abcdefghijklmnopqrstuvwxyz{|}~
5 Commenti
Jeremy Hughes
il 11 Apr 2019
Modificato: Jeremy Hughes
il 11 Apr 2019
default = regexp(default,'\n','split');
This won't work if there are \r\n windows new lines (or at least you'll have trailing \r characters.)
If you're using 16b or later, try:
default = splitlines(default);
It's a little more robust, and since it has only one job to do, probably slightly faster than regexp.
Rik
il 11 Apr 2019
Modificato: Rik
il 11 Apr 2019
To make the regexp splitting more robust (which will be in my nest version of readfile):
CRLF=[13 10];
CRLF=CRLF([any(default==13) any(default==10)]);
if isempty(CRLF),CRLF=10;end
default = regexp(default,CRLF,'split');
splitlines will probably be faster, while the code I showed here is backwards compatible to R14 (v7.0, which was when regexp was expanded to support outkeys).
Edit:
I just noticed I had this line already in my function:
str(str==13)='';
So readfile already splits it correctly for \r\n files.
Più risposte (1)
Bob Thompson
il 10 Apr 2019
Modificato: Rik
il 10 Apr 2019
I'm going to guess that the extra lines are not consistent?
Generally, I would suggest reading the entire file in as one string, then splitting it at the new line characters. The exact coding may be a bit off from the below example, but it should put you on the right track.
default = textscan(fid,'%s'); % Read the file as one block
default = regexp(default,'\n','split'); % Split the string into multiple cells at each new line character
3 Commenti
Bob Thompson
il 10 Apr 2019
Yes, I do. Thank you for catching that, I was using repmat for other things recently.
Vedere anche
Categorie
Scopri di più su Data Import and Export in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!