Azzera filtri
Azzera filtri

Limit to Textscan?

7 visualizzazioni (ultimi 30 giorni)
Ying
Ying il 8 Mag 2012
Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.
Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.
  1 Commento
Oleg Komarov
Oleg Komarov il 9 Mag 2012
Next time do not create additional answers since it became impossible to follow who's answered what and to collect all the info you supplied. Please use comments or/and edit your original answer.

Accedi per commentare.

Risposta accettata

Geoff
Geoff il 9 Mag 2012
Thanks for clarifying what your data looks like.
I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.
Do you have a fixed number of columns? If so, are the commas always there?
If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:
lines = {'1, 2 3, 4 5, 6'
'1, 2 3, 4 5, 6'
'1, 2 3, 4 , '
' , 3, 4 , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');
This extracts word-like strings with optional spaces and the obligatory comma.
What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...
I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.
[EDIT]
The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:
toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
rows(r) = { str2double([toks{r}{:}]) };
end
Now you have a cell with one row per line, containing a vector of doubles...
This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....
And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat

Più risposte (7)

Geoff
Geoff il 8 Mag 2012
I doubt there is a limit for the tiny numbers you're talking about.
What I expect has happened is that textread encountered some text that did not fit the format and was not listed as a possible delimiter.
Check your data file near the last line that you think was successfully parsed.
  3 Commenti
Geoff
Geoff il 9 Mag 2012
From your descriptions it's hard to envisage what your data looks like, and you haven't shown your textread() call. If you want your data in a matrix, then it has to be the width/height of your largest column and row number. If you want a variable width, you'll need to read into a cell array. I'd recommend using fgetl() with textread() on a per-line basis... Other functions worth checking out are sscanf(), regexp() or textscan().
Walter Roberson
Walter Roberson il 9 Mag 2012
textread() is not recommended; it will be removed from MATLAB.
textscan() is its replacement.

Accedi per commentare.


Walter Roberson
Walter Roberson il 9 Mag 2012
MATLAB does not provide any facilities that can deal with reading field-wise from blocks of text of inconsistent number of fields. Not unless all of the fields are the same numeric format and everything is be read as one continuous stream ignoring line boundaries.
To read row-wise with inconsistent number of fields, one must read entire lines and parse them afterwards.

Ken Atwell
Ken Atwell il 9 Mag 2012
That is an unusual file format. If I read you correctly, you have a file I would describe as "ragged down"... a consistent number of columns, but the number of rows per column is variable. Is that right? I'm assuming the columns are delimited with commas, tabs or such; something like (whitespace added):
11, 12, 13
21, , 23
, , 33
In this trivial example, textscan would stop processing at the first missing value (in the second row here). You can call textscan again with the same file handle and it will continue where it left off, but I image you will find it difficult to recover from the missing value.
Depending on the version of MATLAB you are using, I would try importing the file into MATLAB... it may just do right thing, and you can then generate a script from there to create a programmatic solution.
If that doesn't work out, another solution would be to read the file line-by-line, splitting on the delimiter (comma here). And, in this case, I want to convert from strings to doubles. Here is some code to import the data I've included here:
f = fopen('input.dat');
A=[];
while ~feof(f)
l = fgetl(f);
r = regexp(l, ',', 'split');
A(end+1,:) = str2double(r);
end
fclose (f);
A
Missing values are represented by NaNs in A.
  3 Commenti
iffi
iffi il 27 Dic 2012
f = fopen('input.dat');
A=[];
while ~feof(f)
l = fgetl(f);
r = regexp(l, ',', 'split');
A(end+1,:) = str2double(r);
end
fclose (f);
A this code read the file well but I have also some data in this form e.g V567,V1528,..
here this code also give me NaN for all such entries apart from missing values.
Walter Roberson
Walter Roberson il 27 Dic 2012
It appears you are starting a new topic. Please create a new Question for this. You can refer to this existing topic as giving ideas.

Accedi per commentare.


Ying
Ying il 9 Mag 2012
Thanks for the responses, Ken, as for trying to import the file into Matlab, I could not do it successfully as I have multiple delimiters in my data. The data looks like the following:
1, 2 3, 4 5, 6
1, 2 3, 4 5, 6
1, 2 3, 4 ,
, 3, 4, ,
The data is weird in that commas separate the time and data column for one variable, and a space separates it from the next set of time/data. So in this example columns of 1, 3, and 5 are times, and 2,4,6 are the respective data that the times correspond to. And each set ends at different times. Right now my textscan always end at the shortest set (5,6) in this example. Is it possible to just change my delimiters so that it reads the whole file? Or should I try the line by line read option?
  2 Commenti
Walter Roberson
Walter Roberson il 9 Mag 2012
Are the columns fixed width? If they are not, there is logical difficulty in distinguishing between " 3" and "3 ".
Ying
Ying il 9 Mag 2012
I don't know, I do know that it reads everything fine up to the shortest column though

Accedi per commentare.


Ying
Ying il 9 Mag 2012
That's correct Geoff, the comma after the 4 is a typo. The number of columns is somewhat fixed. What I mean is it's controllable, I can choose how many variables to track, however if I want more or less variables then I have to change the script to match that as well. The commas are always there, between the time and data that it matches to.
Oh, and since you asked earlier, this is my textscan line:
datanew = textscan(fid,'%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','Delimiter','\t%,','HeaderLines',2);
So as you can see I have around 52 columns, not the most pretty or ideal way to do it, I know. I wanted to use import, but textscan seems to be the only way I've gotten it to work.
  5 Commenti
Ying
Ying il 9 Mag 2012
How would I account for header lines and column names?
Geoff
Geoff il 10 Mag 2012
Read the first line and process it the same way. Are the headers separated by the same "comma-sometimes" strategy? You could use the same regexp code I gave you as long as a single header does not contain a space.

Accedi per commentare.


per isakson
per isakson il 9 Mag 2012
Does the data block of the file have a format something like this?
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space , time_stamp, value
"space" is that only char(32)? There isn't a tab, char(9)? The "time_stamp" does it have a special format that can be distinguished from "value"? Do the columns have fixed width, as in my example above?
If you how many header lines you can read them with fgetl or textscan.

Ying
Ying il 9 Mag 2012
I think I was able to make it work by reading in all values as strings instead of floating numbers, and then making them all the same length, and use a str2num and converted the strings back to numbers. Now I just have to get it to work with the rest of the script.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by