How to use textscan to read my 2nd column and ignore the string or non numerical values?

26 visualizzazioni (ultimi 30 giorni)
Hello, I have a huge data file and I was wondering if anyone could help me use textscan to only read the 2nd column but also ignore the strings. The file has this sort of format and the data keeps going.
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01
energy
1.0000E-01 5.44426E-10 0.4254
1.2475E-01 1.71665E-10 0.8055
1.4950E-01 8.51003E-11 0.8817
1.7426E-01 2.09602E-10 0.6570
1.9901E-01 2.62823E-10 0.4473
2.2376E-01 3.18821E-11 0.7145
2.4851E-01 4.37107E-11 0.6539
2.7327E-01 1.34258E-10 0.6703
2.9802E-01 1.31857E-10 0.6663
3.2277E-01 4.53330E-11 0.9459
3.4752E-01 9.37144E-13 0.9914
3.7228E-01 2.99698E-10 0.9950
3.9703E-01 7.03990E-18 0.9950
4.2178E-01 5.24669E-16 0.9950
4.4653E-01 4.01338E-29 0.9950
4.7129E-01 0.00000E+00 0.0000
4.9604E-01 0.00000E+00 0.0000
5.2079E-01 0.00000E+00 0.0000
5.4554E-01 0.00000E+00 0.0000
5.7030E-01 0.00000E+00 0.0000
5.9505E-01 0.00000E+00 0.0000
6.1980E-01 9.12419E-29 0.9950
6.4455E-01 0.00000E+00 0.0000
6.6931E-01 2.43906E-25 0.9950
6.9406E-01 0.00000E+00 0.0000
7.1881E-01 0.00000E+00 0.0000
7.4356E-01 0.00000E+00 0.0000
7.6832E-01 1.86537E-12 0.9950
7.9307E-01 0.00000E+00 0.0000
8.1782E-01 2.11385E-11 0.9950
8.4257E-01 0.00000E+00 0.0000
8.6733E-01 0.00000E+00 0.0000
8.9208E-01 0.00000E+00 0.0000
9.1683E-01 0.00000E+00 0.0000
9.4158E-01 9.73682E-13 0.9950
9.6634E-01 0.00000E+00 0.0000
9.9109E-01 0.00000E+00 0.0000
1.0158E+00 0.00000E+00 0.0000
1.0406E+00 0.00000E+00 0.0000
1.0653E+00 4.94059E-40 0.9950
1.0901E+00 0.00000E+00 0.0000
1.1149E+00 0.00000E+00 0.0000
1.1396E+00 0.00000E+00 0.0000
1.1644E+00 0.00000E+00 0.0000
1.1891E+00 0.00000E+00 0.0000
1.2139E+00 0.00000E+00 0.0000
1.2386E+00 0.00000E+00 0.0000
1.2634E+00 0.00000E+00 0.0000
1.2881E+00 0.00000E+00 0.0000
1.3129E+00 0.00000E+00 0.0000
1.3376E+00 6.03842E-10 0.9950
1.3624E+00 0.00000E+00 0.0000
1.3871E+00 0.00000E+00 0.0000
1.4119E+00 0.00000E+00 0.0000
1.4366E+00 0.00000E+00 0.0000
1.4614E+00 0.00000E+00 0.0000
1.4861E+00 0.00000E+00 0.0000
1.5109E+00 0.00000E+00 0.0000
1.5356E+00 0.00000E+00 0.0000
1.5604E+00 0.00000E+00 0.0000
1.5851E+00 0.00000E+00 0.0000
1.6099E+00 0.00000E+00 0.0000
1.6347E+00 0.00000E+00 0.0000
1.6594E+00 0.00000E+00 0.0000
1.6842E+00 0.00000E+00 0.0000
1.7089E+00 0.00000E+00 0.0000
1.7337E+00 0.00000E+00 0.0000
1.7584E+00 0.00000E+00 0.0000
1.7832E+00 0.00000E+00 0.0000
1.8079E+00 0.00000E+00 0.0000
1.8327E+00 0.00000E+00 0.0000
1.8574E+00 0.00000E+00 0.0000
1.8822E+00 0.00000E+00 0.0000
1.9069E+00 0.00000E+00 0.0000
1.9317E+00 0.00000E+00 0.0000
1.9564E+00 0.00000E+00 0.0000
1.9812E+00 0.00000E+00 0.0000
2.0059E+00 0.00000E+00 0.0000
2.0307E+00 0.00000E+00 0.0000
2.0554E+00 0.00000E+00 0.0000
2.0802E+00 0.00000E+00 0.0000
2.1050E+00 0.00000E+00 0.0000
2.1297E+00 0.00000E+00 0.0000
2.1545E+00 0.00000E+00 0.0000
2.1792E+00 0.00000E+00 0.0000
2.2040E+00 0.00000E+00 0.0000
2.2287E+00 0.00000E+00 0.0000
2.2535E+00 0.00000E+00 0.0000
2.2782E+00 0.00000E+00 0.0000
2.3030E+00 0.00000E+00 0.0000
2.3277E+00 0.00000E+00 0.0000
2.3525E+00 0.00000E+00 0.0000
2.3772E+00 0.00000E+00 0.0000
2.4020E+00 0.00000E+00 0.0000
2.4267E+00 0.00000E+00 0.0000
2.4515E+00 0.00000E+00 0.0000
2.4762E+00 0.00000E+00 0.0000
2.5010E+00 0.00000E+00 0.0000
2.5257E+00 0.00000E+00 0.0000
2.5505E+00 0.00000E+00 0.0000
2.5752E+00 0.00000E+00 0.0000
2.6000E+00 0.00000E+00 0.0000
total 2.58911E-09 0.3011
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01
uncollided photon flux
energy
1.0000E-01 7.06645E-15 0.9950
1.2475E-01 0.00000E+00 0.0000
1.4950E-01 0.00000E+00 0.0000
  4 Commenti
John Vargas
John Vargas il 22 Ago 2018
I am sorry, in this file, I had already removed the first two lines of strings which are:
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01 energy

Accedi per commentare.

Risposta accettata

Star Strider
Star Strider il 22 Ago 2018
Your file is not easy to import. I’ve been working on this for a while.
Try this:
fidi = fopen('M1output1.txt','rt');
k1 = 1;
while ~feof(fidi)
C = textscan(fidi, '%*f%f%*f', 'HeaderLines',2, 'CollectOutput',true, 'CommentStyle',{' total', ' energy'});
M = cell2mat(C);
if isempty(M)
break
end
D{k1,:} = M;
fseek(fidi, 0, 0);
k1 = k1 + 1
end
fclose(fidi);
Column2 = cell2mat(D);
The textscan format descriptor reads only Column 2, ignoring the other two columns.
The loop is necessary because you have several text lines that interrupt the ordinary file reading process, so every time textscan encounters text where it expects a numeric value, it stops, snd it is necessary to use fseek to re-start it and read the next block of numbers. Your file also does not have a valid ‘end-of-file’ indicator, so to keep the loop from becoming infinite, it is necessary to test to see if the input is empty. If it is, then the loop breaks and file reading stops. Since this does not occur until all the data have been read, no data are lost.
The cell2mat call at the end concatenates the cells in ‘D’ into a single vector.
  3 Commenti
Star Strider
Star Strider il 22 Ago 2018
As always, my pleasure.
Yes. The asterisk between the ‘%’ and the type descriptor (here ‘f’) tells textscan to ignore that column. So to read all columns:
'%f%f%f'
to read only the first column and ignore the last two:
'%f%*f%*f'
or in a more readable (and still valid) form:
'%f %*f %*f'
and so for any other combinations you want to import or exclude.
Walter Roberson
Walter Roberson il 22 Ago 2018
Star Strider used a format of
'%*f%f%*f'
that says to skip the first number, read and record the second, and skip the third number.
So use %*f for any column you want to skip, and %f for any column you want to read in.

Accedi per commentare.

Più risposte (4)

Jeremy Hughes
Jeremy Hughes il 22 Ago 2018
Modificato: Jeremy Hughes il 22 Ago 2018
I suggest trying
opts = detectImportOptions(filename)
T = readtable(filename,opts)
Also, if you want to ignore those lines:
opts = detectImportOptions(filename)
opts.ImportErrorRule = 'omitrow'
T = readtable(filename,opts)

Yuvaraj Venkataswamy
Yuvaraj Venkataswamy il 22 Ago 2018

Walter Roberson
Walter Roberson il 22 Ago 2018
You can use the CommentStyle option of textscan, specifying {'detector', 'energy'} . This will ignore the x, y, z coordinates on those lines.

jonas
jonas il 22 Ago 2018
Here's another approach with regexprep and textscan
%%Read and remove annoying intermediate headers
str=fileread('File.txt');
str=regexprep(str,'(total|detector).*?energy','');
%%Read 2nd col
num=textscan(str,'%*f%f%*f');
out=cell2mat(num)

Categorie

Scopri di più su Large Files and Big Data in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by