Fixed-width data import with textread()

7 visualizzazioni (ultimi 30 giorni)
Nikolay Rodionov
Nikolay Rodionov il 21 Dic 2013
Modificato: dpb il 21 Dic 2013
Hi guys,
Thanks for your help in advance. I'm having issues importing some data using textread(). I am trying to use it to import a fixed-width txt dataset, but the function is blending some data columns when there is no whitespace between string and numerical datatypes.
Specifically, I am trying to import a data extract from a gromacs .gro file. You can find information about the data structure here: http://manual.gromacs.org/online/gro.html
My code:
[resnum resname atomname atomnr x y z a b c] ...
= textread('conf-mod.gro','%5d%5s%-5s%5d%8.3f%8.3f%8.3f%8.4f%8.4f%8.4f');
It works for data lines like:
1ALA CA 3 56.249 52.119 83.467 0.0000 0.0000 0.0000
But it blends 'atomname' and 'atomnr' data into the 'atomname' column for lines like this:
14195ASP OD119731 55.954 54.890 95.494 0.0000 0.0000 0.0000
Note: 'atomname' should equal OD here, and 'resname' comes out fine strangely enough. I don't understand why this is happening because I've clearly outlined the fixed width format of the dataset. I've tried converting %-5s to %-5c, but it did not help.
Any suggestions?

Risposta accettata

dpb
dpb il 21 Dic 2013
Modificato: dpb il 21 Dic 2013
Unfortunately, can't do it with standard Matlab i/o formatting strings; they just don't honor the fixed width, blank-delimited fields w/o at least one blank. Sad and pathetic and imo utterly unacceptable but that's just the way it is.
If you can't write the data files in another format that has delimiters, you've one of several choices...
a) read the whole file as character array and do character substitution to insert delimiters and then parse the modified array (textscan, say),
b) read a line at a time and parse individual fields w/ sscanf or the like. Something like
l=fgetl(fid); % read a line
resnum=[resnum;sscanf(l(1:5),'%d');
resname=[resname;sscanf(l(6:10),'%s');
...etc., ...
When you get past the character data you can then use an array and parse the six numeric fields together. Or, lastly,
c) see if regular expressions will actually honor a field width--I'm not conversant enough with it to know otomh...
Lastly, complain to TMW through official support that they need to find a solution for fixed-width input parsing...altho they seem to want to not admit it, such files do exist and aren't going away irregardless and it's absurd one can't read them easily in Matlab.

Più risposte (0)

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by