Extract data from a non-rectangular text file, efficiently
1 view (last 30 days)
In spite of going through Matlab help and several attempts, I still do not get how to use "textscan" or other relevant functions to read from a simple non-rectangular file like the attached sample.
All I want is to efficiently extract two integers from the first line and two cell arrays (or char arrays) each containing the strings respectively to the left and two the right of | in the subsequent lines. Here is the code I tried, but it does not work:
fid = fopen('dataset.txt');
coeffs = textscan(fid, '%s%s', 1, 'Delimiter', ' ');
sequences = textscan(fid, '%s%s', 'Delimiter', '|');
Using "fileread" and "regexp" also seem to be options, but "regexp" seems slower than "textscan".
BTW if anybody can point me to a link where I can find how to extract data from files using Matlab, with examples providing ample coverage of use cases, that would be great.
dpb on 5 Mar 2017
Edited: dpb on 5 Mar 2017
Close, but the first record is numeric, not string...
As for the lament, textscan has a fair number of examples of different types of files to study and the forum here is replete with special cases. It's not possible to have examples that cover all possible issues; one needs must look for the general principles underlying the examples and consider how they relate to the file at hand.
For the most part, the issue is simply building a format string that matches the record structure and then applying that to the file in the proper sequence for the proper number of times. The most difficult issue in general does have to do with processing undelimited strings or fixed-width files as the C format rules on field width are based on the concept of fields separated by white space as opposed to actual firm character column counts; hence when one use '%s' on a field that isn't really wanted to be treated as such, havoc can often ensue...