Problem only reading in select data

Hello all,
I am currently in the process of working on reading in this data file into MATLAB however I am having issues grabbing only the data I want. The file is formatted as follows:
*Sale Item Price Profit
1200 00213 12.21 3.26*
Date Salesperson Cost Sold At Net Money
1/10/11 12 13.45 16.45 3
1/14/11 14 3.98 3.48 -0.5
1/24/11 03 4.60 14.60 10
*Sale Item Price Profit
65 01452 13.78 6.12*
Date Salesperson Cost Sold At Net Money
1/04/11 11 20.10 40.10 20
1/06/11 11 20.11 16.11 4
*Sale Item Price Profit*
...
And so on.
I only want to have Matlab read in the data within the asterisks. Any thoughts on how to do this?
Thanks

4 Commenti

Just to clarify: the asterisks are actually in the file?
Zach
Zach il 6 Apr 2011
The asterisks are not within in the file I put them in simply to show you exactly what pieces of data I needed to be read in.
(To clarify the clarification: or are you looking to read data in any block with a certain headerline? ie "Sale Item Price Profit")
Zach
Zach il 6 Apr 2011
I think my answer to this question if I'm following you correctly is I wish to read only the data associated with the Sale, Item, Price, Profit.

Accedi per commentare.

 Risposta accettata

On the off-chance Walter's approach doesn't work (eg there are more than two block formats in the file), here's a more brute-force approach:
fid = fopen('asterisk.txt','rt');
data = [];
while ~feof(fid)
thisline = fgetl(fid);
if strncmpi('sale',thisline,4)
thisdata = textscan(fid,'%f %f %f %f','collectoutput',true);
data = [data;thisdata{1}];
end
end
fclose(fid);
You can modify the if statement to match whatever specific pattern you want.

8 Commenti

Nasty. That relies on the property of textscan() that it falls out of textscan() when the next available data does not match the first format element. With the information given, specifying that you only wanted to repeat the format once would avoid that problem -- but then you might as well use fscanf() instead of textscan()
I don't understand the objection. What do you mean by "specifying that you only wanted to repeat the format once"? I agree that you could parse line-by-line, but I'm assuming
1) you want to read all blocks that start with a headerline "Sale Item Price Profit"
2) you don't know a priori how many lines are in each of those blocks
3) every block in the file starts with a headerline
4) as I said above, there are multiple block formats, not just the two shown
Under those assumptions, I don't see why you shouldn't read each "Sale Item Price Profit" block with textscan, knowing that it will stop at the next headerline.
Zach
Zach il 6 Apr 2011
Well I also learned that 6.5 doesn't have textscan as a built in function.
Matt, we weren't shown any examples of there being more than one line of data in a Sale block, so to match what was shown a textscan() repeat count of 1 could be used without depending upon textscan to "back up" when it figures out something is unparsable.
But that doesn't help Zach, who doesn't have textscan() and thus should probably be using fscanf()
Zach
Zach il 6 Apr 2011
Is it even possible to parse through data with varying blocks using fscanf? Also I know the format to ignore is to throw an asterisk in the identification of the read input but will this input be able to handle the string that we were passing in earlier?
In Matt's code example, replace the lines
thisdata = textscan(fid,'%f %f %f %f','collectoutput',true);
data = [data;thisdata{1}];
with
thisdata = fscanf(fid, '%f%f%f%f');
data = [data;thisdata];
Zach
Zach il 6 Apr 2011
Thank you all for your help and if it isn't too much trouble I have one final understanding question. What exactly does the thisline portion do along with what does the 4 represent in the strncmpi function?
Walter, that makes sense. Thanks for the non-textscan version.
Zach, fgetl reads a single line of text. Then sctrncmpi is comparing the the first 4 characters of that string with the string 'sale' (that's what the 4 does). You can adapt this if, for example, you had other blocks that also started with "sale" (but then had something else after).

Accedi per commentare.

Più risposte (1)

Walter Roberson
Walter Roberson il 6 Apr 2011

1 voto

textread() with 'CommentStyle', {'Date', 'Profit'}

5 Commenti

Grah! Scooped by Walter Quickdraw Roberson while I was fiddling about with clarifications. Anyway, yes:
fid = fopen('asterisk.txt','rt');
data = textscan(fid,'%f %f %f %f','CommentStyle', {'Date', 'Profit'},'headerlines',1);
fclose(fid);
Zach
Zach il 6 Apr 2011
I just tried applying this solution and unfortunately I got an error telling me that Comment style must be a string. I am confused because I thought this is what "{'Date','Profit'} did.
Can you cut/paste the exact code you used?
Zach: Which version of MATLAB are you using? Using a cell array of a pair of strings has been supported since at least 2007b, but there was probably a time when it wasn't supported.
Matt: You snooze, you loze! ;-)
Zach
Zach il 6 Apr 2011
Sorry I went out to lunch I am using Matlab 6.5 so it probably wasn't supported in this version I will try to use Matt's code listed below.

Accedi per commentare.

Prodotti

Tag

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by