Get every nth row of a tall array

I have a tall array and would like to collect every 26th row of one variable into an array. I tried:
U = tall(udata);
hhws = [];
udata.ReadSize = 26*500; % data is in 26 row chunks, so sizing so below works
while hasdata(udata)
U = read(udata);
hhws = [hhws;U.Var13(14:26:end)]; % want every 26th row starting with the 14th row
end
This produced the error:
Error using matlab.io.datastore.TabularTextDatastore/readData (line 78)
Unable to parse a "Numeric" field when reading row 10765, field 1.
Actual Text: "******** 7.909"
Expected: A number or literal "NaN", "Inf". (possibly signed, case insensitive)
Error in matlab.io.datastore.TabularDatastore/read (line 174)
[t, info] = ds.readData();
Caused by:
Reading the variable name 'Var1' using format '%f' from file: '<file path and file name>' starting at offset 1011702139.
Seems like maybe there's a problem with how I'm reading the file in? Is the method above viable assuming I get through this error? Thanks!

 Risposta accettata

dpb
dpb il 18 Ago 2022
Modificato: dpb il 20 Ago 2022
Actual Text: "******** 7.909"
The problem is in the data file itself -- there's an oveflow field indicator of "*" in a numeric field that fails because can't be converted to a numeric value by a formatted read.
You would need to add
'TreatAsMissing',{'********',''}
to the datastore when create it.
I've not really used the datastore much; I didn't see it there, but with detectImportOptions and the resulting text import object, there's also an 'ImportErrorRule' parameter that can be used to substitute a 'FillValue' which in that case could be made to return inf instead of nan to identify the specific instances as being the overflow and leave the missing just as empty. Seems an oversight unless I just missed it in the doc, but surely didn't find it; the options available aren't as extensive for the datastore, it seems.

4 Commenti

Thanks! That appears to have fixed the error, but I'm not convinced that the code is doing what I want yet. I'm expecting 'hhws' to be 43,200,000 long minus any rows with errors, but I'm only getting 191,972 elements in it. I doubt I can upload a file this big (I can't even open it). In principle, should this code work?
Actually, I suspect it's more complicated...
If I only add the recommended 'TreatAsMissing' to the 'tabularTextDatastore' command, I get this error: Cannot interpret data in the file '<file>'. Found 2 variable names but 25 data columns. You may need to specify a different format, delimiter, or number of header lines.
This suggests that 'TreatAsMissing' is changing how it reads in some of the header lines and I need a different number of headerlines, I think. I've tried a bunch of different numbers. Most of the rest produce this error at the read function: The value for "TreatAsEmpty" must be non-empty character vectors or cell arrays of character vectors.
So what does this mean?
I attached a file that's similar to the one I'm working with. The main difference in format is that mine has 25 columns of data in 25 rows "chunks" and the sample has 15 of each. The header lines should be the same. Each chunk of data starts with a line with only two variables and I want the one in the second column from each chunk. My latest code is below:
udata = tabularTextDatastore('Path\Sample.txt','FileExtensions','.txt','NumHeaderLines',13,'TreatAsMissing',{'********',''});
hhws = [];
time = [];
count = [];
udata.ReadSize = 20000;
while hasdata(udata)
Ut = read(udata);
ts = isnan(Ut.Var3); % the blank entries are read in as NaN, so I'm using those to find this line in each chunk
time = [time;Ut.Var1(ts)];
hhws = [hhws;Ut.Var2(ts)];
count(end+1) = length(Ut.Var1);
end
Thanks for your continued help!
Got it to work! Just had to change 'TreatAsMissing',{'********',''} to just 'TreatAsMissing','********', though I don't understand why that made the difference.
dpb
dpb il 22 Ago 2022
That does seem peculiar; the empty record is default; it's supposed to use either.
That might be worth a support Q? to TMW to ask if that is an expected result.

Accedi per commentare.

Più risposte (0)

Prodotti

Release

R2021b

Richiesto:

il 18 Ago 2022

Commentato:

dpb
il 22 Ago 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by