Get every nth row of a tall array

5 visualizzazioni (ultimi 30 giorni)
Dan Houck
Dan Houck il 18 Ago 2022
Commentato: dpb il 22 Ago 2022
I have a tall array and would like to collect every 26th row of one variable into an array. I tried:
U = tall(udata);
hhws = [];
udata.ReadSize = 26*500; % data is in 26 row chunks, so sizing so below works
while hasdata(udata)
U = read(udata);
hhws = [hhws;U.Var13(14:26:end)]; % want every 26th row starting with the 14th row
end
This produced the error:
Error using matlab.io.datastore.TabularTextDatastore/readData (line 78)
Unable to parse a "Numeric" field when reading row 10765, field 1.
Actual Text: "******** 7.909"
Expected: A number or literal "NaN", "Inf". (possibly signed, case insensitive)
Error in matlab.io.datastore.TabularDatastore/read (line 174)
[t, info] = ds.readData();
Caused by:
Reading the variable name 'Var1' using format '%f' from file: '<file path and file name>' starting at offset 1011702139.
Seems like maybe there's a problem with how I'm reading the file in? Is the method above viable assuming I get through this error? Thanks!

Risposta accettata

dpb
dpb il 18 Ago 2022
Modificato: dpb il 20 Ago 2022
Actual Text: "******** 7.909"
The problem is in the data file itself -- there's an oveflow field indicator of "*" in a numeric field that fails because can't be converted to a numeric value by a formatted read.
You would need to add
'TreatAsMissing',{'********',''}
to the datastore when create it.
I've not really used the datastore much; I didn't see it there, but with detectImportOptions and the resulting text import object, there's also an 'ImportErrorRule' parameter that can be used to substitute a 'FillValue' which in that case could be made to return inf instead of nan to identify the specific instances as being the overflow and leave the missing just as empty. Seems an oversight unless I just missed it in the doc, but surely didn't find it; the options available aren't as extensive for the datastore, it seems.
  4 Commenti
Dan Houck
Dan Houck il 22 Ago 2022
Got it to work! Just had to change 'TreatAsMissing',{'********',''} to just 'TreatAsMissing','********', though I don't understand why that made the difference.
dpb
dpb il 22 Ago 2022
That does seem peculiar; the empty record is default; it's supposed to use either.
That might be worth a support Q? to TMW to ask if that is an expected result.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Large Files and Big Data in Help Center e File Exchange

Prodotti


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by