unpreditable behavior of importdata

Question

0 voti

I have a data file consisting of 5 headerlines

% Model:              Test
% Version:            Software
% Date:               May 31 2022, 17:12
% Table:              Table 2 - Global Evaluation 5
% Time (s)            gesamte Drucknderung im Wasser (Pa)

followed by datalines, each dataline consisting of two numbers separated by spaces, looking like

                      -2.5760650371409015E-5
1                      -0.015952065077219565
2                      -0.004477893850765524
30000000000000004      -0.011908521809613106
4                      -0.0073951747614778305
.....

If I use the command

A = importdata(filename);

I usually get automaticely the variables A.data and A.textdata

But in some cases the import just creates the variable A (not A.data), with A containing the numbers and ignoring the headerlines.

Finally importdata always works correctly if I use the full command with the option headerlines 5

A = importdata(filename,' ',5);

What I do not understand is:

Why does importdata mostly automaticely creates A.data and A.textdata, but at some importfiles of the identical structure it only puts the numbers into the variable A.

Of course that leads to error messages while importing a number of datafiles in a loop, when after 16 files correctly suddenly the import of datafile number 17 puts the data into a variable A instead of A.data-

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

dpb il 31 Mag 2022

Well, the answer is to always give importdata the help it needs to guarantee results wanted.

When you leave things to chance, you have to live with the consequences; that's the corollary of using the free format form of the function allowing it to try to determine the file structure on its own.

There is something different between the files; we obviously cannot tell what that might be without a sample file to compare, but it's bound to be there.

I generally recommend against using importdata for anything except interactive use at the command line for just such reasons; production code needs to be more robustly built to handle such issues that (almost) inevitably arise in the creation of input files -- somebody did something slightly differently or opened and changed something or there's a missing value or .... the possibilities are almost endless as to what caused the auto-detect routine to parse the files differently.

I've fallen into the habit of using detectImportOptions almost routinely in production code and the import object it creates with one of the newer input routines for just such reasons of having the control needed to ensure expected behavior.

dpb il 31 Mag 2022

It can be, but doesn't have to -- recognize you don't need a different import object for every file; just a model for the given/expected form of the specific structure to be read. Then, as long as the files follow the format contract, the read routine used with that options object will have the bread crumbs it needs to ensure the consistent interpretation.

Of course, if a file breaks the assumptions radically, it can/will still lead to unexpected results.

If the file structure is quite simple, it may be overkill; but it's the generic solution.

If your specific issue is with a file structure as you show, I'd use readmatrix with the fixed headerlines count to return the data in an array --if you need the header info, and you've already got code written using the importdata data structure, then there's nothing wrong with just setting the 'headerlinesIn' parameter and keeping the existing code.

I just tend to dislke having to dereference the structure to get to the data, so I rarely use importdata, that doesn't mean nobody else should, either! :)

It is useful when there are multiple headerlines and you may need to parse some of them for metadata later, like the time here.

dpb il 31 Mag 2022

Modificato: dpb il 31 Mag 2022

Apri in MATLAB Online

In that case I'd switch tactics for these files and use

A=readmatrix(filename,'NumHeaderLines',5);

and get the data and only the data in an array, ready to use in native double array without any struct in the way.

This does require R2019a or later...

Or, given the nature of the data shown; it looks like maybe these are time-stamped traces; in that case using a timetable with the first column the duration in seconds and the second the pressure might be very conducive to later processing, depending on the need.

tP=readtable(filename,'HeaderLines',5,'ReadVariableNames',0);
tP.Properties.VariableNames={'Time','Pressure'};
tP.Time=seconds(tP.Time);
tP=table2time(tP,'RowTimes',{'Time'});

This is a little more involved on the front end, but may have advantages in use depending on just what are doing.

Here's where the import options object can come into its own -- it lets you preset the variable names, start of the data of interest, etc. inside it and simply pass it. Of course, having the code in line doesn't cost much once have it written once, either...

dpb il 1 Giu 2022

There's also (at least, I won't guarantee this to be exhaustive) of the newer functions besides the venerable textscan and so on,

readcell ... returns cell array
readlines ... returns string array
readstruct ... returns a structure
readtimetable ...returns a time table

With the exception of importdata, all return a given data class; some of which are composite, some of which are a single data type.

importdata is an attempt at user-friendly to try to avoid the user having to know anything about the data -- this is convenience comes at a cost as you've discovered. TMW could have designed it to always return the struct instead of sometimes only returning an array at the cost of occasionally there being an empty text struct element. Walter and I are both in agreement that the fact it is amorphous(*) makes it unsuitable for use in code owing to the need to handle both alternatives.

The one to choose is dependent upon the data and what is to be done with it subsequently (tempered by one's personal coding preferences, perhaps, as well). But, the key is that no matter which one chooses, other than the one aberration, one knows the data class and can write subsequent code for such.

(*) I liken importdata to the use of the free format (*) in Fortran where the syntax of write(*,*) x let's one dump in debug statements or simple output to the console very easily without worrying about writing an explicit FORMAT for whatever variable x is. Folks use it, then complain they would like the spacing or precision to be different than that chosen by their compiler vendor -- the answer is, "If the format is important, then don't use free format!" That's the caveat with importdata -- if you want to be assured of a given data structure, then don't take the free format version of input in MATLAB, use the specific function for the format you want.

Walter Roberson il 1 Giu 2022

Though we do put up with uigetfile returning three different datatypes:

numeric if user canceled
cell array of character vectors if multiselect and user selected multiple files
character vector otherwise

Also, some of the lower level i/o operations can return a different data type at end of file.

There are also a number of functions that can return [], the empty double precision array, instead of an expected datatype.

... but all of those are relatively easy to deal with, whereas importdata can return valid data in different datatype and when it returns struct, fields are missing when they do not apply, instead of being present but empty.

dpb il 1 Giu 2022

Modificato: dpb il 1 Giu 2022

"Though we do put up with uigetfile..."

That's thanks to Bill G and company FileOpen dialog isn't it, Walter? TMW has made the MATLAB interface to match on all platforms; I don't know the other OS native API well enough to know their returns as to whether they were written to be more regular or not..and that there's really no alternative but to create something totally independently(*). With file i/o in MATLAB, one now has pretty good control over what one's going to get albeit that comes at a "veritable plethora" of choices that makes the startup for the new user tougher.

find and the empty return is one often overlooked "gotcha!"

(*) ADDENDUM: One could, of course, write oneself a wrapper function around the builtin dialog function that preprocesses the output to regularize it for the calling code. Haven't really thought about how that might actually look to be simpler to use...

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

unpreditable behavior of importdata

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Risposte (0)

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

unpreditable behavior of importdata

12 Commenti Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Risposte (0)

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti