unpreditable behavior of importdata

I have a data file consisting of 5 headerlines
% Model: Test
% Version: Software
% Date: May 31 2022, 17:12
% Table: Table 2 - Global Evaluation 5
% Time (s) gesamte Drucknderung im Wasser (Pa)
followed by datalines, each dataline consisting of two numbers separated by spaces, looking like
0 -2.5760650371409015E-5
0.1 -0.015952065077219565
0.2 -0.004477893850765524
0.30000000000000004 -0.011908521809613106
0.4 -0.0073951747614778305
.....
If I use the command
A = importdata(filename);
I usually get automaticely the variables A.data and A.textdata
But in some cases the import just creates the variable A (not A.data), with A containing the numbers and ignoring the headerlines.
Finally importdata always works correctly if I use the full command with the option headerlines 5
A = importdata(filename,' ',5);
What I do not understand is:
Why does importdata mostly automaticely creates A.data and A.textdata, but at some importfiles of the identical structure it only puts the numbers into the variable A.
Of course that leads to error messages while importing a number of datafiles in a loop, when after 16 files correctly suddenly the import of datafile number 17 puts the data into a variable A instead of A.data-

12 Commenti

dpb
dpb il 31 Mag 2022
Well, the answer is to always give importdata the help it needs to guarantee results wanted.
When you leave things to chance, you have to live with the consequences; that's the corollary of using the free format form of the function allowing it to try to determine the file structure on its own.
There is something different between the files; we obviously cannot tell what that might be without a sample file to compare, but it's bound to be there.
I generally recommend against using importdata for anything except interactive use at the command line for just such reasons; production code needs to be more robustly built to handle such issues that (almost) inevitably arise in the creation of input files -- somebody did something slightly differently or opened and changed something or there's a missing value or .... the possibilities are almost endless as to what caused the auto-detect routine to parse the files differently.
I've fallen into the habit of using detectImportOptions almost routinely in production code and the import object it creates with one of the newer input routines for just such reasons of having the control needed to ensure expected behavior.
Thank You for the comment. Ok, so it seems this is a common situation, that the recognition of the importfile's structure with importdata doesn't wort perfectly all the time. I never knew about that command.
detectImportOptions
That command seems to me being an interactive tool, requiring user interaction to correct for each import file?
dpb
dpb il 31 Mag 2022
It can be, but doesn't have to -- recognize you don't need a different import object for every file; just a model for the given/expected form of the specific structure to be read. Then, as long as the files follow the format contract, the read routine used with that options object will have the bread crumbs it needs to ensure the consistent interpretation.
Of course, if a file breaks the assumptions radically, it can/will still lead to unexpected results.
If the file structure is quite simple, it may be overkill; but it's the generic solution.
If your specific issue is with a file structure as you show, I'd use readmatrix with the fixed headerlines count to return the data in an array --if you need the header info, and you've already got code written using the importdata data structure, then there's nothing wrong with just setting the 'headerlinesIn' parameter and keeping the existing code.
I just tend to dislke having to dereference the structure to get to the data, so I rarely use importdata, that doesn't mean nobody else should, either! :)
It is useful when there are multiple headerlines and you may need to parse some of them for metadata later, like the time here.
I do not need the headerlines, just want the datalines
dpb
dpb il 31 Mag 2022
Modificato: dpb il 31 Mag 2022
In that case I'd switch tactics for these files and use
A=readmatrix(filename,'NumHeaderLines',5);
and get the data and only the data in an array, ready to use in native double array without any struct in the way.
This does require R2019a or later...
Or, given the nature of the data shown; it looks like maybe these are time-stamped traces; in that case using a timetable with the first column the duration in seconds and the second the pressure might be very conducive to later processing, depending on the need.
tP=readtable(filename,'HeaderLines',5,'ReadVariableNames',0);
tP.Properties.VariableNames={'Time','Pressure'};
tP.Time=seconds(tP.Time);
tP=table2time(tP,'RowTimes',{'Time'});
This is a little more involved on the front end, but may have advantages in use depending on just what are doing.
Here's where the import options object can come into its own -- it lets you preset the variable names, start of the data of interest, etc. inside it and simply pass it. Of course, having the code in line doesn't cost much once have it written once, either...
I handle the situation with importdata by not using importdata. In my opinion, the fundamental data type should not depend upon the fine details of the file contents.
dpb
dpb il 31 Mag 2022
Another case of the attempt to make everything easy actually creates difficulties...
I've to admit, I get a bit lost in the opportunities importing data:
  • importdata .... .might result in a variable or in a 'structure variable', contents data type is 'numeric'
  • readtable ... ....results in the data type 'table'
  • readmatrix ..... results in the data type 'numeric'
The datatype table seems to be tracted with different Matlab commands than the data type numeric. So data imported with readtable need to be manipulated with different code. That might be advantage or disadvantage ... anyhow it is irritating.
table() is a composite datatype, in which each column might be a different datatype. It is similar in that way to struct or cell. One would not expect to use the same operations.
For example suppose you had a table with datetime objects in the first column and rainfall in the second. If you were able to use mathematical functions the same way as numeric arrays then you would expect to be able to take the mean() along the second dimension, and to take the var() along the first dimension. But what does it mean to take the mean of a date and mm of rain? What does it mean to take the variance of dates?
There's also (at least, I won't guarantee this to be exhaustive) of the newer functions besides the venerable textscan and so on,
  • readcell ... returns cell array
  • readlines ... returns string array
  • readstruct ... returns a structure
  • readtimetable ...returns a time table
With the exception of importdata, all return a given data class; some of which are composite, some of which are a single data type.
importdata is an attempt at user-friendly to try to avoid the user having to know anything about the data -- this is convenience comes at a cost as you've discovered. TMW could have designed it to always return the struct instead of sometimes only returning an array at the cost of occasionally there being an empty text struct element. Walter and I are both in agreement that the fact it is amorphous(*) makes it unsuitable for use in code owing to the need to handle both alternatives.
The one to choose is dependent upon the data and what is to be done with it subsequently (tempered by one's personal coding preferences, perhaps, as well). But, the key is that no matter which one chooses, other than the one aberration, one knows the data class and can write subsequent code for such.
(*) I liken importdata to the use of the free format (*) in Fortran where the syntax of write(*,*) x let's one dump in debug statements or simple output to the console very easily without worrying about writing an explicit FORMAT for whatever variable x is. Folks use it, then complain they would like the spacing or precision to be different than that chosen by their compiler vendor -- the answer is, "If the format is important, then don't use free format!" That's the caveat with importdata -- if you want to be assured of a given data structure, then don't take the free format version of input in MATLAB, use the specific function for the format you want.
Though we do put up with uigetfile returning three different datatypes:
  • numeric if user canceled
  • cell array of character vectors if multiselect and user selected multiple files
  • character vector otherwise
Also, some of the lower level i/o operations can return a different data type at end of file.
There are also a number of functions that can return [], the empty double precision array, instead of an expected datatype.
... but all of those are relatively easy to deal with, whereas importdata can return valid data in different datatype and when it returns struct, fields are missing when they do not apply, instead of being present but empty.
dpb
dpb il 1 Giu 2022
Modificato: dpb il 1 Giu 2022
"Though we do put up with uigetfile..."
That's thanks to Bill G and company FileOpen dialog isn't it, Walter? TMW has made the MATLAB interface to match on all platforms; I don't know the other OS native API well enough to know their returns as to whether they were written to be more regular or not..and that there's really no alternative but to create something totally independently(*). With file i/o in MATLAB, one now has pretty good control over what one's going to get albeit that comes at a "veritable plethora" of choices that makes the startup for the new user tougher.
find and the empty return is one often overlooked "gotcha!"
(*) ADDENDUM: One could, of course, write oneself a wrapper function around the builtin dialog function that preprocesses the output to regularize it for the calling code. Haven't really thought about how that might actually look to be simpler to use...

Accedi per commentare.

Risposte (0)

Categorie

Scopri di più su Historical Contests in Centro assistenza e File Exchange

Prodotti

Release

R2021b

Richiesto:

il 31 Mag 2022

Modificato:

dpb
il 1 Giu 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by