Reading multiple excel files with data store when the columns of the files change position
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
Juan Carlos Pozuelos Buezo
il 30 Ott 2019
Commentato: Juan Carlos Pozuelos Buezo
il 31 Ott 2019
Hi. I am trying to read hundreds of excel files with data store, but the files vary the position of the columns. Data Store reads the position of the columns in the first file and assumes that it is the same for the rest. when reviewing the data of the last excel files does not match, because its position was different. how do i solve it?
thanks.
2 Commenti
Guillaume
il 31 Ott 2019
Unfortunately, it's not something that's supported by the SpreadsheetDatastore. In theory, you would have to write your own custom datastore to handle this but it would be a lot of work.
However, have all the files got the exact same columns just in different position? If so, you could handle it with a TransformedDatastore. If you can attach two sample files (actual content can be rubbish as long as they've got the correct format), I can show you how to do that.
Risposta accettata
Guillaume
il 31 Ott 2019
This works with the very simple test case I built (of two excel files with columns in different order and the 2nd file with extra columns).
First create your datastore as normal, e.g:
ds = datastore('C:\somewhere\*.xlsx');
ds.SelectedVariableNames = {'Something', 'SomethingElse', 'Whatever'};
Then create a TransformedDatastore with that underlying datastore:
tds = transform(ds, @reorderds, 'IncludeInfo', true);
with the following reorderds.m file:
function [tout, infoin] = reorderds(tin, infoin)
%reorder the variables of a table read by datastore.read so that they match what is actually in the file
%works with a spreadsheetdatastore only
currentvars = tin.Properties.VariableNames; %these names come from the first file in the datastore and may not match the actual names in the file
opts = detectImportOptions(infoin.Filename, 'Sheet', infoin.SheetNames{1}); %so get the actual names with detectimportoptions
actualvars = opts.VariableNames;
[found, where] = ismember(currentvars, actualvars); %find actual location of the variables in the file
assert(all(found), 'Some expected variables not found in %s', infoin.Filename);
tout = tin(:, where); %reorder columns
tout.Properties.VariableNames = tin.Properties.VariableNames; %and set the name back to what they should be
end
and use tds instead of ds. Any read or readall operation with tds will automatically reorder the columns to their true location.
There will be a performance penalty to this since in effect each file is parsed twice, once by the internal datastore read and another time in reorderds by the detectImportOptions.
Più risposte (1)
Roofus Milton
il 31 Ott 2019
I would use a cell array populated by readtable then simply reference the colums by name.
0 Commenti
Vedere anche
Categorie
Scopri di più su Spreadsheets in Help Center e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!