applying time range to multiple txt files very slow

1 visualizzazione (ultimi 30 giorni)
minomi
minomi il 20 Ago 2018
Modificato: dpb il 22 Ago 2018
Hi there,
I have a large set of ".txt" data files. I then apply timerange to extract data between specific dates and times. My script looks something like this:
warning off
ds_loc = 'Z:\data\*.txt';
ds = datastore(ds_loc);
ds.ReadSize = 1000000;
ds.Delimiter = ' ';
ds.MultipleDelimitersAsOne = 1;
ds.SelectedFormats(1) = {'%{dd/MM/yyyy HH:mm:ss}D'};
warning on
% create time table
tt = tall(ds);
ttab = table2timetable(tt)
strt_time = '03/24/2018 10:00:00'
end_time = '03/25/2018 00:00:00'
warning off
S1 = timerange(strt_time,end_time);
warning on
ttab(S1,:)
The above script takes a long time to execute depending on the number of files in the datastore location i.e. "Z:\data". Is there a better way do this?
  7 Commenti
minomi
minomi il 22 Ago 2018
I'm sorry I don't quite understand what you've written. What are you suggesting is the way to do this?
dpb
dpb il 22 Ago 2018
Modificato: dpb il 22 Ago 2018
I was just commenting on the problem with sequence of defining the date format...it seems as though datastore reads data (how much I've no idea) to infer format on creation of the object but you can't tell it a priori what the date format is but have to do that with a property internal to the object. That means, it would seem, that if it gets it wrong it has to recompute or reread all that information that's a waste of time; if it did get it right at least that part is ok but historically when a format wasn't given processing was significantly longer than when one was; I don't know if that effect is true here or not.
As far as speeding up the retrieval, I don't have any real suggestions as I've not had opportunity to try to use any of the large data tools "in anger" so don't know their idiosyncracies at all.
Just how big are the files and how many are there? Might it possibly turn out to be faster to simply loop through them explicitly rather than using the overhead of the magic behind the scenes datastore object?
Are they all the same form or does the index vector have to be updated for each file? It appears that timerange makes the assumption of a fixed index across the population.

Accedi per commentare.

Risposte (0)

Categorie

Scopri di più su Data Preprocessing in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by