Extracting a wide table from a very tall datastore array, also trying KDB+

Question

Michael il 28 Ago 2019

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/477985-extracting-a-wide-table-from-a-very-tall-datastore-array-also-trying-kdb

Commentato: Michael il 12 Set 2019

Hello,

I am currently storing about 40GB of data in a 4 column tall array with about 1B rows. I'm using a Matlab Tall Array Datastore, but I also plan on experimenting with KDB+.

Could someone comment on the best way to extract a wide table, with about 350,000 rows and about 20 columns by selecting some of the items in the first three columns and the corresponding values,e.g DATE = {29-Jul-1983:31-Aug-1983} & STOCK = AAPL US EQUITY & FIELD= {MOV_AVG_50D & PRICE}?

Thank You,

Michael

Input

DATE,STOCK,FIELD,VALUE
29-Jul-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.8979
31-Aug-1983 00:00:00,BHP AT EQUITY,PRICE,0.9029
29-Jul-1983 00:00:00,IBM US EQUITY,MOV_AVG_50D,0.9106
31-Aug-1983 00:00:00,IBM US EQUITY,PRICE_50D,0.9154
29-Jul-1983 00:00:00,AAPL US EQUITY,MOV_AVG_50D,0.9227
31-Aug-1983 00:00:00,AAPL US EQUITY,PRICE,0.9311

Output:

DATE,MOV_AVG_50D,PRICE
29-Jul-1983 00:00:00,0.9227,NaN
31-Aug-1983 00:00:00,NaN,0.9311

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Michael il 29 Ago 2019

Apri in MATLAB Online

This is what I tried so far.

ds = datastore('bigtall.csv','DatetimeType','text');
ds.SelectedFormats={'%q','%C','%C','%f'};
tds = tall(ds);tm.cputime(1)=cputime;
tic;
t = tds(tds.TICKER=='UNIT US EQUITY' & tds.FIELD=='EBITDA_MARGIN',:);
T = gather(t);
tm.cputime(2)=cputime;
tm.tictoc(2)= toc;
fprintf('%d minutes CPU, %d minutes\n',[diff(tm.cputime) tm.tictoc(2)]./60);
fprintf('Table size [%dx%d]\n',size(T));
T.DATE = datetime(T.DATE);
tb = unstack(T(:,{'DATE','VALUE','FIELD'}), 'VALUE', 'FIELD');
fprintf('Table size [%dx%d]\n',size(tb));

The output took 23 minutes.

Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 20 min 44 sec
Evaluation completed in 21 min 26 sec
5 minutes CPU, 23 minutes
Table size [762x4]
Table size [762x2]

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Michael il 12 Set 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/477985-extracting-a-wide-table-from-a-very-tall-datastore-array-also-trying-kdb#answer_391638

Hello,

Here's an update. Unfortunatly, or fortuntely depending on how you look at it, KDB+ is about 45x times faster (31 seconds vs. 23 minutes) and can be accessed through Matlab's datafeed toolbox.

Best,

Michael

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Michael il 12 Set 2019

Apri in MATLAB Online

This took 1 second instead of 23 minutes!

select from f where TICKER=`$("AAPL US EQUITY"), FIELD in (`MOV_AVG_50D,`HIGH), DATE within 1983.07.29 1983.08.31

Accedi per commentare.

Extracting a wide table from a very tall datastore array, also trying KDB+

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Extracting a wide table from a very tall datastore array, also trying KDB+

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti