In memory calculations with tall arrays from different databases
Mostra commenti meno recenti
Imagine I have two data bases as (Table and double numbers in them)
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
Also imagine that I created my tall arrays as
X = tall(ds_1);
Y = tall(ds_2);
Now, let's imagine that I trained a model, mdl, with fitlm and I want to use this model to predict from X and Y as
Anwer = predict(mdl, [X,Y]);
The error I receive is this
Error using tall/horzcat (line 23)
Incompatible tall array arguments. The tall arrays must be based on the
same datastore.
How can I solve this problem without gathering the data and just use in memory capabilities?
Risposte (1)
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
ds_combined = combine(ds_1, ds_2);
Answer = predict(mdl, tall(ds_combined));
In previous versions, I'm not sure that there's a way to do it other than creating your own custom datastore that would keep track of both datastores (essentially recreating the R2019a CombinedDatastore).
6 Commenti
Guillaume
il 16 Lug 2019
Most likely, you're picking up the combine function from the Symbolic Math Toolbox which of course is not the same thing at all.
The datastore combine function was introduced in R2019a. It does not exist in R2017a. Unfortunately for you, I think the only way you can do what you want is to write your own custom datastore which basically would store both datastores as member variables, and delegate and combine all datastore operations. You would be effectively recreating the R2019a CombinedDatastore. While it's not particularly complicated if you know how to create custom datastores, I can't really give an implementation as I would be plagiarising Mathworks copyrighted code.
TOSA2016
il 16 Lug 2019
TOSA2016
il 17 Lug 2019
TOSA2016
il 17 Lug 2019
Guillaume
il 18 Lug 2019
"The tall array generation from combined datasores is not compatible with parallel compution"
I would recommend raising a service request with matlab then, as they should make it possible to create a combined datastore that has the exact same properties as the source datastores (if they are compatible). I don't have the parallel toolbox, so I'm not sure what these properties are. Since you now have access to the source code of CombinedDatastore (in fullfile(matlaroot, 'toolbox\matlab\datastoreio\+matlab\+io\+datastore')), you could also copy it and make the required modifications.
I'm not sure you will be able to concatenate two tall arrays from the same datastore since by necessity they will have the same variable names, so indeed horizontal concatenation will create duplicate variable names which is not allowed. The only way this could work is if you are allowed to modify the variable names of the tall array. See if this work:
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
X2.Properties.VariableNames = compose('X2Var%d', 1:width(X2));
Categorie
Scopri di più su Tall Arrays in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!