In memory calculations with tall arrays from different databases

Imagine I have two data bases as (Table and double numbers in them)
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
Also imagine that I created my tall arrays as
X = tall(ds_1);
Y = tall(ds_2);
Now, let's imagine that I trained a model, mdl, with fitlm and I want to use this model to predict from X and Y as
Anwer = predict(mdl, [X,Y]);
The error I receive is this
Error using tall/horzcat (line 23)
Incompatible tall array arguments. The tall arrays must be based on the
same datastore.
How can I solve this problem without gathering the data and just use in memory capabilities?

Risposte (1)

Guillaume
Guillaume il 16 Lug 2019
Modificato: Guillaume il 16 Lug 2019
If you have R2019a, you can combine your two datastores.
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
ds_combined = combine(ds_1, ds_2);
Answer = predict(mdl, tall(ds_combined));
In previous versions, I'm not sure that there's a way to do it other than creating your own custom datastore that would keep track of both datastores (essentially recreating the R2019a CombinedDatastore).

6 Commenti

Hi Guillaume . I do not have 2019 version but when I tried to combine the two data bases on my 2017a, I got this error:
>> DSS = combine(ds_1,ds_2);
Warning: combine will be removed in a future release.
> In combine (line 21)
The following error occurred converting from matlab.io.datastore.TabularTextDatastore to double:
Conversion to double from matlab.io.datastore.TabularTextDatastore is not possible.
Error in combine (line 53)
eval(evalStr);
Most likely, you're picking up the combine function from the Symbolic Math Toolbox which of course is not the same thing at all.
The datastore combine function was introduced in R2019a. It does not exist in R2017a. Unfortunately for you, I think the only way you can do what you want is to write your own custom datastore which basically would store both datastores as member variables, and delegate and combine all datastore operations. You would be effectively recreating the R2019a CombinedDatastore. While it's not particularly complicated if you know how to create custom datastores, I can't really give an implementation as I would be plagiarising Mathworks copyrighted code.
Thanks. I will work on it to see if I can figure it out.
I upgraded my Matlab today. The tall array generation from combined datasores is not compatible with parallel compution. It asks me to do the calculations in series which is not what I need.
I also notices that there might be an easier way. What if I make the datastore as
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
I still cannot make a tall array from X1 and X2 as
X_NEW = tall([X1, X2]);
As it gives me the following error.
Error using tall/horzcat (line 21)
Duplicate table variable name: 'Var1'.
"The tall array generation from combined datasores is not compatible with parallel compution"
I would recommend raising a service request with matlab then, as they should make it possible to create a combined datastore that has the exact same properties as the source datastores (if they are compatible). I don't have the parallel toolbox, so I'm not sure what these properties are. Since you now have access to the source code of CombinedDatastore (in fullfile(matlaroot, 'toolbox\matlab\datastoreio\+matlab\+io\+datastore')), you could also copy it and make the required modifications.
I'm not sure you will be able to concatenate two tall arrays from the same datastore since by necessity they will have the same variable names, so indeed horizontal concatenation will create duplicate variable names which is not allowed. The only way this could work is if you are allowed to modify the variable names of the tall array. See if this work:
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
X2.Properties.VariableNames = compose('X2Var%d', 1:width(X2));

Accedi per commentare.

Richiesto:

il 16 Lug 2019

Commentato:

il 18 Lug 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by