How to send a big data (loaded into datastore object) to a classifier in Matlab?
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
this is my first experince working with data storages in `Matlab`. I hoping I can get some guidance here. I have a big data that I have saved features and corresponding labels of each rows into two `txt` file: one is `data.txt` and one is `label.txt`. Each file has `264e6 rows`. I did the following steps:
%creating datastore objects
datafile='data.txt';
ds=datastore(datafile,'TreatAsMissing','NA');
labelfile='label.txt';
ds_lbl=datastore(labelfile,'TreatAsMissing','NA');
After sending to classifier, I am facing the following error:
Mdl=fitcnb(read(ds),read(ds_lbl));
Error using classreg.learning.FullClassificationRegressionModel.prepareDataCR (line 201)
X and Y do not have the same number of observations.
Error in classreg.learning.classif.FullClassificationModel.prepareData (line 487)
classreg.learning.FullClassificationRegressionModel.prepareDataCR(...
Error in ClassificationNaiveBayes.prepareData (line 143)
prepareData@classreg.learning.classif.FullClassificationModel(X,Y,varargin{:},'OrdinalIsCategorical',true);
Error in classreg.learning.FitTemplate/fit (line 213)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});
Error in ClassificationNaiveBayes.fit (line 132)
this = fit(temp,X,Y);
Error in fitcnb (line 307)
this = ClassificationNaiveBayes.fit(X,Y,RemainingArgs{:});
With predefined `Readsize`, which is `20000` the classifier works. But even whenever I change the Readsize to `1e6`, it is showing the same error. The other point is that with predefined readsize, classifier is only able to classify `20000` records, while I have `264e6 rcords`.
I really appreciate if you suggest a solution. How can I send datastorage to the classifier?
0 Commenti
Risposte (1)
Don Mathis
il 30 Mag 2017
I think you need to pass tall arrays or a tall table to fitcnb. See the documentation here: http://www.mathworks.com/help/stats/fitcnb.html?searchHighlight=fitcnb&s_tid=doc_srchtitle#bvnjlgv
and here:
You can get a tall table from a datastore like this:
tt = tall(ds)
3 Commenti
Don Mathis
il 5 Giu 2017
Modificato: Don Mathis
il 5 Giu 2017
I have not tried to do this myself, but from the error message it looks like you need to create your two tall arrays from the same datastore. So you'll need to put your labels in the same datastore as your features. I guess you could concatenate your two txt files "side by side", and then create your single datastore. After that, I think you would create a single tall array from that datastore, and then pass the 'features' columns of that as X and the 'label' column as Y, using the syntax fitcnb(X,Y).
Vedere anche
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!