Training a classifier in batches of data

2 visualizzazioni (ultimi 30 giorni)
How to run machine learning analysis (Classification Learner) on large datasets of GigaBytes  size which due to memory constraints cannot be loaded into MATLAB workspace. Is there a way to train a classifier in batches, for example, using datastore to read n rows at a time and throw it into the classifier. Is there any way to deal with big data that cannot be possibly loaded into memory all at once?

Risposta accettata

MathWorks Support Team
MathWorks Support Team il 3 Mar 2021
Modificato: MathWorks Support Team il 3 Mar 2021
There is no easy way to train a classifier in batches in MATLAB R2016a, but the latest MATLAB prerelease i.e. R2016b has introduced a new feature 'tall arrays' using which one does not need to explicitly load big-data into the MATLAB workspace in order to perform operations on it. That being said, the functionality it supports is still limited as it is a new feature. 
Following are the ways in which one can make use of the new feature:
1. Use a 'tall' table to work with data too large to fit into memory. As of the R2016b Prerelease, the only type of classifier with tall table support is 'fitcdiscr' which is a discriminant analysis classifier . Its documentation is available at:
Use a tall table just as you would a standard table in this function.  However, there are some limitations to the options one can set for the classifier. Functions that support tall arrays(by type) can be accessed as follows:
In MATLAB R2016b command prompt run the following command: 
>> web(fullfile(docroot, 'matlab/import_export/functions-that-support-tall-arrays-by-type.html'))
2. Use the Classification Learner app to generate code for using the discriminant analysis classifier (both linear and quadratic) which can then be modified to use a tall table rather than the standard table.
If one wishes to use Neural Networks, they can use the 'trainb' function available from R2016a onwards to train the weights and biases in batches. Documentation is available at:
  2 Commenti
Greg Heath
Greg Heath il 24 Giu 2016
For more examples search BOTH the NEWGROUP and ANSWERS for approximately 35 examples each using
greg neural batch
Hope this helps.
Greg
Bernhard Suhm
Bernhard Suhm il 28 Feb 2019
Probably you've moved on to other work long time ago, but by now "tall arrays" will make training on large datasets fairly straight forward. However, they aren't supported in the Learner apps, you'll need to run them with the desired fit* functions.

Accedi per commentare.

Più risposte (0)

Prodotti


Release

R2016a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by