Big Data Processing
mapreduce
, on Spark® and Hadoop® clustersYou can use Parallel Computing Toolbox™ to distribute large arrays in parallel across multiple MATLAB® workers, so that you can run big-data applications that use the
combined memory of your cluster. You operate on the entire array as a single entity,
however, workers operate only on their part of the array, and automatically transfer
data between themselves when necessary. Parallel Computing Toolbox also enables you to execute MATLAB tall array and datastore
calculations in
parallel, so that you can analyze big data sets that do not fit in the memory of
your cluster. You can use MATLAB
Parallel Server™ to run tall array and datastore
calculations in
parallel on Spark enabled Hadoop clusters. Doing so significantly reduces
the execution time of very large data calculations.
Categories
- Distributed Arrays
Analyze big data sets in parallel using distributed arrays and simultaneous execution
- Tall Arrays and mapreduce
Analyze big data sets in parallel using MATLAB tall arrays and datastores ormapreduce
on Spark and Hadoop clusters, and parallel pools