Main Content

coalesce

Class: matlab.compiler.mlspark.RDD
Namespace: matlab.compiler.mlspark

Reduce the number of partitions in an RDD

Syntax

result = coalesce(obj,numPartitions,doShuffle)

Description

result = coalesce(obj,numPartitions,doShuffle) reduces the number of partitions in an RDD to a number specified by numPartitions.

Input Arguments

expand all

An input RDD, specified as a RDD object.

Number of partitions to create, specified as a scalar value.

Data Types: double

Specify whether shuffle must be performed or not. By default doShuffle is set to false.

Data Types: logical

Output Arguments

expand all

An RDD with reduced number of partitions, returned as a RDD object.

Examples

expand all

%% Connect to Spark
sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);
sc = matlab.compiler.mlspark.SparkContext(conf);

%% coalesce
inputRDD = sc.parallelize({'A','B','C','A','B'},2);
redRDD= inputRDD.map(@(x)({x,1})).reduceByKey(@(x,y)(x+y),3);
coaRDD = redRDD.checkpoint(2);
viewRes = coaRDD.glom.collect() % {{{'B',2}},{{'C',1},{'A',2}}}

Version History

Introduced in R2016b