# Compute Mean Value with MapReduce

This example shows how to compute the mean of a single variable in a data set using `mapreduce`. It demonstrates a simple use of `mapreduce` with one key, minimal computation, and an intermediate state (accumulating intermediate sum and count).

### Prepare Data

Create a datastore using the `airlinesmall.csv` data set. This 12-megabyte data set contains 29 columns of flight information for several airline carriers, including arrival and departure times. In this example, select `ArrDelay` (flight arrival delay) as the variable of interest.

```ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA'); ds.SelectedVariableNames = 'ArrDelay';```

The datastore treats `'NA'` values as missing, and replaces the missing values with `NaN` values by default. Additionally, the `SelectedVariableNames` property allows you to work with only the selected variable of interest, which you can verify using `preview`.

`preview(ds)`
```ans=8×1 table ArrDelay ________ 8 8 21 13 4 59 3 11 ```

### Run MapReduce

The `mapreduce` function requires a map function and a reduce function as inputs. The mapper receives blocks of data and outputs intermediate results. The reducer reads the intermediate results and produces a final result.

In this example, the mapper finds the count and sum of the arrival delays in each block of data. The mapper then stores these values as the intermediate values associated with the key `"PartialCountSumDelay"`.

Display the map function file.

```function meanArrivalDelayMapper (data, info, intermKVStore) % Data is an n-by-1 table of the ArrDelay. Remove missing values first: data(isnan(data.ArrDelay),:) = []; % Record the partial counts and sums and the reducer will accumulate them. partCountSum = [length(data.ArrDelay), sum(data.ArrDelay)]; add(intermKVStore, "PartialCountSumDelay",partCountSum); end ```

The reducer accepts the count and sum for each block stored by the mapper. It sums up the values to obtain the total count and total sum. The overall mean arrival delay is a simple division of the values. `mapreduce` only calls this reducer once, since the mapper only adds a single unique key. The reducer uses `add` to add a single key-value pair to the output.

Display the reduce function file.

```function meanArrivalDelayReducer(intermKey, intermValIter, outKVStore) count = 0; sum = 0; while hasnext(intermValIter) countSum = getnext(intermValIter); count = count + countSum(1); sum = sum + countSum(2); end meanDelay = sum/count; % The key-value pair added to outKVStore will become the output of mapreduce add(outKVStore,"MeanArrivalDelay",meanDelay); end ```

Use `mapreduce` to apply the map and reduce functions to the datastore, `ds`.

`meanDelay = mapreduce(ds, @meanArrivalDelayMapper, @meanArrivalDelayReducer);`
```******************************** * MAPREDUCE PROGRESS * ******************************** Map 0% Reduce 0% Map 16% Reduce 0% Map 32% Reduce 0% Map 48% Reduce 0% Map 65% Reduce 0% Map 81% Reduce 0% Map 97% Reduce 0% Map 100% Reduce 0% Map 100% Reduce 100% ```

`mapreduce` returns a datastore, `meanDelay`, with files in the current folder.

Read the final result from the output datastore, `meanDelay`.

`readall(meanDelay)`
```ans=1×2 table Key Value ____________________ __________ {'MeanArrivalDelay'} {[7.1201]} ```

### Local Functions

Listed here are the map and reduce functions that `mapreduce` applies to the data.

```function meanArrivalDelayMapper (data, info, intermKVStore) % Data is an n-by-1 table of the ArrDelay. Remove missing values first: data(isnan(data.ArrDelay),:) = []; % Record the partial counts and sums and the reducer will accumulate them. partCountSum = [length(data.ArrDelay), sum(data.ArrDelay)]; add(intermKVStore, "PartialCountSumDelay",partCountSum); end %------------------------------------------------------------------------- function meanArrivalDelayReducer(intermKey, intermValIter, outKVStore) count = 0; sum = 0; while hasnext(intermValIter) countSum = getnext(intermValIter); count = count + countSum(1); sum = sum + countSum(2); end meanDelay = sum/count; % The key-value pair added to outKVStore will become the output of mapreduce add(outKVStore,"MeanArrivalDelay",meanDelay); end %-------------------------------------------------------------------------```