# DaviesBouldinEvaluation

Package: clustering.evaluation
Superclasses: `ClusterCriterion`

Davies-Bouldin criterion clustering evaluation object

## Description

`DaviesBouldinEvaluation` is an object consisting of sample data, clustering data, and Davies-Bouldin criterion values used to evaluate the optimal number of clusters. Create a Davies-Bouldin criterion clustering evaluation object using `evalclusters`.

## Construction

`eva = evalclusters(x,clust,'DaviesBouldin')` creates a Davies-Bouldin criterion clustering evaluation object.

`eva = evalclusters(x,clust,'DaviesBouldin',Name,Value)` creates a Davies-Bouldin criterion clustering evaluation object using additional options specified by one or more name-value pair arguments.

### Input Arguments

expand all

Input data, specified as an N-by-P matrix. N is the number of observations, and P is the number of variables.

Data Types: `single` | `double`

Clustering algorithm, specified as one of the following.

 `'kmeans'` Cluster the data in `x` using the `kmeans` clustering algorithm, with `'EmptyAction'` set to `'singleton'` and `'Replicates'` set to `5`. `'linkage'` Cluster the data in `x` using the `clusterdata` agglomerative clustering algorithm, with `'Linkage'` set to `'ward'`. `'gmdistribution'` Cluster the data in `x` using the `gmdistribution` Gaussian mixture distribution algorithm, with `'SharedCov'` set to `true` and `'Replicates'` set to `5`.

If `criterion` is `'CalinskiHarabasz'`, `'DaviesBouldin'`, or `'silhouette'`, you can specify a clustering algorithm using a function handle. The function must be of the form `C = clustfun(DATA,K)`, where `DATA` is the data to be clustered, and `K` is the number of clusters. The output of `clustfun` must be one of the following:

• A vector of integers representing the cluster index for each observation in `DATA`. There must be `K` unique values in this vector.

• A numeric n-by-K matrix of score for n observations and K classes. In this case, the cluster index for each observation is determined by taking the largest score value in each row.

If `criterion` is `'CalinskiHarabasz'`, `'DaviesBouldin'`, or `'silhouette'`, you can also specify `clust` as a n-by-K matrix containing the proposed clustering solutions. n is the number of observations in the sample data, and K is the number of proposed clustering solutions. Column j contains the cluster indices for each of the N points in the jth clustering solution.

Data Types: `single` | `double` | `char` | `string` | `function_handle`

Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'KList',[1:5]` specifies to test 1, 2, 3, 4, and 5 clusters to find the optimal number.

List of number of clusters to evaluate, specified as the comma-separated pair consisting of `'KList'` and a vector of positive integer values. You must specify `KList` when `clust` is a clustering algorithm name or a function handle. When `criterion` is `'gap'`, `clust` must be a character vector, a string scalar, or a function handle, and you must specify `KList`.

Example: `'KList',[1:6]`

Data Types: `single` | `double`

## Properties

 `ClusteringFunction` Clustering algorithm used to cluster the input data, stored as a valid clustering algorithm name or function handle. If the clustering solutions are provided in the input, `ClusteringFunction` is empty. `CriterionName` Name of the criterion used for clustering evaluation, stored as a valid criterion name. `CriterionValues` Criterion values corresponding to each proposed number of clusters in `InspectedK`, stored as a vector of numerical values. `InspectedK` List of the number of proposed clusters for which to compute criterion values, stored as a vector of positive integer values. `Missing` Logical flag for excluded data, stored as a column vector of logical values. If `Missing` equals `true`, then the corresponding value in the data matrix `x` is not used in the clustering solution. `NumObservations` Number of observations in the data matrix `X`, minus the number of missing (`NaN`) values in `X`, stored as a positive integer value. `OptimalK` Optimal number of clusters, stored as a positive integer value. `OptimalY` Optimal clustering solution corresponding to `OptimalK`, stored as a column vector of positive integer values. If the clustering solutions are provided in the input, `OptimalY` is empty. `X` Data used for clustering, stored as a matrix of numerical values.

## Methods

### Inherited Methods

 addK Evaluate additional numbers of clusters compact Compact clustering evaluation object plot Plot clustering evaluation object criterion values

## Examples

collapse all

Evaluate the optimal number of clusters using the Davies-Bouldin clustering evaluation criterion.

Generate sample data containing random numbers from three multivariate distributions with different parameter values.

```rng('default'); % For reproducibility mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0 ; 0 0.3]; mu3 = [-2, -2]; sigma3 = [1 0 ; 0 0.9]; N = 200; X = [mvnrnd(mu1,sigma1,N);... mvnrnd(mu2,sigma2,N);... mvnrnd(mu3,sigma3,N)];```

Evaluate the optimal number of clusters using the Davies-Bouldin criterion. Cluster the data using `kmeans`.

`E = evalclusters(X,'kmeans','DaviesBouldin','klist',[1:6])`
```E = DaviesBouldinEvaluation with properties: NumObservations: 600 InspectedK: [1 2 3 4 5 6] CriterionValues: [NaN 0.4663 0.4454 0.8316 1.0444 0.9236] OptimalK: 3 ```

The `OptimalK` value indicates that, based on the Davies-Bouldin criterion, the optimal number of clusters is three.

Plot the Davies-Bouldin criterion values for each number of clusters tested.

```figure; plot(E)```

The plot shows that the lowest Davies-Bouldin value occurs at three clusters, suggesting that the optimal number of clusters is three.

Create a grouped scatter plot to visually examine the suggested clusters.

```figure; gscatter(X(:,1),X(:,2),E.OptimalY,'rbg','xod')```

The plot shows three distinct clusters within the data: Cluster 1 is in the lower-left corner, cluster 2 is in the upper-right corner, and cluster 3 is near the center of the plot.

expand all

## References

[1] Davies, D. L., and D. W. Bouldin. “A Cluster Separation Measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. PAMI-1, No. 2, 1979, pp. 224–227.