Contenuto principale

Measures of Central Tendency

Measures of central tendency locate a distribution of data along an appropriate scale.

The following table lists the functions that calculate the measures of central tendency.

Function Name

Description

geomean

Geometric mean

harmmean

Harmonic mean

mean

Arithmetic average

median

50th percentile

mode

Most frequent value

trimmean

Trimmed mean

The average is a simple and popular estimate of location. If the data sample comes from a normal distribution, then the sample mean is also optimal (minimum variance unbiased estimator (MVUE) of µ).

Unfortunately, outliers, data entry errors, or glitches exist in almost all real data. The sample mean is sensitive to these problems. One bad data value can move the average away from the center of the rest of the data by an arbitrarily large distance.

The median and trimmed mean are two measures that are resistant (robust) to outliers. The median is the 50th percentile of the sample, which will only change slightly if you add a large perturbation to any value. The idea behind the trimmed mean is to ignore a small percentage of the highest and lowest values of a sample when determining the center of the sample.

The geometric mean and harmonic mean, like the average, are not robust to outliers. They are useful when the sample is distributed lognormal or heavily skewed.

Measures of Central Tendency

This example shows how to compute and compare measures of location for sample data that contains one outlier.

Generate sample data that contains one outlier.

x = [ones(1,6),100]
x = 1×7

     1     1     1     1     1     1   100

Compute the geometric mean, harmonic mean, mean, median, and trimmed mean for the sample data.

locate = [geomean(x) harmmean(x) mean(x) median(x)... 
          trimmean(x,25)]
locate = 1×5

    1.9307    1.1647   15.1429    1.0000    1.0000

The mean (mean) is far from any data value because of the influence of the outlier. The geometric mean (geomean) and the harmonic mean (harmmean) are influenced by the outlier, but not as significantly. The median (median) and trimmed mean (trimmean) ignore the outlier value and describe the location of the rest of the data values.

See Also

Topics