Documentation

histcounts2

Bivariate histogram bin counts

Description

example

[N,Xedges,Yedges] = histcounts2(X,Y) partitions the values in X and Y into 2-D bins, and returns the bin counts, as well as the bin edges in each dimension. The histcounts2 function uses an automatic binning algorithm that returns uniform bins chosen to cover the range of values in X and Y and reveal the underlying shape of the distribution.

example

[N,Xedges,Yedges] = histcounts2(X,Y,nbins) specifies the number of bins to use in each dimension.

example

[N,Xedges,Yedges] = histcounts2(X,Y,Xedges,Yedges) partitions X and Y into bins with the bin edges specified by Xedges and Yedges.

N(i,j) counts the value [X(k),Y(k)] if Xedges(i)X(k) < Xedges(i+1) and Yedges(j)Y(k) < Yedges(j+1). The last bins in each dimension also include the last (outer) edge. For example, [X(k),Y(k)] falls into the ith bin in the last row if Xedges(end-1)X(k)Xedges(end) and Yedges(i)Y(k) < Yedges(i+1).

example

[N,Xedges,Yedges] = histcounts2(___,Name,Value) uses additional options specified by one or more Name,Value pair arguments using any of the input arguments in previous syntaxes. For example, you can specify 'BinWidth' and a two-element vector to adjust the width of the bins in each dimension.

example

[N,Xedges,Yedges,binX,binY] = histcounts2(___) also returns index arrays binX and binY, using any of the previous syntaxes. binX and binY are arrays of the same size as X and Y whose elements are the bin indices for the corresponding elements in X and Y. The number of elements in the (i,j)th bin is equal to nnz(binX==i & binY==j), which is the same as N(i,j) if Normalization is 'count'.

Examples

collapse all

Distribute 100 pairs of random numbers into bins. histcounts2 automatically chooses an appropriate bin width to reveal the underlying distribution of the data.

x = randn(100,1);
y = randn(100,1);
[N,Xedges,Yedges] = histcounts2(x,y)
N = 7×6

0     0     0     2     0     0
1     2    10     4     0     0
1     4     9     9     5     0
1     4    10    11     5     1
1     4     6     3     1     1
0     0     1     2     0     0
0     0     1     0     1     0

Xedges = 1×8

-3    -2    -1     0     1     2     3     4

Yedges = 1×7

-3    -2    -1     0     1     2     3

Distribute 10 pairs of numbers into 12 bins. Specify 3 bins in the x-dimension, and 4 bins in the y-dimension.

x = [1 1 2 3 2 2 1 1 2 3];
y = [5 6 3 8 9 1 2 7 5 1];
nbins = [3 4];
[N,Xedges,Yedges] = histcounts2(x,y,nbins)
N = 3×4

1     0     2     1
1     1     1     1
1     0     0     1

Xedges = 1×4

0.6000    1.4000    2.2000    3.0000

Yedges = 1×5

0    2.3000    4.6000    6.9000    9.2000

Distribute 1,000 pairs of random numbers into bins. Define the bin edges with two vectors: one each for the x and y dimensions. The first element in each vector specifies the first edge of the first bin, and the last element is the last edge of the last bin.

x = randn(1000,1);
y = randn(1000,1);
Xedges = -5:5;
Yedges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5];
N = histcounts2(x,y,Xedges,Yedges)
N = 10×10

0     0     0     0     0     0     0     0     0     0
0     0     0     0     1     1     1     0     0     0
0     0     5     5     3     5     1     2     0     0
0     2    19    23    29    25    26    20     5     0
0    10    36    51    59    71    54    46    10     0
0     7    43    46    79    64    60    46     9     0
0     3    12    18    21    23    19     9     6     0
0     0     5     3     2     8     2     2     0     0
0     0     0     1     1     1     0     0     0     0
0     0     0     0     0     0     0     0     0     0

Distribute 1,000 pairs of random numbers into bins. Specify Normalization as 'probability' to normalize the bin counts such that sum(N(:)) is 1. That is, each bin count represents the probability that an observation falls within that bin.

x = randn(1000,1);
y = randn(1000,1);
[N,Xedges,Yedges] = histcounts2(x,y,6,'Normalization','probability')
N = 6×6

0         0    0.0020    0.0020         0         0
0    0.0110    0.0320    0.0260    0.0070    0.0010
0.0010    0.0260    0.1410    0.1750    0.0430    0.0060
0    0.0360    0.1620    0.1940    0.0370    0.0040
0    0.0040    0.0300    0.0370    0.0100    0.0010
0    0.0030    0.0040    0.0040    0.0010         0

Xedges = 1×7

-4.0000   -2.7000   -1.4000   -0.1000    1.2000    2.5000    3.8000

Yedges = 1×7

-4.0000   -2.7000   -1.4000   -0.1000    1.2000    2.5000    3.8000

Distribute 1,000 random integer pairs between -10 and 10 into bins, and specify BinMethod as 'integers' to use unit-width bins centered on integers. Specify five outputs for histcounts2 to return vectors representing the bin placement of the data.

x = randi([-10,10],1000,1);
y = randi([-10,10],1000,1);
[N,Xedges,Yedges,binX,binY] = histcounts2(x,y,'BinMethod','integers');

Determine which bin the value (x(3),y(3)) falls into.

[x(3),y(3)]
ans = 1×2

-8    10

bin = [binX(3) binY(3)]
bin = 1×2

3    21

Input Arguments

collapse all

Data to distribute among bins, specified as separate arguments of vectors, matrices, or multidimensional arrays. X and Y must have the same size.

Corresponding elements in X and Y specify the x and y coordinates of 2-D data points, [X(k),Y(k)]. The data types of X and Y can be different.

histcounts2 ignores all NaN values. Similarly, histcounts2 ignores Inf and -Inf values unless the bin edges explicitly specify Inf or -Inf as a bin edge.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

Number of bins in each dimension, specified as a positive scalar integer or two-element vector of positive integers. If you do not specify nbins, then histcounts2 automatically calculates how many bins to use based on the values in X and Y:

• If nbins is a scalar, then histcounts2 uses that many bins in each dimension.

• If nbins is a vector, then nbins(1) specifies the number of bins in the x-dimension and nbins(2) specifies the number of bins in the y-dimension.

Example: [N,Xedges,Yedges] = histcounts2(X,Y,[15 20]) uses 15 bins in the x-dimension and 20 bins in the y-dimension.

Bin edges in x-dimension, specified as a vector. Xedges(1) is the first edge of the first bin in the x-dimension, and Xedges(end) is the outer edge of the last bin.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

Bin edges in y-dimension, specified as a vector. Yedges(1) is the first edge of the first bin in the y-dimension, and Yedges(end) is the outer edge of the last bin.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: [N,Xedges,Yedges] = histcounts2(X,Y,'Normalization','probability') normalizes the bin counts in N, such that sum(N) is 1.

Binning algorithm, specified as one of the values in this table.

ValueDescription
'auto'

The default 'auto' algorithm chooses a bin width to cover the data range and reveal the shape of the underlying distribution.

'scott'

Scott’s rule is optimal if the data is close to being jointly normally distributed. This rule is appropriate for most other distributions, as well. It uses a bin size of [3.5*std(X(:))*numel(X)^(-1/4), 3.5*std(Y(:))*numel(Y)^(-1/4)].

'fd'

The Freedman-Diaconis rule is less sensitive to outliers in the data, and might be more suitable for data with heavy-tailed distributions. It uses a bin size of [2*IQR(X(:))*numel(X)^(-1/4), 2*IQR(Y(:))*numel(Y)^(-1/4)], where IQR is the interquartile range.

'integers'

The integer rule is useful with integer data, as it creates bins centered on pairs of integers. It uses a bin width of 1 for each dimension and places bin edges halfway between integers.

To avoid accidentally creating too many bins, you can use this rule to create a limit of 1024 bins (210). If the data range for either dimension is greater than 1024, then the integer rule uses wider bins instead.

histcounts2 does not always choose the number of bins using these exact formulas. Sometimes the number of bins is adjusted slightly so that the bin edges fall on "nice" numbers.

Example: [N,Xedges,Yedges] = histcounts2(X,Y,'BinMethod','integers') uses 2-D bins centered on each pair of integers.

Width of bins in each dimension, specified as a two-element vector of positive integers, [xWidth yWidth].

If you specify BinWidth, then histcounts2 can use a maximum of 1024 bins (210) along each dimension. If the specified bin width requires more bins, then histcounts2 uses a larger bin width corresponding to the maximum number of bins.

Example: [N,Xedges,Yedges] = histcounts2(X,Y,'BinWidth',[5 10]) uses bins with size 5 in the x-dimension and size 10 in the y-dimension.

Bin limits in x-dimension, specified as a two-element vector, [xbmin,xbmax]. The vector indicates the first and last bin edges in the x-dimension.

This option only bins data that falls within the bin limits inclusively, X>=xbmin & X<=xbmax.

Bin limits in y-dimension, specified as a two-element vector, [ybmin,ybmax]. The vector indicates the first and last bin edges in the y-dimension.

This option only bins data that falls within the bin limits inclusively, Y>=ybmin & Y<=ybmax.

Type of normalization, specified as one of the values in this table. For each bin i:

• ${v}_{i}$ is the bin value.

• ${c}_{i}$ is the number of elements in the bin.

• ${A}_{i}={w}_{xi}\cdot {w}_{yi}$ is the area of each bin, computed using the x and y bin widths.

• $N$ is the number of elements in the input data. This value can be greater than the binned data if the data contains NaN values, or if some of the data lies outside the bin limits.

ValueBin ValuesNotes
'count' (default)

${v}_{i}={c}_{i}$

• Count or frequency of observations.

• Sum of bin values is less than or equal to numel(X) and numel(y). The sum is less than numel(X) only when some of the input data is not included in the bins.

'countdensity'

${v}_{i}=\frac{{c}_{i}}{{A}_{i}}$

• Count or frequency scaled by area of bin.

• The sum of the bin volumes, (N value * Area of bin), is less than or equal to numel(X) and numel(Y).

'cumcount'

${v}_{i}=\sum _{j=1}^{i}{c}_{j}$

• Cumulative count. Each bin value is the cumulative number of observations in each bin and all previous bins in both the x and y dimensions.

• N(end,end) is less than or equal to numel(X) and numel(Y).

'probability'

${v}_{i}=\frac{{c}_{i}}{N}$

• Relative probability.

• sum(N(:)) is less than or equal to 1.

'pdf'

${v}_{i}=\frac{{c}_{i}}{N\cdot {A}_{i}}$

• Probability density function estimate.

• The sum of the bin volumes, (N value * Area of bin), is less than or equal to 1.

'cdf'

${v}_{i}=\sum _{j=1}^{i}\text{\hspace{0.17em}}\frac{{c}_{j}}{N}$

• Cumulative density function estimate.

• N(end,end) is less than or equal to 1.

Example: [N,Xedges,Yedges] = histcounts2(X,Y,'Normalization','pdf') bins the data using the probability density function estimate for X and Y.

Output Arguments

collapse all

Bin counts, returned as a numeric array.

The bin inclusion scheme for the different numbered bins in N, as well as their relative orientation to the x-axis and y-axis, is For example, the (1,1) bin includes values that fall on the first edge in each dimension, and the last bin in the bottom right includes values that fall on any of its edges.

Bin edges in x-dimension, returned as a vector. Xedges(1) is the first bin edge in the x-dimension and Xedges(end) is the last bin edge.

Bin edges in y-dimension, returned as a vector. Yedges(1) is the first bin edge in the y-dimension and Yedges(end) is the last bin edge.

Bin index in x-dimension, returned as a numeric array of the same size as X. Corresponding elements in binX and binY describe which numbered bin contains the corresponding values in X and Y. A value of 0 in binX or binY indicates an element that does not belong to any of the bins (such as a NaN value).

For example, binX(1) and binY(1) describe the bin placement for the value [X(1),Y(1)].

Bin index in y-dimension, returned as a numeric array of the same size as Y. Corresponding elements in binX and binY describe which numbered bin contains the corresponding values in X and Y. A value of 0 in binX or binY indicates an element that does not belong to any of the bins (such as a NaN value).

For example, binX(1) and binY(1) describe the bin placement for the value [X(1),Y(1)].