This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

crosstab

Cross-tabulation

Syntax

tbl = crosstab(x1,x2)
tbl = crosstab(x1,...,xn)
[tbl,chi2,p] = crosstab(___)
[tbl,chi2,p,labels] = crosstab(___)

Description

example

tbl = crosstab(x1,x2) returns a cross-tabulation, tbl, of two vectors of the same length, x1 and x2.

example

tbl = crosstab(x1,...,xn) returns a multi-dimensional cross-tabulation, tbl, of data for multiple input vectors, x1, x2, ..., xn.

example

[tbl,chi2,p] = crosstab(___) also returns the chi-square statistic, chi2, and its p-value, p, for a test that tbl is independent in each dimension. You can use any of the previous syntaxes.

example

[tbl,chi2,p,labels] = crosstab(___) also returns a cell array, labels, which contains one column of labels for each input argument, x1 ... xn.

Examples

collapse all

Create two sample data vectors, containing three and four distinct values, respectively.

x = [1 1 2 3 1];
y = [1 2 5 3 1];

Cross-tabulate x and y.

table = crosstab(x,y)
table = 3×4

     2     1     0     0
     0     0     0     1
     0     0     1     0

The rows in table correspond to the three distinct values in x, and the columns correspond to the four distinct values in y.

Generate two independent vectors, x1 and x2, each containing 50 discrete uniform random numbers in the range 1:3.

rng default;  % for reproducibility
x1 = unidrnd(3,50,1);
x2 = unidrnd(3,50,1);

Cross-tabulate x1 and x2.

[table,chi2,p] = crosstab(x1,x2)
table = 3×3

     1     6     7
     5     5     2
    11     7     6

chi2 = 7.5449
p = 0.1097

The returned p value of 0.1097 indicates that, at the 5% significance level, crosstab fails to reject the null hypothesis that table is independent in each dimension.

Load the sample data, which contains measurements of large model cars during the years 1970-1982.

load carbig

Cross-tabulate the data of four-cylinder cars (cyl4) based on model year (when) and country of origin (org).

[table,chi2,p,labels] = crosstab(cyl4,when,org);

Use labels to determine the index location in table for the number of four-cylinder cars made in the USA during the late period of the data.

labels
labels = 3x3 cell array
    {'Other'   }    {'Early'}    {'USA'   }
    {'Four'    }    {'Mid'  }    {'Europe'}
    {0x0 double}    {'Late' }    {'Japan' }

The first column of labels corresponds to the data in cyl4, and indicates that row 2 of table contains data on cars with four cylinders. The second column of labels corresponds to the data in when, and indicates that column 3 of table contains data on cars made during the late period. The third column of labels corresponds to the data in org, and indicates that location 1 of the third dimension of table contains data on cars made in the USA.

Therefore, table(2,3,1) contains the number of four-cylinder cars made in the USA during the late period.

table(2,3,1)
ans = 38

The data contains 38 four-cylinder cars made in the USA during the late period.

Load the hospital data.

load hospital

The hospital dataset array contains data on 100 hospital patients, including last name, gender, age, weight, smoking status, and systolic and diastolic blood pressure measurements.

To determine whether smoking status is independent of gender, use crosstab to create a 2-by-2 contingency table of smokers and nonsmokers, grouped by gender.

[tbl,chi2,p,labels] = crosstab(hospital.Sex, hospital.Smoker)
tbl = 2×2

    40    13
    26    21

chi2 = 4.5083
p = 0.0337
labels = 2x2 cell array
    {'Female'}    {'0'}
    {'Male'  }    {'1'}

The rows of the resulting contingency table tbl correspond to the patient’s gender, with row 1 containing data for females and row 2 containing data for males. The columns correspond to the patient’s smoking status, with column 1 containing data for nonsmokers and column 2 containing data for smokers. The returned result chi2 = 4.5083 is the value of the chi-squared test statistic for a Pearson's chi-squared test of independence. The returned value p = 0.0337 is an approximate p-value based on the chi-squared distribution.

Input Arguments

collapse all

Input vector, specified as a vector of grouping variables. All input vectors, including x1, x2, ..., xn, must be the same length.

Data Types: single | double | char | string | logical

Input vector, specified as a vector of grouping variables. All input vectors, including x1, x2, ..., xn, must be the same length.

Data Types: single | double | char | string | logical

Input vectors, specified as vectors of grouping variables. If you use this syntax to specify more than two input vectors, then crosstab generates a multi-dimensional cross-tabulation table. All input vectors, including x1, x2, ..., xn, must be the same length.

Data Types: single | double | char | string | logical

Output Arguments

collapse all

Cross-tabulation table, returned as a matrix of integer values.

If you specify two input vectors, x1 and x2, then tbl is an m-by-n matrix, where m is the number of distinct values in x1 and n is the number of distinct values in x2.

If you specify three or more input vectors, then tbl(i,j,...,n) is a count of indices where grp2idx(x1) is i, grp2idx(x2) is j, grp2idx(x3) is k, and so on.

Chi-square statistic, returned as a positive scalar value. The null hypothesis is that the proportion in any entry of tbl is the product of the proportions in each dimension.

p-value for the chi-square test statistic, returned as a scalar value in the range [0,1]. crosstab tests that tbl is independent in each dimension.

Data labels, returned as a cell array. The entries in the first column are labels for the rows of tbl, the entries in the second column are labels for the columns, and so on, for a multi-dimensional tbl.

Algorithms

crosstab uses grp2idx to assign a positive integer to each distinct value. tbl(i,j) is a count of indices where grp2idx(x1) is i and grp2idx(x2) is j. The numerical order of grp2idx(x1) and grp2idx(x2) order rows and columns of tbl, respectively.

In this case, the returned value of tbl(i,j,...,n) is a count of indices where grp2idx(x1) is i, grp2idx(x2) is j, grp2idx(x3) is k, and so on.

Extended Capabilities

Introduced before R2006a