groupcounts
Number of group elements
Syntax
Description
computes the number of elements in each group of data in a table or timetable, and returns
a table containing the groups, their counts, and the percentage (0 to 100) each count
represents. Each group is defined by a unique combination of grouping variables in
G
= groupcounts(T
,groupvars
)groupvars
. For example, G =
groupcounts(T,'Gender')
returns a table showing the number of
Male
elements, the number of Female
elements, and
so on for any other categories in the variable Gender
.
specifies additional grouping properties using one or more name-value arguments for any of
the previous syntaxes. For example, G
= groupcounts(___,Name,Value
)G =
groupcounts(T,'Category1','IncludeMissingGroups',false)
excludes the group
made from missing categorical
data indicated by
<undefined>
.
specifies additional grouping properties using one or more name-value arguments.B
= groupcounts(___,Name,Value
)
Examples
Group Table Variables
Compute the number of group elements from table data.
Create a table containing information about five individuals.
Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119]; T = table(Gender,Smoker,Weight)
T=5×3 table
Gender Smoker Weight
________ ______ ______
"male" true 176
"female" false 163
"male" true 131
"female" false 133
"male" true 119
Count the number of elements in each group by gender.
G1 = groupcounts(T,'Gender')
G1=2×3 table
Gender GroupCount Percent
________ __________ _______
"female" 2 40
"male" 3 60
Count the number of elements in each group by gender and smoker status. By default, groupcounts
suppresses groups with zero elements, so no groups are returned for female smokers or male nonsmokers.
G2 = groupcounts(T,{'Gender','Smoker'})
G2=2×4 table
Gender Smoker GroupCount Percent
________ ______ __________ _______
"female" false 2 40
"male" true 3 60
To count all groups, including those with zero elements, specify the 'IncludeEmptyGroups'
parameter with value true
.
G3 = groupcounts(T,{'Gender','Smoker'},'IncludeEmptyGroups',true)
G3=4×4 table
Gender Smoker GroupCount Percent
________ ______ __________ _______
"female" false 2 40
"female" true 0 0
"male" false 0 0
"male" true 3 60
Specify Group Bins
Group data according to specified bins.
Create a timetable containing sales information for days within a single month.
TimeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;... 2017 3 14; 2017 3 31; 2017 3 25;... 2017 3 29; 2017 3 21; 2017 3 18]); Profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; TotalItemsSold = [14 13 8 5 10 16 8 6 7 11]'; TT = timetable(TimeStamps,Profit,TotalItemsSold)
TT=10×2 timetable
TimeStamps Profit TotalItemsSold
___________ ______ ______________
04-Mar-2017 2032 14
02-Mar-2017 3071 13
15-Mar-2017 1185 8
10-Mar-2017 2587 5
14-Mar-2017 1998 10
31-Mar-2017 2899 16
25-Mar-2017 3112 8
29-Mar-2017 909 6
21-Mar-2017 2619 7
18-Mar-2017 3085 11
Compute the group counts by the total items sold, binning the groups into intervals of item numbers.
G = groupcounts(TT,'TotalItemsSold',[0 4 8 12 16])
G=3×3 table
disc_TotalItemsSold GroupCount Percent
___________________ __________ _______
[4, 8) 3 30
[8, 12) 4 40
[12, 16] 3 30
Compute the group counts grouped by day of the week.
G = groupcounts(TT,'TimeStamps','dayname')
G=5×3 table
dayname_TimeStamps GroupCount Percent
__________________ __________ _______
Tuesday 2 20
Wednesday 2 20
Thursday 1 10
Friday 2 20
Saturday 3 30
Find Duplicate Array Elements
Determine which elements in a vector appear more than once.
Create a column vector with values between 1 and 5.
v = [1 1 2 2 3 5 3 3 1 4]';
Use groupcounts
to determine the unique groups in the vector and count the group members.
[gc,grps] = groupcounts(v)
gc = 5×1
3
2
3
1
1
grps = 5×1
1
2
3
4
5
Determine which elements in the vector appear more than once by creating a logical index for the groups with a count larger than 1. Index into the groups to return the vector elements that are duplicated.
duplicates = grps(gc > 1)
duplicates = 3×1
1
2
3
Multiple Grouping Vectors for Vector Input
Compute the group counts for four groups based on their gender and smoker status.
Store patient information as three vectors of different types.
Gender = ["male";"female";"male";"female";"male"]; Smoker = logical([1;0;1;0;1]); Weight = [176;163;131;133;119];
Grouping by gender and smoker status, compute the group counts. Specify three outputs to also return the groups BG
and percentages BP
. The B
output contains the counts for each group, and BP
contains the percentages represented by those counts.
[B,BG,BP] = groupcounts({Gender,Smoker},'IncludeEmptyGroups',true)
B = 4×1
2
0
0
3
BG=1×2 cell array
{4x1 string} {4x1 logical}
BP = 4×1
40
0
0
60
BG
is a cell array containing two vectors that describe the groups as you look at their elements rowwise. For instance, the first row of BG{1}
says that the patients in the first group are female, and the first row of BG{2}
says that they are nonsmokers. The count for that group is 2, found in the corresponding row of B
.
BG{1}
ans = 4x1 string
"female"
"female"
"male"
"male"
BG{2}
ans = 4x1 logical array
0
1
0
1
Input Arguments
T
— Input data
table | timetable
Input data, specified as a table or timetable.
A
— Input vectors
column vector | matrix | cell array of column vectors
Input vectors, specified as a column vector, matrix, or cell array of column vectors
representing grouping vectors. When A
is a matrix, the grouping
vectors are columnwise.
groupvars
— Grouping variables or vectors
scalar | vector | matrix | cell array | function handle | table vartype
subscript
Grouping variables or vectors, specified as one of the options in this table. For
table or timetable input data, groupvars
indicates which variables to
use to compute groups in the data. Other variables not specified by
groupvars
are not operated on and do not pass through to the
output.
Option | Description | Examples |
---|---|---|
Variable name | A character vector or scalar string specifying a single table variable name |
|
Vector of variable names | A cell array of character vectors or string array where each element is a table variable name |
|
Scalar or vector of variable indices | A scalar or vector of table variable indices |
|
Logical vector | A logical vector whose elements each correspond to a table variable, where
|
|
Function handle | A function handle that takes a table variable as input and returns a logical scalar |
|
vartype subscript | A table subscript generated by the |
|
Example: groupcounts(T,"Var3")
groupbins
— Binning scheme
'none'
(default) | character vector | scalar | vector | cell array
Binning scheme, specified as one of the following options:
'none'
, indicating the groups are returned according to the specified grouping variables onlyA list of bin edges, specified as a numeric vector for numeric grouping variables, or as a
datetime
vector fordatetime
grouping variablesA number of bins, specified as an integer scalar
A time duration, specified as a scalar of type
duration
orcalendarDuration
, indicating bin widths (fordatetime
orduration
grouping variables or vectors only)A cell array listing binning methods for each grouping variable or vector
A time bin for
datetime
andduration
grouping variables or vectors only, specified as one of the following character vectors:Value Description Data Type 'second'
Each bin is 1 second.
datetime
andduration
'minute'
Each bin is 1 minute.
datetime
andduration
'hour'
Each bin is 1 hour.
datetime
andduration
'day'
Each bin is 1 calendar day. This value accounts for Daylight Saving Time shifts.
datetime
andduration
'week'
Each bin is 1 calendar week. datetime
only'month'
Each bin is 1 calendar month. datetime
only'quarter'
Each bin is 1 calendar quarter. datetime
only'year'
Each bin is 1 calendar year. This value accounts for leap days.
datetime
andduration
'decade'
Each bin is 1 decade (10 calendar years). datetime
only'century'
Each bin is 1 century (100 calendar years). datetime
only'secondofminute'
Bins are seconds from 0 to 59.
datetime
only'minuteofhour'
Bins are minutes from 0 to 59.
datetime
only'hourofday'
Bins are hours from 0 to 23.
datetime
only'dayofweek'
Bins are days from 1 to 7. The first day of the week is Sunday.
datetime
only'dayname'
Bins are full day names such as 'Sunday'
.datetime
only'dayofmonth'
Bins are days from 1 to 31. datetime
only'dayofyear'
Bins are days from 1 to 366. datetime
only'weekofmonth'
Bins are weeks from 1 to 6. datetime
only'weekofyear'
Bins are weeks from 1 to 54. datetime
only'monthname'
Bins are full month names such as 'January'
.datetime
only'monthofyear'
Bins are months from 1 to 12.
datetime
only'quarterofyear'
Bins are quarters from 1 to 4. datetime
only
When multiple grouping variables or vectors are specified, you can provide a single
binning method that is applied to all grouping variables, or a cell array containing a
binning method for each grouping variable, such as {'none',[0 2 4
Inf]}
.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: G =
groupcounts(T,groupvars,groupbins,'IncludedEdge','right')
IncludedEdge
— Included bin edge
'left'
(default) | 'right'
Included bin edge, specified as either 'left'
or
'right'
, indicating which end of the bin interval is
inclusive.
This name-value argument can only be specified when groupbins
is specified, and the value is applied to all binning schemes for all grouping
variables or vectors.
IncludeMissingGroups
— Missing groups indicator
true
or 1
(default) | false
or 0
Missing groups indicator, specified as a numeric or logical 1
(true
) or 0
(false
). If the
parameter value is true
, then groupcounts
displays groups made up of missing values, such as NaN
. If the
parameter value is false
, then groupcounts
does
not display the missing value groups.
IncludeEmptyGroups
— Empty groups indicator
false
or 0
(default) | true
or 1
Empty groups indicator, specified as a numeric or logical 0
(false
) or 1
(true
). If the
parameter value is false
, then groupcounts
does
not display groups with zero elements. If the parameter value is
true
, then groupcounts
displays the empty
groups.
Output Arguments
G
— Output table
table
Output table, returned as a table containing the computed groups, number of elements
in each group, and percentages represented by each group count. For a single grouping
variable, the output groups are sorted according to the order returned by the unique
function with the 'sorted'
option.
B
— Group counts
column vector
Group counts for non-table input data, returned as a column vector containing the number of elements in each group.
BG
— Groups
column vector | cell array of column vectors
Groups for non-table input data, returned as a column vector or cell array of column
vectors. For a single grouping vector, the output groups are sorted according to the
order returned by the unique
function with the
'sorted'
option.
When you provide more than one input vector, BG
is a cell array
containing column vectors of equal length. The group information can be found by looking
at the elements rowwise across all vectors in BG
. The count for each
group is contained in the corresponding row of the first output argument
B
.
BP
— Group count percentages
column vector
Group count percentages for non-table input data, returned as a column vector
containing the percentage each group count in B
represents. The
percentages are in the range [0 100]
.
Tips
When making many calls to
groupcounts
, consider converting grouping variables to typecategorical
orlogical
when possible for improved performance. For example, if you have a grouping variable of typechar
(such asGender
with elements'Male'
and'Female'
), you can convert it to a categorical variable using the commandcategorical(Gender)
.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations:
The first input argument does not support cell arrays.
The
groupvars
argument does not support function handles.The
IncludeEmptyGroups
name-value argument is not supported.The order of the groups might be different compared to in-memory
groupcounts
calculations.When grouping by discretized datetime arrays, the categorical group names are different compared to in-memory
groupcounts
calculations.
For more information, see Tall Arrays.
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
All input arguments except for the input data must be constant.
Sparse inputs are not supported.
Binning scheme is not supported for datetime or duration data.
If the number of group variables can change at runtime, the second output
BG
is a cell array.
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
Version History
Introduced in R2019aR2021a: Percentages automatically included in table
outputs
Behavior changed in R2021a
Starting in R2021a, when groupcounts
operates on data in a table or
timetable, the output contains an additional table variable for the percentages. The
percentages are in the range [0 100]
and are included in the table
variable Percent
.
Any code that references specific table variables is unaffected. However, you might need to update code that depends on the number of variables in the output table.
Apri esempio
Si dispone di una versione modificata di questo esempio. Desideri aprire questo esempio con le tue modifiche?
Comando MATLAB
Hai fatto clic su un collegamento che corrisponde a questo comando MATLAB:
Esegui il comando inserendolo nella finestra di comando MATLAB. I browser web non supportano i comandi MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)