summary
Summarize cross-validation partition with stratification or grouping variable
Since R2025a
Syntax
Description
Examples
Create a cvpartition
object using a grouping variable. Display a summary of the cross-validation.
Load data on tsunami occurrences, and create a table from the data. Display the first eight observations in the table.
Tbl = readtable("tsunamis.xlsx");
head(Tbl)
Latitude Longitude Year Month Day Hour Minute Second ValidityCode Validity CauseCode Cause EarthquakeMagnitude Country Location MaxHeight IidaMagnitude Intensity NumDeaths DescDeaths ________ _________ ____ _____ ___ ____ ______ ______ ____________ _________________________ _________ __________________ ___________________ ___________________ __________________________ _________ _____________ _________ _________ __________ -3.8 128.3 1950 10 8 3 23 NaN 2 {'questionable tsunami' } 1 {'Earthquake' } 7.6 {'INDONESIA' } {'JAVA TRENCH, INDONESIA'} 2.8 1.5 1.5 NaN NaN 19.5 -156 1951 8 21 10 57 NaN 4 {'definite tsunami' } 1 {'Earthquake' } 6.9 {'USA' } {'HAWAII' } 3.6 1.8 NaN NaN NaN -9.02 157.95 1951 12 22 NaN NaN NaN 2 {'questionable tsunami' } 6 {'Volcano' } NaN {'SOLOMON ISLANDS'} {'KAVACHI' } 6 2.6 NaN NaN NaN 42.15 143.85 1952 3 4 1 22 41 4 {'definite tsunami' } 1 {'Earthquake' } 8.1 {'JAPAN' } {'SE. HOKKAIDO ISLAND' } 6.5 2.7 2 33 1 19.1 -155 1952 3 17 3 58 NaN 4 {'definite tsunami' } 1 {'Earthquake' } 4.5 {'USA' } {'HAWAII' } 1 NaN NaN NaN NaN 43.1 -82.4 1952 5 6 NaN NaN NaN 1 {'very doubtful tsunami'} 9 {'Meteorological'} NaN {'USA' } {'LAKE HURON, MI' } 1.52 NaN NaN NaN NaN 52.75 159.5 1952 11 4 16 58 NaN 4 {'definite tsunami' } 1 {'Earthquake' } 9 {'RUSSIA' } {'KAMCHATKA' } 18 4.2 4 2236 3 50 156.5 1953 3 18 NaN NaN NaN 3 {'probable tsunami' } 1 {'Earthquake' } 5.8 {'RUSSIA' } {'N. KURIL ISLANDS' } 1.5 0.6 NaN NaN NaN
Create a random nonstratified partition for 5-fold cross-validation on the observations in Tbl
. Ensure that observations with the same Country
value are in the same fold by using the GroupingVariables
name-value argument.
rng(0,"twister") % For reproducibility c = cvpartition(size(Tbl,1),KFold=5, ... GroupingVariables=Tbl.Country)
c = Group k-fold cross validation partition NumObservations: 162 NumTestSets: 5 TrainSize: [126 130 130 131 131] TestSize: [36 32 32 31 31] IsCustom: 0 IsGrouped: 1 IsStratified: 0 Properties, Methods
c
is a cvpartition
object. The IsGrouped
property value is 1
(true
), indicating that at least one grouping variable was used to create the object.
Display a summary of the cvpartition
object c
.
summaryTbl = summary(c)
summaryTbl=150×5 table
Set SetSize GroupLabel GroupCount PercentInSet
________ _______ ___________________ __________ ____________
"train1" 126 {'INDONESIA' } 25 19.841
"train1" 126 {'USA' } 15 11.905
"train1" 126 {'SOLOMON ISLANDS'} 10 7.9365
"train1" 126 {'JAPAN' } 19 15.079
"train1" 126 {'RUSSIA' } 19 15.079
"train1" 126 {'FIJI' } 1 0.79365
"train1" 126 {'GREENLAND' } 1 0.79365
"train1" 126 {'CHILE' } 6 4.7619
"train1" 126 {'GREECE' } 5 3.9683
"train1" 126 {'ECUADOR' } 1 0.79365
"train1" 126 {'VANUATU' } 5 3.9683
"train1" 126 {'TONGA' } 1 0.79365
"train1" 126 {'PHILIPPINES' } 7 5.5556
"train1" 126 {'CANADA' } 1 0.79365
"train1" 126 {'ATLANTIC OCEAN' } 1 0.79365
"train1" 126 {'FRANCE' } 1 0.79365
⋮
The first row in summaryTbl
shows that 25 of the 126 observations in the first training set Tbl(training(c,1),:)
(approximately 20%) have the Country
value INDONESIA
. The software ensures that the first test set Tbl(test(c,1),:)
does not contain any observations with this value.
Check the Country
values for the observations in the first test set.
summaryTest1 = summaryTbl(summaryTbl.Set=="test1",:)
summaryTest1=6×5 table
Set SetSize GroupLabel GroupCount PercentInSet
_______ _______ ____________________ __________ ____________
"test1" 36 {'PAPUA NEW GUINEA'} 13 36.111
"test1" 36 {'MEXICO' } 8 22.222
"test1" 36 {'PERU' } 9 25
"test1" 36 {'JAPAN SEA' } 1 2.7778
"test1" 36 {'MONTSERRAT' } 4 11.111
"test1" 36 {'TURKEY' } 1 2.7778
As expected, the first test set does not contain any observations with the Country
value INDONESIA
.
Create a cvpartition
object using a stratification variable. Display a summary of the cross-validation, and then modify the summary display.
Load the fisheriris
data set. The matrix meas
contains flower measurements for 150 different flowers. The variable species
lists the species for each flower.
load fisheriris
Create a random stratified partition for 3-fold cross-validation. Use the species
variable as the stratification variable.
rng(0,"twister") % For reproducibility c = cvpartition(species,KFold=3)
c = K-fold cross validation partition NumObservations: 150 NumTestSets: 3 TrainSize: [100 100 100] TestSize: [50 50 50] IsCustom: 0 IsGrouped: 0 IsStratified: 1 Properties, Methods
c
is a cvpartition
object. The IsStratified
property value is 1
(true
), indicating that a stratification variable was used to create the object.
Display a summary of the cvpartition
object c
.
summaryTbl = summary(c)
summaryTbl=21×5 table
Set SetSize StratificationLabel StratificationCount PercentInSet
________ _______ ___________________ ___________________ ____________
"all" 150 {'setosa' } 50 33.333
"all" 150 {'versicolor'} 50 33.333
"all" 150 {'virginica' } 50 33.333
"train1" 100 {'setosa' } 34 34
"train1" 100 {'versicolor'} 33 33
"train1" 100 {'virginica' } 33 33
"test1" 50 {'setosa' } 16 32
"test1" 50 {'versicolor'} 17 34
"test1" 50 {'virginica' } 17 34
"train2" 100 {'setosa' } 33 33
"train2" 100 {'versicolor'} 33 33
"train2" 100 {'virginica' } 34 34
"test2" 50 {'setosa' } 17 34
"test2" 50 {'versicolor'} 17 34
"test2" 50 {'virginica' } 16 32
"train3" 100 {'setosa' } 33 33
⋮
The first row in summaryTbl
shows that 50 of the 150 flowers in the data set (approximately 33%) are setosa flowers.
Modify the summary display to include test set information only.
testSummaryTbl = summaryTbl(contains(summaryTbl.Set,"test"),:)
testSummaryTbl=9×5 table
Set SetSize StratificationLabel StratificationCount PercentInSet
_______ _______ ___________________ ___________________ ____________
"test1" 50 {'setosa' } 16 32
"test1" 50 {'versicolor'} 17 34
"test1" 50 {'virginica' } 17 34
"test2" 50 {'setosa' } 17 34
"test2" 50 {'versicolor'} 17 34
"test2" 50 {'virginica' } 16 32
"test3" 50 {'setosa' } 17 34
"test3" 50 {'versicolor'} 16 32
"test3" 50 {'virginica' } 17 34
The first row in testSummaryTbl
shows that 16 of the 50 flowers in the first test set (approximately 32%) are setosa flowers.
Modify summaryTbl
to include setosa information only.
setosaSummaryTbl = summaryTbl(summaryTbl.StratificationLabel=="setosa",:)
setosaSummaryTbl=7×5 table
Set SetSize StratificationLabel StratificationCount PercentInSet
________ _______ ___________________ ___________________ ____________
"all" 150 {'setosa'} 50 33.333
"train1" 100 {'setosa'} 34 34
"test1" 50 {'setosa'} 16 32
"train2" 100 {'setosa'} 33 33
"test2" 50 {'setosa'} 17 34
"train3" 100 {'setosa'} 33 33
"test3" 50 {'setosa'} 17 34
The second row in setosaSummaryTbl
shows that 34 of the 100 flowers in the first training set are setosa flowers.
Display summary information with a separate column for each of the three flower species.
speciesSummaryTbl = unstack(summaryTbl(:,1:4), ... "StratificationCount","StratificationLabel")
speciesSummaryTbl=7×5 table
Set SetSize setosa versicolor virginica
________ _______ ______ __________ _________
"all" 150 50 50 50
"train1" 100 34 33 33
"test1" 50 16 17 17
"train2" 100 33 33 34
"test2" 50 17 17 16
"train3" 100 33 34 33
"test3" 50 17 16 17
The second row in speciesSummaryTbl
shows that of the 100 flowers in the first training set, 34 are setosa flowers, 33 are versicolor flowers, and 33 are virginica flowers.
Input Arguments
Validation partition, specified as a cvpartition
object. The validation partition type of c
,
c.
, must be Type
'kfold'
or
'holdout'
. The IsGrouped
or
IsStratified
property of c
must be
1
(true
).
summary
does not support validation partitions created using
tall arrays.
Output Arguments
Summary table describing the validation partition c
, returned
as a table.
The first column
Set
describes the specific data set for which information is displayed. Possible values include"all"
(the full data set),"train1"
(the first training set),"test1"
(the first test set), and so on.The second column
SetSize
describes the size of each data set listed inSet
.The remaining columns depend on the properties of
c
.If
c.IsStratified
is1
(true
), then the remaining columns areStratificationLabel
,StratificationCount
, andPercentInSet
.StratificationLabel
describes the label of interest in the stratification variable.StratificationCount
describes the number of observations in the data setSet
with the labelStratificationLabel
.PercentInSet
describes the percentage of observations in the data setSet
with the labelStratificationLabel
.If
c.IsGrouped
is1
(true
), then the number of remaining columns varies based on the number of grouping variables.For two or more grouping variables,
GroupLabel1
describes the label in the first grouping variable,GroupLabel2
describes the label in the second grouping variable, and so on.GroupCount
describes the number of observations in the data setSet
with the combination of labels inGroupLabel1
,GroupLabel2
, and so on.PercentInSet
is the percentage of observations in the data setSet
with the combination of labels inGroupLabel1
,GroupLabel2
, and so on.For one grouping variable, the columns are similar, with only one
GroupLabel
column.
Version History
Introduced in R2025a
See Also
cvpartition
| test
| training
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)