Main Content

summary

Summarize cross-validation partition with stratification or grouping variable

Since R2025a

    Description

    Tbl = summary(c) returns a summary table Tbl of the validation partition contained in the cvpartition object c. The validation partition must include a stratification or grouping variable. After displaying the summary table, you can modify it to include only specific information.

    example

    Examples

    collapse all

    Create a cvpartition object using a grouping variable. Display a summary of the cross-validation.

    Load data on tsunami occurrences, and create a table from the data. Display the first eight observations in the table.

    Tbl = readtable("tsunamis.xlsx");
    head(Tbl)
        Latitude    Longitude    Year    Month    Day    Hour    Minute    Second    ValidityCode            Validity             CauseCode          Cause           EarthquakeMagnitude          Country                   Location             MaxHeight    IidaMagnitude    Intensity    NumDeaths    DescDeaths
        ________    _________    ____    _____    ___    ____    ______    ______    ____________    _________________________    _________    __________________    ___________________    ___________________    __________________________    _________    _____________    _________    _________    __________
    
          -3.8        128.3      1950     10       8       3       23       NaN           2          {'questionable tsunami' }        1        {'Earthquake'    }            7.6            {'INDONESIA'      }    {'JAVA TRENCH, INDONESIA'}       2.8            1.5            1.5          NaN          NaN    
          19.5         -156      1951      8      21      10       57       NaN           4          {'definite tsunami'     }        1        {'Earthquake'    }            6.9            {'USA'            }    {'HAWAII'                }       3.6            1.8            NaN          NaN          NaN    
         -9.02       157.95      1951     12      22     NaN      NaN       NaN           2          {'questionable tsunami' }        6        {'Volcano'       }            NaN            {'SOLOMON ISLANDS'}    {'KAVACHI'               }         6            2.6            NaN          NaN          NaN    
         42.15       143.85      1952      3       4       1       22        41           4          {'definite tsunami'     }        1        {'Earthquake'    }            8.1            {'JAPAN'          }    {'SE. HOKKAIDO ISLAND'   }       6.5            2.7              2           33            1    
          19.1         -155      1952      3      17       3       58       NaN           4          {'definite tsunami'     }        1        {'Earthquake'    }            4.5            {'USA'            }    {'HAWAII'                }         1            NaN            NaN          NaN          NaN    
          43.1        -82.4      1952      5       6     NaN      NaN       NaN           1          {'very doubtful tsunami'}        9        {'Meteorological'}            NaN            {'USA'            }    {'LAKE HURON, MI'        }      1.52            NaN            NaN          NaN          NaN    
         52.75        159.5      1952     11       4      16       58       NaN           4          {'definite tsunami'     }        1        {'Earthquake'    }              9            {'RUSSIA'         }    {'KAMCHATKA'             }        18            4.2              4         2236            3    
            50        156.5      1953      3      18     NaN      NaN       NaN           3          {'probable tsunami'     }        1        {'Earthquake'    }            5.8            {'RUSSIA'         }    {'N. KURIL ISLANDS'      }       1.5            0.6            NaN          NaN          NaN    
    

    Create a random nonstratified partition for 5-fold cross-validation on the observations in Tbl. Ensure that observations with the same Country value are in the same fold by using the GroupingVariables name-value argument.

    rng(0,"twister") % For reproducibility
    c = cvpartition(size(Tbl,1),KFold=5, ...
        GroupingVariables=Tbl.Country)
    c = 
    Group k-fold cross validation partition
        NumObservations: 162
            NumTestSets: 5
              TrainSize: [126 130 130 131 131]
               TestSize: [36 32 32 31 31]
               IsCustom: 0
              IsGrouped: 1
           IsStratified: 0
    
    
      Properties, Methods
    
    

    c is a cvpartition object. The IsGrouped property value is 1 (true), indicating that at least one grouping variable was used to create the object.

    Display a summary of the cvpartition object c.

    summaryTbl = summary(c)
    summaryTbl=150×5 table
          Set       SetSize        GroupLabel         GroupCount    PercentInSet
        ________    _______    ___________________    __________    ____________
    
        "train1"      126      {'INDONESIA'      }        25           19.841   
        "train1"      126      {'USA'            }        15           11.905   
        "train1"      126      {'SOLOMON ISLANDS'}        10           7.9365   
        "train1"      126      {'JAPAN'          }        19           15.079   
        "train1"      126      {'RUSSIA'         }        19           15.079   
        "train1"      126      {'FIJI'           }         1          0.79365   
        "train1"      126      {'GREENLAND'      }         1          0.79365   
        "train1"      126      {'CHILE'          }         6           4.7619   
        "train1"      126      {'GREECE'         }         5           3.9683   
        "train1"      126      {'ECUADOR'        }         1          0.79365   
        "train1"      126      {'VANUATU'        }         5           3.9683   
        "train1"      126      {'TONGA'          }         1          0.79365   
        "train1"      126      {'PHILIPPINES'    }         7           5.5556   
        "train1"      126      {'CANADA'         }         1          0.79365   
        "train1"      126      {'ATLANTIC OCEAN' }         1          0.79365   
        "train1"      126      {'FRANCE'         }         1          0.79365   
          ⋮
    
    

    The first row in summaryTbl shows that 25 of the 126 observations in the first training set Tbl(training(c,1),:) (approximately 20%) have the Country value INDONESIA. The software ensures that the first test set Tbl(test(c,1),:) does not contain any observations with this value.

    Check the Country values for the observations in the first test set.

    summaryTest1 = summaryTbl(summaryTbl.Set=="test1",:)
    summaryTest1=6×5 table
          Set      SetSize         GroupLabel         GroupCount    PercentInSet
        _______    _______    ____________________    __________    ____________
    
        "test1"      36       {'PAPUA NEW GUINEA'}        13           36.111   
        "test1"      36       {'MEXICO'          }         8           22.222   
        "test1"      36       {'PERU'            }         9               25   
        "test1"      36       {'JAPAN SEA'       }         1           2.7778   
        "test1"      36       {'MONTSERRAT'      }         4           11.111   
        "test1"      36       {'TURKEY'          }         1           2.7778   
    
    

    As expected, the first test set does not contain any observations with the Country value INDONESIA.

    Create a cvpartition object using a stratification variable. Display a summary of the cross-validation, and then modify the summary display.

    Load the fisheriris data set. The matrix meas contains flower measurements for 150 different flowers. The variable species lists the species for each flower.

    load fisheriris

    Create a random stratified partition for 3-fold cross-validation. Use the species variable as the stratification variable.

    rng(0,"twister") % For reproducibility
    c = cvpartition(species,KFold=3)
    c = 
    K-fold cross validation partition
        NumObservations: 150
            NumTestSets: 3
              TrainSize: [100 100 100]
               TestSize: [50 50 50]
               IsCustom: 0
              IsGrouped: 0
           IsStratified: 1
    
    
      Properties, Methods
    
    

    c is a cvpartition object. The IsStratified property value is 1 (true), indicating that a stratification variable was used to create the object.

    Display a summary of the cvpartition object c.

    summaryTbl = summary(c)
    summaryTbl=21×5 table
          Set       SetSize    StratificationLabel    StratificationCount    PercentInSet
        ________    _______    ___________________    ___________________    ____________
    
        "all"         150        {'setosa'    }               50                33.333   
        "all"         150        {'versicolor'}               50                33.333   
        "all"         150        {'virginica' }               50                33.333   
        "train1"      100        {'setosa'    }               34                    34   
        "train1"      100        {'versicolor'}               33                    33   
        "train1"      100        {'virginica' }               33                    33   
        "test1"        50        {'setosa'    }               16                    32   
        "test1"        50        {'versicolor'}               17                    34   
        "test1"        50        {'virginica' }               17                    34   
        "train2"      100        {'setosa'    }               33                    33   
        "train2"      100        {'versicolor'}               33                    33   
        "train2"      100        {'virginica' }               34                    34   
        "test2"        50        {'setosa'    }               17                    34   
        "test2"        50        {'versicolor'}               17                    34   
        "test2"        50        {'virginica' }               16                    32   
        "train3"      100        {'setosa'    }               33                    33   
          ⋮
    
    

    The first row in summaryTbl shows that 50 of the 150 flowers in the data set (approximately 33%) are setosa flowers.

    Modify the summary display to include test set information only.

    testSummaryTbl = summaryTbl(contains(summaryTbl.Set,"test"),:)
    testSummaryTbl=9×5 table
          Set      SetSize    StratificationLabel    StratificationCount    PercentInSet
        _______    _______    ___________________    ___________________    ____________
    
        "test1"      50         {'setosa'    }               16                  32     
        "test1"      50         {'versicolor'}               17                  34     
        "test1"      50         {'virginica' }               17                  34     
        "test2"      50         {'setosa'    }               17                  34     
        "test2"      50         {'versicolor'}               17                  34     
        "test2"      50         {'virginica' }               16                  32     
        "test3"      50         {'setosa'    }               17                  34     
        "test3"      50         {'versicolor'}               16                  32     
        "test3"      50         {'virginica' }               17                  34     
    
    

    The first row in testSummaryTbl shows that 16 of the 50 flowers in the first test set (approximately 32%) are setosa flowers.

    Modify summaryTbl to include setosa information only.

    setosaSummaryTbl = summaryTbl(summaryTbl.StratificationLabel=="setosa",:)
    setosaSummaryTbl=7×5 table
          Set       SetSize    StratificationLabel    StratificationCount    PercentInSet
        ________    _______    ___________________    ___________________    ____________
    
        "all"         150          {'setosa'}                 50                33.333   
        "train1"      100          {'setosa'}                 34                    34   
        "test1"        50          {'setosa'}                 16                    32   
        "train2"      100          {'setosa'}                 33                    33   
        "test2"        50          {'setosa'}                 17                    34   
        "train3"      100          {'setosa'}                 33                    33   
        "test3"        50          {'setosa'}                 17                    34   
    
    

    The second row in setosaSummaryTbl shows that 34 of the 100 flowers in the first training set are setosa flowers.

    Display summary information with a separate column for each of the three flower species.

    speciesSummaryTbl = unstack(summaryTbl(:,1:4), ...
        "StratificationCount","StratificationLabel")
    speciesSummaryTbl=7×5 table
          Set       SetSize    setosa    versicolor    virginica
        ________    _______    ______    __________    _________
    
        "all"         150        50          50           50    
        "train1"      100        34          33           33    
        "test1"        50        16          17           17    
        "train2"      100        33          33           34    
        "test2"        50        17          17           16    
        "train3"      100        33          34           33    
        "test3"        50        17          16           17    
    
    

    The second row in speciesSummaryTbl shows that of the 100 flowers in the first training set, 34 are setosa flowers, 33 are versicolor flowers, and 33 are virginica flowers.

    Input Arguments

    collapse all

    Validation partition, specified as a cvpartition object. The validation partition type of c, c.Type, must be 'kfold' or 'holdout'. The IsGrouped or IsStratified property of c must be 1 (true).

    summary does not support validation partitions created using tall arrays.

    Output Arguments

    collapse all

    Summary table describing the validation partition c, returned as a table.

    • The first column Set describes the specific data set for which information is displayed. Possible values include "all" (the full data set), "train1" (the first training set), "test1" (the first test set), and so on.

    • The second column SetSize describes the size of each data set listed in Set.

    • The remaining columns depend on the properties of c.

      • If c.IsStratified is 1 (true), then the remaining columns are StratificationLabel, StratificationCount, and PercentInSet. StratificationLabel describes the label of interest in the stratification variable. StratificationCount describes the number of observations in the data set Set with the label StratificationLabel. PercentInSet describes the percentage of observations in the data set Set with the label StratificationLabel.

      • If c.IsGrouped is 1 (true), then the number of remaining columns varies based on the number of grouping variables.

        For two or more grouping variables, GroupLabel1 describes the label in the first grouping variable, GroupLabel2 describes the label in the second grouping variable, and so on. GroupCount describes the number of observations in the data set Set with the combination of labels in GroupLabel1, GroupLabel2, and so on. PercentInSet is the percentage of observations in the data set Set with the combination of labels in GroupLabel1, GroupLabel2, and so on.

        For one grouping variable, the columns are similar, with only one GroupLabel column.

    Version History

    Introduced in R2025a