Contenuto principale

crossvalidate

Cross-validate pipeline

Since R2026a

    Description

    cvloss = crossvalidate(pipeline,input1,...,inputN) returns the five-fold cross-validation loss for the pipeline pipeline.

    [cvloss,output1,...,outputN] = crossvalidate(pipeline,input1,...,inputN) returns additional outputs corresponding to the pipeline outputs in the Outputs property. You can see the identity and order of the outputs using pipeline.Outputs.

    [cvloss,output1,...,outputN,foldPipelines] = crossvalidate(pipeline,input1,...,inputN,ReturnFoldPipelines=returnFoldPipelines) also returns the pipelines learned during k-fold cross validation.

    [___] = crossvalidate(___,Name=Value) returns any of the output combinations in the previous syntaxes with additional options specified by one or more name-value arguments. For example, Holdout=0.2 specifies to perform holdout validation with a test set fraction of 0.2.

    example

    Examples

    collapse all

    Create a pipeline with four components to impute missing observations, normalize data, perform principal component analysis, and perform ECOC classification.

    impute = observationImputerComponent;
    normalize = normalizerComponent;
    pca = pcaComponent(NumComponents=3);
    ecoc = classificationECOCComponent;
    pipe = series(impute,normalize,pca,ecoc);

    Load the carbig data set. Store the acceleration, displacement, and horsepower data as predictor data in the table X. Update the response variable Origin to categorize the cars based on whether they were made in the USA, and store this variable in the table Y.

    load carbig
    X = table(Acceleration,Displacement,Horsepower);
    Origin = categorical(cellstr(Origin));
    Origin = mergecats(Origin,["France","Japan","Germany", ...
        "Sweden","Italy","England"],"NotUSA");
    Y = table(Origin);

    Cross-validate the pipeline using ten-fold cross-validation.

    rng("default")
    cvLoss = crossvalidate(pipe,X,Y,KFold=10)
    cvLoss =
    
       0.1232

    Input Arguments

    collapse all

    Pipeline to cross-validate, specified as a LearningPipeline. pipeline must contain one of the following supervised learning components.

    Classification Model Components

    ComponentPurpose
    classificationDiscriminantComponentDiscriminant analysis classification
    classificationECOCComponentMulticlass classification using error-correcting output codes (ECOC) model
    classificationEnsembleComponentEnsemble classification
    classificationGAMComponentBinary classification using generalized additive model (GAM)
    classificationKernelComponentClassification using Gaussian kernel with random feature expansion
    classificationKNNComponentClassification using k-nearest neighbor model
    classificationLinearComponentBinary classification of high-dimensional data using a linear model
    classificationNaiveBayesComponentMulticlass classification using a naive Bayes model
    classificationNeuralNetworkComponentClassification using a neural network model
    classificationSVMComponentOne-class and binary classification using a support vector machine (SVM) classifier
    classificationTreeComponentDecision tree classifier

    Regression Model Components

    ComponentPurpose
    regressionEnsembleComponentEnsemble regression
    regressionGAMComponentRegression using generalized additive model (GAM)
    regressionGPComponentGaussian process regression
    regressionKernelComponentKernel regression using explicit feature expansion
    regressionLinearComponentLinear regression
    regressionNeuralNetworkComponentNeural network regression
    regressionSVMComponentRegression using a support vector machine (SVM)
    regressionTreeComponentDecision tree regression

    Input data required by pipeline, specified as a table. Input data can be predictor data, response values, observation weights, and so on. The order of the inputs 1, …, N must match the order of the pipeline inputs, as listed in the Inputs property. You can see the identity and the order of the inputs using pipeline.Inputs.

    Data Types: table

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: crossvalidate(pipe,X,Y,KFold=10) specifies to cross validate the pipeline using ten-fold cross-validation.

    Number of folds for k-fold cross-validation, specified as a positive integer scalar.

    When you specify a KFold value of k, crossvalidate randomly partitions the data into k sets. For each set, the functions reserves that set as test data, and learns the pipeline using the other k-1 sets. crossvalidate runs the k pipelines with the corresponding test sets, and uses the results to compute cvloss.

    You can specify only one of these three name-value arguments: Holdout, KFold, and Partition.

    Example: KFold=10

    Data Types: single | double

    Cross-validation partition, specified as a cvpartition object. The partition object specifies the type of cross-validation and the indexing for the training and test sets.

    You cannot specify both Partition and Stratify. Instead, directly specify a stratified partition when you create the partition object.

    You can specify only one of these three name-value arguments: Holdout, KFold, and Partition.

    Example: Partition=cvpartition(X,Holdout=0.2)

    Fraction of data for holdout validation, specified as a scalar value in the range (0,1).

    crossvalidate randomly selects and reserves the proportion of observations specified by Holdout as test data, then learns the pipeline using the remaining data. Finally, the function uses the test data along with the learned pipeline to compute cvloss.

    You can specify only one of these three name-value arguments: Holdout, KFold, and Partition.

    Example: Holdout=0.1

    Data Types: single | double

    Loss function, specified as a function handle or one of the values in this table.

    ValueDescription
    "binodeviance"Binomial deviance
    "classifcost"Observed misclassification cost
    "classiferror"Misclassified rate in decimal
    "crossentropy"Cross-entropy loss
    "epsiloninsensitive"Epsilon-insensitive loss
    "exponential"Exponential loss
    "hinge"Hinge loss
    "logit"Logistic loss
    "mincost"Minimal expected misclassification cost (for classification scores that are posterior probabilities)
    "mse"Mean squared error
    "quadratic"Quadratic loss

    To specify a custom loss function, use function handle notation.

    If pipeline contains a classification component and you specify a value for the Prior property of that component, crossvalidate normalizes the observation weights used to compute loss so that they sum to the corresponding prior class probability. Otherwise, crossvalidate does not normalize observation weights.

    LossFun must be a value accepted by the LossFun property of the supervised learning component in pipeline. By default, crossvalidate uses the loss function specified in the LossFun property of the supervised learning component in pipeline.

    Example: LossFun="classiferror"

    Data Types: char | string | function_handle

    Indicator for stratification, specified as 1 (true) or 0 (false).

    Stratification is only supported when pipeline contains a classification learning component. When Stratify is true, each cross-validation set maintains the same proportion of classes as the original dataset.

    The default value is true if pipeline contains a classification component, and false if pipeline contains a regression component.

    Example: Stratify=false

    Data Types: logical

    Input tag of the data used for stratification, specified as a positive numeric scalar. crossvalidate uses the data specified by StratificationInput to divide the data into a stratified partition.

    Stratification is only supported when pipeline contains a classification learning component.

    Example: StratificationInput=3

    Data Types: single | double

    Indicator to return the pipelines learned during k-fold cross-validation, specified as 0 (false) or 1 (true).

    If ReturnFoldPipelines is true, crossvalidate returns the learned pipelines as foldPipelines.

    Example: ReturnFoldPipelines=true

    Data Types: logical

    Output Arguments

    collapse all

    Cross-validation loss, specified as a numeric scalar. The function determines cvloss by computing the aggregate loss from all test data in the partition.

    For k-fold cross-validation, crossvalidate combines the test data from each fold to compute cvloss. For holdout cross-validation, crossvalidate uses the test set to compute cvloss.

    Output data computed by pipeline based on the input data, returned as separate variables.

    Pipelines learned during k-fold cross-validation, specified as a cell array of LearningPipeline objects.

    Version History

    Introduced in R2026a