crossvalidate

Cross-validate pipeline

Since R2026a

Syntax

cvloss = crossvalidate(pipeline,input1,...,inputN)

[cvloss,output1,...,outputN] = crossvalidate(pipeline,input1,...,inputN)

[cvloss,output1,...,outputN,foldPipelines] = crossvalidate(pipeline,input1,...,inputN,ReturnFoldPipelines=returnFoldPipelines)

[___] = crossvalidate(___,Name=Value)

Description

cvloss = crossvalidate(pipeline,input1,...,inputN) returns the five-fold cross-validation loss for the pipeline pipeline.

[cvloss,output1,...,outputN] = crossvalidate(pipeline,input1,...,inputN) returns additional outputs corresponding to the pipeline outputs in the Outputs property. You can see the identity and order of the outputs using pipeline.Outputs.

[cvloss,output1,...,outputN,foldPipelines] = crossvalidate(pipeline,input1,...,inputN,ReturnFoldPipelines=returnFoldPipelines) also returns the pipelines learned during k-fold cross validation.

[___] = crossvalidate(___,Name=Value) returns any of the output combinations in the previous syntaxes with additional options specified by one or more name-value arguments. For example, Holdout=0.2 specifies to perform holdout validation with a test set fraction of 0.2.

example

Examples

collapse all

Cross-Validate Pipeline Using K-Fold Cross-Validation

Create a pipeline with four components to impute missing observations, normalize data, perform principal component analysis, and perform ECOC classification.

impute = observationImputerComponent;
normalize = normalizerComponent;
pca = pcaComponent(NumComponents=3);
ecoc = classificationECOCComponent;
pipe = series(impute,normalize,pca,ecoc);

Load the carbig data set. Store the acceleration, displacement, and horsepower data as predictor data in the table X. Update the response variable Origin to categorize the cars based on whether they were made in the USA, and store this variable in the table Y.

load carbig
X = table(Acceleration,Displacement,Horsepower);
Origin = categorical(cellstr(Origin));
Origin = mergecats(Origin,["France","Japan","Germany", ...
    "Sweden","Italy","England"],"NotUSA");
Y = table(Origin);

Cross-validate the pipeline using ten-fold cross-validation.

rng("default")
cvLoss = crossvalidate(pipe,X,Y,KFold=10)

cvLoss =

   0.1232

Input Arguments

collapse all

`pipeline` — Pipeline to cross-validate
`LearningPipeline` object

Pipeline to cross-validate, specified as a LearningPipeline. pipeline must contain one of the following supervised learning components.

Classification Model Components

Component	Purpose
`classificationDiscriminantComponent`	Discriminant analysis classification
`classificationECOCComponent`	Multiclass classification using error-correcting output codes (ECOC) model
`classificationEnsembleComponent`	Ensemble classification
`classificationGAMComponent`	Binary classification using generalized additive model (GAM)
`classificationKernelComponent`	Classification using Gaussian kernel with random feature expansion
`classificationKNNComponent`	Classification using k-nearest neighbor model
`classificationLinearComponent`	Binary classification of high-dimensional data using a linear model
`classificationNaiveBayesComponent`	Multiclass classification using a naive Bayes model
`classificationNeuralNetworkComponent`	Classification using a neural network model
`classificationSVMComponent`	One-class and binary classification using a support vector machine (SVM) classifier
`classificationTreeComponent`	Decision tree classifier

Regression Model Components

Component	Purpose
`regressionEnsembleComponent`	Ensemble regression
`regressionGAMComponent`	Regression using generalized additive model (GAM)
`regressionGPComponent`	Gaussian process regression
`regressionKernelComponent`	Kernel regression using explicit feature expansion
`regressionLinearComponent`	Linear regression
`regressionNeuralNetworkComponent`	Neural network regression
`regressionSVMComponent`	Regression using a support vector machine (SVM)
`regressionTreeComponent`	Decision tree regression

`input` — Input data required by pipeline
table

Input data required by pipeline, specified as a table. Input data can be predictor data, response values, observation weights, and so on. The order of the inputs 1, …, N must match the order of the pipeline inputs, as listed in the Inputs property. You can see the identity and the order of the inputs using pipeline.Inputs.

Data Types: table

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: crossvalidate(pipe,X,Y,KFold=10) specifies to cross validate the pipeline using ten-fold cross-validation.

`KFold` — Number of folds
`5` (default) | positive integer scalar

Number of folds for k-fold cross-validation, specified as a positive integer scalar.

When you specify a KFold value of k, crossvalidate randomly partitions the data into k sets. For each set, the functions reserves that set as test data, and learns the pipeline using the other k-1 sets. crossvalidate runs the k pipelines with the corresponding test sets, and uses the results to compute cvloss.

You can specify only one of these three name-value arguments: Holdout, KFold, and Partition.

Example: KFold=10

Data Types: single | double

`Partition` — Cross-validation partition
`cvpartition` object

Cross-validation partition, specified as a cvpartition object. The partition object specifies the type of cross-validation and the indexing for the training and test sets.

You cannot specify both Partition and Stratify. Instead, directly specify a stratified partition when you create the partition object.

You can specify only one of these three name-value arguments: Holdout, KFold, and Partition.

Example: Partition=cvpartition(X,Holdout=0.2)

`Holdout` — Fraction of data for holdout validation
scalar value in the range `(0,1)`

Fraction of data for holdout validation, specified as a scalar value in the range (0,1).

crossvalidate randomly selects and reserves the proportion of observations specified by Holdout as test data, then learns the pipeline using the remaining data. Finally, the function uses the test data along with the learned pipeline to compute cvloss.

You can specify only one of these three name-value arguments: Holdout, KFold, and Partition.

Example: Holdout=0.1

Data Types: single | double

`LossFun` — Loss function
`"binodeviance"` | `"classifcost"` | `"classiferror"` | `"crossentropy"` | `"epsiloninsensitive"` | `"exponential"` | `"hinge"` | `"logit"` | `"mincost"` | `"mse"` | `"quadratic"` | function handle

Loss function, specified as a function handle or one of the values in this table.

Value	Description
`"binodeviance"`	Binomial deviance
`"classifcost"`	Observed misclassification cost
`"classiferror"`	Misclassified rate in decimal
`"crossentropy"`	Cross-entropy loss
`"epsiloninsensitive"`	Epsilon-insensitive loss
`"exponential"`	Exponential loss
`"hinge"`	Hinge loss
`"logit"`	Logistic loss
`"mincost"`	Minimal expected misclassification cost (for classification scores that are posterior probabilities)
`"mse"`	Mean squared error
`"quadratic"`	Quadratic loss

To specify a custom loss function, use function handle notation.

If pipeline contains a classification component and you specify a value for the Prior property of that component, crossvalidate normalizes the observation weights used to compute loss so that they sum to the corresponding prior class probability. Otherwise, crossvalidate does not normalize observation weights.

LossFun must be a value accepted by the LossFun property of the supervised learning component in pipeline. By default, crossvalidate uses the loss function specified in the LossFun property of the supervised learning component in pipeline.

Example: LossFun="classiferror"

Data Types: char | string | function_handle

`Stratify` — Indicator for stratification
`true` or `1` | `false` or `0`

Indicator for stratification, specified as 1 (true) or 0 (false).

Stratification is only supported when pipeline contains a classification learning component. When Stratify is true, each cross-validation set maintains the same proportion of classes as the original dataset.

The default value is true if pipeline contains a classification component, and false if pipeline contains a regression component.

Example: Stratify=false

Data Types: logical

`StratificationInput` — Input tag of data used for stratification
`2` (default) | positive numeric scalar

Input tag of the data used for stratification, specified as a positive numeric scalar. crossvalidate uses the data specified by StratificationInput to divide the data into a stratified partition.

Stratification is only supported when pipeline contains a classification learning component.

Example: StratificationInput=3

Data Types: single | double

`ReturnFoldPipelines` — Indicator to return pipelines
`false` or `0` (default) | `true` or `1`

Indicator to return the pipelines learned during k-fold cross-validation, specified as 0 (false) or 1 (true).

If ReturnFoldPipelines is true, crossvalidate returns the learned pipelines as foldPipelines.

Example: ReturnFoldPipelines=true

Data Types: logical

Output Arguments

collapse all

`cvloss` — Cross-validation loss
numeric scalar

Cross-validation loss, specified as a numeric scalar. The function determines cvloss by computing the aggregate loss from all test data in the partition.

For k-fold cross-validation, crossvalidate combines the test data from each fold to compute cvloss. For holdout cross-validation, crossvalidate uses the test set to compute cvloss.

`output` — Output data
separate variables

Output data computed by pipeline based on the input data, returned as separate variables.

`foldPipelines` — Pipelines learned during k-fold cross-validation
cell array of `LearningPipeline` objects

Pipelines learned during k-fold cross-validation, specified as a cell array of LearningPipeline objects.

Version History

Introduced in R2026a

crossvalidate

Syntax

Description

Examples

Cross-Validate Pipeline Using K-Fold Cross-Validation

Input Arguments

pipeline — Pipeline to cross-validate LearningPipeline object

input — Input data required by pipeline table

Name-Value Arguments

KFold — Number of folds 5 (default) | positive integer scalar

Partition — Cross-validation partition cvpartition object

Holdout — Fraction of data for holdout validation scalar value in the range (0,1)

LossFun — Loss function "binodeviance" | "classifcost" | "classiferror" | "crossentropy" | "epsiloninsensitive" | "exponential" | "hinge" | "logit" | "mincost" | "mse" | "quadratic" | function handle

Stratify — Indicator for stratification true or 1 | false or 0

StratificationInput — Input tag of data used for stratification 2 (default) | positive numeric scalar

ReturnFoldPipelines — Indicator to return pipelines false or 0 (default) | true or 1

Output Arguments

cvloss — Cross-validation loss numeric scalar

output — Output data separate variables

foldPipelines — Pipelines learned during k-fold cross-validation cell array of LearningPipeline objects

Version History

See Also

`pipeline` — Pipeline to cross-validate
`LearningPipeline` object

`input` — Input data required by pipeline
table

`KFold` — Number of folds
`5` (default) | positive integer scalar

`Partition` — Cross-validation partition
`cvpartition` object

`Holdout` — Fraction of data for holdout validation
scalar value in the range `(0,1)`

`LossFun` — Loss function
`"binodeviance"` | `"classifcost"` | `"classiferror"` | `"crossentropy"` | `"epsiloninsensitive"` | `"exponential"` | `"hinge"` | `"logit"` | `"mincost"` | `"mse"` | `"quadratic"` | function handle

`Stratify` — Indicator for stratification
`true` or `1` | `false` or `0`

`StratificationInput` — Input tag of data used for stratification
`2` (default) | positive numeric scalar

`ReturnFoldPipelines` — Indicator to return pipelines
`false` or `0` (default) | `true` or `1`

`cvloss` — Cross-validation loss
numeric scalar

`output` — Output data
separate variables

`foldPipelines` — Pipelines learned during k-fold cross-validation
cell array of `LearningPipeline` objects