Main Content

ClassificationPartitionedEnsemble

Cross-validated classification ensemble

Description

ClassificationPartitionedEnsemble is a set of classification ensembles trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun.

Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in Trained{1} was trained on X and Y with the first 1/5 excluded, the second model stored in Trained{2} was trained on X and Y with the second 1/5 excluded, and so on. When you call kfoldPredict, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model, and so on. In short, response for every observation is computed by kfoldPredict using the model trained without this observation.

Creation

Description

example

cvens = crossval(ens) creates a cross-validated ensemble from ens, a classification ensemble. For syntax details, see the crossval method reference page.

cvens = fitcensemble(X,Y,Name,Value) creates a cross-validated ensemble when Name is one of 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see the fitcensemble function reference page.

Properties

expand all

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order.

Data Types: categorical | char | logical | single | double | cell

Cost of classifying a point into class j when its true class is i, returned as a square matrix. The rows of Cost correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

Data Types: double

Name of the cross-validated model, returned as a character vector.

Data Types: char

Number of folds in the cross-validated ensemble, returned as a positive integer.

Data Types: double

Parameters of the cross-validated ensemble, returned as an object.

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

Number of weak learners used in training each fold of the ensemble, returned as a positive integer.

Data Types: double

Partition used in cross-validation, returned as a CVPartition object.

Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X.

Data Types: cell

Prior probabilities for each class, returned as an m-element vector, where m is the number of unique classes in the response. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Data Types: double

Response variable name, specified as a character vector.

Data Types: char

Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores.

To change the score transformation function to function, for example, use dot notation.

  • For a built-in function, enter a character vector.

    Mdl.ScoreTransform = 'function';

    This table describes the available built-in functions.

    ValueDescription
    'doublelogit'1/(1 + e–2x)
    'invlogit'log(x / (1 – x))
    'ismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
    'logit'1/(1 + ex)
    'none' or 'identity'x (no transformation)
    'sign'–1 for x < 0
    0 for x = 0
    1 for x > 0
    'symmetric'2x – 1
    'symmetricismax'Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
    'symmetriclogit'2/(1 + ex) – 1

  • For a MATLAB® function or a function that you define, enter its function handle.

    Mdl.ScoreTransform = @function;

    function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Data Types: char | string | function_handle

The trained learners, returned as a cell array of full ensembles trained on cross-validation folds. Every ensemble is full, meaning it contains its training data and weights.

Data Types: cell

The trained learners, returned as a cell array of compact ensembles trained on cross-validation folds.

Data Types: cell

This property is read-only.

Scaled weights in the model, returned as a numeric vector. W has length n, the number of rows in the training data.

Data Types: double

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

This property is read-only.

Row classifications corresponding to the rows of X, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. Each row of Y represents the classification of the corresponding row of X.

Data Types: single | double | logical | char | string | cell | categorical

Object Functions

gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
kfoldEdgeClassification edge for cross-validated classification model
kfoldLossClassification loss for cross-validated classification model
kfoldMarginClassification margins for cross-validated classification model
kfoldPredictClassify observations in cross-validated classification model
kfoldfunCross-validate function for classification
resumeResume training of cross-validated classification ensemble model

Examples

collapse all

Evaluate the k-fold cross-validation error for a classification ensemble that models the Fisher iris data.

Load the sample data set.

load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2.

t = templateTree('MaxNumSplits',1); % Weak learner template tree object
ens = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Create a cross-validated ensemble from ens and find the k-fold cross-validation error.

rng(10,'twister') % For reproducibility
cvens = crossval(ens);
L = kfoldLoss(cvens)
L = 0.0533

Extended Capabilities

Version History

Introduced in R2011a

expand all