Main Content

ClassificationBaggedEnsemble

Classification ensemble grown by resampling

Description

ClassificationBaggedEnsemble combines a set of trained weak learner models and data on which these learners were trained. It can predict ensemble response for new data by aggregating predictions from its weak learners.

Creation

Create a bagged classification ensemble object (ens) using fitcensemble. Set the name-value pair argument 'Method' of fitcensemble to 'Bag' to use bootstrap aggregation (bagging, for example, random forest).

For a description of bagged classification ensembles, see Bootstrap Aggregation (Bagging) and Random Forest.

Properties

expand all

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

This property is read-only.

List of the elements in Y with duplicates removed, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.)

Data Types: double | logical | char | cell | categorical

This property is read-only.

How the ensemble combines weak learner weights, returned as either 'WeightedAverage' or 'WeightedSum'.

Data Types: char

Cost of classifying a point into class j when its true class is i, returned as a square matrix. The rows of Cost correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

Data Types: double

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

Fit information, returned as a numeric array. The FitInfoDescription property describes the content of this array.

Data Types: double

Description of the information in FitInfo, returned as a character vector or cell array of character vectors.

Data Types: char | cell

Fraction of training data resampled during object construction, returned as a numeric scalar between 0 and 1. fitcensemble resamples the training data at random for every weak learner when constructing the ensemble.

Data Types: double

This property is read-only.

Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation:

  • 'bayesopt' (default) — Object of class BayesianOptimization

  • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

This property is read-only.

Names of weak learners in ensemble, returned as a cell array of character vectors. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}.

Data Types: cell

Method that fitcensemble uses to create the ensemble, returned as a character vector.

Example: 'AdaBoostM1'

Data Types: char

Parameters used in training the ensemble, returned as an EnsembleParams object. The properties of ModelParameters include the type of ensemble, either 'classification' or 'regression', the Method used to create the ensemble, and other parameters, depending on the ensemble.

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

This property is read-only.

Number of trained weak learners in the ensemble, returned as a positive integer.

Data Types: double

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

Prior probabilities for each class, returned as an m-element vector, where m is the number of unique classes in the response. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Data Types: double

Reason that fitcensemble stopped adding weak learners to the ensemble, returned as a character vector.

Example: 'Terminated normally after completing the requested number of training cycles.'

Data Types: char

Indication that the ensemble was trained with replacement, returned as true or false.

Data Types: logical

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

This property is read-only.

Rows of the original predictor data X used for fitting, returned as an n-element logical vector, where n is the number of rows of X. If the software uses all rows of X for constructing the object, then RowsUsed is an empty array ([]).

Data Types: logical

Function for transforming scores, specified as a function handle or the name of a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree.

Add or change a ScoreTransform function using dot notation:

ctree.ScoreTransform = 'function'
% or
ctree.ScoreTransform = @function

Data Types: char | string | function_handle

Trained classification models, returned as a cell vector. The entries of the cell vector contain the corresponding compact classification models.

Data Types: cell

This property is read-only.

Trained weights for the weak learners in the ensemble, returned as a numeric vector. TrainedWeights has T elements, where T is the number of weak learners in learners. The ensemble computes predicted response by aggregating weighted predictions from its learners.

Data Types: double

Indicator that observation was used to train learner, returned as a logical matrix of size N-by-NumTrained, where N is the number of rows of training data and NumTrained is the number of trained weak learners. UseObsForLearner(I,J) is true if observation I was used for training learner J, and is false otherwise.

Data Types: logical

This property is read-only.

Scaled weights in the ensemble, returned as a numeric vector. W has length n, the number of rows in the training data. The sum of the elements of W is 1.

Data Types: double

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

This property is read-only.

Row classifications corresponding to the rows of X, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. Each row of Y represents the classification of the corresponding row of X.

Data Types: single | double | logical | char | string | cell | categorical

Object Functions

compactReduce size of classification ensemble model
compareHoldoutCompare accuracies of two classification models using new data
crossvalCross-validate machine learning model
edgeClassification edge for classification ensemble model
gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
limeLocal interpretable model-agnostic explanations (LIME)
lossClassification loss for classification ensemble model
marginClassification margins for classification ensemble model
oobEdgeOut-of-bag classification edge for bagged classification ensemble model
oobLossOut-of-bag classification loss for bagged classification ensemble model
oobMarginOut-of-bag classification margins of bagged classification ensemble
oobPermutedPredictorImportanceOut-of-bag predictor importance estimates for random forest of classification trees by permutation
oobPredictPredict out-of-bag labels and scores of bagged classification ensemble
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predictPredict labels using classification ensemble model
predictorImportanceEstimates of predictor importance for classification ensemble of decision trees
removeLearnersRemove members of compact classification ensemble
resubEdgeResubstitution classification edge for classification ensemble model
resubLossResubstitution classification loss for classification ensemble model
resubMarginResubstitution classification margins for classification ensemble model
resubPredictClassify observations in classification ensemble by resubstitution
resumeResume training of classification ensemble model
shapleyShapley values
testckfoldCompare accuracies of two classification models by repeated cross-validation

Examples

collapse all

Load the ionosphere data set.

load ionosphere

You can train a bagged ensemble of 100 classification trees using all measurements.

Mdl = fitcensemble(X,Y,'Method','Bag')

fitcensemble uses a default template tree object templateTree() as a weak learner when 'Method' is 'Bag'. In this example, for reproducibility, specify 'Reproducible',true when you create a tree template object, and then use the object as a weak learner.

rng('default') % For reproducibility
t = templateTree('Reproducible',true); % For reproducibiliy of random predictor selections
Mdl = fitcensemble(X,Y,'Method','Bag','Learners',t)
Mdl = 
  ClassificationBaggedEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'none'
          NumObservations: 351
               NumTrained: 100
                   Method: 'Bag'
             LearnerNames: {'Tree'}
     ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                  FitInfo: []
       FitInfoDescription: 'None'
                FResample: 1
                  Replace: 1
         UseObsForLearner: [351x100 logical]


Mdl is a ClassificationBaggedEnsemble model object.

Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained classification trees (CompactClassificationTree model objects) that compose the ensemble.

Plot a graph of the first trained classification tree.

view(Mdl.Trained{1},'Mode','graph')

By default, fitcensemble grows deep decision trees for bagged ensembles.

Estimate the in-sample misclassification rate.

L = resubLoss(Mdl)
L = 0

L is 0, which indicates that Mdl is perfect at classifying the training data.

Tips

For a bagged ensemble of classification trees, the Trained property of ens stores a cell vector of ens.NumTrained CompactClassificationTree model objects. For a textual or graphical display of tree t in the cell vector, enter

view(ens.Trained{t})

Alternative Functionality

Bootstrap Aggregation Methods

For classification or regression, you can choose two approaches for bagging:

For help choosing between these approaches, see Ensemble Algorithms and Suggestions for Choosing an Appropriate Ensemble Algorithm.

Extended Capabilities

Version History

Introduced in R2011a

expand all