Main Content

Train Naive Bayes Classifiers Using Classification Learner App

This example shows how to create and compare different naive Bayes classifiers using the Classification Learner app, and export trained models to the workspace to make predictions for new data.

Naive Bayes classifiers leverage Bayes theorem and make the assumption that predictors are independent of one another within each class. However, the classifiers appear to work well even when the independence assumption is not valid. You can use naive Bayes with two or more classes in Classification Learner. The app allows you to train a Gaussian naive Bayes model or a kernel naive Bayes model individually or simultaneously.

This table lists the available naive Bayes models in Classification Learner and the probability distributions used by each model to fit predictors.

ModelNumerical PredictorCategorical Predictor
Gaussian naive BayesGaussian distribution (or normal distribution)multivariate multinomial distribution
Kernel naive BayesKernel distribution
You can specify the kernel type and support. Classification Learner automatically determines the kernel width using the underlying fitcnb function.
multivariate multinomial distribution

This example uses Fisher's iris data set, which contains measurements of flowers (petal length, petal width, sepal length, and sepal width) for specimens from three species. Train naive Bayes classifiers to predict the species based on the predictor measurements.

  1. In the MATLAB® Command Window, load the Fisher iris data set and create a table of measurement predictors (or features) using variables from the data set.

    fishertable = readtable('fisheriris.csv');
  2. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

  3. On the Classification Learner tab, in the File section, select New Session > From Workspace.

    Classification Learner tab

  4. In the New Session from Workspace dialog box, select the table fishertable from the Data Set Variable list (if necessary).

    As shown in the dialog box, the app selects the response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.

    New Session from Workspace dialog box

  5. To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting.

    Classification Learner creates a scatter plot of the data.

    Scatter plot of the Fisher iris data.

  6. Use the scatter plot to investigate which variables are useful for predicting the response. Select different options on the X and Y lists under Predictors to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly.

    The setosa species (blue points) is easy to separate from the other two species with all four predictors. The versicolor and virginica species are much closer together in all predictor measurements and overlap, especially when you plot sepal length and width. setosa is easier to predict than the other two species.

  7. Create a naive Bayes model. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Naive Bayes Classifiers group, click Gaussian Naive Bayes. Note that Classification Learner disables the Advanced button in the Model Type section, because this type of model has no advanced settings.

    Gaussian Naive Bayes model type selected

    In the Training section, click Train.

    The app creates a Gaussian naive Bayes model, and plots the results.

    The app display the Gaussian Naive Bayes model in the Models pane. Check the model validation score in the Accuracy (Validation) box. The score shows that the model performs well.

    For the Gaussian Naive Bayes model, by default, the app models the distribution of numerical predictors using the Gaussian distribution, and models the distribution of categorical predictors using the multivariate multinomial distribution (MVMN).

    Scatter plot of the Fisher iris data modeled by a Gaussian Naive Bayes classifier. Correctly classified points are marked with an O. Incorrectly classified points are marked with an X.

    Note

    Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.

  8. Examine the scatter plot. An X indicates misclassified points. The blue points (setosa species) are all correctly classified, but the other two species have misclassified points. Under Plot, switch between the Data and Model predictions options. Observe the color of the incorrect (X) points. Or, to view only the incorrect points, clear the Correct check box.

  9. Train a kernel naive Bayes model for comparison. On the Classification Learner tab, in the Model Type section, click Kernel Naive Bayes. Note that Classification Learner enables the Advanced button, because this type of model has advanced settings.

    Kernel Naive Bayes model type selected

    The app displays a draft kernel naive Bayes model in the Models pane.

    In the Model Type section, click Advanced to change settings in the Advanced Naive Bayes Options dialog box. Select Triangle from the Kernel Type list, and select Positive from the Support list.

    Advanced naive Bayes options selected

    Note

    The settings in the Advanced Naive Bayes Options dialog box are available for continuous data only. Pointing to Kernel Type displays the tooltip "Specify Kernel smoothing function for continuous variables," and pointing to Support displays the tooltip "Specify Kernel smoothing density support for continuous variables."

    In the Training section, click Train to train the new model.

    Scatter plot of the Fisher iris data modeled by a Kernel Naive Bayes classifier. Correctly classified points are marked with an O. Incorrectly classified points are marked with an X.

    The Models pane now includes the new kernel naive Bayes model. Its model validation score is better than the score for the Gaussian naive Bayes model. The app highlights the Accuracy (Validation) score of the best model by outlining it in a box.

  10. In the Models pane, click each model to view and compare the results.

  11. Train a Gaussian naive Bayes model and a kernel naive Bayes model simultaneously. On the Classification Learner tab, in the Model Type section, click All Naive Bayes. Classification Learner disables the Advanced button. In the Training section, click Train.

    All Naive Bayes model type selected

    The app trains one of each naive Bayes model type and highlights the Accuracy (Validation) score of the best model or models.

    Scatter plot of the Fisher iris data modeled by a Gaussian Naive Bayes classifier. The Models pane on the left shows the accuracy for each model.

  12. In the Models pane, click a model to view the results. Examine the scatter plot for the trained model and try plotting different predictors. Misclassified points appear as an X.

  13. To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix and select Validation Data. The app displays a matrix of true class and predicted class results.

    Confusion matrix plot for the Kernel Naive Bayes model

    Note

    Validation introduces some randomness into the results. Your confusion matrix results can vary from the results shown in this example.

  14. In the Models pane, click the other models and compare their results.

  15. In the Models pane, click the model with the highest Accuracy (Validation) score. To improve the model, try modifying its features. For example, see if you can improve the model by removing features with low predictive power.

    On the Classification Learner tab, in the Features section, click Feature Selection.

    In the Feature Selection dialog box, clear the check boxes for PetalLength and PetalWidth to exclude them from the predictors. A new draft model (model 4) appears in the Models pane with the new settings (2/4 features), based on the kernel naive Bayes model (model 3.2).

    Feature Selection menu with SepalLength and SepalWidth selected, and PetalLength and PetalWidth cleared

    In the Training section, click Train to train a new kernel naive Bayes model using the new predictor options.

    The Models pane now includes model 4. It is also a kernel naive Bayes model, trained using only 2 of 4 predictors.

  16. To determine which predictors are included, click a model in the Models pane, then click Feature Selection in the Features section and note which check boxes are selected. The model with only sepal measurements (model 4) has a much lower Accuracy (Validation) score than the models containing all predictors.

  17. Train another kernel naive Bayes model including only the petal measurements. Change the selections in the Feature Selection dialog box and click Train.

    Confusion matrix plot for the Kernel Naive Bayes model with two of the four features selected. The Models pane on the left shows the accuracy for each model.

    The model trained using only petal measurements (model 5) performs comparably to the model containing all predictors. The models predict no better using all the measurements compared to only the petal measurements. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors.

  18. To investigate features to include or exclude, use the parallel coordinates plot. On the Classification Learner tab, in the Plots section, click Parallel Coordinates.

  19. In the Models pane, click the model with the highest Accuracy (Validation) score. To improve the model further, try changing naive Bayes settings (if available). On the Classification Learner tab, in the Model Type section, click Advanced. Recall that the Advanced button is enabled only for some models. Change a setting, then train the new model by clicking Train.

  20. Export the trained model to the workspace. On the Classification Learner tab, in the Export section, select Export Model > Export Model. See Export Classification Model to Predict New Data.

  21. Examine the code for training this classifier. In the Export section, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner.

To try all the nonoptimizable classifier model presets available for your data set:

  1. Click the arrow on the Model Type section to open the gallery of classifiers.

  2. In the Get Started group, click All, then click Train in the Training section.

    Option selected for training all available classifier types

For information about other classifier types, see Train Classification Models in Classification Learner App.

Related Topics