This example shows how to create and compare various regression trees using the Regression Learner app, and export trained models to the workspace to make predictions for new data.
You can train regression trees to predict responses to given input data. To predict the response of a regression tree, follow the tree from the root (beginning) node down to a leaf node. At each node, decide which branch to follow using the rule associated to that node. Continue until you arrive at a leaf node. The predicted response is the value associated to that leaf node.
Statistics and Machine Learning Toolbox™ trees are binary. Each step in a prediction involves checking the value of one predictor variable. For example, here is a simple regression tree:
This tree predicts the response based on two predictors,
x2. To predict, start at the top node. At each node, check the
values of the predictors to decide which branch to follow. When the branches reach a
leaf node, the response is set to the value corresponding to that node.
This example uses the
carbig data set. This data set contains
characteristics of different car models produced from 1970 through 1982,
Number of cylinders
Engine power (Horsepower)
Country of origin
Miles per gallon (MPG)
Train regression trees to predict the fuel economy in miles per gallon of a car model, given the other variables as inputs.
In MATLAB®, load the
carbig data set and create a table
containing the different variables:
load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);
On the Apps tab, in the Machine Learning and Deep Learning group, click Regression Learner.
On the Regression Learner tab, in the File section, select New Session > From Workspace.
Under Data Set Variable in the New Session from Workspace
dialog box, select
cartable from the list of tables and
matrices in your workspace.
Observe that the app has preselected response and predictor variables.
MPG is chosen as the response, and all the other
variables as predictors. For this example, do not change the selections.
To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting.
Regression Learner creates a plot of the response with the record number on the x-axis.
Use the response plot to investigate which variables are useful for predicting the response. To visualize the relation between different predictors and the response, select different variables in the X list under X-axis.
Observe which variables are correlated most clearly with the response.
Weight all have a clearly visible impact on the response
and all show a negative association with the response.
Select the variable
X-axis. A box plot is automatically displayed. A box
plot shows the typical values of the response and any possible outliers. The box
plot is useful when plotting markers results in many points overlapping. To show
a box plot when the variable on the x-axis has few unique
values, under Style, select Box plot.
Create a selection of regression trees. On the Regression Learner tab, in the Model Type section, click All Trees .
Then click Train .
If you have Parallel Computing Toolbox™, you can train all the models (All Trees) simultaneously by selecting the Use Parallel button in the Training section before clicking Train. After you click Train, the Opening Parallel Pool dialog box opens and remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, the app trains the models simultaneously.
Regression Learner creates and trains three regression trees: a Fine Tree, a Medium Tree, and a Coarse Tree.
The three models appear in the Models pane. Check the RMSE (Validation) (validation root mean squared error) of the models. The best score is highlighted in a box.
The Fine Tree and the Medium Tree have similar RMSEs, while the Coarse Tree is less accurate.
Regression Learner plots both the true training response and the predicted response of the first model (model 1.1).
If you are using validation, the results include some randomness. So, your model validation score can differ from the results shown.
Choose a model in the Models pane to view the results of
that model. For example, select the Medium Tree model
(model 1.2). On the Regression Learner tab, in the
Plots section, click the arrow to open the gallery, and
then click Response in the Validation
Results group. Under X-axis, select
Horsepower and examine the response plot. Both the true
and predicted responses are now plotted. Show the prediction errors, drawn as
vertical lines between the predicted and true responses, by selecting the
Errors check box.
See more details on the currently selected model in the Current Model Summary pane. Check and compare additional model characteristics, such as R-squared (coefficient of determination), MAE (mean absolute error), and prediction speed. To learn more, see View and Compare Model Statistics. In the Current Model Summary pane, you also can find details on the currently selected model type, such as options used for training the model.
Plot the predicted response versus true response. On the Regression Learner tab, in the Plots section, click the arrow to open the gallery, and then click Predicted vs. Actual (Validation) in the Validation Results group. Use this plot to understand how well the regression model makes predictions for different response values.
A perfect regression model has predicted response equal to true response, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, so the predictions are scattered near the line. Usually a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, it is likely that you can improve your model.
Select the other models in the Models pane, open the
predicted versus actual plot for each of the models, and then compare the
results. Rearrange the layout of the plots to better compare the plots. Click
the Document Actions arrow located to the far right of the model plot
tabs. Select the
Tile All option and specify a
1-by-3 layout. Click the Hide plot options button in the top right of the plots to make more
room for the plots.
To return to the original layout, you can click the Layout button in the Plots section and select Single model (Default).
In the Model Type gallery, select All Trees again. To try to improve the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. On the Regression Learner tab, in the Features section, click Feature Selection .
In the Feature Selection dialog box, clear the check boxes for Acceleration and Cylinders to exclude them from the predictors. Click OK.
Click Train to train new regression trees using the new predictor settings.
Observe the new models in the Models pane. These models are the same regression trees as before, but trained using only five of seven predictors. The app displays how many predictors are used. To check which predictors are used, click a model in the Models pane and observe the check boxes in the Feature Selection dialog box.
The models with the two features removed perform comparably to the models using all predictors. The models predict no better using all the predictors compared to using only a subset of the predictors. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors.
Train the three regression tree presets using only
Horsepower as a predictor. Change the selections in the
Feature Selection dialog box and click OK. Then click
Using only the engine power as a predictor results in models with lower accuracy. However, the models perform well given that they are using only a single predictor. With this simple one-dimensional predictor space, the coarse tree now performs as well as the medium and fine trees.
Select the best model in the Models pane and view the residuals plot. On the Regression Learner tab, in the Plots section, click the arrow to open the gallery, and then click Residuals (Validation) in the Validation Results group. The residuals plot displays the difference between the predicted and true responses. To display the residuals as a line graph, in the Style section, choose Lines.
Under X-axis, select the variable to plot on the x-axis. Choose the true response, predicted response, record number, or one of your predictors.
Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model.
To learn about model settings, choose the best model in the Models pane and view the advanced settings. The nonoptimizable model options in the Model Type gallery are preset starting points, and you can change additional settings. On the Regression Learner tab, in the Model Type section, click Advanced. Compare the different regression tree models in the Models pane, and observe the differences in the Advanced Regression Tree Options dialog box. The Minimum leaf size setting controls the size of the tree leaves, and through that the size and depth of the regression tree.
To try to improve the model further, change the Minimum leaf size setting to 8 and click OK. Then train a new model by clicking Train.
View the settings for the selected trained model in the Current Model Summary pane or in the Advanced Regression Tree Options dialog box.
To learn more about regression tree settings, see Regression Trees.
You can export a full or compact version of the selected model to the
workspace. On the Regression Learner tab, in the
Export section, click Export
Model and select either Export Model or
Export Compact Model. In the Export Model dialog box,
click OK to accept the default variable name
To see information about the results, look in the command window.
Use the exported model to make predictions on new data. For example, to make
predictions for the
cartable data in your workspace,
yfit = trainedModel.predictFcn(cartable)
yfitcontains the predicted response for each data point.
If you want to automate training the same model with new data or learn how to programmatically train regression models, you can generate code from the app. To generate code for the best trained model, on the Regression Learner tab, in the Export section, click Generate Function.
The app generates code from your model and displays the file in the MATLAB Editor. To learn more, see Generate MATLAB Code to Train Model with New Data.
Use the same workflow as in this example to evaluate and compare the other regression model types you can train in Regression Learner.
Train all the nonoptimizable regression model presets available:
On the far right of the Model Type section, click the arrow to expand the list of regression models.
Click All , and then click Train.
To learn about other regression model types, see Train Regression Models in Regression Learner App.