Machine Learning Made Easy: Finding a Model for Categorical Data
In this example, use supervised learning to find a classification model for fitness tracker data. Using the Classification Learner app and functions in Statistics and Machine Learning Toolbox™, perform common machine learning tasks such as:
- Selecting and transforming features
- Specifying cross-validation schemes
- Training a range of classification models, including support vector machines (SVMs), boosted and bagged decision trees, k-nearest neighbors, and discriminant analysis
- Performing model assessment and model comparisons using confusion matrices and ROC curves to help choose the best model for your data
Published: 29 Jul 2024
Let's say we have a fitness tracking app on our phone. This app tracks a bunch of different activities related to fitness, like walking, running, and dancing, but also time spent sitting and standing. This app doesn't track the information by making us log every time we stand up. That would be incredibly annoying. Instead, it uses inbuilt sensors to categorize our activity-- namely, the accelerometer, how fast the phone is moving, and the gyroscope, the angle at which the phone is being held at.
And machine learning helps us create a model to translate those sensor readings into those particular activity categories. In this example, we have a data set of prelabeled observations ready to go.
Importing and preprocessing the data. Importing data in MATLAB is relatively straightforward. You either use the importdata function or the GUI up here. With this demo, it's even simpler because we're using a data set that comes with MATLAB.
So we just type, openExample stats/HARDeploymentExample. And we get the data set. To set up our data such that it's labeled with activity names and not numbers, we run this code section here. Now we have a set of labeled observations. And we're ready to get started finding a model.
Setting up the data and Classification Learner. To find the Classification Learner app, simply go up to the Apps tab at the top of the page. Click the down arrow to see machine learning and deep learning apps. And select Classification Learner.
Once in Classification Learner, start a new session. We then get prompted to select our data. In our case, it's the Tbl variable. And the response variable is from the data set.
Now we want to set aside a test data set so we can see how effective the model we eventually choose will be. To do so, we select Set aside a test data set. And set aside 10% of our data set. Also, make sure a validation scheme is selected. I'm going with the default, Cross-Validation. It loads and gives us a view of our data set. And we can see this is a good candidate for machine learning, as there's no hard and fast rule dividing our data points.
Using Classification Learner. Now that we have our data set-- all set up in Classification Learner, let's try out some models. From this dropdown, we can see all the different options for models, from decision trees to neural network classifiers. Because of the size of the data set and how much time I have, I'm just going to select the All option and then Train All. And then I'm going to go get myself a coffee while my models train.
Once all of the models have been trained, I can see the results. Next to each model, there's an accuracy rate. And with 99.5% accuracy, the winning model is cubic SVM. We can examine this model by selecting it and then picking Confusion Matrix from this dropdown. We can view the confusion matrix by number of observations or by percentages.
As you can see, we have a high degree of accuracy. But there's some discrepancy between running and dancing. Overall, though, this seems like a strong model. Let's try it out on our testing data. We go up to the Test tab and select Confusion Matrix parentheses Test. And we see the test matrix has similar results. This model is all well and good. But what if we want to use it outside of the Classification Learner app?
Exporting the model from Classification Learner. If you select the dropdown from Export Model, you have four options-- exporting the model from MATLAB or Simulink, exporting for deployment, and exporting for Experiment Manager. We're just going to select Export Model. And that gives us a struct that we can use to run this model again. But let's say we don't want this information as a struct. Well, we can click Generate Function and get a MATLAB script that does the steps we did in the GUI programmatically.
Conclusion. As you can see, finding a machine learning model that works for your data doesn't have to be as scary, painful process. It can be as simple as clicking a few buttons. Check out the links in the description for more information about this demo and machine learning with MATLAB. As always, if you enjoyed this video, like, share, and subscribe. Thanks for watching, and happy coding.