MathWorks - Mobile View
  • Accedere al proprio MathWorks AccountAccedere al proprio MathWorks Account
  • Access your MathWorks Account
    • Il Mio Account
    • Il mio Profilo utente
    • Associa Licenza
    • Disconnettiti
  • Prodotti
  • Soluzioni
  • Università
  • Assistenza
  • Community
  • Eventi
  • Acquista MATLAB
MathWorks
  • Prodotti
  • Soluzioni
  • Università
  • Assistenza
  • Community
  • Eventi
  • Acquista MATLAB
  • Accedere al proprio MathWorks AccountAccedere al proprio MathWorks Account
  • Access your MathWorks Account
    • Il Mio Account
    • Il mio Profilo utente
    • Associa Licenza
    • Disconnettiti

Video e Webinar

  • MathWorks
  • Video
  • Home Video
  • Cerca
  • Home Video
  • Cerca
  • Contattaci
  • Software di prova
  Register to watch video
  • Description
  • Full Transcript
  • Related Resources

Applied Machine Learning, Part 1: Feature Engineering

From the series: Applied Machine Learning

Adam Filion, MathWorks

Explore how to perform feature engineering, a technique for transforming raw data into features that are suitable for a machine learning algorithm. 

Feature engineering starts with your best guess about what features might influence the action you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if your results have improved.   

This video provides a high-level overview of the topic, and it uses several examples to illustrate basic principles behind feature engineering and established ways for extracting features from signals, text, and images. 

­­­­­Machine learning algorithms don’t always work so ­­­­well on raw data. Part of our jobs as engineers and scientists is to transform the raw data to make the behavior of the system more obvious to the machine learning algorithm. This is called feature engineering.   

Feature engineering starts with your best guess about what features might influence the thing you’re trying to predict.  After that, it’s an iterative process where you create new features, add them to your model, and see if the result improved. 

Let’s take a simple example where we want to predict whether a flight is going to be delayed or not. 

In the raw data, we have information such as the month of the flight, the destination, and the day of the week.  

If I fit a decision tree just to this data, I’ll get an accuracy of 70%. What else could we calculate from this data that might help improve our predictions?

Well, how about the number of flights per day?  There are more flights on some days than others, which may mean they’re more likely to be delayed. 

I already have this feature from my dataset in the app, so let’s add it and retrain the model. You can see the model accuracy improved to 74%. Not bad for just adding a feature.

Feature engineering is often referred to as a creative process, more of an art than a science.  There’s no correct way to do it, but if you have domain expertise and a solid understanding of the data, you’ll be in a good position to perform feature engineering.  As you’ll see later, techniques used for feature engineering are things you may already be familiar with, but you might not have thought about them in this context before.

Let’s see another example that’s a bit more interesting.  Here, we’re trying to predict whether a heart is behaving normally or abnormally by classifying the sounds it makes.

The sounds come in the form of audio signals.  Rather than training on the raw signals, we can engineer features and then use those values to train a model.  

Recently, deep learning approaches are becoming popular, as they require less manual feature engineering. Instead, the features are learned as part of the training process.  While this has often shown very promising results, deep learning models require more data, take longer to train, and the resulting model is typically less interpretable than if you were to manually engineer the features.

The features we used to classify heart sounds come from the signal processing field.  We calculated things such as skewness, kurtosis, and dominant frequencies.  These calculations extract characteristics that make it easier for the model to distinguish between an abnormal heart sound and a normal one.

So what other features do people use?  Many use traditional statistical techniques like mean, median, and mode, as well as basic things like counting the number of times something happens.

Lots of data has a timestamp associated with it. There are a number of features you can extract from a timestamp that might improve model performance.  What was the month, or day of week, or hour of the day?  Was it a weekend or a holiday?  Such features play a big role in determining human behavior, for example, if you were trying to predict how much electricity people use.

Another class of feature engineering has to do with text data.  Counting the number of times certain words occur in a text is one technique, which is often combined with normalization techniques like term-frequency-inverse-document-frequency.  Word2vec, in which words are converted to a high-dimensional vector representation, is another popular feature engineering technique for text.

The last class of techniques I’ll talk about has to do with images.  Images contain lots of information, so you often need to extract the important parts. Traditional techniques calculate the histogram of colors or apply transforms such as the Haar wavelet.  More recently, researchers have started using convolutional neural networks to extract features from images.

Depending on the type of data you’re working with, it may make sense to use a variety of the techniques we’ve discussed. Feature engineering is a trial and error process.  The only way to know if a feature is any good is to add it to a model and check if it improves the results.

To wrap up, that was a brief explanation of feature engineering. We have many more examples on our site, so check them out.

 

Related Products

  • Statistics and Machine Learning Toolbox

Learn More

Feature Extraction for Signals
Feature Extraction for Images
Text Feature Extraction
Statistical Feature Extraction
Related Information
MATLAB for Machine Learning

Feedback

Featured Product

Statistics and Machine Learning Toolbox

  • Request Trial
  • Get Pricing

Up Next:

Use ROC curves to assess classification models. Walk through several examples that illustrate what ROC curves are and why you’d use them.  
4:43
Part 2: ROC Curves
View full series (4 Videos)

Related Videos:

34:34
Machine Learning Made Easy
5:36
Machine Learning for Predictive Modelling (Highlights)
44:37
Machine Learning for Predictive Modelling
41:25
Machine Learning with MATLAB
34:31
Machine Learning with MATLAB: Getting Started with...

View more related videos

MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web site

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • Contattaci
  • Software di prova

Scopri i nostri prodotti

  • MATLAB
  • Simulink
  • Software per studenti​
  • Supporto hardware
  • File Exchange

Prova o Acquista

  • Download
  • Software di prova
  • Contattaci
  • Prezzi e licenze
  • Come acquistare

Impara ad utilizzare i nostri prodotti

  • Documentazione
  • Tutorial
  • Esempi
  • Video e Webinar
  • Formazione

Ricevi supporto tecnico

  • Aiuto all'installazione
  • Risposte​
  • Consulenza
  • License Center
  • Contatta l'assistenza

Informazioni su MathWorks

  • Lavora con noi
  • Sala stampa
  • Missione sociale​
  • Contattaci
  • Informazioni su MathWorks

MathWorks

Accelerating the pace of engineering and science

MathWorks è leader nello sviluppo di software per il calcolo matematico per ingegneri e ricercatori

Scopri…

  • Select a Web Site United States
  • Brevetti
  • Marchi
  • Informativa sulla privacy
  • Antipirateria
  • Stato dell'applicazione

© 1994-2021 The MathWorks, Inc.

  • Facebook
  • Twitter
  • Instagram
  • YouTube
  • LinkedIn
  • RSS

Unisciti alla discussione