audioFeatureExtractor

Streamline audio feature extraction

Description

audioFeatureExtractor encapsulates multiple audio feature extractors into a streamlined and modular implementation.

Creation

Description

aFE = audioFeatureExtractor() creates an audio feature extractor with default property values.

example

aFE = audioFeatureExtractor(Name,Value) specifies nondefault properties for aFE using one or more name-value pair arguments.

Properties

expand all

Main Properties

Analysis window, specified as a real vector.

Data Types: single | double

Overlap length of adjacent analysis windows, specified as an integer in the range [0, numel(Window)).

Data Types: single | double

FFT length, specified as an integer. The default, [], means that the FFT length is equal to the window length, (numel(Window)).

Data Types: single | double

Input sample rate in Hz, specified as a nonnegative scalar.

Data Types: single | double

Input to spectral descriptors, specified as "linearSpectrum", "melSpectrum", "barkSpectrum", or "erbSpectrum".

Spectral descriptors affected by this property are:

The spectrum input to the spectral descriptors is the same as output from the corresponding feature:

For example, if you set "SpectralDescriptorInput" to "barkSpectrum", and "spectralCentroid" to true, then aFE returns the centroid of the default Bark spectrum.

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
aFE = audioFeatureExtractor("SampleRate",fs, ...
                            "SpectralDescriptorInput","barkSpectrum", ...
                            "spectralCentroid",true);
barkSpectralCentroid = extract(aFE,audioIn);
If you specify a nondefault barkSpectrum using setExtractorParams, then the nondefault Bark spectrum is the input to the spectral descriptors. For example, if you call setExtractorParams(aFE,"barkSpectrum","NumBands",40), then aFE returns the centroid of an 40-band Bark spectrum.

setExtractorParams(aFE,"barkSpectrum","NumBands",40)
bark40SpectralCentroid = extract(aFE,audioIn);

Data Types: char | string

Features to Extract

Extract the one-sided linear spectrum, specified as true or false.

To set parameters of the linear spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"linearSpectrum","Name",Value)
Settable parameters for the linear spectrum extraction are:

  • "FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].

  • "SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".

Data Types: logical

Extract the one-sided mel spectrum, specified as true or false.

To set parameters of the mel spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"melSpectrum","Name",Value)
Settable parameters for the Bark spectrum extraction are:

  • "FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].

  • "SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".

  • "NumBands" –– Number of mel bands, specified as the comma-separated pair consisting of "NumBands" and an integer. If unspecified, NumBands defaults to 32.

  • "Normalization" –– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of "Normalization" and "bandwidth" or "area". If unspecified, Normalization defaults to "bandwidth".

Data Types: logical

Extract the one-sided Bark spectrum, specified as true or false.

To set parameters of the Bark spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"barkSpectrum","Name",Value)
Settable parameters for the Bark spectrum extraction are:

  • "FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].

  • "SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".

  • "NumBands" –– Number of Bark bands, specified as the comma-separated pair consisting of "NumBands" and an integer. If unspecified, NumBands defaults to 32.

  • "Normalization" –– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of "Normalization" and "bandwidth" or "area". If unspecified, Normalization defaults to "bandwidth".

Data Types: logical

Extract the one-sided ERB spectrum, specified as true or false.

To set parameters of the ERB spectrum extraction, use setExtractorParams:

setExtractorParams(aFE,"erbSpectrum","Name",Value)
Settable parameters for the ERB spectrum extraction are:

  • "FrequencyRange" –– Frequency range of the extracted spectrum in Hz, specified as the comma-separated pair consisting of "FrequencyRange" and a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].

  • "SpectrumType" –– Spectrum type, specified as the comma-separated pair consisting of "SpectrumType" and "power" or "magnitude". If unspecified, SpectrumType defaults to "power".

  • "NumBands" –– Number of ERB bands, specified as the comma-separated pair consisting of "NumBands" and an integer. If unspecified, NumBands defaults to ceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1))).

  • "Normalization" –– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of "Normalization" and "bandwidth" or "area". If unspecified, Normalization defaults to "bandwidth".

Data Types: logical

Extract mel-frequency cepstral coefficients (MFCC), specified as true or false.

To set parameters of the MFCC extraction, use setExtractorParams:

setExtractorParams(aFE,"mfcc","Name",Value)
Settable parameters for the MFCC extraction are:

  • "NumCoeffs" –– Number of coefficients returned for each window, specified as a the comma-separated pair consisting of "NumCoeffs" and a positive integer. If unspecified, NumCoeffs defaults to 13.

  • "DeltaWindowLength" –– Delta window length, specified as the comma-separated pair consisting of "DeltaWindowLength" and 2 or an odd integer. If unspecified, DeltaWindowLength defaults to 2. This parameter affects the mfccDelta and mfccDeltaDelta features.

The mel-frequency cepstral coefficients are calculated using the melSpectrum.

Data Types: logical

Extract delta of MFCC, specified as true or false.

The delta MFCC is calculated based on the extracted MFCC. Parameters set on mfcc affect mfccDelta.

Data Types: logical

Extract delta-delta of MFCC, specified as true or false.

The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on mfcc affect mfccDeltaDelta.

Data Types: logical

Extract gammatone cepstral coefficients (GTCC), specified as true or false.

To set parameters of the GTCC extraction, use setExtractorParams:

setExtractorParams(aFE,"gtcc","Name",Value)
Settable parameters for the GTCC extraction are:

  • "NumCoeffs" –– Number of coefficients returned for each window, specified as a the comma-separated pair consisting of "NumCoeffs" and a positive integer. If unspecified, NumCoeffs defaults to 13.

  • "DeltaWindowLength" –– Delta window length, specified as the comma-separated pair consisting of "DeltaWindowLength" and 2 or an odd integer. If unspecified, DeltaWindowLength defaults to 2. This parameter affects the gtccDelta and gtccDeltaDelta features.

The gammatone cepstral coefficients are calculated using the erbSpectrum.

Data Types: logical

Extract delta of GTCC, specified as true or false.

The delta GTCC is calculated based on the extracted GTCC. Parameters set on gtcc affect gtccDelta.

Data Types: logical

Extract delta-delta of GTCC, specified as true or false.

The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on gtcc affect gtccDeltaDelta.

Data Types: logical

Extract spectral centroid, specified as true or false.

The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral crest, specified as true or false.

The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral decrease, specified as true or false.

The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral entropy, specified as true or false.

The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral flatness, specified as true or false.

The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral flux, specified as true or false.

The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral flux extraction, use setExtractorParams:

setExtractorParams(aFE,"spectralFlux","Name",Value)
Settable parameters for the spectral flux extraction are:

  • "NormType" –– Norm type used to calculate the spectral flux, specified as the comma-separated pair consisting of "NormType" and a 1 or 2. If unspecified, NormType defaults to 2.

Data Types: logical

Extract spectral kurtosis, specified as true or false.

The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral rolloff point, specified as true or false.

The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral rolloff point extraction, use setExtractorParams:

setExtractorParams(aFE,"spectralRolloffPoint","Name",Value)
Settable parameters for the spectral flux extraction are:

  • "Threshold" –– Threshold of the rolloff point, specified as the comma-separated pair consisting of "Threshold" and a scalar in the range (0, 1). If unspecified, Threshold defaults to 0.95.

Data Types: logical

Extract spectral skewness, specified as true or false.

The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral slope, specified as true or false.

The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract spectral spread, specified as true or false.

The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

Extract pitch, specified as true or false.

To set parameters of the pitch extraction, use setExtractorParams:

setExtractorParams(aFE,"pitch","Name",Value)
Settable parameters for the pitch extraction are:

  • "Method" –– Method used to calculate the pitch, specified as the comma-separated pair consisting of "Method" and "PEF", "NCF", "CEP", "LHS", or "SRH". If unspecified, Method defaults to "NCF". For a description of available pitch extraction methods, see pitch.

  • "Range" –– Range within to search for the pitch in Hz, specified as the comma-separated pair consisting of "Range" and a two-element row vector of increasing values. If unspecified, Range defaults to [50,400].

  • "MedianFilterLength" –– Median filter length used to smooth pitch estimates over time, specified as the comma-separated pair consisting of "MedianFilterLength" and a positive integer. If unspecified, MedianFilterLength defaults to 1 (no median filtering).

Data Types: logical

Extract harmonic ratio, specified as true or false.

Data Types: logical

Object Functions

extractExtract audio features
setExtractorParamsSet nondefault parameter values for individual feature extractors
infoOutput mapping and individual feature extractor parameters

Examples

collapse all

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, and spectral centroid of an audio signal. Use a 30 ms analysis window with 20 ms overlap.

aFE = audioFeatureExtractor( ...
    "SampleRate",fs, ...
    "Window",hamming(round(0.03*fs),"periodic"), ...
    "OverlapLength",round(0.02*fs), ...
    "mfcc",true, ...
    "mfccDelta",true, ...
    "mfccDeltaDelta",true, ...
    "pitch",true, ...
    "spectralCentroid",true);

Call extract to extract the audio features from the audio signal.

features = extract(aFE,audioIn);

Use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

idx = info(aFE)
idx = struct with fields:
                mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
           mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
      mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
    spectralCentroid: 40
               pitch: 41

Plot the detected pitch over time.

t = linspace(0,size(audioIn,1)/fs,size(features,1));
plot(t,features(:,idx.pitch))
title('Pitch')
xlabel('Time (s)')
ylabel('Frequency (Hz)')

Create an audio datastore that points to audio samples included with Audio Toolbox®.

folder = fullfile(matlabroot,'toolbox','audio','samples');
ads = audioDatastore(folder);

Find all files that correspond to a sample rate of 44.1 kHz and then subset the datastore.

keepFile = cellfun(@(x)contains(x,'44p1'),ads.Files);
ads = subset(ads,keepFile);

Convert the data to a tall array. tall arrays are evaluated only when you request them explicitly using gather. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple machines. The audio data is represented as an M-by-1 tall cell array, where M is the number of files in the audio datastore.

adsTall = tall(ads)
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).

adsTall =

  Mx1 tall cell array

    { 539648x1 double}
    { 227497x1 double}
    {   8000x1 double}
    { 685056x1 double}
    { 882688x2 double}
    {1116283x2 double}
    { 505726x2 double}
    {3195904x2 double}
        :         :
        :         :

Create an audioFeatureExtractor object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.

aFE = audioFeatureExtractor('SampleRate',44.1e3, ...
    'melSpectrum',true, ...
    'barkSpectrum',true, ...
    'erbSpectrum',true, ...
    'linearSpectrum',true);

Define a cellfun function so that audio features are extracted from each cell of the tall array. Call gather to evaluate the tall array.

specsTall = cellfun(@(x)extract(aFE,x),adsTall,"UniformOutput",false);
specs = gather(specsTall);
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 14 sec
Evaluation completed in 14 sec

The specs variable returned from gather is an numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numFiles = numel(specs)
numFiles = 12
[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})
numHops1 = 1053
numFeaturesFile1 = 620
numChanelsFile1 = 1
[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})
numHops2 = 443
numFeaturesFile2 = 620
numChanelsFile2 = 1

Algorithms

The audioFeatureExtractor creates a feature extraction pipeline based on your selected features. To reduce computations, audioFeatureExtractor reuses intermediary representations. Some intermediate representations can be output as features:

For example, to create an object that extracts the centroid of the Bark spectrum, the flux of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify the audioFeatureExtractor as:

 aFE = audioFeatureExtractor( ...
     "SpectralDescriptorInput","barkSpectrum", ...
     "spectralCentroid",true, ...
     "spectralFlux",true, ...
     "pitch",true, ...
     "harmonicRatio",true, ...
     "mfccDeltaDelta",true)
aFE = 

  audioFeatureExtractor with properties:

   Properties
                     Window: [1024×1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'

   Enabled Features
     mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy
     spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread


   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.
This configuration corresponds to the highlighted feature extraction pipeline:

Note

Because audioFeatureExtractor reuses intermediary representations, the features output from audioFeatureExtractor may not correspond with the default configuration of features output by corresponding individual feature extractors.

Introduced in R2019b