audioFeatureExtractor

Streamline audio feature extraction

expand all in page

Description

audioFeatureExtractor encapsulates multiple audio feature extractors into a streamlined and modular implementation.

Creation

Syntax

aFE = audioFeatureExtractor()

aFE = audioFeatureExtractor(Name=Value)

Description

aFE = audioFeatureExtractor() creates an audio feature extractor with default property values.

aFE = audioFeatureExtractor(Name=Value) specifies nondefault properties for aFE using one or more name-value arguments.

example

Properties

expand all

Main Properties

`Window` — Analysis window
`hamming(1024,"periodic")` (default) | real vector

Analysis window, specified as a real vector.

Data Types: single | double

`OverlapLength` — Overlap length of adjacent analysis windows
`512` (default) | integer in the range [0, `numel(Window)`)

Overlap length of adjacent analysis windows, specified as an integer in the range [0, numel(Window)).

Data Types: single | double

`FFTLength` — FFT length
`[]` (default) | positive integer

FFT length, specified as an integer. The default value of [] means that the FFT length is equal to the window length numel(Window).

Data Types: single | double

`SampleRate` — Input sample rate (Hz)
`44100` (default) | positive scalar

Input sample rate in Hz, specified as a positive scalar.

Data Types: single | double

`SpectralDescriptorInput` — Input to spectral descriptors
`"linearSpectrum"` (default) | `"melSpectrum"` | `"barkSpectrum"` | `"erbSpectrum"`

Input to spectral descriptors, specified as "linearSpectrum", "melSpectrum", "barkSpectrum", or "erbSpectrum".

Spectral descriptors affected by this property are:

The spectrum input to the spectral descriptors is the same as output from the corresponding feature:

For example, if you set SpectralDescriptorInput to "barkSpectrum", and spectralCentroid to true, then aFE returns the centroid of the default Bark spectrum.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
aFE = audioFeatureExtractor(SampleRate=fs, ...
                            SpectralDescriptorInput="barkSpectrum", ...
                            spectralCentroid=true);
barkSpectralCentroid = extract(aFE,audioIn);

If you specify a nondefault barkSpectrum using setExtractorParameters, then the nondefault Bark spectrum is the input to the spectral descriptors. For example, if you call setExtractorParameters(aFE,"barkSpectrum",NumBands=40), then aFE returns the centroid of a 40-band Bark spectrum.

setExtractorParameters(aFE,"barkSpectrum",NumBands=40)
bark40SpectralCentroid = extract(aFE,audioIn);

Data Types: char | string

`FeatureVectorLength` — Number of features output from extract
Read-only: positive integer

This property is read-only.

Total number of features output from extract for the current object configuration, specified as a positive integer. FeatureVectorLength is equal to the second dimension of the output from the extract function.

Data Types: single | double

Features to Extract

`linearSpectrum` — Extract linear spectrum
`false` (default) | `true`

Extract the one-sided linear spectrum, specified as true or false.

To set parameters of the linear spectrum extraction, use setExtractorParameters:

setExtractorParameters(aFE,"linearSpectrum",Name=Value)

Settable parameters for the linear spectrum extraction are:

FrequencyRange –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
SpectrumType –– Spectrum type, specified as "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
WindowNormalization –– Apply window normalization, specified as true or false. If unspecified, WindowNormalization defaults to true.

Data Types: logical

`melSpectrum` — Extract mel spectrum
`false` (default) | `true`

Extract the one-sided mel spectrum, specified as true or false.

To set parameters of the mel spectrum extraction, use setExtractorParameters:

setExtractorParameters(aFE,"melSpectrum",Name=Value)

Settable parameters for the mel spectrum extraction are:

FrequencyRange –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
SpectrumType –– Spectrum type, specified as "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
NumBands –– Number of mel bands, specified as an integer. If unspecified, NumBands defaults to 32.
FilterBankNormalization –– Normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". If unspecified, FilterBankNormalization defaults to "bandwidth".
WindowNormalization –– Apply window normalization, specified as true or false. If unspecified, WindowNormalization defaults to true.
FilterBankDesignDomain –– Domain in which the filter bank is designed, specified as either "linear" or "warped". If unspecified, FilterBankDesignDomain defaults to "linear".
MelStyle –– Style of the mel scale used, specified as either "oshaughnessy" or "slaney". If unspecified, MelStyle defaults to "oshaughnessy".
ApplyLog –– Apply base 10 logarithm to the auditory spectrum, specified as true or false. If unspecified, ApplyLog defaults to false.

Data Types: logical

`barkSpectrum` — Extract Bark spectrum
`false` (default) | `true`

Extract the one-sided Bark spectrum, specified as true or false.

To set parameters of the Bark spectrum extraction, use setExtractorParameters:

setExtractorParameters(aFE,"barkSpectrum",Name=Value)

Settable parameters for the Bark spectrum extraction are:

FrequencyRange –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
SpectrumType –– Spectrum type, specified as "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
NumBands –– Number of Bark bands, specified as an integer. If unspecified, NumBands defaults to 32.
FilterBankNormalization –– Normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". If unspecified, FilterBankNormalization defaults to "bandwidth".
WindowNormalization –– Apply window normalization, specified as true or false. If unspecified, WindowNormalization defaults to true.
FilterBankDesignDomain –– Domain in which the filter bank is designed, specified as either "linear" or "warped". If unspecified, FilterBankDesignDomain defaults to "linear".
ApplyLog –– Apply base 10 logarithm to the auditory spectrum, specified as true or false. If unspecified, ApplyLog defaults to false.

Data Types: logical

`erbSpectrum` — Extract ERB spectrum
`false` (default) | `true`

Extract the one-sided ERB spectrum, specified as true or false.

To set parameters of the ERB spectrum extraction, use setExtractorParameters:

setExtractorParameters(aFE,"erbSpectrum",Name=Value)

Settable parameters for the ERB spectrum extraction are:

FrequencyRange –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, FrequencyRange defaults to [0, SampleRate/2].
SpectrumType –– Spectrum type, specified as "power" or "magnitude". If unspecified, SpectrumType defaults to "power".
NumBands –– Number of ERB bands, specified as an integer. If unspecified, NumBands defaults to ceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1))).
FilterBankNormalization –– Normalization applied to bandpass filters, specified as "bandwidth", "area", or "none". If unspecified, FilterBankNormalization defaults to "bandwidth".
WindowNormalization –– Apply window normalization, specified as true or false. If unspecified, WindowNormalization defaults to true.
ApplyLog –– Apply base 10 logarithm to the auditory spectrum, specified as true or false. If unspecified, ApplyLog defaults to false.

Data Types: logical

`mfcc` — Extract mel-frequency cepstral coefficients (MFCC)
`false` (default) | `true`

Extract mel-frequency cepstral coefficients (MFCC), specified as true or false.

To set parameters of the MFCC extraction, use setExtractorParameters:

setExtractorParameters(aFE,"mfcc",Name=Value)

Settable parameters for the MFCC extraction are:

NumCoeffs –– Number of coefficients returned for each window, specified as a positive integer. If unspecified, NumCoeffs defaults to 13.
DeltaWindowLength –– Delta window length, specified as an odd integer greater than 2. If unspecified, DeltaWindowLength defaults to 9. This parameter affects the mfccDelta and mfccDeltaDelta features.
Rectification –– Type of nonlinear rectification, specified as "log" or "cubic-root".

The mel-frequency cepstral coefficients are calculated using the melSpectrum.

Data Types: logical

`mfccDelta` — Extract delta of MFCC
`false` (default) | `true`

Extract delta of MFCC, specified as true or false.

The delta MFCC is calculated based on the extracted MFCC. Parameters set on mfcc affect mfccDelta.

Data Types: logical

`mfccDeltaDelta` — Extract delta-delta of MFCC
`false` (default) | `true`

Extract delta-delta of MFCC, specified as true or false.

The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on mfcc affect mfccDeltaDelta.

Data Types: logical

`gtcc` — Extract gammatone cepstral coefficients (GTCC)
`false` (default) | `true`

Extract gammatone cepstral coefficients (GTCC), specified as true or false.

To set parameters of the GTCC extraction, use setExtractorParameters:

setExtractorParameters(aFE,"gtcc",Name=Value)

Settable parameters for the GTCC extraction are:

NumCoeffs –– Number of coefficients returned for each window, specified as a positive integer. If unspecified, NumCoeffs defaults to 13.
DeltaWindowLength –– Delta window length, specified as an odd integer greater than 2. If unspecified, DeltaWindowLength defaults to 9. This parameter affects the gtccDelta and gtccDeltaDelta features.

Rectification –– Type of nonlinear rectification, specified as "log" or "cubic-root".

The gammatone cepstral coefficients are calculated using the erbSpectrum.

Data Types: logical

`gtccDelta` — Extract delta of GTCC
`false` (default) | `true`

Extract delta of GTCC, specified as true or false.

The delta GTCC is calculated based on the extracted GTCC. Parameters set on gtcc affect gtccDelta.

Data Types: logical

`gtccDeltaDelta` — Extract delta-delta of GTCC
`false` (default) | `true`

Extract delta-delta of GTCC, specified as true or false.

The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on gtcc affect gtccDeltaDelta.

Data Types: logical

`spectralCentroid` — Extract spectral centroid
`false` (default) | `true`

Extract spectral centroid, specified as true or false.

The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralCrest` — Extract spectral crest
`false` (default) | `true`

Extract spectral crest, specified as true or false.

The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralDecrease` — Extract spectral decrease
`false` (default) | `true`

Extract spectral decrease, specified as true or false.

The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralEntropy` — Extract spectral entropy
`false` (default) | `true`

Extract spectral entropy, specified as true or false.

The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralFlatness` — Extract spectral flatness
`false` (default) | `true`

Extract spectral flatness, specified as true or false.

The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralFlux` — Extract spectral flux
`false` (default) | `true`

Extract spectral flux, specified as true or false.

The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral flux extraction, use setExtractorParameters:

setExtractorParameters(aFE,"spectralFlux",Name=Value)

Settable parameters for the spectral flux extraction are:

NormType –– Norm type used to calculate the spectral flux, specified as 1 or 2. If unspecified, NormType defaults to 2.

Data Types: logical

`spectralKurtosis` — Extract spectral kurtosis
`false` (default) | `true`

Extract spectral kurtosis, specified as true or false.

The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralRolloffPoint` — Extract spectral rolloff point
`false` (default) | `true`

Extract spectral rolloff point, specified as true or false.

The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral rolloff point extraction, use setExtractorParameters:

setExtractorParameters(aFE,"spectralRolloffPoint",Name=Value)

Settable parameters for the spectral flux extraction are:

Threshold –– Threshold of the rolloff point, specified as a scalar in the range (0, 1). If unspecified, Threshold defaults to 0.95.

Data Types: logical

`spectralSkewness` — Extract spectral skewness
`false` (default) | `true`

Extract spectral skewness, specified as true or false.

The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralSlope` — Extract spectral slope
`false` (default) | `true`

Extract spectral slope, specified as true or false.

The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`spectralSpread` — Extract spectral spread
`false` (default) | `true`

Extract spectral spread, specified as true or false.

The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: logical

`pitch` — Extract pitch
`false` (default) | `true`

Extract pitch, specified as true or false.

To set parameters of the pitch extraction, use setExtractorParameters:

setExtractorParameters(aFE,"pitch",Name=Value)

Settable parameters for the pitch extraction are:

Method –– Method used to calculate the pitch, specified as "PEF", "NCF", "CEP", "LHS", or "SRH". If unspecified, Method defaults to "NCF". For a description of available pitch extraction methods, see pitch.
Range –– Range within to search for the pitch in Hz, specified as a two-element row vector of increasing values. If unspecified, Range defaults to [50,400].
MedianFilterLength –– Median filter length used to smooth pitch estimates over time, specified as a positive integer. If unspecified, MedianFilterLength defaults to 1 (no median filtering).

Data Types: logical

`harmonicRatio` — Extract harmonic ratio
`false` (default) | `true`

Extract harmonic ratio, specified as true or false.

Data Types: logical

`zerocrossrate` — Extract zero-crossing rate
`false` (default) | `true`

Extract zero-crossing rate, specified as true or false.

To set parameters of the zero-crossing rate extraction, use setExtractorParameters:

setExtractorParameters(aFE,"zerocrossrate",Name=Value)

Settable parameters for the zero-crossing rate extraction are:

Method –– Method for computing the zero-crossing rate, specified as "difference" or "comparison". If unspecified, Method, defaults to "difference". For more information, see zerocrossrate.
Level –– Signal level for which the crossing rate is computed, specified as a real scalar. audioFeatureExtractor subtracts the Level value from the signal and then finds the zero crossings. If unspecified, Level defaults to 0.
Threshold –– Threshold above and below the Level value over which the crossing rate is computed, specified as a real scalar. audioFeatureExtractor sets all the values of the input in the range [–Threshold, Threshold] to 0 and then finds the zero crossings. If unspecified, Threshold defaults to 0.
TransitionEdge — Transitions to include when counting zero crossings, specified as "falling", "rising", or "both". If you specify "falling", only negative-going transitions are counted. If you specify "rising", only positive-going transitions are counted. If unspecified, TransitionEdge defaults to "both".
ZeroPositive — Sign convention, specified as a logical scalar. If you specify ZeroPositive as true, then 0 is considered positive. If you specify ZeroPositive as false, then audioFeatureExtractor considers 0, –1, and +1 to have distinct signs following the convention of the sign function. If unspecified, ZeroPositive defaults to false.

Data Types: logical

`shortTimeEnergy` — Extract short-time energy
`false` (default) | `true`

Extract short-time energy, specified as true or false. The short-time energy is computed using

sTE = sum(xbw.^2,1),

where xbw is the buffered and windowed signal.

Data Types: logical

Object Functions

`extract`	Extract audio features
`setExtractorParameters`	Set nondefault parameter values for individual feature extractors
`info`	Output mapping and individual feature extractor parameters
`generateMATLABFunction`	Create MATLAB function compatible with C/C++ code generation
`plotFeatures`	Plot extracted audio features

Examples

collapse all

Extract Multiple Audio Features

Open Live Script

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. Use a 30 ms analysis window with 20 ms overlap.

aFE = audioFeatureExtractor( ...
    SampleRate=fs, ...
    Window=hamming(round(0.03*fs),"periodic"), ...
    OverlapLength=round(0.02*fs), ...
    mfcc=true, ...
    mfccDelta=true, ...
    mfccDeltaDelta=true, ...
    pitch=true, ...
    spectralCentroid=true, ...
    zerocrossrate=true, ...
    shortTimeEnergy=true);

Call extract to extract the audio features from the audio signal.

features = extract(aFE,audioIn);

Use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

idx = info(aFE)

idx = struct with fields:
                mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
           mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
      mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
    spectralCentroid: 40
               pitch: 41
       zerocrossrate: 42
     shortTimeEnergy: 43

Plot the detected pitch over time.

t = linspace(0,size(audioIn,1)/fs,size(features,1));
plot(t,features(:,idx.pitch))
title("Pitch")
xlabel("Time (s)")
ylabel("Frequency (Hz)")

Figure contains an axes object. The axes object with title Pitch, xlabel Time (s), ylabel Frequency (Hz) contains an object of type line.

Plot the zero-crossing rate over time.

plot(t,features(:,idx.zerocrossrate))
title("Zero-Crossing Rate")
xlabel("Time (s)")

Figure contains an axes object. The axes object with title Zero-Crossing Rate, xlabel Time (s) contains an object of type line.

Plot the short-time energy over time.

plot(t,features(:,idx.shortTimeEnergy))
title("Short-Time Energy")
xlabel("Time (s)")

Figure contains an axes object. The axes object with title Short-Time Energy, xlabel Time (s) contains an object of type line.

Extract Features from Data Set

Open Live Script

Create an audio datastore that points to audio samples included with Audio Toolbox™.

folder = fullfile(matlabroot,"toolbox","audio","samples");
ads = audioDatastore(folder);

Create an audioFeatureExtractor object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.

aFE = audioFeatureExtractor(SampleRate=44.1e3, ...
    melSpectrum=true, ...
    barkSpectrum=true, ...
    erbSpectrum=true, ...
    linearSpectrum=true);

Call extract to extract the features from each audio file in the datastore. Specify SampleRateMismatchRule as "resample" to resample the audio files in the datastore if they do not match 44.1 kHz, the sample rate of the audioFeatureExtractor object. If you have Parallel Computing Toolbox™, specify UseParallel as true to read the files and extract the features in parallel.

specs = extract(aFE,ads,SampleRateMismatchRule="resample",UseParallel=true);

Starting parallel pool (parpool) using the 'Processes' profile ...
17-Dec-2024 09:28:59: Job Queued. Waiting for parallel pool job with ID 3 to start ...
Connected to parallel pool with 4 workers.

The specs variable is a numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numFiles = numel(specs)

numFiles = 
39

[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})

numHops1 = 
1053

numFeaturesFile1 = 
620

numChanelsFile1 = 
1

[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})

numHops2 = 
1724

numFeaturesFile2 = 
620

numChanelsFile2 = 
4

Visualize Extracted Audio Features

Open Live Script

Use plotFeatures to visualize audio features extracted with an audioFeatureExtractor object.

Read in an audio signal from a file.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor object that extracts the gammatone cepstral coefficients (GTCCs) and the delta of the GTCCs. Set the SampleRate property to the sample rate of the audio signal, and use the default values for the other properties.

afe = audioFeatureExtractor(SampleRate=fs,gtcc=true,gtccDelta=true);

Plot the features extracted from the audio signal.

plotFeatures(afe,audioIn)

Figure audioFeatureExtractor contains 2 axes objects and another object of type uipanel. Axes object 1 with title GTCC, xlabel Time (s), ylabel Coefficient contains an object of type image. Axes object 2 with title GTCC Delta, xlabel Time (s), ylabel Coefficient contains an object of type image.

Algorithms

The audioFeatureExtractor creates a feature extraction pipeline based on your selected features. To reduce computations, audioFeatureExtractor reuses intermediary representations and outputs some intermediate representations as features.

For example, to create an object that extracts the centroid of the Bark spectrum, the flux of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify the audioFeatureExtractor as follows.

aFE = audioFeatureExtractor( ...
     SpectralDescriptorInput="barkSpectrum", ...
     spectralCentroid=true, ...
     spectralFlux=true, ...
     pitch=true, ...
     harmonicRatio=true, ...
     mfccDeltaDelta=true)

aFE = 

  audioFeatureExtractor with properties:

   Properties
                     Window: [1024×1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'

   Enabled Features
     mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy
     spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread


   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

This configuration corresponds to the highlighted feature extraction pipeline.

Note

Because audioFeatureExtractor reuses intermediary representations, the features output from audioFeatureExtractor might not correspond with the default configuration of features output by corresponding individual feature extractors.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

You cannot generate code directly from audioFeatureExtractor. You can generate C/C++ code from the function returned by generateMATLABFunction.
Functions returned by generateMATLABFunction that compute an auditory spectrum (mel, Bark, ERB) support optimized code generation using single instruction, multiple data (SIMD) instructions. For more information about SIMD code generation, see Generate SIMD Code from MATLAB Functions for Intel Platforms (MATLAB Coder).
zerocrossrate code generation does not support disabling dynamic memory allocation when the input is multichannel.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2019b

expand all

R2024b: `setExtractorParams` object function and `Normalization` parameter have been removed

The setExtractorParams object function has been removed. Use setExtractorParameters instead.

The Normalization parameter of the melSpectrum, barkSpectrum, and erbSpectrum features has been removed. Use the FilterBankNormalization parameter for these features instead.

R2024a: Apply logarithm to auditory spectrum

Use setExtractorParameters to set the ApplyLog parameter of the melSpectrum, barkSpectrum, and erbSpectrum features to true to apply a base 10 logarithm to the auditory spectrum.

R2024a: `Normalization` parameter of auditory spectrum features will be removed

Using the Normalization parameter of the melSpectrum, barkSpectrum, and erbSpectrum issues a warning that it will be removed in a future release. Use the FilterBankNormalization parameter for these features instead.

R2023b: Support for Slaney-style mel scale

Use setExtractorParameters to set the MelStyle parameter of the melSpectrum feature to "slaney" to use the Slaney-style mel scale.

R2023a: Generate optimized C/C++ code for computing auditory spectrum

Functions returned by generateMATLABFunction that compute an auditory spectrum (mel, Bark, ERB) support optimized C/C++ code generation using single instruction, multiple data (SIMD) instructions.

R2022b: Visualize extracted features

Use the plotFeatures object function to visualize extracted audio features.

R2020b: Computation of deltas and delta-deltas

The audioDelta function is now used to compute mfccDelta, mfccDeltaDelta, gtccDelta, and gtccDeltaDelta. The audioDelta algorithm has a different startup behavior than the previous algorithm. The default window length used to compute the deltas has changed from 2 to 9. A delta window length of 2 is no longer supported.

audioFeatureExtractor

Description

Creation

Syntax

Description

Properties

Main Properties

Window — Analysis window hamming(1024,"periodic") (default) | real vector

OverlapLength — Overlap length of adjacent analysis windows 512 (default) | integer in the range [0, numel(Window))

FFTLength — FFT length [] (default) | positive integer

SampleRate — Input sample rate (Hz) 44100 (default) | positive scalar

SpectralDescriptorInput — Input to spectral descriptors "linearSpectrum" (default) | "melSpectrum" | "barkSpectrum" | "erbSpectrum"

FeatureVectorLength — Number of features output from extract Read-only: positive integer

Features to Extract

linearSpectrum — Extract linear spectrum false (default) | true

melSpectrum — Extract mel spectrum false (default) | true

barkSpectrum — Extract Bark spectrum false (default) | true

erbSpectrum — Extract ERB spectrum false (default) | true

mfcc — Extract mel-frequency cepstral coefficients (MFCC) false (default) | true

mfccDelta — Extract delta of MFCC false (default) | true

mfccDeltaDelta — Extract delta-delta of MFCC false (default) | true

gtcc — Extract gammatone cepstral coefficients (GTCC) false (default) | true

gtccDelta — Extract delta of GTCC false (default) | true

gtccDeltaDelta — Extract delta-delta of GTCC false (default) | true

spectralCentroid — Extract spectral centroid false (default) | true

spectralCrest — Extract spectral crest false (default) | true

spectralDecrease — Extract spectral decrease false (default) | true

spectralEntropy — Extract spectral entropy false (default) | true

spectralFlatness — Extract spectral flatness false (default) | true

spectralFlux — Extract spectral flux false (default) | true

spectralKurtosis — Extract spectral kurtosis false (default) | true

spectralRolloffPoint — Extract spectral rolloff point false (default) | true

spectralSkewness — Extract spectral skewness false (default) | true

spectralSlope — Extract spectral slope false (default) | true

spectralSpread — Extract spectral spread false (default) | true

pitch — Extract pitch false (default) | true

harmonicRatio — Extract harmonic ratio false (default) | true

zerocrossrate — Extract zero-crossing rate false (default) | true

shortTimeEnergy — Extract short-time energy false (default) | true

Object Functions

Examples

Extract Multiple Audio Features

Extract Features from Data Set

Visualize Extracted Audio Features

Algorithms

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024b: setExtractorParams object function and Normalization parameter have been removed

R2024a: Apply logarithm to auditory spectrum

R2024a: Normalization parameter of auditory spectrum features will be removed

R2023b: Support for Slaney-style mel scale

R2023a: Generate optimized C/C++ code for computing auditory spectrum

R2022b: Visualize extracted features

R2020b: Computation of deltas and delta-deltas

See Also

`Window` — Analysis window
`hamming(1024,"periodic")` (default) | real vector

`OverlapLength` — Overlap length of adjacent analysis windows
`512` (default) | integer in the range [0, `numel(Window)`)

`FFTLength` — FFT length
`[]` (default) | positive integer

`SampleRate` — Input sample rate (Hz)
`44100` (default) | positive scalar

`SpectralDescriptorInput` — Input to spectral descriptors
`"linearSpectrum"` (default) | `"melSpectrum"` | `"barkSpectrum"` | `"erbSpectrum"`

`FeatureVectorLength` — Number of features output from extract
Read-only: positive integer

`linearSpectrum` — Extract linear spectrum
`false` (default) | `true`

`melSpectrum` — Extract mel spectrum
`false` (default) | `true`

`barkSpectrum` — Extract Bark spectrum
`false` (default) | `true`

`erbSpectrum` — Extract ERB spectrum
`false` (default) | `true`

`mfcc` — Extract mel-frequency cepstral coefficients (MFCC)
`false` (default) | `true`

`mfccDelta` — Extract delta of MFCC
`false` (default) | `true`

`mfccDeltaDelta` — Extract delta-delta of MFCC
`false` (default) | `true`

`gtcc` — Extract gammatone cepstral coefficients (GTCC)
`false` (default) | `true`

`gtccDelta` — Extract delta of GTCC
`false` (default) | `true`

`gtccDeltaDelta` — Extract delta-delta of GTCC
`false` (default) | `true`

`spectralCentroid` — Extract spectral centroid
`false` (default) | `true`

`spectralCrest` — Extract spectral crest
`false` (default) | `true`

`spectralDecrease` — Extract spectral decrease
`false` (default) | `true`

`spectralEntropy` — Extract spectral entropy
`false` (default) | `true`

`spectralFlatness` — Extract spectral flatness
`false` (default) | `true`

`spectralFlux` — Extract spectral flux
`false` (default) | `true`

`spectralKurtosis` — Extract spectral kurtosis
`false` (default) | `true`

`spectralRolloffPoint` — Extract spectral rolloff point
`false` (default) | `true`

`spectralSkewness` — Extract spectral skewness
`false` (default) | `true`

`spectralSlope` — Extract spectral slope
`false` (default) | `true`

`spectralSpread` — Extract spectral spread
`false` (default) | `true`

`pitch` — Extract pitch
`false` (default) | `true`

`harmonicRatio` — Extract harmonic ratio
`false` (default) | `true`

`zerocrossrate` — Extract zero-crossing rate
`false` (default) | `true`

`shortTimeEnergy` — Extract short-time energy
`false` (default) | `true`

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2024b: `setExtractorParams` object function and `Normalization` parameter have been removed

R2024a: `Normalization` parameter of auditory spectrum features will be removed