Extract gammatone cepstral coefficients, log-energy, delta, and delta-delta

specifies options using one or more `coeffs`

= gtcc(___,`Name,Value`

)`Name,Value`

pair arguments.

`[`

returns the delta, delta-delta, and location in samples corresponding to each window of
data. This output syntax can be used with any of the previous input syntaxes.`coeffs`

,`delta`

,`deltaDelta`

,`loc`

] = gtcc(___)

Get the gammatone cepstral coefficients for an audio file using default settings. Plot the results.

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); [coeffs,~,~,loc] = gtcc(audioIn,fs); t = loc./fs; plot(t,coeffs) xlabel('Time (s)') title('Gammatone Cepstral Coefficients') legend('logE','0','1','2','3','4','5','6','7','8','9','10','11','12', ... 'Location','northeastoutside')

Read in an audio file.

`[audioIn,fs] = audioread('Turbine-16-44p1-mono-22secs.wav');`

Calculate 20 GTCC using filters equally spaced on the ERB scale between `hz2erb(62.5)`

and `hz2erb(12000)`

. Calculate the coefficients using 50 ms windows with 25 ms overlap. Replace the 0th coefficient with the log-energy. Use time-domain filtering.

[coeffs,~,~,loc] = gtcc(audioIn,fs, ... 'NumCoeffs',20, ... 'FrequencyRange',[62.5,12000], ... 'WindowLength',round(0.05*fs), ... 'OverlapLength',round(0.025*fs), ... 'LogEnergy','Replace', ... 'FilterDomain','Time');

Plot the results.

t = loc./fs; plot(t,coeffs) xlabel('Time (s)') title('Gammatone Cepstral Coefficients') legend('logE','1','2','3','4','5','6','7','8','9','10','11','12','13', ... '14','15','16','17','18','19','Location','northeastoutside');

Read in an audio file and convert it to a frequency representation.

[audioIn,fs] = audioread("Rainbow-16-8-mono-114secs.wav"); win = hann(1024,"periodic"); S = stft(audioIn,"Window",win,"OverlapLength",512,"Centered",false);

To extract the gammatone cepstral coefficients, call `gtcc`

with the frequency-domain audio. Ignore the log-energy.

coeffs = gtcc(S,fs,"LogEnergy","Ignore");

In many applications, GTCC observations are converted to summary statistics for use in classification tasks. Plot probability density functions of each of the gammatone cepstral coefficients to observe their distributions.

nbins = 60; for i = 1:size(coeffs,2) figure histogram(coeffs(:,i),nbins,'Normalization','pdf') title(sprintf("Coefficient %d",i-1)) end

`audioIn`

— Input signalvector | matrix | 3-D array

Input signal, specified as a vector, matrix, or 3-D array.

If '`FilterDomain`

' is set to `'Frequency'`

(default), then `audioIn`

can be real or complex.

If

`audioIn`

is real, it is interpreted as a time-domain signal and must be a column vector or a matrix. Columns of the matrix are treated as independent audio channels.If

`audioIn`

is complex, it is interpreted as a frequency-domain signal. In this case,`audioIn`

must be an*L*-by-*M*-by-*N*array, where*L*is the number of DFT points,*M*is the number of individual spectrums, and*N*is the number of individual channels.

If '`FilterDomain`

' is set to `'Time'`

, then
`audioIn`

must be a real column vector or matrix. Columns of the
matrix are treated as independent audio channels.

**Data Types: **`single`

| `double`

**Complex Number Support: **Yes

`fs`

— Sample rate (Hz)positive scalar

Sample rate of the input signal in Hz, specified as a positive scalar.

**Data Types: **`single`

| `double`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

`coeffs = gtcc(audioIn,fs,'LogEnergy','Replace')`

returns
gammatone cepstral coefficients for the audio input signal sampled at `fs`

Hz. For each analysis window, the first coefficient in the `coeffs`

vector
is replaced with the log energy of the input signal.`'WindowLength'`

— Number of samples in analysis window`round(0.03*``fs`

)

(default) | positive scalar integer`'OverlapLength'`

— Number of samples overlapped between adjacent windows`round(0.02*``fs`

)

(default) | non-negative scalarNumber of samples overlapped between adjacent windows, specified as the
comma-separated pair consisting of `'OverlapLength'`

and an integer
in the range [0, `WindowLength`

). If unspecified,
`OverlapLength`

defaults to
`round(0.02*`

.`fs`

)

**Data Types: **`single`

| `double`

`'NumCoeffs'`

— Number of coefficients returned`13`

(default) | positive scalar integerNumber of coefficients returned for each window of data, specified as the
comma-separated pair consisting of `'NumCoeffs'`

and an integer in
the range [2, *v*]. *v* is the number of valid
passbands. If unspecified, `NumCoeffs`

defaults to
`13`

.

The number of valid passbands is defined as the number of ERB steps
(ERB_{N}) in the frequency range of the filter bank. The
frequency range of the filter bank is specified by
`FrequencyRange`

.

**Data Types: **`single`

| `double`

`'FilterDomain'`

— Domain in which to apply filtering`'Frequency'`

(default) | `'Time'`

Domain in which to apply filtering, specified as the comma-separated pair
consisting of `'FilterDomain'`

and `'Frequency'`

or
`'Time'`

. If unspecified, `FilterDomain`

defaults to `Frequency`

.

**Data Types: **`string`

| `char`

`'FrequencyRange'`

— Frequency range of gammatone filter bank (Hz)`[50 ``fs`

/2]

(default) | two-element row vectorFrequency range of gammatone filter bank in Hz, specified as the comma-separated
pair consisting of `'FrequencyRange'`

and a two-element row vector of
increasing values in the range [0, `fs`

/2]. If unspecified,
`FrequencyRange`

defaults to ```
[50,
```

`fs`

/2]

**Data Types: **`single`

| `double`

`'FFTLength'`

— Number of bins in DFT`WindowLength`

(default) | positive scalar integerNumber of bins used to calculate the DFT of windowed input samples, specified as
the comma-separated pair consisting of `'FFTLength'`

and a positive
scalar integer. If unspecified, `FFTLength`

defaults to
`WindowLength`

.

**Data Types: **`single`

| `double`

`'DeltaWindowLength'`

— Number of coefficients used to calculate delta and delta-delta`2`

(default) | odd integer greater than twoNumber of coefficients used to calculate the delta and the delta-delta values,
specified as the comma-separated pair consisting of
`'DeltaWindowLength'`

and two or an odd integer greater than two.
If unspecified, `DeltaWindowLength`

defaults to
`2`

.

If `DeltaWindowLength`

is set to `2`

, the
`delta`

is given by the difference between the current
coefficients and the previous coefficients.

If `DeltaWindowLength`

is set to an odd integer greater than
`2`

, the following equation defines their values:

The function uses a least-squares approximation of the local slope over a region
around the coefficients of the current analysis window. The delta cepstral values are
computed by fitting the cepstral coefficients of neighboring analysis windows
(*M* analysis windows before the current analysis window and
*M* analysis windows after the current analysis window) to a
straight line. For details, see [3].

**Data Types: **`single`

| `double`

`'LogEnergy'`

— Log energy usage`'Append'`

(default) | `'Replace'`

| `'Ignore'`

Log energy usage, specified as the comma-separated pair consisting of
`'LogEnergy'`

and `'Append'`

,
`'Replace'`

, or `'Ignore'`

. If unspecified,
`LogEnergy`

defaults to `Append`

.

`'Append'`

–– The function prepends the log energy to the coefficients vector. The length of the coefficients vector is 1 +`NumCoeffs`

.`'Replace'`

–– The function replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is`NumCoeffs`

.`'Ignore'`

–– The function does not calculate or return the log energy.

**Data Types: **`char`

| `string`

`coeffs`

— Gammatone cepstral coefficientsmatrix | array

Gammatone cepstral coefficients, returned as an
*L*-by-*M* matrix or an
*L*-by-*M*-by-*N* array, where:

*L*–– Number of analysis windows the audio signal is partitioned into. The input size,`WindowLength`

, and`OverlapLength`

control this dimension:

.*L*= floor((size(`audioIn`

,1) −`WindowLength`

))/(`WindowLength`

−`OverlapLength`

) + 1*M*–– Number of coefficients returned per frame. This value is determined by`NumCoeffs`

and`LogEnergy`

.When

`LogEnergy`

is set to:`'Append'`

–– The object prepends the log energy value to the coefficients vector. The length of the coefficients vector is 1 +`NumCoeffs`

.`'Replace'`

–– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is`NumCoeffs`

.`'Ignore'`

–– The object does not calculate or return the log energy. The length of the coefficients vector is`NumCoeffs`

.

*N*–– Number of input channels (columns). This value is`size(`

.`audioIn`

,2)

**Data Types: **`single`

| `double`

`delta`

— Change in coefficientsmatrix | array

Change in coefficients from one analysis window to another, returned as an
*L*-by-*M* matrix or an
*L*-by-*M*-by-*N* array. The
`delta`

array is the same size and data type as the
`coeffs`

array. See `coeffs`

for the definitions
of *L*, *M*, and *N*.

The function uses a least-squares approximation of the local slope over a region around the current time sample. For details, see [3].

**Data Types: **`single`

| `double`

`deltaDelta`

— Change in delta valuesmatrix | array

Change in `delta`

values, returned as an
*L*-by-*M* matrix or an
*L*-by-*M*-by-*N* array. The
`deltaDelta`

array is the same size and data type as the
`coeffs`

and `delta`

arrays. See
`coeffs`

for the definitions of *L*,
*M*, and *N*.

The function uses a least-squares approximation of the local slope over a region around the current time sample. For details, see [3].

**Data Types: **`single`

| `double`

`loc`

— Location of the last sample in each analysis windowcolumn vector

Location of last sample in each analysis window, returned as a column vector with
the same number of rows as `coeffs`

.

**Data Types: **`single`

| `double`

The `gtcc`

function splits the entire data into overlapping segments.
The length of each analysis window is determined by `WindowLength`

. The
length of overlap between analysis windows is determined by
`OverlapLength`

. The algorithm to determine the gammatone cepstral
coefficients depends on the filter domain, specified by `FilterDomain`

. The
default filter domain is frequency.

`gtcc`

computes the gammatone cepstral coefficients, log energy
values, delta, and delta-delta values for each analysis window as per the algorithm
described in `cepstralFeatureExtractor`

.

If `FilterDomain`

is specified as `'Time'`

, the
`gtcc`

function uses the `gammatoneFilterBank`

to apply time-domain filtering. The basic steps of the
`gtcc`

algorithm are outlined by the diagram.

The `FrequencyRange`

and sample rate (`fs`

)
parameters are set on the filter bank using the name-value pairs input to the
`gtcc`

function. The number of filters in the gammatone filter bank is
defined as

.This
roughly corresponds to placing a gammatone filter every 0.9 mm in the cochlea.`hz2erb`

(`FrequencyRange`

(2)) −
`hz2erb`

(`FrequencyRange`

(1))

The output from the gammatone filter bank is a multichannel signal. Each channel output
from the gammatone filter bank is buffered into overlapped analysis windows, as specified by
`WindowLength`

and `OverlapLength`

. Then a periodic
Hamming window is applied to each analysis window. The energy for each analysis window of
data is calculated. The STE of the channels are concatenated. The concatenated signal is
then passed through a logarithm function and transformed to the cepstral domain using a
discrete cosine transform (DCT).

The log-energy is calculated on the original audio signal using the same buffering scheme applied to the gammatone filter bank output.

[1] Shao, Yang, Zhaozhang Jin, Deliang
Wang, and Soundararajan Srinivasan. "An Auditory-Based Feature for Robust Speech Recognition."
*IEEE International Conference on Acoustics, Speech and Signal
Processing*. 2009.

[2] Valero, X., and F. Alias.
"Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio
Classification." *IEEE Transactions on Multimedia*. Vol. 14, Issue 6,
2012, pp. 1684–1689.

[3] Rabiner, Lawrence R., and Ronald
W. Schafer. *Theory and Applications of Digital Speech Processing*. Upper
Saddle River, NJ: Pearson, 2010.

Generate C and C++ code using MATLAB® Coder™.

`cepstralFeatureExtractor`

| `mfcc`

| `pitch`

| `voiceActivityDetector`

A modified version of this example exists on your system. Do you want to open this version instead?

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)