Missing IDCT in MFCC computation of "vggishPreprocess" function
16 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Roberto Andreotti
il 10 Dic 2024 alle 17:09
Modificato: jibrahim
il 10 Dic 2024 alle 22:31
Hello all
I am try to go deep into audio signal feature extraction and, in literature, I read about Mel Frequency Cepstral Coefficients (MFCCs). Avoiding longer tenuous explanation, MFCC are obtained by performing the inverse cosine transform of the log spectrum of the audio signal (including frequency warping to the log scale). By assembling these coefficients, one obtains the Mel spectrum.
I was very happy to notice that matlab includes the "vggish" function, to automatically obtain the Mel spectrums from a signal. Anyways, by going through the "vggish" function code, I noticed that no inverse cosine transform is included. I was expecting to find the "idct" function, or an analogous one, but the procedure seems to stop performing the log of the fourier transform. Also, the latter is not squared to obtain the spectrum.
Is this a different procedure to obtain MFCC? Any references that describe such procedure?
Thanks in advance
Roberto
0 Commenti
Risposta accettata
jibrahim
il 10 Dic 2024 alle 22:29
Modificato: jibrahim
il 10 Dic 2024 alle 22:31
Hi Roberto,
The VGGish-related functions do not generate MFCC coefficients. VGGish is a deep neural network that extracts feature embeddings from audio signals. These embeddings may be used as features to train AI networks. They are not identical to MFCC coefficients.
To extract VGGish features directly from audio signals, use vggishEmbeddings. This function first generates Mel spectrograms (not MFCC coefficients) from the audio signal using parameters (window length, overlap, scaling, etc) to match the original VGGish implemenation. These Mel spectrograms are then fed tot he VGGish pretrained network. The output of the network are the deep embeddings.
The function vggishPreprocess is used inside vggishEmbeddings to generate the Mel spectrograms I just mentioned. Again, the intent of the function is to generate Mel spectrograms that are ready to be consumed by the VGGish network. If you want to generate Mel spectrograms, you can use the melSpectrogram function.
0 Commenti
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Feature Extraction in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!