How to extract VGGish features from audio files shorter than 1s?

Question

Robert-Valentin Bencze il 7 Mar 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1665994-how-to-extract-vggish-features-from-audio-files-shorter-than-1s

Commentato: jibrahim il 10 Mar 2022

feature_extraction_VGGish.m

I used the code from the example presented here: https://www.mathworks.com/help/audio/ref/vggish.html by replacing the line [audioIn,fs0] = audioread('Ambiance-16-44p1-mono-12secs.wav'); with [audioIn,fs0] = audioread('1340-a_h.wav');

The file's initial sample rate is 50kHz and its length is 43501 samples. After resampling to 16kHz, its length becomes 13921.

The attempt to run the attached file retrned the following errors:

To reproduce the bug, 1340-a_h.wav is a vocal recording from the Saarbrucken Voice Dataset that can be downloaded here. If the first link is not working, try here by clicking on "Databankanfrage" (Database Request) and selecting the "Cyste" pathology from the list on the right. Click the blue "Exportieren" (Export) button on the bottom right. Click the blue "Alle" (All) checkbox and the WAV checkbox to the right of "Sprach-Signal" (Speech signal). Click the blue "Ubernehmen" (Take over) button. Click "Herunterladen" (Download).

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

jibrahim il 9 Mar 2022

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1665994-how-to-extract-vggish-features-from-audio-files-shorter-than-1s#answer_913639

Apri in MATLAB Online

Hi Robert-Valentin,

The VGGIsh network accepts auditory spectrograms that correspond to roughly one second of audio, so you do not have enough audio to generate a set of embeddings.

One way around this is to pad your input with zeros. For example (after you've resampled to 16 kHz):

audioIn = [audioIn ;zeros(0.975 *16e3-size(audioIn,1),1)];

Also, note that these two functions should make your life easier:

vggishPreprocess: Will accept the audio signal and create the Mel spectogram for you, including resampling to the right sample rate. No need to do it yourself.
vggishFeatures: Combines Mel spectrogram generation and network inference. You feed the function the audio signal, and it does everything for you and gives you the embeddings.

2 Commenti
Mostra NessunoNascondi Nessuno

Robert-Valentin Bencze il 10 Mar 2022

Modificato: Robert-Valentin Bencze il 10 Mar 2022

Thank you @jibrahim.

However, I'm expecting that if I attempt the zero-padding strategy to 20ms signal windows, the extracted features will have a poor quality (i.e. they will not yield good prediction accuracy if used for a voice pathology classifier based on windowed signals). Am I right?

jibrahim il 10 Mar 2022

Hi Robert-Valentin,

I guess it depends. If you're padding a small amount compared to the length of the audio, the spectrogram will probably still have enough valuable info to give good results. VGGish essentially expects spectrograms that correspond to roughly one second of audio (975 ms) , so there is no way around this if your entire signal is shorted than that.

Note that, in some of our examples, we do a similar zero-padding if the signal is too short (see this example), and results are fine. I think we pad zeros on each side rather than put all the zeros at the front. That might help too.

Accedi per commentare.

How to extract VGGish features from audio files shorter than 1s?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

2 Commenti
Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

How to extract VGGish features from audio files shorter than 1s?

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

2 Commenti Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno