Word comparison using frequency domain.
Mostra commenti meno recenti
Good day, I've managed to get the following from a sound wave using the FFT function:

The red circles are the peak values found using the findpeaks() function (I make use of both descending and MinProminance). This means I have their x and y -coordinates in decending order. I then only take the first 6 x and y elements and save them to a .mat file to compare the other words (Other words' .mat files to). I horizontally concatenate them and save them as a .mat file. I do this for 5 different words and then compare their .mat files via the mean square error function (immse). The word with the lowest error then corrisponds to a value k, i.e Word 1 word 2 etc. The problem is however, that it doesn't produce the right result. So I say "Five" and the algorithm says I said "Two" as it more closely resembles the .mat file of the word "Two" (So the peaks are closer).
Does anyone have any hints on where I could be going wrong or what other step I might need to take to go from the graph above to detecting which word has been said. My code is quite a mess and I don't want to discourage help by posting it... It follows the excact method I described to you for figuring out which word has been said. But if You'd like to help and request I post it I will. Thanks in advance!
13 Commenti
Star Strider
il 12 Nov 2021
The fft function is keeping the frequency information, however discarding the associated time information necessary for the classification.
Experiment with pspectrum, using the 'spectrogram' option. (I prefer it to the spectrogram function for such tasks, since the desired output is the frequency data as a function of time, rather than normalised power in dB/Hz that spectrogram provides. They both have their uses, however pspectrum is preferable here.)
.
Leon Ellis
il 12 Nov 2021
Star Strider
il 12 Nov 2021
My pleasure!
.
Leon Ellis
il 12 Nov 2021
Modificato: Leon Ellis
il 12 Nov 2021
Star Strider
il 12 Nov 2021
Use the pspectrum 'spectrogram' option to produce a time-frequency plot, necesary to correctly characterise the data.
Leon Ellis
il 12 Nov 2021
Star Strider
il 12 Nov 2021
No worries!
(I’m thus far not posting this as an Answer since I don’t have the signal to work with.)
Leon Ellis
il 13 Nov 2021
Leon Ellis
il 13 Nov 2021
Modificato: Leon Ellis
il 13 Nov 2021
Leon Ellis
il 13 Nov 2021
Leon Ellis
il 14 Nov 2021
Salman Ahmed
il 17 Nov 2021
Hi Leon,
From my understanding, you wish to develop an algorithm for word classification. If you have a dataset of different spoken instances of each word, you could extract time-frequency features and train a neural network. Also, have a look at a similar example here. You could customize this code by replacing the words you wish to detect. Hope it helps.
Leon Ellis
il 17 Nov 2021
Risposte (0)
Categorie
Scopri di più su Deep Learning Toolbox in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

