Good day, I've managed to get the following from a sound wave using the FFT function: The red circles are the peak values found using the findpeaks() function (I make use of both descending and MinProminance). This means I have their x and y -coordinates in decending order. I then only take the first 6 x and y elements and save them to a .mat file to compare the other words (Other words' .mat files to). I horizontally concatenate them and save them as a .mat file. I do this for 5 different words and then compare their .mat files via the mean square error function (immse). The word with the lowest error then corrisponds to a value k, i.e Word 1 word 2 etc. The problem is however, that it doesn't produce the right result. So I say "Five" and the algorithm says I said "Two" as it more closely resembles the .mat file of the word "Two" (So the peaks are closer). Does anyone have any hints on where I could be going wrong or what other step I might need to take to go from the graph above to detecting which word has been said. My code is quite a mess and I don't want to discourage help by posting it... It follows the excact method I described to you for figuring out which word has been said. But if You'd like to help and request I post it I will. Thanks in advance!

Word comparison using frequency domain.

Star Strider il 12 Nov 2021

The fft function is keeping the frequency information, however discarding the associated time information necessary for the classification.

Experiment with pspectrum, using the 'spectrogram' option. (I prefer it to the spectrogram function for such tasks, since the desired output is the frequency data as a function of time, rather than normalised power in dB/Hz that spectrogram provides. They both have their uses, however pspectrum is preferable here.)

.

Leon Ellis il 12 Nov 2021

Ok thank you very much! Will look into that

Star Strider il 12 Nov 2021

My pleasure!

.

Leon Ellis il 12 Nov 2021

Modificato: Leon Ellis il 12 Nov 2021

Hi so still no luck. I got the pspectrum function, plotted it and took it's peak points to compare with the other words (Whos peaks points I got the same way). I still get inaccurate results like "Five" is "Three" and such using the MSE. Any other advice?

Star Strider il 12 Nov 2021

Use the pspectrum 'spectrogram' option to produce a time-frequency plot, necesary to correctly characterise the data.

Leon Ellis il 12 Nov 2021

Thanks, missread sorry! Will get back to you tomorrow

Star Strider il 12 Nov 2021

No worries!

(I’m thus far not posting this as an Answer since I don’t have the signal to work with.)

Leon Ellis il 13 Nov 2021

Apri in MATLAB Online

The code I used to produce these mat files is the following: I just changed the file names when saving the different words. I want to compare the "Word3.mat" to itself, then word4.mat and word5.mat and find which word it's most likely to be using the immse() function.

Fs=8000;             
rs = audiorecorder(Fs,24,1);      %Initialize variable for Sound (20) to be saved in.
                 recordblocking(rs,2);         %Give the Sound within the given 2 seconds
                Sound = getaudiodata(rs,"double");
                audiowrite("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\CWord.wav",Sound,Fs);
          
            [CompareWord, Fs] = audioread("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\CWord.wav");
            Ts=1/Fs;
            dt=(0:length(CompareWord)-1)*Ts;
            
            nfft=length(CompareWord);
            nfft2=2.^nextpow2(nfft);
            
            ff=fft(CompareWord,nfft2);
            ff=ff(1:nfft2/2);
            ffm=movmax(ff,50); 
             xfft=Fs*(0:nfft2/2-1)/nfft2;
            
            cut_off=1.2e3/Fs/2;
            order=32;
            
            h=fir1(order,cut_off);
            
            fh=fft(h,nfft2);
            fh=fh(1:nfft2/2);
    
           
            mul=conv(fh,ff);
            con=conv(CompareWord,h);
            hold off;
            
             plot(dt,CompareWord);
            plot(xfft,abs(ff/max(ff))); %#ok<ADPROPLC> 
            hold on;
            %pks=findpeaks(abs(ffm));
            %%Gets center x-coordinates of local maximum values.
            TF2=islocalmax(abs(ffm),'FlatSelection',"center");
            x=1:length(xfft);
            hold on;
            plot(x,abs(ff)/max(abs(ff)),x(TF2),abs(ff(TF2)/max(ff)),'r*');
           hold off;
           
            
           stem(h);
           plot(abs(fh/max(fh))); %#ok<ADPROPLC> 
           sound(con);
           plot(con);
          CompareWord=con;
          
            save("CompareWord.mat","CompareWord","-mat");
           
           plot(abs(mul));
           TF3=islocalmax(abs(ffm),'FlatSelection',"center");
            x=1:length(mul);
            hold on;
            

This is what I got plotting the Time amplitude domain using the

[con,f,t]=pspectrum(con,'spectrogram');

plot(app.UIAxes,t,con);

But I'm not sure how to use it or if findpeaks() will even work here.

Thanks again for your time!

Leon Ellis il 13 Nov 2021

Modificato: Leon Ellis il 13 Nov 2021

Please try help with the main code! This was me trying to solve it differently so you can just ignore this.

I've also tried using mscohere on the imported soundwaves and then trying to find the line closest to a straigh line with y=1; (y=1 means there's no difference in the soundwave). Although the code works, it still gives the wrong output many times. I'm feeling quite hopeless atm. My code was: but please keep to the main question with the .mat files if possible. This was just me trying to classify the words in a different way.

clc;

clear;

clf;

Fs=8000;

M=100000;

flag='0';

int=1;

M2=0;

for k=1:10

k2=num2str(k);

[x,Fs]=audioread("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\WordL1_"+k2+".wav");

psdx=psd(spectrum.periodogram,x,'Fs',8000,'NFFT',length(x));

[z,Fs]=audioread("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\WordR1_"+k2+".wav");

psdx=psd(spectrum.periodogram,z,'Fs',8000,'NFFT',length(z));

[y,Fs]=audioread("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\CWord.wav");

psdy=psd(spectrum.periodogram,y,'Fs',8000,'NFFT',length(y));

L=100000;

%%Make sure signals are same length%%

if (length(y)<length(x))

L=length(y);

end

if(length(x)<L)

L=length(x);

end

if(length(z)<L)

L=length(z);

end

x=x(1:L);

y=y(1:L);

z=z(1:L);

%%Get the amplitudes and their corresponding frequency points

[amp,fs]=mscohere(x,y,hanning(1e3),800,1024,8000);

[amp2,fs]=mscohere(z,y,hanning(1e3),800,1024,8000);

%%Get the area under the plotted amplitude and frequency points and compare

%%it to the area under a y=1 line over the same interval

amp=trapz(amp,fs);

amp2=trapz(amp2,fs);

%%Area under y=1 line.

L1 = trapz(ones(length(amp),1),fs);

L2 = trapz(ones(length(amp2),1),fs);

if(immse(amp,L1)<M)

M=immse(amp,L1)

flag=k2;

end

if(immse(amp2,L2)<M)

M=immse(amp2,L2)

flag=k2;

end

%%Check to see which word has been said based on the value of flag

if(flag=='1')

Title ='One';

end

if(flag=='2')

Title ='Two';

end

if(flag=='3')

Title ='Three';

end

if(flag=='4')

Title ='Four';

end

if(flag=='5')

Title ='Five';

end

if(flag=='6')

Title ='Six';

end

if(flag=='7')

Title ='Seven';

end

if(flag=='8')

Title ='Eight';

end

if(flag=='9')

Title ='Nine';

end

if(flag=='10')

Title ='Ten';

end

Leon Ellis il 13 Nov 2021

Sorry to bother so much, it's just the due date for figuring this out is quite close. I've taken the time and power values of the words spoken and compared them via immse function. It still gives the wrong outputs.

Leon Ellis il 14 Nov 2021

I've used the plots of the time-frequency graphs that I got from the pspectrum function and comparing them also doesn't work or identify the word correctly for me.

Salman Ahmed il 17 Nov 2021

Hi Leon,

From my understanding, you wish to develop an algorithm for word classification. If you have a dataset of different spoken instances of each word, you could extract time-frequency features and train a neural network. Also, have a look at a similar example here. You could customize this code by replacing the words you wish to detect. Hope it helps.

Leon Ellis il 17 Nov 2021

Thank you very much. Unfortunetely it's a bit too late and I wasn't able to get it to work. I also don't think we're suppost to create an algorithm to train for word identification (We're just suppost to work with the audio file characteristics for identification.) But thanks a lot for replying!

Word comparison using frequency domain.

13 Commenti
Mostra 11 commenti meno recenti Nascondi 11 commenti meno recenti

Risposte (0)

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

Word comparison using frequency domain.

13 Commenti Mostra 11 commenti meno recenti Nascondi 11 commenti meno recenti

Risposte (0)

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

13 Commenti
Mostra 11 commenti meno recenti Nascondi 11 commenti meno recenti