Deep Learning for speech proficiency scoring?

Joel Bluhme

2 Apr 2021

0 Risposte

Aggiornato 3 Mag 2021

2 Visualizzazioni (30 giorni)

Accedi per rispondere a questa domanda.

Follow Question

Accedi per rispondere a questa domanda.

Follow Question

Mostra commenti meno recenti

0 voti

Hi!

I have a dataset consisting of > 100 recordings of approx. 10 minutes consisting of patients undergoing a speech evaluation interview with a speech therapist. I also have access to the resulting score of these interviews, on a scale from 1 to 10. I want to train an intial deep learning network which predicts the score of a patient based on this dataset. My question is this: Is it better to label the entire interview with a 9 if that's the score given to that patient or would you rather want to implement some sort of speech2text function for the entire interview so that each interview yields a whole set of pairs consisting of words and the score for the entire interview? Then when the network is asked to score a new interview it would then to speech2text on that file and match each word with its closets matches?

Best,

Joel

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Brian Hemmat il 3 Mag 2021

Hi Joel,

When you suggest pairs of words and scores, you mean by 'word' either the raw audio or some set of acoustic features, and not the text, correct? speech2text might be useful for segmenting audio, but I don't think it will retain valuable information like whether or not stuttering is present.

At what time scale or part of speech does whatever you are evaluating show up? Are you looking for articulation disorders? Fluency disorders? What time scale you need to feed to your system will depend on what you are evaluating. Whatever the rating represents, my suspician is that you will want to segment the audio into 5-20 seconds clips, with the segments having the same label as the whole clip.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question