Azzera filtri
Azzera filtri

Deep Learning for speech proficiency scoring?

2 visualizzazioni (ultimi 30 giorni)
Joel Bluhme
Joel Bluhme il 2 Apr 2021
Commentato: Brian Hemmat il 3 Mag 2021
Hi!
I have a dataset consisting of > 100 recordings of approx. 10 minutes consisting of patients undergoing a speech evaluation interview with a speech therapist. I also have access to the resulting score of these interviews, on a scale from 1 to 10. I want to train an intial deep learning network which predicts the score of a patient based on this dataset. My question is this: Is it better to label the entire interview with a 9 if that's the score given to that patient or would you rather want to implement some sort of speech2text function for the entire interview so that each interview yields a whole set of pairs consisting of words and the score for the entire interview? Then when the network is asked to score a new interview it would then to speech2text on that file and match each word with its closets matches?
Best,
Joel
  1 Commento
Brian Hemmat
Brian Hemmat il 3 Mag 2021
Hi Joel,
When you suggest pairs of words and scores, you mean by 'word' either the raw audio or some set of acoustic features, and not the text, correct? speech2text might be useful for segmenting audio, but I don't think it will retain valuable information like whether or not stuttering is present.
At what time scale or part of speech does whatever you are evaluating show up? Are you looking for articulation disorders? Fluency disorders? What time scale you need to feed to your system will depend on what you are evaluating. Whatever the rating represents, my suspician is that you will want to segment the audio into 5-20 seconds clips, with the segments having the same label as the whole clip.

Accedi per commentare.

Risposte (0)

Categorie

Scopri di più su AI for Signals in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by