Main Content

unenroll

Unenroll labels

Since R2021a

    Description

    unenroll(ivs) unenrolls all labels and corresponding i-vectors from the i-vector system ivs.

    example

    unenroll(ivs,labels) unenrolls the specified labels and corresponding i-vectors from the i-vector system ivs.

    Examples

    collapse all

    Download the Berlin Database of Emotional Speech [1]. The database contains 535 utterances spoken by 10 actors intended to convey one of the following emotions: anger, boredom, disgust, anxiety/fear, happiness, sadness, or neutral. The emotions are text independent.

    url = "http://emodb.bilderbar.info/download/download.zip";
    downloadFolder = tempdir;
    datasetFolder = fullfile(downloadFolder,"Emo-DB");
    
    if ~exist(datasetFolder,"dir")
        disp("Downloading Emo-DB (40.5 MB) ...")
        unzip(url,datasetFolder)
    end

    Create an audioDatastore that points to the audio files.

    ads = audioDatastore(fullfile(datasetFolder,"wav"));

    The file names are codes indicating the speaker id, text spoken, emotion, and version. The website contains a key for interpreting the code and additional information about the speakers such as gender and age. Create a table with the variables Speaker and Emotion. Decode the file names into the table.

    filepaths = ads.Files;
    emotionCodes = cellfun(@(x)x(end-5),filepaths,"UniformOutput",false);
    emotions = replace(emotionCodes,{'W','L','E','A','F','T','N'}, ...
        {'Anger','Boredom','Disgust','Anxiety','Happiness','Sadness','Neutral'});
    
    speakerCodes = cellfun(@(x)x(end-10:end-9),filepaths,"UniformOutput",false);
    labelTable = table(categorical(speakerCodes),categorical(emotions),VariableNames=["Speaker","Emotion"]);
    summary(labelTable)
    Variables:
    
        Speaker: 535×1 categorical
    
            Values:
    
                03       49   
                08       58   
                09       43   
                10       38   
                11       55   
                12       35   
                13       61   
                14       69   
                15       56   
                16       71   
    
        Emotion: 535×1 categorical
    
            Values:
    
                Anger          127   
                Anxiety         69   
                Boredom         81   
                Disgust         46   
                Happiness       71   
                Neutral         79   
                Sadness         62   
    

    labelTable is in the same order as the files in audioDatastore. Set the Labels property of the audioDatastore to labelTable.

    ads.Labels = labelTable;

    Read a signal from the datastore and listen to it. Display the speaker ID and emotion of the audio signal.

    [audioIn,audioInfo] = read(ads);
    fs = audioInfo.SampleRate;
    sound(audioIn,fs)
    audioInfo.Label
    ans=1×2 table
        Speaker     Emotion 
        _______    _________
    
          03       Happiness
    
    

    Split the datastore into a training set and a test set. Assign two speakers to the test set and the remaining to the training set.

    testSpeakerIdx = ads.Labels.Speaker=="12" | ads.Labels.Speaker=="13";
    adsTrain = subset(ads,~testSpeakerIdx);
    adsTest = subset(ads,testSpeakerIdx);

    Read all the training and testing audio data into cell arrays. If your data can fit in memory, training is usually faster to input cell arrays to an i-vector system rather than datastores.

    trainSet = readall(adsTrain);
    trainLabels = adsTrain.Labels.Emotion;
    testSet = readall(adsTest);
    testLabels = adsTest.Labels.Emotion;

    Create an i-vector system that does not apply speech detection. When DetectSpeech is set to true (the default), only regions of detected speech are used to train the i-vector system. When DetectSpeech is set to false, the entire input audio is used to train the i-vector system. The usefulness of applying speech detection depends on the data input to the system.

    emotionRecognizer = ivectorSystem(SampleRate=fs,DetectSpeech=false)
    emotionRecognizer = 
      ivectorSystem with properties:
    
             InputType: 'audio'
            SampleRate: 16000
          DetectSpeech: 0
               Verbose: 1
        EnrolledLabels: [0×2 table]
    
    

    Call trainExtractor using the training set.

    rng default
    trainExtractor(emotionRecognizer,trainSet, ...
        UBMNumComponents =256, ...
        UBMNumIterations =5, ...
        ...
        TVSRank = 128, ...
        TVSNumIterations =5);
    Calculating standardization factors .....done.
    Training universal background model ........done.
    Training total variability space ........done.
    i-vector extractor training complete.
    

    Copy the emotion recognition system for use later in the example.

    sentimentRecognizer = copy(emotionRecognizer);

    Call trainClassifier using the training set.

    rng default
    trainClassifier(emotionRecognizer,trainSet,trainLabels, ...
        NumEigenvectors =32, ...
        ...
        PLDANumDimensions =16, ...
        PLDANumIterations =10);
    Extracting i-vectors ...done.
    Training projection matrix .....done.
    Training PLDA model .............done.
    i-vector classifier training complete.
    

    Call calibrate using the training set. In practice, the calibration set should be different than the training set.

    calibrate(emotionRecognizer,trainSet,trainLabels)
    Extracting i-vectors ...done.
    Calibrating CSS scorer ...done.
    Calibrating PLDA scorer ...done.
    Calibration complete.
    

    Enroll the training labels into the i-vector system.

    enroll(emotionRecognizer,trainSet,trainLabels)
    Extracting i-vectors ...done.
    Enrolling i-vectors ..........done.
    Enrollment complete.
    

    You can use detectionErrorTradeoff as a quick sanity check on the performance of a multilabel closed-set classification system. However, detectionErrorTradeoff provides information more suitable to open-set binary classification problems, for example, speaker verification tasks.

    detectionErrorTradeoff(emotionRecognizer,testSet,testLabels)
    Extracting i-vectors ...done.
    Scoring i-vector pairs ...done.
    Detection error tradeoff evaluation complete.
    

    For a more detailed view of the i-vector system's performance in a multilabel closed set application, you can use the identify function and create a confusion matrix. The confusion matrix enables you to identify which emotions are misidentified and what they are misidentified as. Use the supporting function plotConfusion to display the results.

    trueLabels = testLabels;
    predictedLabels = trueLabels;
    scorer = "plda";
    for ii = 1:numel(testSet)
        tableOut = identify(emotionRecognizer,testSet{ii},scorer);
        predictedLabels(ii) = tableOut.Label(1);
    end
    
    plotConfusion(trueLabels,predictedLabels)

    Call info to inspect how emotionRecognizer was trained and evaluated.

    info(emotionRecognizer)
    i-vector system input
      Input feature vector length: 60
      Input data type: double
    
    trainExtractor
      Train signals: 439
      UBMNumComponents: 256
      UBMNumIterations: 5
      TVSRank: 128
      TVSNumIterations: 5
    
    trainClassifier
      Train signals: 439
      Train labels: Anger (103), Anxiety (56) ... and 5 more
      NumEigenvectors: 32
      PLDANumDimensions: 16
      PLDANumIterations: 10
    
    calibrate
      Calibration signals: 439
      Calibration labels: Anger (103), Anxiety (56) ... and 5 more
    
    detectionErrorTradeoff
      Evaluation signals: 96
      Evaluation labels: Anger (24), Anxiety (13) ... and 5 more
    

    Next, modify the i-vector system to recognize emotions as positive, neutral, or negative. Update the labels to only include the categories negative, positive, and categorical.

    trainLabelsSentiment = trainLabels;
    trainLabelsSentiment(ismember(trainLabels,categorical(["Anger","Anxiety","Boredom","Sadness","Disgust"]))) = categorical("Negative");
    trainLabelsSentiment(ismember(trainLabels,categorical("Happiness"))) = categorical("Positive");
    trainLabelsSentiment = removecats(trainLabelsSentiment);
    
    testLabelsSentiment = testLabels;
    testLabelsSentiment(ismember(testLabels,categorical(["Anger","Anxiety","Boredom","Sadness","Disgust"]))) = categorical("Negative");
    testLabelsSentiment(ismember(testLabels,categorical("Happiness"))) = categorical("Positive");
    testLabelsSentiment = removecats(testLabelsSentiment);

    Train the i-vector system classifier using the updated labels. You do not need to retrain the extractor. Recalibrate the system.

    rng default
    trainClassifier(sentimentRecognizer,trainSet,trainLabelsSentiment, ...
        NumEigenvectors =64, ...
        ...
        PLDANumDimensions =32, ...
        PLDANumIterations =10);
    Extracting i-vectors ...done.
    Training projection matrix .....done.
    Training PLDA model .............done.
    i-vector classifier training complete.
    
    calibrate(sentimentRecognizer,trainSet,trainLabels)
    Extracting i-vectors ...done.
    Calibrating CSS scorer ...done.
    Calibrating PLDA scorer ...done.
    Calibration complete.
    

    Enroll the training labels into the system and then plot the confusion matrix for the test set.

    enroll(sentimentRecognizer,trainSet,trainLabelsSentiment)
    Extracting i-vectors ...done.
    Enrolling i-vectors ......done.
    Enrollment complete.
    
    trueLabels = testLabelsSentiment;
    predictedLabels = trueLabels;
    scorer = "plda";
    for ii = 1:numel(testSet)
        tableOut = identify(sentimentRecognizer,testSet{ii},scorer);
        predictedLabels(ii) = tableOut.Label(1);
    end
    
    plotConfusion(trueLabels,predictedLabels)

    An i-vector system does not require the labels used to train the classifier to be equal to the enrolled labels.

    Unenroll the sentiment labels from the system and then enroll the original emotion categories in the system. Analyze the system's classification performance.

    unenroll(sentimentRecognizer)
    enroll(sentimentRecognizer,trainSet,trainLabels)
    Extracting i-vectors ...done.
    Enrolling i-vectors ..........done.
    Enrollment complete.
    
    trueLabels = testLabels;
    predictedLabels = trueLabels;
    scorer = "plda";
    for ii = 1:numel(testSet)
        tableOut = identify(sentimentRecognizer,testSet{ii},scorer);
        predictedLabels(ii) = tableOut.Label(1);
    end
    
    plotConfusion(trueLabels,predictedLabels)

    Supporting Functions

    function plotConfusion(trueLabels,predictedLabels)
    uniqueLabels = unique(trueLabels);
    cm = zeros(numel(uniqueLabels),numel(uniqueLabels));
    for ii = 1:numel(uniqueLabels)
        for jj = 1:numel(uniqueLabels)
            cm(ii,jj) = sum((trueLabels==uniqueLabels(ii)) & (predictedLabels==uniqueLabels(jj)));
        end
    end
    
    heatmap(uniqueLabels,uniqueLabels,cm)
    colorbar off
    ylabel('True Labels')
    xlabel('Predicted Labels')
    accuracy = mean(trueLabels==predictedLabels);
    title(sprintf("Accuracy = %0.2f %%",accuracy*100))
    end

    References

    [1] Burkhardt, F., A. Paeschke, M. Rolfes, W.F. Sendlmeier, and B. Weiss, "A Database of German Emotional Speech." In Proceedings Interspeech 2005. Lisbon, Portugal: International Speech Communication Association, 2005.

    Input Arguments

    collapse all

    i-vector system, specified as an object of type ivectorSystem.

    Classification labels used by an i-vector system, specified as one of these:

    • A categorical array

    • A cell array of character vectors

    • A string array

    Data Types: categorical | cell | string

    Version History

    Introduced in R2021a