Label Audio Using Audio Labeler

The Audio Labeler app enables you to interactively define and visualize ground-truth labels for audio datasets. This example shows how you can create label definitions and then interactively label a set of audio files. The example also shows how to export the labeled ground truth data, which you can then use with audioDatastore to train a machine learning system.

Load Unlabeled Data

  1. To open the Audio Labeler, at the MATLAB® command prompt, enter:


  2. This example uses the audio files included with Audio Toolbox™. To locate the file path on your system, at the MATLAB command prompt, enter:


    To load audio from a file, click Load > Audio Folders and select the folder containing audio files you want to label.

Create Label Definitions

Define File-Level Labels

The audio samples include music, speech, and ambience. To create a file-level label that defines the contents of the audio file as music, speech, ambience, or unknown, click . Specify the Label Name as Content, the Data Type as categorical, and the Categories as music, speech, ambience, or unknown. Set the Default Value of the label definition to unknown.

All audio files in the Data Browser are now associated with the Content label name. To listen to the audio file selected in the Data Browser and confirm that it is a music file, click . To set the value of the Contents label, click unknown in the File Labels panel and select music from the drop-down menu.

The selected audio file now has the label name Content with value music assigned to it. You can continue setting the Content value for each file by selecting a file in the Data Browser and then selecting a value from the File Labels panel.

Define Region-Level Labels

You can define region-level labels by clicking in the ROI Labels panel. Create a region-level label that indicates if speech is present. Specify the Label Name as SpeechActivity, the Data Type as logical, and the Default Value as true.

Create another region-level label, this time with Label Name set to VUV, Data Type set to categorical, and categories specified as voiced and unvoiced.

Select Rainbow-16-8-mono-114sec.wav from the Data Browser. The file is 114 seconds long. By default, the waveform viewer shows the entire contents of the file. To display tools for zooming and panning, hover over the top right corner of the plot. Zoom in on the first five seconds of the audio file.

Click the zoom control button again to return the cursor to labeling mode. Then, select the part of the signal that corresponds to the first word on the waveform viewer. Hover the cursor over the ROI bar, which is directly to the right of the ROI label. The ROI bar has a one-to-one correspondence with the waveform viewer. When you select a region in the plot and then place your mouse in the ROI bar, the shadow of the region appears. To assign the region the default true value for the SpeechActivity label name, click the shadow. Label the first three regions of speech activity.

Zoom in on the third speech activity region. Label the regions of speech as voiced and unvoiced.

Export Label Definitions

You can export label definitions as a MAT file or as a MATLAB script. Maintaining label definitions enables consistent labeling between users and sessions. Select Export > Label Definitions > To File.

The labels are saved as an array of signalLabelDefinition objects. In your next session, you can import the label definitions by selecting Import > Label Definitions > From File.

Export Labeled Audio Data

You can export the labeled signal set to a file or to your workspace. Select Export > Labels > To Workspace.

The Audio Labeler creates a labeledSignalSet object named labeledSet_HHMMSS, where HHMMSS is the time the object is created in hours, minutes, and seconds.

labeledSet_142356 = 

  labeledSignalSet with properties:

             Source: {29×1 cell}
         NumMembers: 29
    TimeInformation: "inherent"
             Labels: [29×3 table]
        Description: ""

 Use labelDefinitionsHierarchy to see a list of labels and sublabels.
 Use setLabelValue to add data to the set.

The labels you created are saved as a table to the Labels property.

ans =

  29×3 table

                                                                                                         Content     SpeechActivity        VUV    
                                                                                                         ________    ______________    ___________

    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav                ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav           unknown      [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav    unknown      [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Click-16-44p1-mono-0.2secs.wav                  unknown      [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Counting-16-44p1-mono-15secs.wav                speech       [10×2 table]     [7×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Engine-16-44p1-stereo-20sec.wav                 ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\FemaleSpeech-16-8-mono-3secs.wav                speech       [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\FunkyDrums-44p1-stereo-25secs.mp3               music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\FunkyDrums-48-stereo-25secs.mp3                 music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Heli_16ch_ACN_SN3D.wav                          unknown      [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\JetAirplane-16-11p025-mono-16secs.wav           ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Laughter-16-8-mono-4secs.wav                    speech       [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\MainStreetOne-24-96-stereo-63secs.wav           ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\NoisySpeech-16-22p5-mono-5secs.wav              speech       [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Rainbow-16-8-mono-114secs.wav                   speech       [ 3×2 table]     [2×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RainbowNoisy-16-8-mono-114secs.wav              speech       [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RandomOscThree-24-96-stereo-13secs.aif          music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockDrums-44p1-stereo-11secs.mp3                music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockDrums-48-stereo-11secs.mp3                  music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockGuitar-16-44p1-stereo-72secs.wav            music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockGuitar-16-96-stereo-72secs.flac             unknown      [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\SoftGuitar-44p1_mono-10mins.ogg                 music        [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\SpeechDFT-16-8-mono-5secs.wav                   speech       [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\TrainWhistle-16-44p1-mono-9secs.wav             ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Turbine-16-44p1-mono-22secs.wav                 ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WashingMachine-16-44p1-stereo-10secs.wav        ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WashingMachine-16-8-mono-1000secs.wav           ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WashingMachine-16-8-mono-200secs.wav            ambience     [ 0×2 table]     [0×2 table]
    C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WaveGuideLoopOne-24-96-stereo-10secs.aif        music        [ 0×2 table]     [0×2 table]

The file names associated with the labels are saved as a cell array to the Source property.

ans =

  29×1 cell array

    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav'            }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav'       }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav'}
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Click-16-44p1-mono-0.2secs.wav'              }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Counting-16-44p1-mono-15secs.wav'            }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Engine-16-44p1-stereo-20sec.wav'             }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\FemaleSpeech-16-8-mono-3secs.wav'            }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\FunkyDrums-44p1-stereo-25secs.mp3'           }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\FunkyDrums-48-stereo-25secs.mp3'             }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Heli_16ch_ACN_SN3D.wav'                      }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\JetAirplane-16-11p025-mono-16secs.wav'       }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Laughter-16-8-mono-4secs.wav'                }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\MainStreetOne-24-96-stereo-63secs.wav'       }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\NoisySpeech-16-22p5-mono-5secs.wav'          }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Rainbow-16-8-mono-114secs.wav'               }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RainbowNoisy-16-8-mono-114secs.wav'          }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RandomOscThree-24-96-stereo-13secs.aif'      }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockDrums-44p1-stereo-11secs.mp3'            }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockDrums-48-stereo-11secs.mp3'              }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockGuitar-16-44p1-stereo-72secs.wav'        }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\RockGuitar-16-96-stereo-72secs.flac'         }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\SoftGuitar-44p1_mono-10mins.ogg'             }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\SpeechDFT-16-8-mono-5secs.wav'               }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\TrainWhistle-16-44p1-mono-9secs.wav'         }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\Turbine-16-44p1-mono-22secs.wav'             }
    {'C:\Program Files\MATLAB\R2018b\audio\samples\WashingMachine-16-44p1-stereo-10secs.wav'    }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WashingMachine-16-8-mono-1000secs.wav'       }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WashingMachine-16-8-mono-200secs.wav'        }
    {'C:\Program Files\MATLAB\R2018b\toolbox\audio\samples\WaveGuideLoopOne-24-96-stereo-10secs.aif'    }

Prepare Audio Datastore for Deep Learning Workflow

To continue on to a deep learning or machine learning workflow, use audioDatastore. Using an audio datastore enables you to apply capabilities that are common to machine learning applications, such as splitEachLabel. splitEachLabel enables you split your data into train and test sets.

Create an audio datastore for your labeled signal set. Specify the location of the audio files as the first argument of audioDatastore and set the Labels property of audioDatastore to the Labels property of the labeled signal set.

ADS = audioDatastore(labeledSet_142356.Source,'Labels',labeledSet_142356.Labels)

ADS = 

  audioDatastore with properties:

                       Files: {
                              ' ...\matlab\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav';
                              ' ...\matlab\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav';
                              ' ...\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav'
                               ... and 26 more
                      Labels: 29-by-3 table
    AlternateFileSystemRoots: {}
              OutputDataType: 'double'

Call countEachLabel and specify the Content table variable to count the number of files that are labeled as ambience, music, speech, or unknown.

ans =

  4×2 table

    Content     Count
    ________    _____

    ambience     10  
    music         9  
    speech        7  
    unknown       3  

For examples of using labeled audio data in a machine learning or deep learning workflow, see:

See Also

| | | |