Main Content


Graph of YAMNet AudioSet ontology



    ygraph = yamnetGraph returns a directed graph of the AudioSet ontology.


    [ygraph,classes] = yamnetGraph also returns a string array of classes supported by YAMNet.

    This function requires both Audio Toolbox™ and Deep Learning Toolbox™.


    collapse all

    Download and unzip the Audio Toolbox™ support for YAMNet.

    Type yamnetGraph at the Command Window. If the Audio Toolbox support for YAMNet is not installed, then the function provides a link to the download location. To download the model, click the link. Unzip the file to a location on the MATLAB path.

    Alternatively, execute the following commands to download and unzip the YAMNet model to your temporary directory.

    downloadFolder = fullfile(tempdir,'YAMNetDownload');
    loc = websave(downloadFolder,'');
    YAMNetLocation = tempdir;

    Check that the installation is successful by typing yamnetGraph at the Command Window. If the network is installed, then the function returns a digraph object.


    Create a digraph object that describes the AudioSet ontology.

    ygraph = yamnetGraph
    ygraph = 
      digraph with properties:
        Edges: [670×1 table]
        Nodes: [632×1 table]

    Visualize the ontology. The ontology consists of 632 separate classes with 670 connections.

    p = plot(ygraph);

    Get the name of each sound class. If the sound class has no predecessors, identify it as a major category of the ontology.

    nodeNames = ygraph.Nodes.Name;
    topCategories = {};
    for index = 1:numel(nodeNames)
        pre = predecessors(ygraph,nodeNames{index});
        if isempty(pre)
            topCategories{end+1} = nodeNames{index};

    Display the categories as an array of strings.

    topCategories = string(topCategories)
    topCategories = 1×7 string
        "Human sounds"    "Animal"    "Music"    "Natural sounds"    "Sounds of things"    "Source-ambiguous sounds"    "Channel, environment and background"

    Highlight and label the top categories on the digraph plot.


    Create a digraph object that represents the AudioSet ontology.

    ygraph = yamnetGraph;

    Use dfsearch to perform a depth-first graph search to identify all audio classes under the class Animal.

    animalNodes = dfsearch(ygraph,"Animal");

    Use subgraph to create a new digraph object that only includes the identified audio classes. Plot the resulting directed edges graph.

    animalGraph = subgraph(ygraph,animalNodes);
    p = plot(animalGraph);
    p.NodeFontSize = 12;
    graphFigure = gcf;
    old = graphFigure.Position;

    Use predecessors to determine all predecessors to the Growling sound. Highlight the predecessors on the plot.

    preIDs = predecessors(animalGraph,"Growling")
    preIDs = 4×1 string
        "Roaring cats (lions, tigers)"
        "Canidae, dogs, wolves"

    Use highlight to highlight the Growling node and the predecessors on the plot.


    Create a digraph object that describes the AudioSet ontology. Also return the classes supported by YAMNet. Plot the directed graph.

    [ygraph,classes] = yamnetGraph;
    p = plot(ygraph);

    YAMNet predicts a subset of the full AudioSet ontology. Display the sound classes that are in the AudioSet ontology but are not possible outputs from the YAMNet network.

    audiosetClasses = ygraph.Nodes.Name;
    classDiff = setdiff(audiosetClasses,classes)
    classDiff = 111×1 string
        "Acoustic environment"
        "Alto saxophone"
        "Background noise"
        "Bass (frequency range)"
        "Bass (instrument role)"
        "Battle cry"
        "Birthday music"
        "Brief tone"
        "Cat communication"
        "Cellphone buzz, vibrating alert"
        "Channel, environment and background"
        "Compact disc"
        "Crash cymbal"
        "Deformable shell"
        "Domestic sounds, home sounds"
        "Donkey, ass"

    Highlight the classes that are not possible outputs from YAMNet.


    Analyze one of the major categories.

    categoryToAnalyze = "Channel, environment and background";
    subsetNodes = dfsearch(ygraph,categoryToAnalyze);
    ygraphSubset = subgraph(ygraph,subsetNodes);
    classToHighlight = intersect(classDiff,ygraphSubset.Nodes.Name);
    pSub = plot(ygraphSubset);

    Create a digraph object that describes the AudioSet ontology.

    ygraph = yamnetGraph;

    Specify a sound class to visualize, and specify the number of predecessors and successors. The available sound classes are only those that are supported as outputs from YAMNet. If you specify more predecessors or successors than those in the ontology, only the predecessors and successors in the ontology are shown.

    soundClass = "Growling";
    numPredecessors = 3;
    numSuccessors = 0;
    pred = nearest(ygraph,soundClass,numPredecessors,'Direction','incoming');
    suc = nearest(ygraph,soundClass,numSuccessors,'Direction','outgoing');
    subClasses = [soundClass;pred;suc];
    ygraphSub = subgraph(ygraph,unique(subClasses));
    p = plot(ygraphSub);

    Output Arguments

    collapse all

    AudioSet ontology graph with directed edges, returned as a digraph object.

    Classes supported by YAMNet, returned as a string array. The classes supported by YAMNet are a subset of the AudioSet ontology.


    Google® provides a website where you can explore the AudioSet ontology and the corresponding data set:


    [1] Gemmeke, Jort F., et al. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 776–80. (Crossref), doi:10.1109/ICASSP.2017.7952261.

    [2] Hershey, Shawn, et al. “CNN Architectures for Large-Scale Audio Classification.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 131–35. (Crossref), doi:10.1109/ICASSP.2017.7952132.

    Introduced in R2020b