Main Content

predict

Predict entities using named entity recognition (NER) model

Since R2023a

    Description

    The predict function detects named entities in text using a hmmEntityModel object.

    To add entity details to documents using a custom NER model, use addDependencyDetails and set the Model option to the custom model.

    example

    tbl = predict(mdl,documents) predicts the named entities of the tokens in the specified documents using the NER model mdl.

    Examples

    collapse all

    Load the trained example hmmEntityModel object.

    load exampleEntityModel
    mdl
    mdl = 
      hmmEntityModel with properties:
    
        Entities: [3x1 categorical]
    
    

    Create a tokenized document object of text data.

    str = "MathWorks develops MATLAB and Simulink.";
    document = tokenizedDocument(str);

    Make predictions using the predict function.

    tbl = predict(mdl,document)
    tbl=6×2 table
           Token           Entity    
        ___________    ______________
    
        "MathWorks"    B-organization
        "develops"     non-entity    
        "MATLAB"       B-product     
        "and"          non-entity    
        "Simulink"     B-product     
        "."            non-entity    
    
    

    Input Arguments

    collapse all

    Custom NER model, specified as a hmmEntityModel object. To train a custom NER model, use the trainHMMEntityModel function.

    For an example, see Train Custom Named Entity Recognition Model.

    Input documents, specified as a tokenizedDocument array.

    Output Arguments

    collapse all

    Predicted entities, returned as a table with these variables:

    Algorithms

    collapse all

    Inside, Outside, Beginning (IOB) Labeling Schemes

    The inside, outside (IO) labeling scheme tags entities with "O" or prefixes the entities with "I". The tag "O" (outside) denotes non-entities. For each token in an entity, the tag is prefixed with "I-" (inside), which denotes that the token is part of an entity.

    A limitation of the IO labeling scheme is that it does not specify entity boundaries between adjacent entities of the same type. The inside, outside, beginning (IOB) labeling scheme, also known as the beginning, inside, outside (BIO) labeling scheme, addresses this limitation by introducing a "beginning" prefix.

    There are two variants of the IOB labeling scheme: IOB1 and IOB2.

    IOB2 Labeling Scheme

    For each token in an entity, the tag is prefixed with one of these values:

    • "B-" (beginning) — The token is a single token entity or the first token of a multi-token entity.

    • "I-" (inside) — The token is a subsequent token of a multi-token entity.

    For a list of entity tags Entity, the IOB labeling scheme helps identify boundaries between adjacent entities of the same type by using this logic:

    • If Entity(i) has prefix "B-" and Entity(i+1) is "O" or has prefix "B-", then Token(i) is a single entity.

    • If Entity(i) has prefix "B-", Entity(i+1), ..., Entity(N) has prefix "I-", and Entity(N+1) is "O" or has prefix "B-", then the phrase Token(i:N) is a multi-token entity.

    IOB1 Labeling Scheme

    The IOB1 labeling scheme do not use the prefix "B-" when an entity token follows an "O-" prefix. In this case, an entity token that is the first token in a list or follows a non-entity token implies that the entity token is the first token of an entity. That is, if Entity(i) has prefix "I-" and i is equal to 1 or Entity(i-1) has prefix "O-", then Token(i) is a single token entity or the first token of a multi-token entity.

    Alternative Functionality

    To add entity details to documents using a custom NER model, use addDependencyDetails and set the Model option to the custom model.

    Version History

    Introduced in R2023a