The inside, outside (IO) labeling scheme tags entities with
"O"
or prefixes the entities with "I"
. The tag
"O"
(outside) denotes non-entities. For each token in an entity, the
tag is prefixed with "I-"
(inside), which denotes that the token is part
of an entity.
A limitation of the IO labeling scheme is that it does not specify entity boundaries between
adjacent entities of the same type. The inside, outside, beginning
(IOB) labeling scheme, also known as the beginning, inside, outside
(BIO) labeling scheme, addresses this limitation by introducing a "beginning" prefix.
There are two variants of the IOB labeling scheme: IOB1 and IOB2.
IOB2 Labeling SchemeFor each token in an entity, the tag is prefixed with one of these values:
For a list of entity tags Entity
, the IOB labeling
scheme helps identify boundaries between adjacent entities of the same type by using
this logic:
If Entity(i)
has prefix "B-"
and
Entity(i+1)
is "O"
or has prefix
"B-"
, then Token(i)
is a single
entity.
If Entity(i)
has prefix "B-"
,
Entity(i+1)
, ..., Entity(N)
has
prefix "I-"
, and Entity(N+1)
is
"O"
or has prefix "B-"
, then the
phrase Token(i:N)
is a multi-token entity.
IOB1 Labeling SchemeThe IOB1 labeling scheme do not use the prefix "B-"
when an entity token
follows an "O-"
prefix. In this case, an entity token that is the
first token in a list or follows a non-entity token implies that the entity token is the
first token of an entity. That is, if Entity(i)
has prefix
"I-"
and i
is equal to 1 or
Entity(i-1)
has prefix "O-"
, then
Token(i)
is a single token entity or the first token of a
multi-token entity.