embed

Map document to embedding vector

Since R2024a

collapse all in page

Syntax

M = embed(emb,documents)

M = embed(emb,documents,Name=Value)

Description

M = embed(emb,documents) returns the embedding vectors of documents in the embedding emb.

example

M = embed(emb,documents,Name=Value) returns the embedding vectors with additional options specified by one or more name-value arguments.

Examples

collapse all

Map Documents to Vectors

Open Live Script

Load the pretrained document embedding all-MiniLM-L6-v2 using the documentEmbedding function. This model requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package. If this support package is not installed, then the function provides a download link.

emb = documentEmbedding;

Create an array of input documents.

documents = [
    "the quick brown fox jumped over the lazy dog"
    "the fast brown fox jumped over the lazy dog"
    "the lazy dog sat there and did nothing"];

Map the input documents to vectors using the embed function.

embeddedDocuments = embed(emb,documents);

To estimate how similar the documents are, compute the pairwise cosine similarities using cosineSimilarity.

similarities = cosineSimilarity(embeddedDocuments)

similarities = 3×3

    1.0000    0.9840    0.5505
    0.9840    1.0000    0.5524
    0.5505    0.5524    1.0000

Input Arguments

collapse all

`emb` — Input document embedding
`documentEmbedding` object

Input document embedding, specified as a documentEmbedding object.

`documents` — Input documents
`tokenizedDocument` array | string array | cell array of character vectors

Input documents, specified as a tokenizedDocument array, a string array of documents, or a cell array of character vectors. If documents is a string array, then each string represents a document. If documents is a cell array of character vectors, then each character vector represents a document.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: embed(emb,documents,MiniBatchSize=64) embeds the specified documents using mini-batches of size 64.

`MiniBatchSize` — Mini-batch size
`32` (default) | positive integer

Mini-batch size to use for embedding, specified as a positive integer. Larger mini-batch sizes require more memory, but can lead to faster results.

`Acceleration` — Performance optimization
`"auto"` (default) | `"mex"` | `"none"`

Performance optimization, specified as one of these values:

"auto" — Automatically apply a number of optimizations that are suitable for the input network and hardware resources.
"mex" — Compile and execute a MEX function. This option is available only when you use a GPU. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information about supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If Parallel Computing Toolbox or a suitable GPU is not available, then the software returns an error.
"none" — Disable all acceleration.

When you use the "auto" or "mex" option, the software can offer performance benefits at the expense of an increased initial run time. Subsequent calls to the function are typically faster. Use performance optimization when you call the function multiple times using different input data.

When Acceleration is "mex", the software generates and executes a MEX function based on the model and parameters you specify in the function call. A single model can have several associated MEX functions at one time. Clearing the model variable also clears any MEX functions associated with that model.

When Acceleration is "auto", the software does not generate a MEX function.

The "mex" option is available only when you use a GPU. You must have a C/C++ compiler installed and the GPU Coder™ Interface for Deep Learning support package. Install the support package using the Add-On Explorer in MATLAB^®. For setup instructions, see Set Up Compiler (GPU Coder). GPU Coder is not required.

MATLAB Compiler™ software does not support compiling models when you use the "mex" option.

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

Hardware resource, specified as one of these values:

"auto" — Use a GPU if one is available. Otherwise, use the CPU.
"gpu" — Use the GPU. Using a GPU requires a Parallel Computing Toolbox license and a supported GPU device. For information about supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If Parallel Computing Toolbox or a suitable GPU is not available, then the software returns an error.
"cpu" — Use the CPU.

Output Arguments

collapse all

`M` — Document embedding vectors
matrix

Document embedding vectors, returned as an N1-by-N2 matrix, where M(i,:) is the embedding vector for the ith document in documents.

Version History

Introduced in R2024a

embed

Syntax

Description

Examples

Map Documents to Vectors

Input Arguments

`emb` — Input document embedding
`documentEmbedding` object

`documents` — Input documents
`tokenizedDocument` array | string array | cell array of character vectors

Name-Value Arguments

`MiniBatchSize` — Mini-batch size
`32` (default) | positive integer

`Acceleration` — Performance optimization
`"auto"` (default) | `"mex"` | `"none"`

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

Output Arguments

`M` — Document embedding vectors
matrix

Version History

See Also

Topics

embed

Syntax

Description

Examples

Map Documents to Vectors

Input Arguments

emb — Input document embedding documentEmbedding object

documents — Input documents tokenizedDocument array | string array | cell array of character vectors

Name-Value Arguments

MiniBatchSize — Mini-batch size 32 (default) | positive integer

Acceleration — Performance optimization "auto" (default) | "mex" | "none"

ExecutionEnvironment — Hardware resource "auto" (default) | "gpu" | "cpu"

Output Arguments

M — Document embedding vectors matrix

Version History

See Also

Topics

`emb` — Input document embedding
`documentEmbedding` object

`documents` — Input documents
`tokenizedDocument` array | string array | cell array of character vectors

`MiniBatchSize` — Mini-batch size
`32` (default) | positive integer

`Acceleration` — Performance optimization
`"auto"` (default) | `"mex"` | `"none"`

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`M` — Document embedding vectors
matrix