Contenuto principale

distanceProfile

Compute distance profile of the distances between a query subsequence and all other subsequences of a time series

Since R2024b

Description

Return Distance Profile

DP = distanceProfile(X,len,loc) returns the distance profile of a specified query subsequence within the time series X. The distance profile is a vector or matrix of z-normalized Euclidean distances between the query and every subsequence in X with the same length len. A small distance indicates that the query and the subsequence are a close match. A large distance indicates that the query and the subsequence are very different.

  • If X is a vector, then distanceProfile treats it as a single channel

  • If X is a matrix, then you can specify whether distanceProfile computes the distance profile for each time series channel individually or cumulatively by specifying Type. For more information, see Type.

The query begins at the time series position loc. The query subsequence is defined by whether X is a vector or a matrix:

  • Vector — X(loc:loc+len-1)

  • Matrix with k columns — X(loc:loc+len-1,k).

example

[DP,DPI] = distanceProfile(___) also returns the vector DPI, which contains the starting indices of the subsequences that best match the query subsequence.

example

[___] = distanceProfile(___,Name=Value) specifies additional options using one or more name-value arguments. For example, to exclude matches that are near the query starting position in the time series, set ExcludeTrivialMatches to true.

Plot Distance Profile

distanceProfile(___) creates an interactive plot of the distance profile, with overlays for the query, the motif (the best match to the query), and the discord (the worst match to the query). You can move the vertical selection lines in the plot to find the top motif and discord of any other subsequences in the time series.

You can use this syntax with any of the previous input argument combinations.

example

Examples

collapse all

Load the data, which consists of T1, a timetable containing armature current measurements on a degrading DC motor.

load matrix_profile_data T1

T1 contains a known anomalous segment with length 100, starting at location 9797. Use this segment as the query subsequence.

X = T1.MotorCurrent;
len = 100;
loc = 9797;

Calculate the distance profile.

[DP,DPI] = distanceProfile(X,len,loc);

Display the first two elements of the index vector DPI and the corresponding distances in DP.

DPI(1:2)
ans = 2×1

        2617
        9368

DP(DPI(1)),DP(DPI(2))
ans = 
8.3894
ans = 
8.4532

For comparison, display the value of the largest distance.

max(DP)
ans = 
18.4828

Plot the distance profile.

distanceProfile(X,len,loc);

Figure contains 3 axes objects. Axes object 1 with title Time Series, xlabel Time, ylabel Data contains 5 objects of type line, constantline. These objects represent Data (Channel=1), Query (i=9797), Motif (i=2617), Discord (i=567). Axes object 2 with title Distance Profile, xlabel Time, ylabel Distance contains 3 objects of type line, constantline, patch. These objects represent Distance (Channel=1), Exclusion Zone. Axes object 3 with title Subsequences, xlabel Time, ylabel Data contains 3 objects of type line. These objects represent Query (i=9797), Motif (i=2617), Discord (i=567).

The function generates three plots.

  • The top plot shows the time series data. The query appears at location 9797. The motif, or best match to the query, occurs at location 2617. The discord, or worst match to the query, occurs at location 567.

  • The middle plot shows the distance profile of the time series, with an exclusion zone around the query location.

  • The bottom plot shows an isolated comparison of the query, motif, and discord subsequences together. The query and motif subsequences are hard to distinguish from each other. The discord subsequence is significantly different.

Move the vertical selection lines to find the top motif and discord of any other data segments in the time series.

The distanceProfile plot displays only the top match. To view more matches, you can extract, plot, and compare subsegments using the values in X and DPI.

Input Arguments

collapse all

Time series to evaluate, specified as a numeric vector of length n or a numeric matrix containing multiple columns of length n. X must not have any missing data.

Length of the query subsequence, specified as an integer. len must be less than the length n of the time series.

Starting position of the query subsequence, specified as an integer. loc must be less than the length n of the time series.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: DP = distanceProfile(X,10,20,ExcludeTrivialMatch=true) excludes subsequence matches near the query subsequence starting position of 20.

Option to set exclusion zone around the starting position loc of the query sequence, specified as true or false. Setting this option to true excludes matches of the query subsequence with itself.

Length of exclusion zone on either side of the query starting position loc, specified as the number of data points to exclude. Setting this parameter when ExcludeTrivialMatch is true results in the setting of values of DP to NaN within the exclusion zone.

Method for handling query windows near the endpoints of x, specified as one of these options:

  • "discard" — Truncate the length of the output vectors DP and DPI to nlen + 1, where n is the length of X.

  • "fill" — Extend the length of DP and DPI to n by padding DP with len – 1 NaNs. The software sets the last len – 1 elements of the vector DPI to the sequence n-len+2:n.

Computation options when X is a matrix, specified as one of the following approaches:

  • "individual" — Compute the distance profile of each channel separately.

  • "cumulative" — Combine the distance profiles of each channel using the cumulative average of sorted distance profile values.

Output Arguments

collapse all

Distance profile containing the z-normalized distances between a query sequence of time series X and each subsequence of the time series of the same length len, returned as a numeric vector.

  • The length of DP is equal to n or nlen + 1, depending on the setting for EndPoints. Here, n is the length of X.

When ExcludeTrivialMatch is true, elements of DP near the query starting location loc are set to NaN, with the number of elements determined by the value of ExclusionZoneLength.

Starting indices for subsequences X(DPI(k):DPI(k)+len-1) of X that best match the query subsequence of X(loc:loc+len-1), returned as an integer vector.

The elements of DPI sort the elements of DP(DPI) in ascending order of distances, that is, from the best match (smallest distance) to the worst match (largest distance). The best match, therefore, has the starting location of DP(DPI(1)), and the worst match has the starting location of DP(DPI(n-len+1).

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

Extended Capabilities

expand all

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

Version History

Introduced in R2024b