Contenuto principale

outlierRemoverComponent

Pipeline component for removing outlier values

Since R2026a

    Description

    outlierRemoverComponent is a pipeline component that removes outliers. The pipeline component uses the functionality of the rmoutliers function during the learn phase to identify and remove outlier values for a set of observations. During the run phase, the component uses the values learned during the learn phase to remove outlier values in a new data set.

    Creation

    Description

    component = outlierRemoverComponent creates a pipeline component for removing outlier values.

    example

    component = outlierRemoverComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the outlier detection method by using the Method name-value argument.

    Properties

    expand all

    Structural Parameters

    The software sets structural parameters when you create the component. You cannot modify structural parameters after the component is created.

    This property is read-only after the component is created.

    Number of data flow tags to include in the component, specified as a positive integer scalar. NumDataFlow determines the number of nonzero elements in InputTags and OutputTags. For example, if NumDataFlow=3, then InputTags=[1 2 3] and OutputTags=[1 2 3 0]. The 0 output tag corresponds to the logical output argument that indicates which observations have outlier values.

    Example: c = outlierRemoverComponent(NumDataFlow=1)

    Data Types: single | double

    This property is read-only after the component is created.

    Index of the data argument passed to learn that is used to detect outliers, specified as a positive integer scalar. For example, if ReferenceInput=3, then the software finds outliers in the third data argument.

    Example: c = outlierRemoverComponent(ReferenceInput=2)

    Data Types: single | double

    Learn Parameters

    The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

    Outlier detection method, specified as one of the following values.

    ValueDescription
    "gesd"For each variable, find outliers by using the generalized extreme Studentized deviate test for outliers. Use ThresholdFactor to specify the alpha value for the test.
    "grubbs"For each variable, find outliers by using Grubbs’ test, which removes one outlier per iteration based on hypothesis testing. Use ThresholdFactor to specify the alpha value for the test.
    "mean"For each variable, outliers are values more than a certain number of standard deviations from the mean. Use ThresholdFactor to specify the number of standard deviations.
    "median"For each variable, outliers are values more than a certain number of scaled median absolute deviations (MAD) from the median. Use ThresholdFactor to specify the number of scaled MAD.
    "percentiles"For each variable, outliers are values below the lower threshold or above the upper threshold, as specified by Threshold.
    "quartiles"For each variable, outliers are values more than a certain number of interquartile ranges below the lower quartile (25 percent) or above the upper quartile (75 percent). Use ThresholdFactor to specify the number of interquartile ranges.

    For more information, see method.

    Example: c = outlierRemoverComponent(Method="mean")

    Example: c.Method = "quartiles"

    Data Types: char | string

    Outlier detection threshold factor, specified as a nonnegative scalar.

    • When Method is "median", the outlier detection threshold factor is the number of scaled MAD, which is 3 by default.

    • When Method is "mean", the outlier detection threshold factor is the number of standard deviations from the mean, which is 3 by default.

    • When Method is "grubbs" or "gesd", the outlier detection threshold factor is a scalar in the interval (0,1), which represents the alpha value of the hypothesis test. Values close to 0 result in a smaller number of outliers, and values close to 1 result in a larger number of outliers. The default value is 0.05.

    • When Method is "quartiles", the outlier detection threshold factor is the number of interquartile ranges, which is 1.5 by default.

    You cannot specify ThresholdFactor when the outlier detection method is "percentiles".

    Example: c = outlierRemoverComponent(ThresholdFactor=2.5)

    Example: c.ThresholdFactor = 0.01

    Data Types: single | double

    Lower and upper percentile thresholds, specified as a nonnegative vector with two elements in the interval [0,100]. The first element indicates the lower percentile threshold, and the second element indicates the upper percentile threshold. The first element must be less than the second element.

    You must specify Threshold when the outlier detection method (Method) is "percentiles". You cannot specify Threshold for any other outlier detection method.

    Example: c = outlierRemoverComponent(Threshold=[10 90])

    Example: c.Threshold=[5 95]

    Data Types: single | double

    Maximum number of outliers to remove, specified as a positive integer scalar.

    If you do not specify the MaxNumOutliers value, the software uses the integer nearest to 10 percent of n, where n is the number of observations in the data arguments of learn.

    You can specify MaxNumOutliers only when the outlier detection method (Method) is "gesd".

    Example: c = outlierRemoverComponent(MaxNumOutliers=20)

    Example: c.MaxNumOutliers = 5

    Data Types: single | double

    Run Parameters

    The software sets run parameters when you create the component. You can modify the run parameters at any time. Any unset run parameters use the corresponding default values.

    Flag for removing outliers during the run phase, specified as 0 (false) or 1 (true). If you set RunRemoval to true, then the software removes observations with outlier values when you use the run function. If RunRemoval is set to false, the software does not remove any observations from the data arguments passed to run.

    Example: c = outlierRemoverComponent(RunRemoval=true)

    Example: c.RunRemoval = false

    Data Types: logical

    Component Properties

    The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

    Component identifier, specified as a character vector or string scalar.

    Example: c = outlierRemoverComponent(Name="OutlierRemoval")

    Example: c.Name = "Removal"

    Data Types: char | string

    Names of the input ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = outlierRemoverComponent(Inputs=["X","Y"])

    Example: c.Inputs = ["X1","Y1"]

    Data Types: char | string | cell

    Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = outlierRemoverComponent(Outputs=["newX","newY","indices"])

    Example: c.Outputs = ["X1","X2","Idx"]

    Data Types: char | string | cell

    Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, the number of tags must match the number of inputs in Inputs.

    Example: c = outlierRemoverComponent(InputTags=[1 0])

    Example: c.InputTags = [1 2]

    Data Types: single | double

    Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, the number of tags must match the number of outputs in Outputs.

    Example: c = outlierRemoverComponent(OutputTags=[1 0 0])

    Example: c.OutputTags = [1 2 0]

    Data Types: single | double

    This property is read-only.

    Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

    Data Types: logical

    This property is read-only.

    Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component, and the Learnables are nonempty.

    Data Types: logical

    Learnables

    The software sets learnables when you use the learn object function. You cannot modify learnables directly.

    This property is read-only.

    Lower threshold for identifying outliers, returned as a table. Each value corresponds to a variable in VariablesWithOutliers.

    This property is read-only.

    Upper threshold for identifying outliers, returned as a table. Each value corresponds to a variable in VariablesWithOutliers.

    This property is read-only.

    Center value for identifying outliers, returned as a table. Each value corresponds to a variable in VariableWithOutliers.

    This property is read-only.

    Names of the variables used by the component to derive the LowerThreshold, UpperThreshold, and Center values. By default, the variables correspond to columns in the first data argument of learn. You can use ReferenceInput to specify which data argument to use.

    Object Functions

    learnInitialize and evaluate pipeline or component
    runExecute pipeline or component for inference after learning
    resetReset pipeline or component
    seriesConnect components in series to create pipeline
    parallelConnect components or pipelines in parallel to create pipeline
    viewView diagram of pipeline inputs, outputs, components, and connections

    Examples

    collapse all

    Create a pipeline component that removes outlier values in observations.

    component = outlierRemoverComponent
    component = 
    
      outlierRemoverComponent with properties:
    
                         Name: "OutlierRemover"
                       Inputs: ["DataIn1"    "DataIn2"]
                    InputTags: [1 2]
                      Outputs: [1×3 string]
                   OutputTags: [1 2 0]
    
       
    Learnables (HasLearned = false)
               LowerThreshold: []
               UpperThreshold: []
                       Center: []
        VariablesWithOutliers: []
    
       
    Structural Parameters (locked)
                  NumDataFlow: 2
               ReferenceInput: 1
    
       
    Run Parameters (unlocked)
                   RunRemoval: 0
    
    
    Show all parameters

    component is a outlierRemoverComponent object that contains four learnables: LowerThreshold, UpperThreshold, Center, and VariablesWithOutliers. The properties remain empty until you pass data to the component during the learn phase.

    Load the carbig data set. Create a table containing the predictor variables Acceleration, Displacement, and Horsepower, and create another table containing the response variable MPG.

    load carbig
    cars = table(Acceleration,Displacement,Horsepower);
    y = table(MPG);

    Use the learn object function to remove observations with outlier values in cars. The software removes the observations from both cars and y.

    [component,newcars,newy] = learn(component,cars,y);
    component
    component = 
    
      outlierRemoverComponent with properties:
    
                         Name: "OutlierRemover"
                       Inputs: ["DataIn1"    "DataIn2"]
                    InputTags: [1 2]
                      Outputs: ["DataOut1"    "DataOut2"    "IsOutlier"]
                   OutputTags: [1 2 0]
    
       
    Learnables (HasLearned = true)
               LowerThreshold: [1×3 table]
               UpperThreshold: [1×3 table]
                       Center: [1×3 table]
        VariablesWithOutliers: ["Acceleration"    "Displacement"    "Horsepower"]
    
       
    Structural Parameters (locked)
                  NumDataFlow: 2
               ReferenceInput: 1
    
       
    Run Parameters (unlocked)
                   RunRemoval: 0
    
    
    Show all parameters

    The LowerThreshold, UpperThreshold, Center, and VariablesWithOutliers properties are nonempty, and the HasLearned property is set to true.

    Notice that newcars and newy have fewer observations than cars and y.

    newNumObservations = size([newcars newy],1)
    originalNumObservations = size([cars y],1)
    newNumObservations =
    
       385
    
    
    originalNumObservations =
    
       406

    Version History

    Introduced in R2026a

    See Also