Contenuto principale

optimalDOE

D-optimal design of experiments (DOE)

Since R2024b

    Description

    An optimalDOE object contains a D-optimal design for an experiment, and additional information about the design, factors, model, and algorithm used to create the design. The design runs in a D-optimal design minimize the covariance of the model coefficient estimates. Use a D-optimal design when you have a limited number of experimental runs, or factor constraints that are not suitable for full factorial or mixture designs.

    Creation

    Description

    dopt = optimalDOE(n,nruns) generates a D-optimal design with nruns runs and n factors for a linear experiment model, and returns the design in an optimalDOE design object dopt.

    example

    dopt = optimalDOE(bounds,nruns) specifies the factor bounds for the design runs.

    example

    dopt = optimalDOE(levels1,levels2,...,levelsN,nruns) specifies the number and levels for the factors in the design.

    example

    dopt = optimalDOE(candset,nruns) specifies a candidate set of design runs from which optimalDOE generates runs.

    example

    dopt = optimalDOE(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify fixed factors and the experiment model.

    example

    Input Arguments

    expand all

    Number of factors in the design, specified as a positive integer.

    If you do not also specify the NumLevelsPerFactor name-value argument when you pass n to optimalDOE, each factor has two levels. The default range for each factor is [-1,1].

    Data Types: single | double

    Number of design runs, specified as a positive integer.

    Example: 100

    Data Types: single | double

    Factor bounds, specified as a 2-by-n matrix, where n is the number of factors in the design. Each column of bounds corresponds to a factor. The first row of bounds contains the lower bounds for the factors, and the second row contains the upper bounds.

    This argument sets the Levels property.

    Example: [0 0.1 10; 5 0.7 50]

    Data Types: single | double

    Factor levels, specified as a numeric, logical, or categorical vector, or a cell array. levels1,...,levelsN must contain levels for each factor in the design.

    This argument sets the Levels property.

    Example: ["cohorta","cohortb"],[0,0.25,0.5,0.75],["drug1","drug2","drug3"]

    Data Types: single | double | logical | char | string | cell | categorical

    Candidate set for the design runs, specified as a numeric matrix or a table.

    This argument sets the CandidateSet property.

    Data Types: single | double | table

    Name-Value Arguments

    expand all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: optimalDOE(4,100,FixedFactors=[ones(50,1);zeros(50,1)],ModelSpecification="scheffe-quad") specifies a fixed factor and the quadratic Scheffe model for a design with four factors and 100 design runs.

    Flag to avoid generating duplicate design runs, specified as a numeric or logical 1 (true) or 0 (false). If AvoidDuplicates is true and optimalDOE can calculate nonduplicate runs, the rows of dopt.Design are unique. If AvoidDuplicates is false, the function does not attempt to avoid duplicate design runs, and the rows are not unique.

    Example: AvoidDuplicates=true

    Data Types: logical

    Categorical factors, specified as one of the values in this table.

    ValueDescription
    Vector of positive integers

    Each entry in the vector is an index value indicating that the corresponding factor is categorical. The index values are between 1 and n, where n is the number of factors in the design.

    Logical vector

    A true entry means that the corresponding factor is categorical. The length of the vector is n.

    String vector or cell array of character vectorsEach element in the array is the name of a factor. The names must match the entries in FactorNames.
    "all"All factors are categorical.

    By default, optimalDOE treats all nonnumeric factors as categorical.

    This argument sets the CategoricalFactors property.

    Example: CategoricalFactors="all"

    Data Types: single | double | logical | char | string | cell

    Algorithm for generating the D-optimal design, specified as "coordinate" or "row".

    • "coordinate" — Use the coordinate-exchange algorithm to generate a D-optimal design. This value is the default for ExchangeMethod when you do not specify a candidate set for the design. You cannot specify the exchange method as "coordinate" when you specify candset.

    • "row" — Use the row-exchange algorithm to generate a D-optimal design. This value is the default for ExchangeMethod when you specify a candidate set for the design. You cannot specify the exchange method as "row" when you specify FixedFactors.

    This argument sets the ExchangeMethod property.

    For more information about the coordinate-exchange and row-exchange algorithms, see the Algorithms section of cordexch and rowexch.

    Example: ExchangeMethod="row"

    Data Types: char | string

    Validation function, specified as a function handle. The function must accept a table of design runs and return a logical vector indicating which rows of the table contain valid design runs.

    • The table input must have n variables, where n is the number of factors in the design. The names of the variables must be the names of the factors. You can specify the factor names by using FactorNames.

    • The logical vector output must contain the same number of elements as the number of rows in the table input.

    When calculating the design, optimalDOE excludes runs corresponding to true values in the vector output for the validation function.

    This argument sets the ExcludeFcn property.

    Data Types: function_handle

    Factor names, specified as a string vector or a cell array of character vectors. The number of unique values in FactorNames must equal the number of factors in the design. The default value for FactorNames is ["Factor1","Factor2",..."FactorN"].

    If you specify ModelSpecification as a character vector or string scalar formula in Wilkinson Notation, then FactorNames must contain only valid variable names.

    If you pass levels for a factor using variable names in the input argument levels1,levels2,...,levelsN and do not specify FactorNames, then optimalDOE assigns the workspace variable name to the corresponding factor.

    Example: FactorNames=["compound","quantity"]

    Data Types: char | string | cell

    Fixed factor values, specified as a numeric matrix or a table.

    Fixed factors are held constant while the function varies other factors, which can be useful when you create a blocked design. A blocked design orders design runs by the values of a factor.

    optimalDOE uses all factors, including fixed factors, to calculate design runs. The last columns of the design contain the values specified in FixedFactors. FixedFactors must have nruns rows.

    This argument sets the FixedFactors property.

    Example: FixedFactors=[zeros(100,1);ones(100,1)]

    Data Types: single | double | table

    Initial design to use for the coordinate-exchange or row-exchange algorithm, specified as an nruns-by-n numeric matrix or a table. You can specify which algorithm to use by setting the ExchangeMethod name-value argument. If any of the factors are nonnumeric, InitialDesign must be a table.

    Data Types: single | double | table

    Maximum number of iterations for the algorithm that generates the design runs, specified as a positive integer. To specify the algorithm, use the ExchangeMethod name-value argument.

    Example: IterationLimit=20

    Data Types: single | double

    Experiment model, specified as one of the following values.

    • A character vector or string scalar with the model name.

      ValueModel Description
      "linear" (Default)The model contains an intercept and linear term for each factor.
      "constant"The model contains only a constant (intercept) term.
      "interactions"The model contains an intercept, a linear term for each factor, and all products of pairs of distinct factors (no squared terms).
      "purequadratic"The model contains an intercept term, and linear and squared terms for each factor.
      "quadratic"The model contains an intercept term, linear and squared terms for each factor, and all products of pairs of distinct factors.
      "scheffe-linear"

      The model contains a linear term for each factor and does not include an intercept term.

      "scheffe-quad"

      The model is given by the formula:

      i=1nbixi+i=1nj<in1bijxixj

      "scheffe-special-cubic"

      The model is given by the formula:

      i=1nbixi+i=1nj<in1bijxixj+i=1nj<in1k<jn2bijkxixjxk

      "polyijk"The model is a polynomial with all terms up to degree i in the first factor, degree j in the second factor, and so on. Specify the maximum degree for each factor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, "poly13" has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second factors, respectively.

      In the above table, each xi corresponds to the ith factor in the D-optimal design, and bi, bij, bijk, and dij are coefficients for the model terms.

    • A character vector or string scalar formula in Wilkinson notation. The factor names in the formula must be valid variable names specified by FactorNames.

    • A t-by-n terms matrix, where t is the number of terms and n is the number of factors in the design. A terms matrix is convenient when the number of factors is large and you want to generate the terms programmatically. For more information about terms matrices, see Terms Matrix

    ModelSpecification does not include the response variable. optimalDOE generates a design that minimizes the covariance between the estimated coefficients for ModelSpecification.

    This argument sets the ModelSpecification property.

    Example: ModelSpecification="quadratic"

    Data Types: single | double | char | string

    Number of levels for each factor, specified as a vector of positive integers. NumLevelsPerFactor must have an element for each factor in the design.

    Note

    If you specify AvoidDuplicates=true, the software adds additional levels for any noncategorical factors as needed to avoid duplicate rows in the design.

    This argument sets the Levels property.

    Example: NumLevelsPerFactor=[3,4,10]

    Data Types: single | double

    Maximum number of start runs for generating the design, specified as a positive integer. If NumTries > 1, then optimalDOE generates NumTries designs from different start runs. The function returns the design with the least amount of covariance between the coefficient estimates for the experiment model.

    Example: NumTries=5

    Data Types: single | double

    Options for controlling the iterative algorithm to minimize the fitting criteria, specified as a structure array returned by statset. Supported fields of the structure array specify options for controlling the iterative algorithm.

    This table summarizes the supported fields, which require Parallel Computing Toolbox™.

    FieldDescription
    Streams

    A RandStream object or cell array of such objects. If you do not specify Streams, optimalDOE uses the default stream or streams. If you specify Streams, use a single object except when all of the following apply:

    • You have an open parallel pool.

    • UseParallel is true.

    • UseSubstreams is false.

    In this case, use a cell array the same size as the parallel pool. If a parallel pool is not open, then Streams must supply a single random number stream.

    UseParallel
    • If true and NumTries > 1, then optimalDOE generates the design runs in parallel.

    • If Parallel Computing Toolbox is not installed, then computation occurs in serial mode. The default is false, indicating serial computation.

    UseSubstreamsSet to true to compute in a reproducible fashion. The default is false. To compute reproducibly, set Streams to a type allowing substreams: "mlfg6331_64" or "mrg32k3a".

    To ensure more predictable results, use parpool (Parallel Computing Toolbox) and explicitly create a parallel pool before calling optimalDOE with Options=statset(UseParallel=1).

    Example: Options=statset(UseParallel=1)

    Data Types: struct

    Properties

    expand all

    This property is read-only.

    Candidate set for the design runs, represented as a table. If ExchangeMethod is "row" and you do not specify a candidate set, optimalDOE automatically generates a candidate set.

    Data Types: table

    This property is read-only.

    Categorical factors, represented as a vector of indices indicating which factors are categorical.

    Data Types: double

    This property is read-only.

    Generated design runs, represented as a table. Each column of Design corresponds to a factor in the design, and each row corresponds to a run.

    This property is read-only.

    Algorithm for generating the design, represented as "coordinate" or "row".

    Data Types: string

    This property is read-only.

    Validation function, represented as a function handle.

    Data Types: function_handle

    This property is read-only.

    Fixed factor values, represented as a vector of indices indicating which factors are fixed.

    Data Types: single | double

    This property is read-only.

    Factor levels, represented as a cell array with one element per factor. The software uses the value of bounds or levels1,levels2,...,levelsN to set Levels. Otherwise the software sets the elements of Levels to have n equally-spaced levels in the range [-1 1], where n is determined as follows:

    • If you do not specify ModelSpecification or NumLevelsPerFactor, then n equals 2.

    • If you specify NumLevelsPerFactor, then n equals NumLevelsPerFactor.

    • If you specify ModelSpecification and do not specify NumLevelsPerFactor, then n equals 1 + the maximum order of the ModelSpecification model.

    Data Types: cell

    This property is read-only.

    Experiment model, represented as a formula in Wilkinson notation. ModelSpecification indicates the model you want to fit with the specified design. ModelSpecification does not include the response variable.

    Data Types: string

    This property is read-only.

    Optimal value for the determinant D = |XTX|, where X is the design matrix, represented as a numeric scalar. For more information, see the Algorithms section of cordexch.

    Data Types: single | double

    Object Functions

    fitlmFit linear regression model using design runs
    addrunsAdd runs to D-optimal design

    Examples

    collapse all

    Generate a D-optimal design with 10 runs and four factors.

    dopt = optimalDOE(4,10)
    dopt = 
      optimalDOE with properties:
    
                    Design: [10×4 table]
        ModelSpecification: "1 + Factor1 + Factor2 + Factor3 + Factor4"
           OptimalityValue: 8.6016e+04
                    Levels: {[-1 1]  [-1 1]  [-1 1]  [-1 1]}
        CategoricalFactors: []
              FixedFactors: []
            ExchangeMethod: "coordinate"
    
    

    dopt is an optimalDOE object that contains information about the generated D-optimal design. The output includes the size of the table containing the design runs, model for the design, factor levels, and method used to generate the design runs. By default, the levels for each factor are -1 and 1. The output also displays the optimal value for the determinant |XTX| where X is the design matrix.

    Display the design table.

    dopt.Design
    ans=10×4 table
        Factor1    Factor2    Factor3    Factor4
        _______    _______    _______    _______
    
           1         -1         -1         -1   
          -1          1         -1          1   
          -1          1          1         -1   
           1         -1          1          1   
          -1         -1         -1         -1   
          -1         -1          1         -1   
          -1         -1         -1          1   
           1          1         -1         -1   
           1          1          1          1   
           1          1         -1         -1   
    
    

    The design table displays the values for the 10 runs in the optimal design.

    Generate a D-optimal design and specify the factor bounds for the design runs.

    bounds = [10 20 30; 20 30 40];
    dopt = optimalDOE(bounds,5)
    dopt = 
      optimalDOE with properties:
    
                    Design: [5×3 table]
        ModelSpecification: "1 + Factor1 + Factor2 + Factor3"
           OptimalityValue: 8.0000e+06
                    Levels: {[10 20]  [20 30]  [30 40]}
        CategoricalFactors: []
              FixedFactors: []
            ExchangeMethod: "coordinate"
    
    

    dopt is an optimalDOE object that contains information about the generated D-optimal design. By default, the levels for the factors are the same as the specified bounds.

    Generate some response data for the design runs.

    runs = dopt.Design;
    h = height(runs);
    response = 2*runs.Factor1+3*runs.Factor2+runs.Factor3+0.01*randn(h,1);

    Fit a linear model using the design runs in dopt as the predictor data and response as the response data.

    mdl = fitlm(dopt,response)
    mdl = 
    Linear regression model:
        y ~ 1 + Factor1 + Factor2 + Factor3
    
    Estimated Coefficients:
                        Estimate         SE         tStat        pValue  
                       __________    __________    ________    __________
    
        (Intercept)    -0.0085086      0.046507    -0.18295        0.8848
        Factor1            1.9998    0.00095215      2100.3    0.00030311
        Factor2            3.0006    0.00095215      3151.4    0.00020201
        Factor3            1.0001    0.00095215      1050.3    0.00060612
    
    
    Number of observations: 5, Error degrees of freedom: 1
    Root Mean Squared Error: 0.0102
    R-squared: 1,  Adjusted R-Squared: 1
    F-statistic vs. constant model: 5.53e+06, p-value = 0.000312
    

    mdl is a LinearModel object that contains the results of fitting a linear model to the data. The model display includes the model formula, estimated coefficients, and model summary statistics.

    Generate data for patient weight using the randi function. Create variables containing levels for patient age and smoking status.

    weight = randi([120 200], 50, 1);
    age = [20 30 40 50];
    smoker = ["Y", "N"];

    Generate a D-optimal design with 20 runs, using the unique values in age, weight, and smoker as the factor levels.

    dopt = optimalDOE(age,weight,smoker,20)
    dopt = 
      optimalDOE with properties:
    
                    Design: [20×3 table]
        ModelSpecification: "1 + age + weight + smoker"
           OptimalityValue: 1.2996e+10
                    Levels: {[20 30 40 50]  [122 123 127 130 131 132 133 135 142 145 150 151 154 155 156 159 164 171 172 173 174 176 177 180 181 182 184 185 186 188 193 194 195 196 197 198]  ["N"    "Y"]}
        CategoricalFactors: 3
              FixedFactors: []
            ExchangeMethod: "coordinate"
    
    

    dopt is an optimalDOE object that contains information about the generated D-optimal design.

    Display the design table.

    dopt.Design
    ans=20×3 table
        age    weight    smoker
        ___    ______    ______
    
        20      122       "N"  
        20      198       "N"  
        50      198       "Y"  
        20      122       "Y"  
        20      198       "Y"  
        50      122       "N"  
        50      122       "Y"  
        20      122       "N"  
        50      198       "N"  
        20      198       "N"  
        50      122       "N"  
        20      122       "Y"  
        50      122       "N"  
        50      198       "Y"  
        50      198       "N"  
        50      122       "Y"  
          ⋮
    
    

    The design table displays the values for the 20 runs in the D-optimal design.

    Generate a candidate set for the D-optimal design by using the combinations function. Use the categorical and randi functions to create the factor values.

    compound = categorical(["compound1","compound2","compound3"]);
    age = 17 + randi(83,1,10);
    
    candset = combinations(compound,age)
    candset=30×2 table
        compound     age
        _________    ___
    
        compound1    85 
        compound1    93 
        compound1    28 
        compound1    93 
        compound1    70 
        compound1    26 
        compound1    41 
        compound1    63 
        compound1    97 
        compound1    98 
        compound2    85 
        compound2    93 
        compound2    28 
        compound2    93 
        compound2    70 
        compound2    26 
          ⋮
    
    

    candset is a table that contains every possible combination of the values in compound and age.

    Generate a D-optimal design with 15 runs using candset as the candidate set.

    dopt = optimalDOE(candset,15)
    dopt = 
      optimalDOE with properties:
    
                    Design: [15×2 table]
        ModelSpecification: "1 + compound + age"
           OptimalityValue: 2.3328e+06
                    Levels: {[compound1    compound2    compound3]  [26 28 41 63 70 85 93 97 98]}
        CategoricalFactors: 1
              FixedFactors: []
            ExchangeMethod: "row"
              CandidateSet: [30×2 table]
    
    

    dopt is an optimalDOE object that contains information about the generated D-optimal design. The levels for the design factors are the same as the unique values in the columns of candset.

    Determine if the set of design runs is a subset of the candidate set by using the ismember and all functions.

    idx = ismember(dopt.Design,candset,"rows");
    all(idx)
    ans = logical
       1
    
    

    The output shows that dopt.Design is a subset of candset.

    Generate levels for three factors by using the categorical, randi, ones, and zeros functions.

    compound = categorical(["compound1","compound2","compound3"]);
    age = 17 +randi(83,1,10);
    smoker = [ones(10,1);zeros(5,1)];

    Generate a D-optimal design with 15 runs using compound and age as factors and smoker as a fixed factor. Specify the model for the design, and avoid calculating duplicate runs.

    dopt = optimalDOE(compound,age,15,FixedFactors=smoker,AvoidDuplicates=true,ModelSpecification="scheffe-linear")
    dopt = 
      optimalDOE with properties:
    
                    Design: [15×3 table]
        ModelSpecification: "compound + age + Factor3"
           OptimalityValue: 7.5620e+06
                    Levels: {[compound1    compound2    compound3]  [26 28 41 93 97 98]  [0 1]}
        CategoricalFactors: 1
              FixedFactors: 3
            ExchangeMethod: "coordinate"
    
    

    dopt is an optimalDOE object that contains information about the generated D-optimal design. The design table contains 15 runs.

    Determine if the design runs are unique by using the unique function.

    upts = unique(dopt.Design)
    upts=15×3 table
        compound     age    Factor3
        _________    ___    _______
    
        compound1    26        1   
        compound1    28        1   
        compound1    41        1   
        compound1    97        0   
        compound1    98        0   
        compound2    26        1   
        compound2    28        1   
        compound2    41        1   
        compound2    97        0   
        compound2    98        0   
        compound2    98        1   
        compound3    93        1   
        compound3    97        1   
        compound3    98        0   
        compound3    98        1   
    
    

    upts is a 15x3 table containing the unique runs in dopt.Design. This result indicates that the design runs for dopt are unique.

    More About

    expand all

    Version History

    Introduced in R2024b