fitglme

Fit generalized linear mixed-effects model

Syntax

glme = fitglme(tbl,formula)

glme = fitglme(tbl,formula,Name,Value)

Description

glme = fitglme(tbl,formula) returns a generalized linear mixed-effects model, glme. The model is specified by formula and fitted to the predictor variables in the table or dataset array, tbl.

example

glme = fitglme(tbl,formula,Name,Value) returns a generalized linear mixed-effects model using additional options specified by one or more Name,Value pair arguments. For example, you can specify the distribution of the response, the link function, or the covariance pattern of the random-effects terms.

Examples

collapse all

Fit a Generalized Linear Mixed-Effects Model

Open Live Script

Load the sample data.

load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

Flag to indicate whether the batch used the new process (newprocess)
Processing time for each batch, in hours (time)
Temperature of the batch, in degrees Celsius (temp)
Categorical variable indicating the supplier of the chemical used in the batch (supplier)
Number of defects in the batch (defects)

The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution

${defects}_{i j} \sim Poisson (μ_{i j}) .$

This corresponds to the generalized linear mixed-effects model

$\log (μ_{i j}) = β_{0} + β_{1} {newprocess}_{i j} + β_{2} {time_dev}_{i j} + β_{3} {temp_dev}_{i j} + β_{4} {supplier_C}_{i j} + β_{5} {supplier_B}_{i j} + b_{i},$

where

${defects}_{i j}$ is the number of defects observed in the batch produced by factory $i$ during batch $j$ .
$μ_{i j}$ is the mean number of defects corresponding to factory $i$ (where $i = 1, 2, . . ., 20$ ) during batch $j$ (where $j = 1, 2, . . ., 5$ ).
${newprocess}_{i j}$ , ${time_dev}_{i j}$ , and ${temp_dev}_{i j}$ are the measurements for each variable that correspond to factory $i$ during batch $j$ . For example, ${newprocess}_{i j}$ indicates whether the batch produced by factory $i$ during batch $j$ used the new process.
${supplier_C}_{i j}$ and ${supplier_B}_{i j}$ are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory $i$ during batch $j$ .
$b_{i} \sim N (0, σ_{b}^{2})$ is a random-effects intercept for each factory $i$ that accounts for factory-specific variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ...
    'Distribution','Poisson','Link','log','FitMethod','Laplace', ...
    'DummyVarCoding','effects');

Display the model.

disp(glme)

Generalized linear mixed-effects model fit by ML

Model information:
    Number of observations             100
    Fixed effects coefficients           6
    Random effects coefficients         20
    Covariance parameters                1
    Distribution                    Poisson
    Link                            Log   
    FitMethod                       Laplace

Formula:
    defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)

Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    416.35    434.58    -201.17          402.35  

Fixed effects coefficients (95% CIs):
    Name                   Estimate     SE          tStat       DF    pValue        Lower        Upper    
    {'(Intercept)'}           1.4689     0.15988      9.1875    94    9.8194e-15       1.1515       1.7864
    {'newprocess' }         -0.36766     0.17755     -2.0708    94      0.041122     -0.72019    -0.015134
    {'time_dev'   }        -0.094521     0.82849    -0.11409    94       0.90941      -1.7395       1.5505
    {'temp_dev'   }         -0.28317      0.9617    -0.29444    94       0.76907      -2.1926       1.6263
    {'supplier_C' }        -0.071868    0.078024     -0.9211    94       0.35936     -0.22679     0.083051
    {'supplier_B' }         0.071072     0.07739     0.91836    94       0.36078    -0.082588      0.22473

Random effects covariance parameters:
Group: factory (20 Levels)
    Name1                  Name2                  Type           Estimate
    {'(Intercept)'}        {'(Intercept)'}        {'std'}        0.31381 

Group: Error
    Name                        Estimate
    {'sqrt(Dispersion)'}        1

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace.

Formula indicates the model specification using Wilkinson’s notation.

The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values.

The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the $t$ -statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and $p$ -value that correspond to the $t$ -statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by fitglme does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters.

Fit Generalized Mixed-Effects Model to Binary Data

Open Live Script

Load the carbig sample data set.

load carbig

The variables Acceleration, Model_Year, and Cylinders contain data for car acceleration, year of manufacture, and number of engine cylinders, respectively. The data was collected from cars built between 1970 and 1982.

Create a variable named CylinderCats that indicates whether a car has more than four cylinders. Use the table function to create a table from the data in Acceleration, Model_Year, and CylinderCats.

CylinderCats = Cylinders>4;
tbl = table(Acceleration,Model_Year,CylinderCats);

Fit a generalized mixed-effects model to the data, using CylinderCats as the response variable and Model_Year as a random effect. Specify the response data distribution as binomial.

glme = fitglme(tbl,"CylinderCats~Acceleration+(Acceleration|Model_Year)",Distribution="Binomial");

glme is a GeneralizedLinearMixedModel object that contains information about the fitted model.

Inspect the statistics for the fixed effect Acceleration by using the fixedEffects object function with the default 95% confidence level.

[~,~,statsFixed] = fixedEffects(glme)

statsFixed = 
    Fixed effect coefficients: DFMethod = 'residual', Alpha = 0.05

    Name                    Estimate    SE          tStat      DF     pValue        Lower       Upper  
    {'(Intercept)' }          4.3838      1.2374     3.5428    404    0.00044213      1.9513     6.8163
    {'Acceleration'}        -0.29673    0.077896    -3.8093    404    0.00016104    -0.44986    -0.1436

The small p-value for the Acceleration term indicates that car acceleration has a statistically significant effect on whether a car has more than four cylinders.

Inspect the statistics for the random effect Model_Year by using the randomEffects object function with the default 95% confidence level.

[~,~,statsRandom] = randomEffects(glme)

statsRandom = 
    Random effect coefficients: DFMethod = 'residual', Alpha = 0.05

    Group                 Level         Name                    Estimate    SEPred     tStat       DF     pValue      Lower        Upper   
    {'Model_Year'}        {'70'}        {'(Intercept)' }           3.041     2.1322      1.4262    404     0.15457      -1.1506      7.2326
    {'Model_Year'}        {'70'}        {'Acceleration'}        -0.16836    0.13906     -1.2107    404     0.22672     -0.44173     0.10501
    {'Model_Year'}        {'71'}        {'(Intercept)' }          3.4715     2.3452      1.4802    404     0.13959      -1.1389      8.0818
    {'Model_Year'}        {'71'}        {'Acceleration'}        -0.21721    0.15106     -1.4378    404     0.15125     -0.51418    0.079764
    {'Model_Year'}        {'72'}        {'(Intercept)' }          4.2634     2.4382      1.7486    404    0.081124     -0.52977      9.0566
    {'Model_Year'}        {'72'}        {'Acceleration'}        -0.28827    0.15892     -1.8139    404    0.070435      -0.6007    0.024149
    {'Model_Year'}        {'73'}        {'(Intercept)' }          3.7951     2.1976      1.7269    404    0.084949     -0.52512      8.1153
    {'Model_Year'}        {'73'}        {'Acceleration'}        -0.21079    0.14182     -1.4864    404     0.13796     -0.48958    0.067996
    {'Model_Year'}        {'74'}        {'(Intercept)' }        -0.77693     2.6678    -0.29123    404     0.77103      -6.0214      4.4675
    {'Model_Year'}        {'74'}        {'Acceleration'}        0.056863    0.16571     0.34314    404     0.73167      -0.2689     0.38263
    {'Model_Year'}        {'75'}        {'(Intercept)' }         -3.2681     2.1531     -1.5178    404     0.12984      -7.5008     0.96463
    {'Model_Year'}        {'75'}        {'Acceleration'}         0.24151    0.13346      1.8096    404    0.071093    -0.020847     0.50387
    {'Model_Year'}        {'76'}        {'(Intercept)' }        -0.28228     2.0922    -0.13492    404     0.89274      -4.3952      3.8306
    {'Model_Year'}        {'76'}        {'Acceleration'}        0.045966    0.13069     0.35171    404     0.72524     -0.21096     0.30289
    {'Model_Year'}        {'77'}        {'(Intercept)' }        -0.78239     2.2806    -0.34305    404     0.73174      -5.2658       3.701
    {'Model_Year'}        {'77'}        {'Acceleration'}        0.052519    0.14498     0.36226    404     0.71735     -0.23249     0.33752
    {'Model_Year'}        {'78'}        {'(Intercept)' }        -0.46307     2.2693    -0.20406    404     0.83841      -4.9242      3.9981
    {'Model_Year'}        {'78'}        {'Acceleration'}        0.050014    0.14243     0.35114    404     0.72567     -0.22999     0.33002
    {'Model_Year'}        {'79'}        {'(Intercept)' }         -2.5181     2.0134     -1.2507    404     0.21178      -6.4762        1.44
    {'Model_Year'}        {'79'}        {'Acceleration'}         0.19051     0.1257      1.5156    404      0.1304    -0.056591     0.43761
    {'Model_Year'}        {'80'}        {'(Intercept)' }         -2.6168     2.4053     -1.0879    404     0.27728      -7.3452      2.1117
    {'Model_Year'}        {'80'}        {'Acceleration'}         0.10117    0.14903     0.67883    404     0.49763     -0.19181     0.39414
    {'Model_Year'}        {'81'}        {'(Intercept)' }         -1.8396     2.4268    -0.75801    404     0.44888      -6.6103      2.9312
    {'Model_Year'}        {'81'}        {'Acceleration'}         0.08723    0.15145     0.57596    404     0.56497      -0.2105     0.38496
    {'Model_Year'}        {'82'}        {'(Intercept)' }         -2.0238     2.5531    -0.79267    404     0.42843      -7.0428      2.9953
    {'Model_Year'}        {'82'}        {'Acceleration'}        0.058853    0.15948     0.36903    404      0.7123     -0.25467     0.37237

The large p-values in the table output indicate that not enough evidence exists to conclude that any of the random effect terms have a statistically significant effect on whether a car has more than four cylinders.

Input Arguments

collapse all

`tbl` — Input data
table | dataset array

Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see Grouping Variables). The response variable must be numeric or logical. You must specify the model for the variables using formula.

`formula` — Formula for model specification
character vector or string scalar of the form `'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'`

Formula for model specification, specified as a character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. The formula is case sensitive. For a full description, see Formula.

Example: 'y ~ treatment + (1|block)'

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects' specifies the response variable distribution as Poisson, the link function as log, the fit method as Laplace, and dummy variable coding where the coefficients sum to 0.

`BinomialSize` — Number of trials for binomial distribution
1 (default) | scalar value | vector | variable name

Number of trials for binomial distribution, that is the sample size, specified as the comma-separated pair consisting of a scalar value, a vector of the same length as the response, or the name of a variable in the input table. If you specify the name of a variable, then the variable must be of the same length as the response. BinomialSize applies only when the Distribution parameter is 'binomial'.

If BinomialSize is a scalar value, that means all observations have the same number of trials.

Data Types: single | double

`CheckHessian` — Indicator to check positive definiteness of Hessian
`false` (default) | `true`

Indicator to check the positive definiteness of the Hessian of the objective function with respect to unconstrained parameters at convergence, specified as the comma-separated pair consisting of 'CheckHessian' and either false or true. Default is false.

Specify 'CheckHessian' as true to verify optimality of the solution or to determine if the model is overparameterized in the number of covariance parameters.

If you specify 'FitMethod' as 'MPL' or 'REMPL', then the covariance of the fixed effects and the covariance parameters is based on the fitted linear mixed-effects model from the final pseudo likelihood iteration.

Example: 'CheckHessian',true

`CovarianceMethod` — Method to compute covariance of estimated parameters
`'conditional'` (default) | `'JointHessian'`

Method to compute covariance of estimated parameters, specified as the comma-separated pair consisting of 'CovarianceMethod' and either 'conditional' or 'JointHessian'. If you specify 'conditional', then fitglme computes a fast approximation to the covariance of fixed effects given the estimated covariance parameters. It does not compute the covariance of covariance parameters. If you specify 'JointHessian', then fitglme computes the joint covariance of fixed effects and covariance parameters via the observed information matrix using the Laplacian loglikelihood.

Example: 'CovarianceMethod','JointHessian'

`CovariancePattern` — Pattern of covariance matrix
`'FullCholesky'` | `'Isotropic'` | `'Full'` | `'Diagonal'` | `'CompSymm'` | square symmetric logical matrix | string array | cell array of character vectors or logical matrices

Pattern of the covariance matrix of the random effects, specified as the comma-separated pair consisting of 'CovariancePattern' and 'FullCholesky', 'Isotropic', 'Full', 'Diagonal', 'CompSymm', a square symmetric logical matrix, a string array, or a cell array containing character vectors or logical matrices.

If there are R random-effects terms, then the value of 'CovariancePattern' must be a string array or cell array of length R, where each element r of the array specifies the pattern of the covariance matrix of the random-effects vector associated with the rth random-effects term. The options for each element follow.

Value	Description
`'FullCholesky'`	Full covariance matrix using the Cholesky parameterization. `fitglme` estimates all elements of the covariance matrix.
`'Isotropic'`	Diagonal covariance matrix with equal variances. That is, off-diagonal elements of the covariance matrix are constrained to be 0, and the diagonal elements are constrained to be equal. For example, if there are three random-effects terms with an isotropic covariance structure, this covariance matrix looks like $(\begin{matrix} σ_{b}^{2} & 0 & 0 \\ 0 & σ_{b}^{2} & 0 \\ 0 & 0 & σ_{b}^{2} \end{matrix})$ where σ²₁ is the common variance of the random-effects terms.
`'Full'`	Full covariance matrix, using the log-Cholesky parameterization. `fitlme` estimates all elements of the covariance matrix.
`'Diagonal'`	Diagonal covariance matrix. That is, off-diagonal elements of the covariance matrix are constrained to be 0. $(\begin{matrix} σ_{b 1}^{2} & 0 & 0 \\ 0 & σ_{b 2}^{2} & 0 \\ 0 & 0 & σ_{b 3}^{2} \end{matrix})$
`'CompSymm'`	Compound symmetry structure. That is, common variance along diagonals and equal correlation between all random effects. For example, if there are three random-effects terms with a covariance matrix having a compound symmetry structure, this covariance matrix looks like $(\begin{matrix} σ_{b 1}^{2} & σ_{b 1, b 2} & σ_{b 1, b 2} \\ σ_{b 1, b 2} & σ_{b 1}^{2} & σ_{b 1, b 2} \\ σ_{b 1, b 2} & σ_{b 1, b 2} & σ_{b 1}^{2} \end{matrix})$ where σ²_b1 is the common variance of the random-effects terms and σ_b1,b2 is the common covariance between any two random-effects term.
`PAT`	Square symmetric logical matrix. If `'CovariancePattern'` is defined by the matrix `PAT`, and if `PAT(a,b) = false`, then the `(a,b)` element of the corresponding covariance matrix is constrained to be 0.

For scalar random-effects terms, the default is 'Isotropic'. Otherwise, the default is 'FullCholesky'.

Example: 'CovariancePattern','Diagonal'

Example: 'CovariancePattern',{'Full','Diagonal'}

Data Types: char | string | logical | cell

`DispersionFlag` — Indicator to compute dispersion parameter
`false` for `'binomial'` and `'poisson'` distributions (default) | `true`

Indicator to compute dispersion parameter for 'binomial' and 'poisson' distributions, specified as the comma-separated pair consisting of 'DispersionFlag' and one of the following.

Value	Description
`true`	Estimate a dispersion parameter when computing standard errors
`false`	Use the theoretical value of `1.0` when computing standard errors

'DispersionFlag' only applies if 'FitMethod' is 'MPL' or 'REMPL'.

The fitting function always estimates the dispersion for other distributions.

Example: 'DispersionFlag',true

`Distribution` — Distribution of the response variable
`'Normal'` (default) | `'Binomial'` | `'Poisson'` | `'Gamma'` | `'InverseGaussian'`

Distribution of the response variable, specified as the comma-separated pair consisting of 'Distribution' and one of the following.

Value	Description
`'Normal'`	Normal distribution
`'Binomial'`	Binomial distribution
`'Poisson'`	Poisson distribution
`'Gamma'`	Gamma distribution
`'InverseGaussian'`	Inverse Gaussian distribution

Example: 'Distribution','Binomial'

`DummyVarCoding` — Coding to use for dummy variables
`'reference'` (default) | `'effects'` | `'full'`

Coding to use for dummy variables created from the categorical variables, specified as the comma-separated pair consisting of 'DummyVarCoding' and one of the variables in this table.

Value	Description
`'reference'` (default)	`fitglme` creates dummy variables with a reference group. This scheme treats the first category as a reference group and creates one less dummy variables than the number of categories. You can check the category order of a categorical variable by using the `categories` function, and change the order by using the `reordercats` function.
`'effects'`	`fitglme` creates dummy variables using effects coding. This scheme uses –1 to represent the last category. This scheme creates one less dummy variables than the number of categories.
`'full'`	`fitglme` creates full dummy variables. This scheme creates one dummy variable for each category.

For more details about creating dummy variables, see Automatic Creation of Dummy Variables.

Example: 'DummyVarCoding','effects'

`EBMethod` — Method used to approximate empirical Bayes estimates of random effects
`'Auto'` (default) | `'LineSearchNewton'` | `'TrustRegion2D'` | `'fsolve'`

Method used to approximate empirical Bayes estimates of random effects, specified as the comma-separated pair consisting of 'EBMethod' and one of the following.

'Auto'
'LineSearchNewton'
'TrustRegion2D'
'fsolve'

'Auto' is similar to 'LineSearchNewton' but uses a different convergence criterion and does not display iterative progress. 'Auto' and 'LineSearchNewton' may fail for non-canonical link functions. For non-canonical link functions, 'TrustRegion2D' or 'fsolve' are recommended. You must have Optimization Toolbox™ to use 'fsolve'.

Example: 'EBMethod','LineSearchNewton'

`EBOptions` — Options for empirical Bayes optimization
structure

Options for empirical Bayes optimization, specified as the comma-separated pair consisting of 'EBOptions' and a structure containing the following.

Value	Description
`'TolFun'`	Relative tolerance on the gradient norm. Default is 1e-6.
`'TolX'`	Absolute tolerance on the step size. Default is 1e-8.
`'MaxIter'`	Maximum number of iterations. Default is 100.
`'Display'`	`'off'`, `'iter'`, or `'final'`. Default is `'off'`.

If EBMethod is 'Auto' and 'FitMethod' is 'Laplace', TolFun is the relative tolerance on the linear predictor of the model, and the 'Display' option does not apply.

If 'EBMethod' is 'fsolve', then 'EBOptions' must be specified as an object created by optimoptions('fsolve').

Data Types: struct

`Exclude` — Indices for rows to exclude
use all rows without `NaNs` (default) | vector of integer or logical values

Indices for rows to exclude from the generalized linear mixed-effects model in the data, specified as the comma-separated pair consisting of 'Exclude' and a vector of integer or logical values.

For example, you can exclude the 13th and 67th rows from the fit as follows.

Example: 'Exclude',[13,67]

Data Types: single | double | logical

`FitMethod` — Method for estimating model parameters
`'MPL'` (default) | `'REMPL'` | `'Laplace'` | `'ApproximateLaplace`

Method for estimating model parameters, specified as the comma-separated pair consisting of 'FitMethod' and one of the following.

'MPL' — Maximum pseudo likelihood
'REMPL' — Restricted maximum pseudo likelihood
'Laplace' — Maximum likelihood using Laplace approximation
'ApproximateLaplace' — Maximum likelihood using approximate Laplace approximation with fixed effects profiled out

Example: 'FitMethod','REMPL'

`InitPLIterations` — Initial number of pseudo likelihood iterations
10 (default) | integer value in the range [1,∞)

Initial number of pseudo likelihood iterations used to initialize parameters for ApproximateLaplace and Laplace fit methods, specified as the comma-separated pair consisting of 'InitPLIterations' and an integer value greater than or equal to 1.

Data Types: single | double

`Link` — Link function
`'identity'` | `'log'` | `'logit'` | `'probit'` | `'comploglog'` | `'reciprocal'` | scalar value | structure

Link function, specified as the comma-separated pair consisting of 'Link' and one of the following.

Value	Description
`'identity'`	`g(mu) = mu` This is the default for the normal distribution.
`'log'`	`g(mu) = log(mu)` This is the default for the Poisson distribution.
`'logit'`	`g(mu) = log(mu/(1-mu))` This is the default for the binomial distribution.
`'loglog'`	`g(mu) = log(-log(mu))`
`'probit'`	`g(mu) = norminv(mu)`
`'comploglog'`	`g(mu) = log(-log(1-mu))`
`'reciprocal'`	`g(mu) = mu.^(-1)`
Scalar value `P`	`g(mu) = mu.^P`
Structure `S`	A structure containing four fields whose values are function handles with the following names: `S.Link` — Link function `S.Derivative` — Derivative `S.SecondDerivative` — Second derivative `S.Inverse` — Inverse of link Specification of `S.SecondDerivative` can be omitted if `FitMethod` is `MPL` or `REMPL`, or if `S` is the canonical link for the specified distribution.

The default link function used by fitglme is the canonical link that depends on the distribution of the response.

Response Distribution	Canonical Link Function
`'Normal'`	`'identity'`
`'Binomial'`	`'logit'`
`'Poisson'`	`'log'`
`'Gamma'`	`-1`
`'InverseGaussian'`	`-2`

Example: 'Link','log'

Data Types: char | string | single | double | struct

`MuStart` — Starting values for conditional mean
n-by-1 numeric vector

Starting values for conditional mean, specified as an n-by-1 vector, where n is the number of rows in tbl. Valid values are as follows.

Response Distribution	Valid Values
`"Normal"`	`(-Inf,Inf)`
`"Binomial"`	`(0,1)`
`"Poisson"`	`(0,Inf)`
`"Gamma"`	`(0,Inf)`
`"InverseGaussian"`	`(0,Inf)`

Data Types: single | double

`Offset` — Offset
`zeros(n,1)` (default) | n-by-1 vector of scalar values

Offset, specified as the comma-separated pair consisting of 'Offset' and an n-by-1 vector of scalar values, where n is the length of the response vector. You can also specify the variable name of an n-by-1 vector of scalar values. 'Offset' is used as an additional predictor that has a coefficient value fixed at 1.0.

Data Types: single | double

`Optimizer` — Optimization algorithm
`'quasinewton'` (default) | `'fminsearch'` | `'fminunc'`

Optimization algorithm, specified as the comma-separated pair consisting of 'Optimizer' and either of the following.

Value	Description
`'quasinewton'`	Uses a trust region based quasi-Newton optimizer. You can change the options of the algorithm using `statset('fitglme')`. If you do not specify the options, then `fitglme` uses the default options of `statset('fitglme')`.
`'fminsearch'`	Uses a derivative-free Nelder-Mead method. You can change the options of the algorithm using `optimset('fminsearch')`. If you do not specify the options, then `fitglme` uses the default options of `optimset('fminsearch')`.
`'fminunc'`	Uses a line search-based quasi-Newton method. You must have Optimization Toolbox to specify this option. You can change the options of the algorithm using `optimoptions('fminunc')`. If you do not specify the options, then `fitglme` uses the default options of `optimoptions('fminunc')` with `'Algorithm'` set to `'quasi-newton'`.

Example: 'Optimizer','fminsearch'

`OptimizerOptions` — Options for optimization algorithm
structure returned by `statset` | structure returned by `optimset` | object returned by `optimoptions`

Options for the optimization algorithm, specified as the comma-separated pair consisting of 'OptimizerOptions' and a structure returned by statset('fitglme'), a structure created by optimset('fminsearch'), or an object returned by optimoptions('fminunc').

If 'Optimizer' is 'fminsearch', then use optimset('fminsearch') to change the options of the algorithm. If 'Optimizer' is 'fminsearch' and you do not supply 'OptimizerOptions', then the defaults used in fitglme are the default options created by optimset('fminsearch').
If 'Optimizer' is 'fminunc', then use optimoptions('fminunc') to change the options of the optimization algorithm. See optimoptions for the options 'fminunc' uses. If 'Optimizer' is 'fminunc' and you do not supply 'OptimizerOptions', then the defaults used in fitglme are the default options created by optimoptions('fminunc') with 'Algorithm' set to 'quasi-newton'.
If 'Optimizer' is 'quasinewton', then use statset('fitglme') to change the optimization parameters. If 'Optimizer' is 'quasinewton' and you do not change the optimization parameters using statset, then fitglme uses the default options created by statset('fitglme').

The 'quasinewton' optimizer uses the following fields in the structure created by statset('fitglme').

`TolFun` — Relative tolerance on gradient of objective function
`1e-6` (default) | positive scalar value

Relative tolerance on the gradient of the objective function, specified as a positive scalar value.

`TolX` — Absolute tolerance on step size
`1e-12` (default) | positive scalar value

Absolute tolerance on the step size, specified as a positive scalar value.

`MaxIter` — Maximum number of iterations allowed
`10000` (default) | positive scalar value

Maximum number of iterations allowed, specified as a positive scalar value.

`Display` — Level of display
`'off'` (default) | `'iter'` | `'final'`

Level of display, specified as one of 'off', 'iter', or 'final'.

`PLIterations` — Maximum number of pseudo likelihood iterations
`100` (default) | positive integer value

Maximum number of pseudo likelihood (PL) iterations, specified as the comma-separated pair consisting of 'PLIterations' and a positive integer value. PL is used for fitting the model if 'FitMethod' is 'MPL' or 'REMPL'. For other 'FitMethod' values, PL iterations are used to initialize parameters for subsequent optimization.

Example: 'PLIterations',200

Data Types: single | double

`PLTolerance` — Relative tolerance factor for pseudo likelihood iterations
`1e–08` (default) | positive scalar value

Relative tolerance factor for pseudo likelihood iterations, specified as the comma-separated pair consisting of 'PLTolerance' and a positive scalar value.

Example: 'PLTolerance',1e-06

Data Types: single | double

`StartMethod` — Method to start iterative optimization
`'default'` (default) | `'random'`

Method to start iterative optimization, specified as the comma-separated pair consisting of 'StartMethod' and either of the following.

Value	Description
`'default'`	An internally defined default value
`'random'`	A random initial value

Example: 'StartMethod','random'

`UseSequentialFitting` — Initial fitting type
`false` (default) | `true`

, specified as the comma-separated pair consisting of 'UseSequentialFitting' and either false or true. If 'UseSequentialFitting' is false, all maximum likelihood methods are initialized using one or more pseudo likelihood iterations. If 'UseSequentialFitting' is true, the initial values from pseudo likelihood iterations are refined using 'ApproximateLaplace' for 'Laplace' fitting.

Example: 'UseSequentialFitting',true

`Verbose` — Indicator to display optimization process on screen
`0` (default) | `1` | `2`

Indicator to display the optimization process on screen, specified as the comma-separated pair consisting of 'Verbose' and 0, 1, or 2. If 'Verbose' is specified as 1 or 2, then fitglme displays the progress of the iterative model-fitting process. Specifying 'Verbose' as 2 displays iterative optimization information from the individual pseudo likelihood iterations. Specifying 'Verbose' as 1 omits this display.

The setting for 'Verbose' overrides the field 'Display' in 'OptimizerOptions'.

Example: 'Verbose',1

`Weights` — Observation weights
vector of nonnegative scalar values

Observation weights, specified as the comma-separated pair consisting of 'Weights' and an n-by-1 vector of nonnegative scalar values, where n is the number of observations. If the response distribution is binomial or Poisson, then 'Weights' must be a vector of positive integers.

Data Types: single | double

Output Arguments

collapse all

`glme` — Generalized linear mixed-effects model
`GeneralizedLinearMixedModel` object

Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel.

More About

collapse all

Formula

In general, a formula for model specification is a character vector or string scalar of the form 'y ~ terms'. For the generalized linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed and random contain the fixed-effects and the random-effects terms.

Suppose a table tbl contains the following:

A response variable, y
Predictor variables, X_j, which can be continuous or grouping variables
Grouping variables, g₁, g₂, ..., g_R,

where the grouping variables in X_j and g_r can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors.

Then, in a formula of the form, 'y ~ fixed + (random₁|g₁) + ... + (random_R|g_R)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random₁ is a specification of the random-effects design matrix Z₁ corresponding to grouping variable g₁, and similarly random_R is a specification of the random-effects design matrix Z_R corresponding to grouping variable g_R. You can express the fixed and random terms using Wilkinson notation.

Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.

Wilkinson Notation	Factors in Standard Notation
`1`	Constant (intercept) term
`X^k`, where `k` is a positive integer	`X`, `X²`, ..., `X^k`
`X1 + X2`	`X1`, `X2`
`X1*X2`	`X1`, `X2`, `X1.*X2 (elementwise multiplication of X1 and X2)`
`X1:X2`	`X1.*X2` only
`- X2`	Do not include `X2`
`X1*X2 + X3`	`X1`, `X2`, `X3`, `X1*X2`
`X1 + X2 + X3 + X1:X2`	`X1`, `X2`, `X3`, `X1*X2`
`X1X2X3 - X1:X2:X3`	`X1`, `X2`, `X3`, `X1X2`, `X1X3`, `X2*X3`
`X1*(X2 + X3)`	`X1`, `X2`, `X3`, `X1X2`, `X1X3`

Statistics and Machine Learning Toolbox™ notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for generalized linear mixed-effects model specification.

Examples:

Formula	Description
`'y ~ X1 + X2'`	Fixed effects for the intercept, `X1` and `X2`. This is equivalent to `'y ~ 1 + X1 + X2'`.
`'y ~ -1 + X1 + X2'`	No intercept and fixed effects for `X1` and `X2`. The implicit intercept term is suppressed by including `-1`.
`'y ~ 1 + (1 \| g1)'`	Fixed effects for the intercept plus random effect for the intercept for each level of the grouping variable `g1`.
`'y ~ X1 + (1 \| g1)'`	Random intercept model with a fixed slope.
`'y ~ X1 + (X1 \| g1)'`	Random intercept and slope, with possible correlation between them. This is equivalent to `'y ~ 1 + X1 + (1 + X1\|g1)'`.
`'y ~ X1 + (1 \| g1) + (-1 + X1 \| g1)'`	Independent random effects terms for intercept and slope.
`'y ~ 1 + (1 \| g1) + (1 \| g2) + (1 \| g1:g2)'`	Random intercept model with independent main effects for `g1` and `g2`, plus an independent interaction effect.

Version History

Introduced in R2014b

fitglme

Syntax

Description

Examples

Fit a Generalized Linear Mixed-Effects Model

Fit Generalized Mixed-Effects Model to Binary Data

Input Arguments

tbl — Input data table | dataset array

formula — Formula for model specification character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'

Name-Value Arguments

BinomialSize — Number of trials for binomial distribution 1 (default) | scalar value | vector | variable name

CheckHessian — Indicator to check positive definiteness of Hessian false (default) | true

CovarianceMethod — Method to compute covariance of estimated parameters 'conditional' (default) | 'JointHessian'

CovariancePattern — Pattern of covariance matrix 'FullCholesky' | 'Isotropic' | 'Full' | 'Diagonal' | 'CompSymm' | square symmetric logical matrix | string array | cell array of character vectors or logical matrices

DispersionFlag — Indicator to compute dispersion parameter false for 'binomial' and 'poisson' distributions (default) | true

Distribution — Distribution of the response variable 'Normal' (default) | 'Binomial' | 'Poisson' | 'Gamma' | 'InverseGaussian'

DummyVarCoding — Coding to use for dummy variables 'reference' (default) | 'effects' | 'full'

EBMethod — Method used to approximate empirical Bayes estimates of random effects 'Auto' (default) | 'LineSearchNewton' | 'TrustRegion2D' | 'fsolve'

EBOptions — Options for empirical Bayes optimization structure

Exclude — Indices for rows to exclude use all rows without NaNs (default) | vector of integer or logical values

FitMethod — Method for estimating model parameters 'MPL' (default) | 'REMPL' | 'Laplace' | 'ApproximateLaplace

InitPLIterations — Initial number of pseudo likelihood iterations 10 (default) | integer value in the range [1,∞)

Link — Link function 'identity' | 'log' | 'logit' | 'probit' | 'comploglog' | 'reciprocal' | scalar value | structure

MuStart — Starting values for conditional mean n-by-1 numeric vector

Offset — Offset zeros(n,1) (default) | n-by-1 vector of scalar values

Optimizer — Optimization algorithm 'quasinewton' (default) | 'fminsearch' | 'fminunc'

OptimizerOptions — Options for optimization algorithm structure returned by statset | structure returned by optimset | object returned by optimoptions

TolFun — Relative tolerance on gradient of objective function 1e-6 (default) | positive scalar value

TolX — Absolute tolerance on step size 1e-12 (default) | positive scalar value

MaxIter — Maximum number of iterations allowed 10000 (default) | positive scalar value

Display — Level of display 'off' (default) | 'iter' | 'final'

PLIterations — Maximum number of pseudo likelihood iterations 100 (default) | positive integer value

PLTolerance — Relative tolerance factor for pseudo likelihood iterations 1e–08 (default) | positive scalar value

StartMethod — Method to start iterative optimization 'default' (default) | 'random'

UseSequentialFitting — Initial fitting type false (default) | true

Verbose — Indicator to display optimization process on screen 0 (default) | 1 | 2

Weights — Observation weights vector of nonnegative scalar values

Output Arguments

glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object

More About

Formula

Version History

See Also

Topics

`tbl` — Input data
table | dataset array

`formula` — Formula for model specification
character vector or string scalar of the form `'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'`

`BinomialSize` — Number of trials for binomial distribution
1 (default) | scalar value | vector | variable name

`CheckHessian` — Indicator to check positive definiteness of Hessian
`false` (default) | `true`

`CovarianceMethod` — Method to compute covariance of estimated parameters
`'conditional'` (default) | `'JointHessian'`

`CovariancePattern` — Pattern of covariance matrix
`'FullCholesky'` | `'Isotropic'` | `'Full'` | `'Diagonal'` | `'CompSymm'` | square symmetric logical matrix | string array | cell array of character vectors or logical matrices

`DispersionFlag` — Indicator to compute dispersion parameter
`false` for `'binomial'` and `'poisson'` distributions (default) | `true`

`Distribution` — Distribution of the response variable
`'Normal'` (default) | `'Binomial'` | `'Poisson'` | `'Gamma'` | `'InverseGaussian'`

`DummyVarCoding` — Coding to use for dummy variables
`'reference'` (default) | `'effects'` | `'full'`

`EBMethod` — Method used to approximate empirical Bayes estimates of random effects
`'Auto'` (default) | `'LineSearchNewton'` | `'TrustRegion2D'` | `'fsolve'`

`EBOptions` — Options for empirical Bayes optimization
structure

`Exclude` — Indices for rows to exclude
use all rows without `NaNs` (default) | vector of integer or logical values

`FitMethod` — Method for estimating model parameters
`'MPL'` (default) | `'REMPL'` | `'Laplace'` | `'ApproximateLaplace`

`InitPLIterations` — Initial number of pseudo likelihood iterations
10 (default) | integer value in the range [1,∞)

`Link` — Link function
`'identity'` | `'log'` | `'logit'` | `'probit'` | `'comploglog'` | `'reciprocal'` | scalar value | structure

`MuStart` — Starting values for conditional mean
n-by-1 numeric vector

`Offset` — Offset
`zeros(n,1)` (default) | n-by-1 vector of scalar values

`Optimizer` — Optimization algorithm
`'quasinewton'` (default) | `'fminsearch'` | `'fminunc'`

`OptimizerOptions` — Options for optimization algorithm
structure returned by `statset` | structure returned by `optimset` | object returned by `optimoptions`

`TolFun` — Relative tolerance on gradient of objective function
`1e-6` (default) | positive scalar value

`TolX` — Absolute tolerance on step size
`1e-12` (default) | positive scalar value

`MaxIter` — Maximum number of iterations allowed
`10000` (default) | positive scalar value

`Display` — Level of display
`'off'` (default) | `'iter'` | `'final'`

`PLIterations` — Maximum number of pseudo likelihood iterations
`100` (default) | positive integer value

`PLTolerance` — Relative tolerance factor for pseudo likelihood iterations
`1e–08` (default) | positive scalar value

`StartMethod` — Method to start iterative optimization
`'default'` (default) | `'random'`

`UseSequentialFitting` — Initial fitting type
`false` (default) | `true`

`Verbose` — Indicator to display optimization process on screen
`0` (default) | `1` | `2`

`Weights` — Observation weights
vector of nonnegative scalar values

`glme` — Generalized linear mixed-effects model
`GeneralizedLinearMixedModel` object