CoxModel

Cox proportional hazards model

Since R2021a

Description

A Cox proportional hazards model relates to lifetime or failure time data. The basic Cox model includes a hazard function h₀(t) and model coefficients b such that, for predictor X, the hazard rate at time t is

$h (X_{i}, t) = h_{0} (t) \exp [\sum_{j = 1}^{p} x_{i j} b_{j}],$

where the b coefficients do not depend on time. The creation function fitcox infers both the model coefficients b and the hazard rate h₀(t), and stores them as properties in the resulting CoxModel object.

The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See Extension of Cox Proportional Hazards Model.

Creation

Create a CoxModel object using fitcox.

Properties

expand all

`Baseline` — Baseline hazard
`mean(X)` (default) | real scalar

Baseline hazard specified when model was fitted, specified as a real scalar. The Cox model is a relative hazard model, so it requires a baseline at which to compare hazards of given data, relative to the baseline. The default is mean(X) (and the mean within each stratification for stratified models), so the hazard rate at X is h(t)*exp((X-mean(X))*b). Enter 0 to compute the baseline relative to 0, so the hazard rate at X is h(t)*exp(X*b). Changing the baseline does not affect the coefficients.

Data Types: double

`CoefficientCovariance` — Covariance matrix for coefficient estimates
square matrix

Covariance matrix for coefficient estimates, specified as a square matrix with the number of rows equal to the number of predictors.

Data Types: double

`Coefficients` — Coefficients and related statistics
table

Coefficients and related statistics, specified as a table with four columns:

Beta — Coefficient estimate
SE — Standard error of the coefficient estimate
zStat — z statistic
pValue — p-value for the coefficient (compared to a zero Beta)

Each row of the table corresponds to one predictor.

To obtain any of these columns as a vector, index into the property using dot notation. For example, in the coxMdl object, the estimated coefficient vector is

beta = coxMdl.Coefficients.Beta

To perform other tests on the coefficients, use linhyptest.

Data Types: table

`Formula` — Representation of the model used in fit
formula in Wilkinson notation

Representation of the model used in the fit, specified as a formula in Wilkinson notation. See Wilkinson Notation. For example, to include several predictors, use

'X ~ a + b + … + c'

where each of the variables a, b, c represents one predictor. These variables are column names for the table X.

`Hazard` — Estimated baseline cumulative hazard
matrix double

Estimated baseline cumulative hazard, specified as a matrix double. The cumulative hazard is evaluated at time points defined in training.

Hazard has at least two columns. The first column contains the time values, and the rest of the columns contain the cumulative hazard at each listed time.

For nonstratified models, Hazard has two columns.
For stratified models, Hazard has an additional column for each unique combination of the stratification levels. Distinct time values in Hazard(:,1) for each stratification are separated by 0 entries in Hazard(:,2). A stratified model is a model trained using the 'Stratification' name-value argument.

Theoretically, the cumulative hazard at time t is –log(1 – cdf(t)). The empirical cumulative hazard is

$\hat{H_{0}} (t) = \sum_{t_{i} \leq t} \hat{h_{0}} (t_{i}) = \sum_{t_{i} \leq t} \frac{1}{\sum_{j \in R_{i}} \exp (β \cdot x_{j})},$

where R_i is the risk set at time t_i, meaning the observations that are at risk of failing. See Partial Likelihood Function.

Data Types: double

`LikelihoodRatioTestPValue` — P-value indicating model is significant relative to null model
real scalar

P-value indicating if the model is significant relative to the null model, specified as a real scalar.

This property contains the p-value of performing the likelihood ratio test against the null model, that is, a model with all coefficients equal to 0.

The likelihood ratio test compares the likelihood function of the data at the coefficient estimates, and at all the coefficients being 0. The comparison yields a test statistic that can be used to determine if the trained model is significant, relative to a model with all coefficients equal to 0. The null hypothesis is that there is no difference between the null model and the trained model, so a significant p-value implies the trained model is significant.

Data Types: double

`LogLikelihood` — Log of likelihood function at coefficient estimates
real scalar

Log of the likelihood function at the coefficient estimates, specified as a real scalar.

Data Types: double

`NumPredictors` — Number of predictors
positive integer

Number of predictors (coefficients) in the model, specified as a positive integer.

Data Types: double

`PredictorNames` — Names of predictors
cell array of character vectors

Names of the predictors used to fit the model, specified as a cell array of character vectors. If the model is trained on data in a table, the predictor names are the names of the table columns. Otherwise, the predictor names are X1, X2, and so on.

Data Types: cell

`ProportionalHazardsPValue` — P-value indicating covariates satisfies the proportional hazards assumption
real vector

P-value indicating if each covariate satisfies the proportional hazards assumption, specified as a real vector, with one entry for each predictor.

The Cox model relies on the assumption of proportional hazards, that is, for any two data points X1 and X2, hazard(X1)/hazard(X2) is constant. This assumption might be violated if the predictors depend on time. For example, if a predictor corresponds to age, it generally becomes more hazardous as age increases.

The test of this assumption uses the scaled Schoenfeld residuals and was derived by Grambsch and Therneau in [1].

The null hypothesis is that each coefficient satisfies the proportional hazards assumption. A significant p-value implies that a specific coefficient violates the proportional hazards assumption. The test is performed on each coefficient, so this property is a vector with as many elements as the number of coefficients.

Data Types: double

`ProportionalHazardsPValueGlobal` — P-value indicating model satisfies proportional hazards assumption
real scalar

P-value indicating if the whole model satisfies the proportional hazards assumption, specified as a real scalar.

The test of this assumption uses the scaled Schoenfeld residuals and was derived by Grambsch and Therneau in [1].

The null hypothesis is that the model, as a whole, satisfies the proportional hazards assumption. A significant p-value implies the whole model does not satisfy the proportional hazards assumption.

Data Types: double

`Residuals` — Residuals of various types
table

Residuals of various types, specified as a table with seven columns, one for each residual:

'CoxSnell' — The Cox-Snell residuals for an observation X(i) are defined as the cumulative hazard at time i (cumHazard(i)) multiplied by the hazard of X(i): csres(i) = cumHazard(i) * exp(X(i) * Beta). Beta is the fitted Beta vector stored in Coefficients.
'Deviance' — The deviance residual is defined using the martingale residual as follows: D(i) = sign(M(i))*sqrt(–2*[M(i) + delta(i)*log(delta(i)–M(i)))], where D(i) is the ith deviance residual, M(i) is the ith martingale residual, and delta(i) indicates if the data point i died or not.
'Martingale' — The martingale residual for a point X(i) is delta(i) – CoxSnell(i), where delta(i) indicates if X(i) died, and CoxSnell(i) is the Cox-Snell residual at i. The martingale residual can be viewed as the difference between the true number of deaths for X(i) minus the expected number of deaths based on the model.
'Schoenfeld' — The Schoenfeld residuals are defined as: scres(i,j) = X(i,j) – M(Beta,i,j), where X(i,j) is the jth element of observation i, and M(Beta,i,j) is the expected value of X(i,j), given the number of living observations left at time i. The Schoenfeld residuals can be viewed as the difference between a true dead observation at time i and how the model expects a dead observation to look at time i, given the remaining living observations. The residuals are calculated for each covariate, so they have as many columns as the number of learned parameters. The residuals are valid only for times and observations at which there were deaths. For any censored observations, the corresponding residual is NaN.
'ScaledSchoenfeld' — The scaled Schoenfeld residuals are the Schoenfeld residuals scaled by the variance of the learned coefficients. Like the Schoenfeld residuals, the scaled residuals are not defined for observations and times at which there were no deaths; a residual at such a point is NaN.
'Score' — The score residuals are defined as: scores(i,t) = integral_{0}^{t}(X(i,u) – Xbar(u)) dScres(i,u), where Schres(i) is the Schoenfeld residual at observation i, and Xbar is the mean of the observations still alive at time u. The residuals are calculated for each covariate, so they have as many columns as the number of learned parameters.
'ScaledScore' — The scaled score residuals are the score residuals scaled by the covariance of the fitted coefficients.

Residuals has the same number of rows as the training data.

Data Types: table

`ResponseName` — Response variable name
character vector

Response variable name, specified as a character vector. For models where the response value is in a table, the response variable name is the name of the relevant table column. Otherwise, ResponseName is 'T'.

Data Types: char

`StandardError` — Standard errors of coefficient estimates
real vector

Standard errors of coefficient estimates, specified as a real vector. StandardError is the square root of the diagonal of the CoefficientCovariance matrix.

Data Types: double

`Stratification` — Array of unique combinations of input stratification
numeric array | string array | cell array of strings | categorical array | cell array

Array of unique combinations of input stratification during training, specified as one of the following data types.

Numeric array — All stratification variables are numeric.
String array — All variables are strings.
Cell array of strings — All variables are cell strings.
Categorical array — All variables are categorical.
Cell array — Variables are mixed types.

Given some data X and T, the following table shows examples of what Stratification contains.

Input Data	Example	Resulting Stratification
Double	`mdl = fitcox(X,T,'Stratification',[1 2; 1 2; 2 2; 2 2]);`	`[1 2; 2 2]`
String	`mdl = fitcox(X,T,'Stratification',["cat";"dog";"dog";"bird"]);`	`["cat"; "dog"; "bird"]`
Cell string	`mdl = fitcox(X,T,'Stratification',{'cat';'dog';'dog';'bird'});`	`{'cat';'dog';'bird'}`
Categorical	`mdl = fitcox(X,T,'Stratification',categorical([1;2;3;4]));`	`categorical([1;2;3;4])`
Mixed types	`data = table(X,T,[1;2;1;3],{'cat';'cat';'dog';'dog'},'VariableNames',{'X','T','S1','S2');` `mdl = fitcox(data,'T','Stratification',{'S1','S2'});`	`{1 'cat'; 2 'cat'; 1 'dog'; 3 'dog'}`

Data Types: double | char | string | cell | categorical

`VariableInfo` — Information about fitting data
table

Information about fitting data, specified as a table with four columns:

Class — The class of the predictor.
Range — The minimum and maximum of the predictor if it is not categorical, or the list of all the categories if the predictor is categorical.
InModel — A logical indicating if the predictor is used in the model. The response variable is not in the model. Predictor variables used for training are in the model.
IsCategorical — A logical indicating if the predictor was treated as categorical during training.

If the model has no categorical predictors, and no formula was used to fit the model, the number of rows of VariableInfo is the number of model predictors. Otherwise, the number of rows is the same as the number of elements in PredictorNames.

Data Types: table

Object Functions

`coefci`	Confidence interval for Cox proportional hazards model coefficients
`discardResiduals`	Remove residuals from Cox model
`hazardratio`	Estimate Cox model hazard relative to baseline
`linhyptest`	Linear hypothesis tests on Cox model coefficients
`plotSurvival`	Plot survival function of Cox proportional hazards model
`survival`	Calculate survival of Cox proportional hazards model

Examples

collapse all

Estimate Cox Proportional Hazard Regression

Open Live Script

Weibull random variables with the same shape parameter have proportional hazard rates; see Weibull Distribution. The hazard rate with scale parameter $a$ and shape parameter $b$ at time $t$ is

$\frac{b}{a^{b}} t^{b - 1}$ .

Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter B.

rng default % For reproducibility
B = 2;
A = ones(100,1);
data1 = wblrnd(A,B);
A2 = 5*A;
data2 = wblrnd(A2,B);
A3 = A/3;
data3 = wblrnd(A3,B);

Create a table of data. The predictors are the three variable types, 1, 2, or 3.

predictors = categorical([A;2*A;3*A]);
data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);

Fit a Cox regression to the data.

mdl = fitcox(data,"Times")

mdl = 
Cox Proportional Hazards regression model

                     Beta        SE        zStat       pValue  
                    _______    _______    _______    __________

    Predictors_2    -3.5834    0.33187    -10.798    3.5299e-27
    Predictors_3     2.1668    0.20802     10.416    2.0899e-25


Log-likelihood: -1197.917

rates = exp(mdl.Coefficients.Beta)

rates = 2×1

    0.0278
    8.7301

Fit Cox Proportional Hazards Model to Lifetime Data

Open Live Script

Perform a Cox proportional hazards regression on the lightbulb data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored.

Load the lightbulb data set.

load lightbulb

Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb, which is categorical.

coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ...
    Censoring=lightbulb(:,3),CategoricalPredictors="all")

coxMdl = 
Cox Proportional Hazards regression model

             Beta       SE      zStat       pValue  
            ______    ______    ______    __________

    X1_1    4.7262    1.0372    4.5568    5.1936e-06


Log-likelihood: -212.638

Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating $\exp (B e t a)$ .

hr = exp(coxMdl.Coefficients.Beta)

hr = 
112.8646

The estimate of the hazard ratio is $e^{B e t a}$ = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of coxMdl.Coefficients.pValue indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean Beta = 0.

References

[1] Grambsch, Patricia M., and Terry M. Therneau. Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika, vol. 81, no. 3, 1994, pp. 515–526. JSTOR, https://www.jstor.org/stable/2337123.

Version History

Introduced in R2021a

CoxModel

Description

Creation

Properties

`Baseline` — Baseline hazard
`mean(X)` (default) | real scalar

`CoefficientCovariance` — Covariance matrix for coefficient estimates
square matrix

`Coefficients` — Coefficients and related statistics
table

`Formula` — Representation of the model used in fit
formula in Wilkinson notation

`Hazard` — Estimated baseline cumulative hazard
matrix double

`LikelihoodRatioTestPValue` — P-value indicating model is significant relative to null model
real scalar

`LogLikelihood` — Log of likelihood function at coefficient estimates
real scalar

`NumPredictors` — Number of predictors
positive integer

`PredictorNames` — Names of predictors
cell array of character vectors

`ProportionalHazardsPValue` — P-value indicating covariates satisfies the proportional hazards assumption
real vector

`ProportionalHazardsPValueGlobal` — P-value indicating model satisfies proportional hazards assumption
real scalar

`Residuals` — Residuals of various types
table

`ResponseName` — Response variable name
character vector

`StandardError` — Standard errors of coefficient estimates
real vector

`Stratification` — Array of unique combinations of input stratification
numeric array | string array | cell array of strings | categorical array | cell array

`VariableInfo` — Information about fitting data
table

Object Functions

Examples

Estimate Cox Proportional Hazard Regression

Fit Cox Proportional Hazards Model to Lifetime Data

References

Version History

See Also

Topics

CoxModel

Description

Creation

Properties

Baseline — Baseline hazard mean(X) (default) | real scalar

CoefficientCovariance — Covariance matrix for coefficient estimates square matrix

Coefficients — Coefficients and related statistics table

Formula — Representation of the model used in fit formula in Wilkinson notation

Hazard — Estimated baseline cumulative hazard matrix double

LikelihoodRatioTestPValue — P-value indicating model is significant relative to null model real scalar

LogLikelihood — Log of likelihood function at coefficient estimates real scalar

NumPredictors — Number of predictors positive integer

PredictorNames — Names of predictors cell array of character vectors

ProportionalHazardsPValue — P-value indicating covariates satisfies the proportional hazards assumption real vector

ProportionalHazardsPValueGlobal — P-value indicating model satisfies proportional hazards assumption real scalar

Residuals — Residuals of various types table

ResponseName — Response variable name character vector

StandardError — Standard errors of coefficient estimates real vector

Stratification — Array of unique combinations of input stratification numeric array | string array | cell array of strings | categorical array | cell array

VariableInfo — Information about fitting data table

Object Functions

Examples

Estimate Cox Proportional Hazard Regression

Fit Cox Proportional Hazards Model to Lifetime Data

References

Version History

See Also

Topics

`Baseline` — Baseline hazard
`mean(X)` (default) | real scalar

`CoefficientCovariance` — Covariance matrix for coefficient estimates
square matrix

`Coefficients` — Coefficients and related statistics
table

`Formula` — Representation of the model used in fit
formula in Wilkinson notation

`Hazard` — Estimated baseline cumulative hazard
matrix double

`LikelihoodRatioTestPValue` — P-value indicating model is significant relative to null model
real scalar

`LogLikelihood` — Log of likelihood function at coefficient estimates
real scalar

`NumPredictors` — Number of predictors
positive integer

`PredictorNames` — Names of predictors
cell array of character vectors

`ProportionalHazardsPValue` — P-value indicating covariates satisfies the proportional hazards assumption
real vector

`ProportionalHazardsPValueGlobal` — P-value indicating model satisfies proportional hazards assumption
real scalar

`Residuals` — Residuals of various types
table

`ResponseName` — Response variable name
character vector

`StandardError` — Standard errors of coefficient estimates
real vector

`Stratification` — Array of unique combinations of input stratification
numeric array | string array | cell array of strings | categorical array | cell array

`VariableInfo` — Information about fitting data
table