fitrepanel
Syntax
Description
returns the random effects panel data regression
model, EstMdl = fitrepanel(X,Y)EstMdl, from fitting the model to the input panel data in wide
format. X is a
T-by-n-by-p array of predictor
data and Y is a T-by-n matrix of
response data, where T is the greatest number of sampling time points
among subjects, n is the number of sampled subjects, and
p is the number of predictor variables. A data set in wide format must
be organized as follows:
Rows correspond to time points in the sample. In other words, row t contains all p measurements for all n subjects at time t.
Columns correspond to sampled subjects. In other words, column c contains all p measurements over all T time points of subject c.
For
X, pages correspond to predictor variables. In other words, page k contains measurements of predictor k for all n subjects and T time points in the sample. None of the predictors can represent an intercept. Among subjects, sampled time points correspond (fitrepanelassumes all subjects are measured simultaneously).
EstMdl is a panel data regression model PanelModel.
fits a random effects panel data regression model to the panel data in long
format. EstMdl = fitrepanel(X,Y,groups)X is an
m-by-p matrix of predictor data and
Y is an m-by-1 vector of response data, where
m is the number of observations (for example, m =
Tn for a balanced panel data set). Each row is an observation (all
measurements) associated with a particular subject at a particular time, and each column is
a variable. The groups input specifies to which subject the observation
belongs. For a subject, larger row indices indicate measurements taken later in the
sample.
fits a random effects panel data regression model to the predictor, response, and
subject-assignment data in the table or timetable EstMdl = fitrepanel(Tbl,PredictorVariables=predictorVariables,GroupVariable=groupVariable)Tbl. Panel data in a table is in
long format. The Tbl input argument has
m rows; each row is an observation. The
predictorVariables input specifies which table variables are
predictor variables. groupVariable specifies to which subject the
measurements in the rows of the data belong. The last table variable is the response
variable.
uses additional options specified by name-value arguments and any input-argument combination
in the previous syntaxes. For example,
EstMdl = fitrepanel(___,Name=Value)fitrepanel(Tbl,PredictorVariables=predictors,GroupVariable="Country",ResponseVariable="LogGDP",FitEffects=false,Method="ssm")
specifies that the table variable LogGDP contains the response data,
the table variable Country contains the subject identifiers, and the
arbitrary string vector predictors contains the predictor variable names
in the table. This syntax skips fitting the unobserved effects and estimates the parameters
by using maximum likelihood in the state-space model framework.
Examples
Fit a random effects panel data regression model to data using default options. The data is in wide format.
Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.
load Data_SimulatedBalancedPanelFor details on the data set, enter Description at the command line.
The variable Data is a 3-D numeric array containing the predictor and response variables. Each row is a time point in the sampling period, each column is a subject in the sample, and each page is a variable. The final variable in Data is the response variable (log wages series), while all other variables are predictors.
Create separate variables for the predictor and response data.
X = Data(:,:,1:(end-1)); Y = Data(:,:,end);
X is a 15-by-1000-by-11 numeric array of predictor data and Y is a 15-by-1000 numeric matrix. For example, X(10,501,3) is the education experience of subject 501 in 2015.
Create a binary numeric variable for whether the subject is female (coded as 1), by using predictor 2, and a binary numeric variable for whether the subject is married (coded as 1), by using predictor 9.
X(:,:,2) = double(X(:,:,2) == 1); X(:,:,9) = double(X(:,:,9) == 1);
Assume that the subject effect (heterogeneity) is not associated with the predictor variables. Fit a random effects panel data regression model to the data. Use default options.
EstMdl = fitrepanel(X,Y);
Panel data information:
Number of cross-sectional units (N): 1000
Number of periods (T): 15
Number of observations: 15000
Method of estimation: random effects (GLS)
| Estimator SE tStat pValue
-----------------------------------------------------------
x1 | 0.0485 0.0003 154.4327 0
x2 | -0.3381 0.0324 -10.4265 0.0000
x3 | 0.1003 0.0036 28.0246 0.0000
x4 | -0.1370 0.0389 -3.5231 0.0004
x5 | -0.0620 0.0047 -13.1530 0.0000
x6 | 0.0087 0.0053 1.6637 0.0962
x7 | -0.0390 0.0112 -3.4902 0.0005
x8 | -0.0191 0.0071 -2.6853 0.0072
x9 | -0.0516 0.0107 -4.8406 0.0000
x10 | 0.0741 0.0051 14.6575 0.0000
x11 | 0.0016 0.0003 5.7319 0.0000
DisturbanceVariance | 0.0302
EffectVariance | 0.0947
fitrepanel displays an estimation summary to the command line. Row xj contains, for predictor j, the coefficient estimate, standard error, and statistic for a two-tailed test that the coefficient is 0 with its -value. All predictor variables are significant except for x6 and x7.
Display the fitted model.
EstMdl
EstMdl =
PanelModel with properties:
Coefficients: [11×1 double]
CoefficientCovariance: [11×11 double]
DisturbanceVariance: 0.0302
EffectVariance: 0.0947
Effects: [4.5484 4.3304 4.6605 4.9304 4.9176 4.5704 3.8183 4.7566 4.4485 4.1725 4.6417 4.5086 4.7519 4.4794 4.2617 4.7618 4.5002 4.3778 4.1206 3.8693 3.9290 4.5513 4.5366 4.2080 4.6091 4.2711 4.6690 3.7448 3.9863 … ] (1×1000 double)
LogLikelihood: 4.9624e+03
Summary: [13×4 table]
Type: "RandomEffects"
EstMdl is a PanelModel object. You can access its properties using dot notation.
Plot the empirical distribution of the heterogeneity.
effects = EstMdl.Effects;
histogram(effects,Normalization="probability")
Fit a random effects panel data regression model to data using default options. The data is in long format.
Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.
Load and Extract Data
load Data_SimulatedBalancedPanelFor details on the data set, enter Description at the command line.
The variable Data is a 3-D numeric array containing the predictor and response variables. Each row is a time point in the sampling period, each column in a subject is the sample, and each page is a variable. The final variable in Data is the response variable (log wages series), while all other variables are predictors. This data format is wide.
Create separate variables for the predictor and response data.
X = Data(:,:,1:(end-1)); Y = Data(:,:,end); [T,n,p] = size(X)
T = 15
n = 1000
p = 11
X is a 15-by-1000-by-11 numeric array of predictor data and Y is a 15-by-1000 numeric matrix. For example, X(10,501,3) is the education experience of subject 501 in 2015.
Convert Data to Long Format
Data in long format must have the following characteristics:
The response data is a -by-1 vector, where is the number of periods in time time base and is the number of subjects. In this example, the long-format response data is a 15000-by-1 vector.
The predictor data is a -by- matrix, where is the number of predictors. In this example, the long-format predictor data is a 15000-by-11 matrix.
The software must be able to identify to which subject the observation belongs by a -by-1 vector of subject IDs.
For each subject, the software assumes that observations in higher rows were sampled later.
Convert the response data to long format by stacking the columns of Y using linear indexing with a single colon.
YLong = Y(:); size(YLong)
ans = 1×2
15000 1
For selected subjects, verify that the responses are arranged by blocks of subjects, increasing by sampling time within each block. To choose a subject to check, use the control.
j =3; % Subject index YSubj = Y(1:T,j); YLongSubj = YLong((T*(j-1)+1):(T*j)); sum(YSubj - YLongSubj)
ans = 0
Convert the predictor data to long format by stacking the columns of X and setting its pages to columns using reshape.
XLong = reshape(X,T*n,11); size(XLong)
ans = 1×2
15000 11
XLong is arranged such that all subject-specified measurements are blocked together and stacked, and within-subject blocks of observations are arranged in increasing order by sampling time.
For selected subjects, verify that the predictor data are arranged by blocks of subjects, increasing by sampling time within each block. To choose a subject to check, use the control.
j =
6;
XSubj = squeeze(X(1:T,j,:));
XLongSubj = XLong((T*(j-1)+1):(T*j),:);
sum(sum(XSubj - XLongSubj))ans = 0
Observations are arranged by blocks of subjects. Create a numeric vector, which identifies each subject, by repeating each integer in the interval times, and then stacking the results.
Groups = repmat(1:n,T,1); Groups = Groups(:);
Preprocess Data
Create a binary numeric variable for whether the subject is female (coded as 1), by using predictor 2, and a binary numeric variable for whether the subject is married (coded as 1), by using predictor 9.
XLong(:,2) = double(XLong(:,2) == 1); XLong(:,9) = double(XLong(:,9) == 1);
Fit the Model to Data
Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the timetable except the subject ID (Group). Specify the predictor and grouping variables; fitrepanel assumes the final variable is the response variable.
Assume that the heterogeneity is not associated with the predictor variables. Fit a random effects panel data regression model to the long-format data. Specify the grouping variable. Use default options.
EstMdl = fitrepanel(XLong,YLong,Groups);
Panel data information:
Number of cross-sectional units (N): 1000
Number of periods (T): 15
Number of observations: 15000
Method of estimation: random effects (GLS)
| Estimator SE tStat pValue
-----------------------------------------------------------
x1 | 0.0485 0.0003 154.4327 0
x2 | -0.3381 0.0324 -10.4265 0.0000
x3 | 0.1003 0.0036 28.0246 0.0000
x4 | -0.1370 0.0389 -3.5231 0.0004
x5 | -0.0620 0.0047 -13.1530 0.0000
x6 | 0.0087 0.0053 1.6637 0.0962
x7 | -0.0390 0.0112 -3.4902 0.0005
x8 | -0.0191 0.0071 -2.6853 0.0072
x9 | -0.0516 0.0107 -4.8406 0.0000
x10 | 0.0741 0.0051 14.6575 0.0000
x11 | 0.0016 0.0003 5.7319 0.0000
DisturbanceVariance | 0.0302
EffectVariance | 0.0947
The results are the same as the results from the model fit to data in wide format.
Fit a random effects panel data regression model to data using default options. The data is in a timetable.
Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.
load Data_SimulatedBalancedPanelFor details on the data set, enter Description at the command line.
The variable DataTimeTable is a timetable containing the data. LogWage is the response variable, Group is the subject ID (grouping) variable, and all other variables are predictors. Each row is an observation for a subject at a time point in the sampling period (in other words, this data format is wide).
Display the head and size of the timetable of data.
head(DataTimeTable)
Time WorkExperience Gender Education Ethnicity IsBlueCollar IsManufacturing IsSouth IsCity MaritalStatus IsUnion WeeksWorked Group LogWage
____ ______________ ______ _________ _________ ____________ _______________ _______ ______ _____________ _______ ___________ _____ _______
2006 29 female 12 0 1 0 0 0 nevermarried 0 48 1 6.7956
2007 30 female 12 0 1 0 0 0 nevermarried 0 49 1 6.6592
2008 31 female 12 0 1 0 0 0 nevermarried 0 51 1 6.9801
2009 32 female 12 0 0 0 0 0 nevermarried 0 45 1 7.2397
2010 33 female 12 0 0 0 0 0 nevermarried 0 25 1 7.123
2011 34 female 12 0 0 0 0 0 nevermarried 0 42 1 6.9183
2012 35 female 12 0 0 0 0 0 nevermarried 0 48 1 7.1639
2013 36 female 12 0 0 0 0 0 nevermarried 0 49 1 7.0534
size(DataTimeTable)
ans = 1×2
15000 13
Create a new timetable TT containing a binary numeric variable for whether the subject is female, by using Gender, and a binary numeric variable for whether the subject is married, by using MaritalStatus. Then, remove the corresponding variables from TT.
TT = DataTimeTable; TT.IsFemale = double(TT.Gender == "female"); TT = movevars(TT,"IsFemale","Before","Gender"); TT.Gender = []; TT.IsMarried = double(TT.MaritalStatus == "married"); TT = movevars(TT,"IsMarried","Before","MaritalStatus"); TT.MaritalStatus = [];
Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the timetable except the subject ID (Group). Specify the predictor and grouping variable names; fitrepanel assumes the final variable is the response variable.
prednames = TT.Properties.VariableNames(1:end-2);
EstMdl = fitrepanel(TT,PredictorVariables=prednames,GroupVariable="Group");Panel data information:
Number of cross-sectional units (N): 1000
Number of periods (T): 15
Number of observations: 15000
Method of estimation: random effects (GLS)
| Estimator SE tStat pValue
-----------------------------------------------------------
WorkExperience | 0.0485 0.0003 154.4327 0
IsFemale | -0.3381 0.0324 -10.4265 0.0000
Education | 0.1003 0.0036 28.0246 0.0000
Ethnicity | -0.1370 0.0389 -3.5231 0.0004
IsBlueCollar | -0.0620 0.0047 -13.1530 0.0000
IsManufacturing | 0.0087 0.0053 1.6637 0.0962
IsSouth | -0.0390 0.0112 -3.4902 0.0005
IsCity | -0.0191 0.0071 -2.6853 0.0072
IsMarried | -0.0516 0.0107 -4.8406 0.0000
IsUnion | 0.0741 0.0051 14.6575 0.0000
WeeksWorked | 0.0016 0.0003 5.7319 0.0000
DisturbanceVariance | 0.0302
EffectVariance | 0.0947
The results are the same as the results from the model fit to data in wide format.
Estimate a random effects panel data regression model of log wages as a function of a set of predictors by viewing the model as a linear state-space model.
By default, fitrepanel uses GLS to estimate the model. Alternatively, you can specify that fitrepanel view the model as a linear state-space, and apply maximum likelihood to estimate the parameters.
Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.
load Data_SimulatedBalancedPanelFor details on the data set, enter Description at the command line.
Create a new timetable TT containing a binary numeric variable for whether the subject is female, by using Gender, and a binary numeric variable for whether the subject is married, by using MaritalStatus. Then, remove the corresponding variables from TT.
TT = DataTimeTable; TT.IsFemale = double(TT.Gender == "female"); TT = movevars(TT,"IsFemale","Before","Gender"); TT.Gender = []; TT.IsMarried = double(TT.MaritalStatus == "married"); TT = movevars(TT,"IsMarried","Before","MaritalStatus"); TT.MaritalStatus = [];
Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the processed timetable TT except the subject ID. Specify the state-space model estimation method.
varnames = TT.Properties.VariableNames; prednames = varnames(~ismember(varnames,["Group" "LogWage"])); EstMdl = fitrepanel(TT,PredictorVariables=prednames,GroupVariable="Group",Method="ssm");
Panel data information:
Number of cross-sectional units (N): 1000
Number of periods (T): 15
Number of observations: 15000
Method of estimation: random effects (SSM)
| Estimator SE tStat pValue
-----------------------------------------------------------
WorkExperience | 0.0485 0.0003 154.4004 0
IsFemale | -0.3381 0.0325 -10.3976 0.0000
Education | 0.1003 0.0036 27.9469 0.0000
Ethnicity | -0.1370 0.0390 -3.5132 0.0004
IsBlueCollar | -0.0620 0.0047 -13.1543 0.0000
IsManufacturing | 0.0088 0.0053 1.6647 0.0960
IsSouth | -0.0389 0.0112 -3.4838 0.0005
IsCity | -0.0191 0.0071 -2.6810 0.0073
IsMarried | -0.0516 0.0107 -4.8464 0.0000
IsUnion | 0.0741 0.0051 14.6588 0.0000
WeeksWorked | 0.0016 0.0003 5.7332 0.0000
DisturbanceVariance | 0.0302 0.0004 84.7569 0
EffectVariance | 0.0953 0.0040 23.7771 0.0000
The results are nearly the same as the results from the model fit using GLS. This similarity occurs because, in this problem, maximum likelihood and GLS are asymptotically equal.
Fit a random effects panel data regression model to data; fix the random effects variance to a known value.
Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.
load Data_SimulatedBalancedPanelFor details on the data set, enter Description at the command line.
Create a new timetable TT containing a binary numeric variable for whether the subject is female, by using Gender, and a binary numeric variable for whether the subject is married, by using MaritalStatus. Then, remove the corresponding variables from TT.
TT = DataTimeTable; TT.IsFemale = double(TT.Gender == "female"); TT = movevars(TT,"IsFemale","Before","Gender"); TT.Gender = []; TT.IsMarried = double(TT.MaritalStatus == "married"); TT = movevars(TT,"IsMarried","Before","MaritalStatus"); TT.MaritalStatus = [];
Fit a random effects panel data regression model of the log wage series (LogWage) to all other variables in the timetable except the subject ID (Group). Specify the predictor and grouping variable names. Fix the random effects variance to 0.05, 0.1, and then 0.2. Suppress the estimation display.
prednames = TT.Properties.VariableNames(1:end-2); sigma2alpha = [0.05 0.1 0.5 1]; m = numel(sigma2alpha); Coefficients = cell(3,1); DisturbanceVariance = zeros(3,1); for j = 1:m EstMdl = fitrepanel(TT,PredictorVariables=prednames,GroupVariable="Group", ... EffectVariance=sigma2alpha(j),Display=false); Coefficients{j} = EstMdl.Coefficients; DisturbanceVariance(j) = EstMdl.DisturbanceVariance; end cell2mat(Coefficients')
ans = 11×4
0.0486 0.0485 0.0484 0.0483
-0.3376 -0.3381 -0.3387 -0.3388
0.1003 0.1004 0.1004 0.1004
-0.1378 -0.1370 -0.1359 -0.1357
-0.0621 -0.0620 -0.0618 -0.0618
0.0082 0.0088 0.0094 0.0096
-0.0451 -0.0385 -0.0294 -0.0278
-0.0231 -0.0189 -0.0148 -0.0143
-0.0413 -0.0523 -0.0653 -0.0673
0.0744 0.0741 0.0740 0.0740
0.0016 0.0016 0.0016 0.0016
DisturbanceVariance
DisturbanceVariance = 4×1
0.0302
0.0302
0.0302
0.0302
The estimates are nearly the same among the effects variance settings, which shows that the estimators are robust to moderate to extreme levels of the effects variance.
Fit a random effects panel data regression model to data and obtain robust estimates.
Load the simulated, balanced panel data set Data_SimulatedBalancedPanel.mat, which is available when you open this example. The data set contains 12 microeconomic measurements of 1000 randomly selected people taken yearly from 2006 through 2020. The response variable in the model is a series of log wages, while all other variables in the data are predictors.
load Data_SimulatedBalancedPanelFor details on the data set, enter Description at the command line.
Create separate variables for the predictor and response data.
X = Data(:,:,1:(end-1)); [T,n,p] = size(X); Y = Data(:,:,end);
Create a binary numeric variable for whether the subject is female (coded as 1), by using predictor 2, and a binary numeric variable for whether the subject is married (coded as 1), by using predictor 9.
X(:,:,2) = double(X(:,:,2) == 1); X(:,:,9) = double(X(:,:,9) == 1);
Simulate heteroscedasticity in the system by using unmeasured, subject-specific predictor variables such that, for each subject , and . For each subject, simulate values.
rng(1,"twister") Z = zeros(T,n); for j = 1:n lambda = randi(50); Z(:,j) = poissrnd(lambda,T,1); end
Add the simulated predictor data to the response data with coefficient .
YSim = Y + 2*Z;
Assume that the heterogeneity is not associated with the predictor variables. Fit a random effects panel data regression model to the predictor data without and the simulated response data. Use default options.
EstMdl = fitrepanel(X,YSim);
Panel data information:
Number of cross-sectional units (N): 1000
Number of periods (T): 15
Number of observations: 15000
Method of estimation: random effects (GLS)
| Estimator SE tStat pValue
----------------------------------------------------------
x1 | 0.0196 0.0190 1.0279 0.3040
x2 | 1.4279 3.0659 0.4657 0.6414
x3 | -0.1095 0.3385 -0.3236 0.7462
x4 | -2.0201 3.6775 -0.5493 0.5828
x5 | 0.3700 0.2763 1.3392 0.1805
x6 | -0.2306 0.3087 -0.7472 0.4550
x7 | 0.9527 0.6997 1.3616 0.1733
x8 | 0.1847 0.4256 0.4341 0.6642
x9 | 0.3463 0.6632 0.5223 0.6015
x10 | -0.0337 0.2965 -0.1137 0.9094
x11 | 0.0070 0.0161 0.4345 0.6640
DisturbanceVariance | 101.9401
EffectVariance | 859.3624
Compute model residuals , and plot them against the fitted responses. Color the residuals according to subject ID.
betahat = reshape(EstMdl.Coefficients,1,1,p); alphahat = EstMdl.Effects; Yhat = sum(X.*betahat,3) + alphahat; Residuals = YSim - Yhat; figure plot(Yhat,Residuals,'.') hold on yline(0,"--") hold off title("Residuals by Subject") ylabel("Residual") xlabel("Fitted Value")

The residuals scatter more widely as the fitted values increase. This behavior is indicative of heteroscedasticity. Also, residuals appear clustered by groups.
Refit the model; compute robust covariance estimates.
EstMdlRobust = fitrepanel(X,YSim,RobustCovariance=true);
Panel data information:
Number of cross-sectional units (N): 1000
Number of periods (T): 15
Number of observations: 15000
Method of estimation: random effects (GLS)
| Estimator SE tStat pValue
----------------------------------------------------------
x1 | 0.0196 0.0191 1.0222 0.3067
x2 | 1.4279 2.8975 0.4928 0.6222
x3 | -0.1095 0.3417 -0.3205 0.7486
x4 | -2.0201 3.5807 -0.5642 0.5726
x5 | 0.3700 0.2755 1.3431 0.1792
x6 | -0.2306 0.3215 -0.7174 0.4731
x7 | 0.9527 0.6701 1.4218 0.1551
x8 | 0.1847 0.4344 0.4253 0.6706
x9 | 0.3463 0.6501 0.5327 0.5942
x10 | -0.0337 0.2976 -0.1133 0.9098
x11 | 0.0070 0.0159 0.4414 0.6589
DisturbanceVariance | 101.9401
EffectVariance | 859.3624
The coefficient estimates between the regular and robust runs are the same; the difference between the runs is in the inferences.
Plot a heatmap of the difference between the estimated coefficient covariance matrix.
seriesSim = series(1:p); heatmap(seriesSim,seriesSim,(EstMdlRobust.CoefficientCovariance-EstMdl.CoefficientCovariance)./EstMdl.CoefficientCovariance)

The estimated covariance of the coefficients of Education and WeeksWorked shows the greatest relative difference between the robust and non-robust analyses.
Input Arguments
Predictor data X, specified as an m-by-p numeric matrix or a T-by-n-by-p numeric 3-D array, where m is to total number of observations, p is the number of predictor variables, n is the number of sampled subjects (groups), and T is the largest number of sampling time points among subjects.
When X is a matrix, the data sets are in long
format and the following conditions apply:
You must provide the subject identifiers input
groups.Each row is an observation taken at a particular time from a particular subject. Put differently, row j contains measurements for all predictors at time t for the subject
, whereggroups(isj).gSuppose
Xg = X(groups == g,:)identify the observations of subject. Thus, rowg< rowt1impliest1Xg(was observed earlier thant1,:)Xg(.t2,:)For each sampling time t,
fitrepanelassumes all subjects were measured simultaneously.Column k contains measurements of predictor variable k.
You must provide the response data
Yas an m-by-1 vector.
When X is a 3-D array, the data sets are in wide
format and the following conditions apply:
Row t contains all measurements taken at time t. Row t1 < row t2 implies the sampling time t1 < sampling time t2.
Column c contains all measurements of subject c.
Page k contains measurements of predictor variable k.
You must provide the response data
Yas a T-by-n matrix.
Do not include a predictor variable entirely composed of ones in
X to represent the model intercept.
NaN values in X indicate missing measurements.
For unbalanced data in wide format, you must insert rows of NaN
values for unmeasured time points among all pages.
For more details, see Panel Data.
Data Types: double
Response data Y, specified as an m-by-1 numeric vector or a T-by-n numeric matrix.
When Y is a vector the data sets are in long
format and the following conditions apply:
You must provide the subject identifiers input
groups.Row j contains the response at time t for the subject
, where inputggroups(isj).gSuppose
Yg = Y(groups == g)identify the responses of subject. Thus,gfitrepanelassumes that, ifYg(was observed at time t,j)Yg(was observed at time t + f, where f is the sampling frequency.j+ 1)For each sampling time t,
fitrepanelassumes all subjects were measured simultaneously.You must provide the predictor data
Xas an m-by-p matrix.
When Y is a matrix, the data sets are in wide
format and the following statements apply:
Row t contains all measurements taken at time t. Row t1 < row t2 implies the sampling time t1 < sampling time t2.
Column c contains all measurements of subject c.
You must provide the predictor data
Xas a T-by-n-by-p 3-D array.
NaN values in Y indicate missing responses.
For unbalanced data in wide format, you must insert rows of NaN
values for unmeasured time points.
For more details, see Panel Data.
Data Types: double
Subject (group) identifiers for unobserved effects, specified as an
m-by-1 vector. The unique values in groups
identify sampled subjects.
When you specify data in long format, you must specify
groups.
NaN values in groups indicate missing group
identifiers for the corresponding observations. fitrepanel removes
entire observations from the data when it cannot assign them to a group. Such data
removal can cause unbalanced panel data.
For more details, see Panel Data.
Data Types: double | categorical | cell | char | string
Panel data in long format, to which
fitrepanel fits the model, specified as a table or timetable
with numvars variables and m rows.
When you specify Tbl, the following conditions apply:
Each row is an observation taken at a particular time from a particular subject. Put differently, row j contains measurements for all variables at time t for the subject
.gSuppose
Tblg = Tbl(Tbl.groupVariable == g,:)identifies the observations of subject g. Thus,fitrepanelassumes that, ifTblg(was observed at time t,j,:)Tblg(was observed at time t + f.j+ 1,:)For each sampling time t,
fitrepanelassumes all subjects were measured simultaneously.Specify the predictor variables in the model X by setting
PredictorVariables=. Each selected predictor variable must be a numeric vector.predictorVariablesSpecify the subject (group) identifier variable by setting
GroupVariables=. The selected subject identifier variable can be a numeric, categorical, or text vector.groupVariableBy default, the last variable is the response variable Y, but you can specify a different variable by setting
ResponseVariable=. The selected response variable must be a numeric vector.responseVariable
Do not include a predictor variable entirely composed of ones in
Tbl to represent the model intercept.
NaN values in Tbl indicate missing
measurements. NaN values in groups indicate
missing group identifiers for the corresponding observations.
fitrepanel removes entire observations from the data when it
cannot assign them to a group. Such data removal can cause unbalanced panel data.
For more details, see Panel Data.
Predictor variables X to select from Tbl,
which contain predictor data, specified as one of the following data types:
String vector or cell vector of character vectors containing p variable names in
Tbl.Properties.VariableNamesA length p vector of unique indices (positive integers) of variables to select from
Tbl.Properties.VariableNamesA length
numvars = width(Tbl)logical vector, wherePredictorVariables(selects variablej) = truefromjTbl.Properties.VariableNames, andsum(PredictorVariables)is p
Example: PredictorVariables=["M1SL" "TB3MS"
"UNRATE"]
Example: PredictorVariables=[true false true false] or
PredictorVariable=[1 3] selects the first and third table variables
to supply the predictor data.
Data Types: double | logical | char | cell | string
Subject (group) variable to select from Tbl, which contains
subject identifier data for the unobserved effects, specified as one of the following
data types:
String scalar or character vector containing the variable name to select from
Tbl.Properties.VariableNamesVariable index (positive integer) to select from
Tbl.Properties.VariableNamesA logical vector, where
groupVariable(selects variablej) = truefromjTbl.Properties.VariableNames
Example: GroupVariable="Country"
Example: GroupVariable=[false false true false] or
GroupVariable=3 selects the third table variable as the subject
variable.
Data Types: double | logical | char | cell | string
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: fitrepanel(Tbl,PredictorVariables=predictors,GroupVariable="Subjects",ResponseVariable="Response",FitEffects=false,Method="ssm")
specifies that the table variable "Response" contains the response
data, the table variable "Subjects" contains the subject identifiers, and
the arbitrary string vector predictors contains the predictor variable
names in the table. This syntax skips fitting the unobserved effects and estimates the
parameters using maximum likelihood in the state-space model framework.
Disturbance variance σε2, specified as a nonnegative numeric scalar.
When you specify DisturbanceVariance=sigma2eps,
fitrepanel fixes the value of the disturbance variance to
sigma2eps during estimation. When
DisturbanceVariance=NaN, the default,
fitrepanel estimates the disturbance variance with all other
estimable parameters.
Example: DisturbanceVariance=5
Data Types: double
Variance of unobserved, subject-specific effects σɑ2, specified as a nonnegative numeric scalar.
When you specify EffectVariance=sigma2alpha,
fitrepanel fixes the value of the effects variance to
sigma2alpha during estimation. When
EffectVariance=NaN, the default,
fitrepanel estimates the effects variance with all other
estimable parameters.
Example: EffectVariance=1
Data Types: double
Parameter estimation method, specified as a value in this table:
| Value | Description |
|---|---|
"gls" | Generalized least squares |
"ssm" | Maximum likelihood estimation by linear state-space model formulation of model |
For more details, see Estimation Method Descriptions.
Example: Method="ssm"
Data Types: char | string
Robust covariance estimation flag, specified as a value in this table:
| Value | Description |
|---|---|
false | fitrepanel does not compute cluster-robust
covariance estimates. |
true | fitrepanel computes cluster-robust
covariance estimates. |
Coefficient estimates between the non-robust-covariance and robust-covariance estimation methods are equal. Coefficient covariance estimates, and therefore inferences, between the non-robust and robust-covariance estimation methods are not necessarily equal.
Tip
Although you should set RobustCovariance=true when
residuals show evidence of heteroscedasticity or serial correlation, [1] suggests this setting whenever it is
feasible.
Example: RobustCovariance=true
Data Types: logical
Unobserved effects ɑ estimation flag, specified as a value in this table:
| Value | Description |
|---|---|
false | fitrepanel does not estimate
ɑ. |
true | fitrepanel estimates ɑ and
reports its estimates. |
To fit the model using less computational resources and obtain only coefficient
and covariance estimates, and inferences, set
FitEffects=false.
For details on how fitrepanel estimates the random effects,
see Latent Effects Estimation.
Example: FitEffects=false
Data Types: logical
Predictor variable names for displays when you specify X,
specified as a string vector or cell vector of character vectors.
VarNames must contain NumPredictors
elements. VarNames( is the name of the
variable j)j in the predictor data
X.
The default is ["x1" "x2" ...
"x.p"}
Example: VarNames=["UnemploymentRate"; "CPI"]
Data Types: string | cell | char
Estimation display flag, specified as a value in this table.
| Value | Description |
|---|---|
false | fitrepanel does not display estimation results to
the command line. |
true | fitrepanel displays estimation results to the
command line. |
Example: Display=false
Data Types: logical
Response variable y to select from Tbl
containing the response data, specified as one of the following data types:
String scalar or character vector containing a variable name in
Tbl.Properties.VariableNamesVariable index (integer) to select from
Tbl.Properties.VariableNamesA length
numvarslogical vector, whereResponseVariable(selects variablej) = truefromjTbl.Properties.VariableNames, andsum(ResponseVariable)is1
Example: ResponseVariable="Wages"
Example: ResponseVariable=[false false true false] or
ResponseVariable=3 selects the third table variable as the
response variable.
Data Types: double | logical | char | cell | string
Output Arguments
Estimated panel model, returned as a PanelModel
object. EstMdl contains properties that store the estimation
results from fitting the random effects panel data regression model to the data. You can
access its properties by using dot notation.
More About
A panel data set contains measurements of
n subjects measured at most T times over a sampling
time frame. Panel data is a type of longitudinal data resulting from an observational study,
rather than a controlled experiment. This distinction impacts regression procedures used to
analyze these types of data sets. (To analyze longitudinal data, see fitlme, fitlmematrix, and fitrm.)
You can format panel data sets in two ways: wide and long. In what follows, an observation is all measurements (predictors xi, i = 1,…,p and response y data) of a subject (gk, k = 1,...,n) at a particular time (tj, j = 1,…,T).
In wide format, the predictor data set X
(input X) is a
T-by-n-by-p 3-D numeric array,
where rows correspond to contemporaneous sampling times in increasing order by row, columns
correspond to individual subjects, and pages correspond to predictor variables. The response
data set Y (input Y) in wide format is a
T-by-n matrix. This figure illustrates the predictor
and response data in wide format.

For a data set in this format, you can clearly infer the sampling time and subject by
its row and column, respectively. An observation of subject
gk at time
tj is the set
{X(,
j,k,:)Y(}. For
example, in the figure, the boxed values comprise the observation of subject
g2 at time
t1.j,k)
In long format, the predictor data set X is an m-by-p matrix, where the total sample size. Each row contains all predictor measurements of a particular subject at a particular time, and each column is a predictor variable. The response data set y is an m-by-1 vector, where each row is the response of the corresponding subject at the corresponding time. For data in this format, one cannot infer to which subject and sampling time each observation belongs. A variable of subject identifiers (group variable), an m-by-1 vector, is required. For each subject, observations are recorded in increasing order by row. This figure illustrates the predictor and response data in long format.

Row j of the subject identifier vector
sj is in the set
{g1,g2,…,gn}.
This figure illustrates the variables for all observations of subject
g2 (s =
g2, coded as g2). In the
figure, tj =
t(j). Because only those observations belonging to
subject g2 are displayed, the row indices are not
clear, but the sampling times are clear and, therefore, labeled.

An observation of subject gk at time
tj is the set
{Xgk(,
j,:)Ygk(}, where j)Xgk = X(groups ==
g and k,:)Ygk =
Y(groups == g. For example,
in the figure, the boxed values comprise the observation of subject
g2 at time
t1.k)
Panel data functionality accepts data in matrices, tables, and timetables. A panel data
set in a table or timetable is in long format; the Time variable of a
timetable specifies the sampling times of the observations.
Regardless of format, when the data set contains measurements for all subjects and sampling times, the data set is balanced. Otherwise, the data set is unbalanced.
A random effects panel data regression model is a linear regression model for panel data that includes a term, called the random effect, representing the influence of subjects on the response variable. The goals of a random effects panel data regression model are to study the impact of predictor variables on a response, while controlling for latent, subject-specific effects, called heterogeneity, and to study the influence of the heterogeneity itself.
A random effects panel data regression model resembles a typical linear random effects model. The differences between the two models include:
Panel data is the result of an observational study, whereas data analyzed by a typical linear random effects model is the result of an experimental study.
Panel data analysis includes a study of the values of the heterogeneity; this goal is atypical for experimental studies, where an ANOVA is typically the goal.
Symbolically, the random effects panel data regression model is
where:
ytj is the scalar response (dependent variable) of subject (entity or group) j at time t, j = 1,…,n and t = 1,…,T.
xtj is a p-by-1 vector of measurements of predictor variables (independent variables) of subject j at time t. xtj does not contain a constant term for the intercept.
β is a p-by-1 vector of fixed linear regression coefficients associated with the predictor variables. For each k = 1,…,p, coefficient k represents the influence predictor k averaged over all subjects and sampling times.
ɑj is the scalar time-invariant heterogeneity of subject j, which is treated as a random effect. For each j = 1,…,n, ɑj represents the latent influence of subject j; ɑj is randomly distributed across subjects with mean μɑ and variance σ2ɑ.
εtj is the disturbance, or idiosyncratic error, of subject j at time t. The conditional distribution of εtj, given the predictors and heterogeneity, has a mean of 0 and variance σ2ε.
In addition to the usual classical linear model assumptions, random effects panel data regression models assume that the heterogeneity is uncorrelated with the predictor variables, and that the heterogeneity is exogenous.
Ordinary least squares (OLS), with dummy-variable subject coding, produces consistent, but inefficient, coefficient estimates. Therefore, generalized least squares (GLS) or maximum likelihood, via a linear state-space model view of the model, are used to estimate coefficients and the heterogeneity terms, and to perform inference.
Algorithms
fitrepanel estimates the parameters of the model as follows:
When
Methodis"gls",fitrepaneluses generalized least squares (GLS) to estimate coefficients. This algorithm summarizes the estimation procedure:Estimate the idiosyncratic error variance σε2 by performing a within-subject regression (when that variance is not fixed by
DisturbanceVariance) . This procedure demeans the data, with respect to time, concatenates all variables to obscure the subject, and regresses the transformed response data (an m-by-1 vector; for balanced data, m = Tn) onto the transformed predictor data (an m-by-p matrix). The estimator of is the mean squared error (MSE) from the regression.Estimate the random-effects variance σɑ2 by performing a between-subject regression and transforming the resulting MSE (when that variance is not fixed by
EffectVariance). This procedure aggregates the data by subject by averaging over the time dimension, and then regresses the transformed response data (an n-by-1 vector) onto the transformed predictor data (an n-by-p+1 matrix, which includes a column of ones for the intercept). To obtain the random-effects variance estimator, transform the resulting MSE :where and Tj is the number of times subject j is sampled (
fitrepanelsets estimates of 0 to1e-6). For a balanced panel data set, .Estimate the coefficients by performing GLS on the quasi-within transformation of the data. For each subject j, the transformation is the vector ũj such that
where
ūj is the mean of the observations of subject j over time, and u represents each variable in the data, with uj being the vector of observations of subject j.
When
Methodis"ssm",fitrepanelviews the model as a linear state-space model to estimate the parameters. This algorithm summarizes the estimation procedure:Obtain initial values for the idiosyncratic and random-effects variances, required by the maximum likelihood procedure, by applying the same within- and between-subject regressions as in the
"gls"method (for those variances not being fixed byDisturbanceVarianceorEffectVariance).Create this state-space model, which specifies the latent states as the subject-specific random effect and the observations are deflated responses (the responses adjusted by fitted values from a regression):
In the system, j = 1,…,n, tj = 1,…,Tj, uj and εtj are standard Gaussian random variables, and is the vector of estimated coefficients resulting from GLS and the quasi-within transformation.
Estimate σɑ2 and σε2 by fitting the state-space model to the data. The estimation procedure applies the Kalman filter and maximum likelihood.
Estimate the coefficients by likelihood profiling: Perform GLS on the quasi-within transformed data, and use the MLEs of the variances to compute θj.
A Bayesian estimate of the latent random effect of subject j ɑj is
where is the estimated intercept from GLS and weight qj is
When you set RobustCovariance to true,
fitrepanel uses an expression proportional to this sandwich
estimator for the robust covariance of the coefficients
where:
which is an n-by-p matrix representing quasi-within-transformed predictor data (see Estimation Method Descriptions).
the residuals from GLS using the quasi-within transformed data.
References
[1] Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data, Second edition. Cambridge, MA: The MIT Press, 2010.
[2] Greene, William H. Econometric Analysis, Fifth edition. New York: Pearson, 2018.
Version History
Introduced in R2026a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
