Contenuto principale

bj

Estimate Box-Jenkins polynomial model using time-domain data

Description

Box-Jenkins (BJ) models are a special configuration of polynomial models that provide completely independent parameterization for the dynamics and noise using rational polynomial functions. BJ models, which are always discrete-time models, can be estimated only from time-domain data. Use BJ models when the noise is primarily a measurement disturbance rather than an input disturbance. The BJ structure provides additional flexibility for modeling the noise.

Estimate Box-Jenkins Model

sys = bj(tt,[nb nc nd nf nk]) estimates a Box-Jenkins polynomial model sys using the data contained in the variables of timetable tt. The software uses the first Nu variables as inputs and the next Ny variables as outputs, where Nu and Ny are determined from the dimensions of the specified polynomial orders.

sys is represented by the equation

y(t)=i=1nuBi(q)Fi(q)ui(tnki)+C(q)D(q)e(t)

Here, y(t) is the output, u(t) is the input, and e(t) is the error.

The components of [nb nc nd nf nk] define the orders of the polynomials used for estimation. For more information about the Box-Jenkins model structure, see Box-Jenkins Model Structure.

To select specific input and output channels from tt, use name-value syntax to set 'InputName' and 'OutputName' to the corresponding timetable variable names.

sys = bj(u,y,[nb nc nd nf nk]) uses the time-domain input and output signals in the comma-separated matrices u,y. The software assumes that the data sample time is 1 second. To change the sample time, set Ts using name-value syntax.

sys = bj(data,[nb nc nd nf nk]) uses the time-domain data in the iddata object data. Use this syntax especially when you want to take advantage of the additional information, such as data sample time or experiment labeling, that data objects provide.

example

sys = bj(___, Name,Value) specifies model structure attributes using additional options specified by one or more name-value arguments. You can use this syntax with any of the previous input-argument combinations.

Configure Initial Parameters

sys = bj(tt, init_sys) uses the polynomial model init_sys to configure the initial parameterization of sys for estimation using the timetable tt.

sys = bj(u,y,init_sys) uses the matrix data u,y for estimation.

sys = bj(u,y,init_sys) uses the data object data, for estimation.

Specify Additional Estimation Options

sys = bj(___, opt) incorporates an option set opt that specifies options such as handling of initial conditions, regularization, and numerical search method to use for estimation.

example

Return Estimated Initial Conditions

[sys,ic] = bj(___) returns the estimated initial conditions as an initialCondition object. Use this syntax if you plan to simulate or predict the model response using the same estimation input data and then compare the response with the same estimation output data. Incorporating the initial conditions yields a better match during the first part of the simulation.

Examples

collapse all

Estimate the parameters of a single-input, single-output Box-Jenkins model from measured data.

load sdata1 tt1;
nb = 2;
nc = 2;
nd = 2;
nf = 2;
nk = 1;
sys = bj(tt1,[nb nc nd nf nk]);

sys is a discrete-time idpoly model with estimated coefficients. The order of sys is as described by nb, nc, nd, nf, and nk.

Compare the model output to the estimation data.

compare(sys,tt1)

Figure contains an axes object. The axes object with ylabel y contains 2 objects of type line. These objects represent Validation data (y), sys: 70.75%.

Use getpvec to obtain the estimated parameters and getcov to obtain the covariance associated with the estimated parameters.

Estimate the parameters of a multi-input, single-output Box-Jenkins model from measured data.

Load the data tt8, which is in the form of a timetable containing three input channels and one output channel. Separate tt8 into estimation and validation portions.

load sdata8.mat tt8
tt8e = tt8(1:250,:);
tt8v = tt8(251:end,:);

Specify the orders and delays. Then, estimate the model sys.

nb = [2 1 1];
nc = 1;
nd = 1;
nf = [2 1 2] + 1;
nk = [1 0 2];
sys=bj(tt8e,[nb nc nd nf nk]);

Compare the output of sys with the validation data.

compare(tt8v,sys)

Figure contains an axes object. The axes object with ylabel y contains 2 objects of type line. These objects represent Validation data (y), sys: 81.23%.

Estimate a regularized BJ model by converting a regularized ARX model.

Load data.

load regularizationExampleData.mat m0simdata;

Estimate an unregularized BJ model of order 30.

m1 = bj(m0simdata(1:150),[15 15 15 15 1]);

Estimate a regularized BJ model by determining Lambda value by trial and error.

opt = bjOptions;
opt.Regularization.Lambda = 1;
m2 = bj(m0simdata(1:150),[15 15 15 15 1],opt);

Obtain a lower-order BJ model by converting a regularized ARX model followed by order reduction.

opt1 = arxOptions;
[L,R] = arxRegul(m0simdata(1:150),[30 30 1]);
opt1.Regularization.Lambda = L;
opt1.Regularization.R = R;
m0 = arx(m0simdata(1:150),[30 30 1],opt1);
mr = idpoly(balred(idss(m0),7));

Compare the model outputs against data.

opt2 = compareOptions('InitialCondition','z');
compare(m0simdata(150:end),m1,m2,mr,opt2);

Figure contains an axes object. The axes object with ylabel y1 contains 4 objects of type line. These objects represent Validation data (y1), m1: 52.83%, m2: 57.59%, mr: 64.91%.

Estimate the parameters of a single-input, single-output Box-Jenkins model while configuring some estimation options.

Generate estimation data.

B = [0 1 0.5];
C = [1 -1 0.2];
D = [1 1.5 0.7];
F = [1 -1.5 0.7];
sys0 = idpoly(1,B,C,D,F,0.1);
e = randn(200,1);
u = idinput(200); 
y = sim(sys0,[u e]);

data is a single-input, single-output data set created by simulating a known model.

Estimate initial Box-Jenkins model.

nb = 2;
nc = 2;
nd = 2;
nf = 2;
nk = 1;
init_sys = bj(u,y,[2 2 2 2 1]);

Create an estimation option set to refine the parameters of the estimated model.

opt = bjOptions;
opt.Display = 'on';
opt.SearchOptions.MaxIterations = 50;

opt is an estimation option set that configures the estimation to iterate 50 times at most and display the estimation progress.

Re-estimate the model parameters using the estimation option set.

sys = bj(u,y,init_sys,opt);

sys is estimated using init_sys for the initial parameterization for the polynomial coefficients.

To view the estimation result, enter sys.Report.

Estimate a multi-input, multi-output Box-Jenkins model from estimated data.

Load measured data.

load iddata1 z1
load iddata2 z2
data = [z1 z2(1:300)];

data contains the measured data for two inputs and two outputs.

Estimate the model.

 nb = [2 2; 3 4];  
 nc = [2;2]; 
 nd = [2;2]; 
 nf = [1 0; 2 2];
 nk = [1 1; 0 0];
 sys = bj(data,[nb nc nd nf nk]);

The polynomial order coefficients contain one row for each output.

sys is a discrete-time idpoly model with two inputs and two outputs.

Load the data.

load iddata1ic z1i

Estimate a second-order Box-Jenkins model sys and return the initial conditions in ic.

nb = 2;
nc = 2;
nd = 2;
nf = 2;
nk = 1;
[sys,ic] = bj(z1i,[nb nc nd nf nk]);
ic
ic = 
  initialCondition with properties:

     A: [4×4 double]
    X0: [4×1 double]
     C: [0.8744 0.5426 0.4647 -0.5285]
    Ts: 0.1000

ic is an initialCondition object that encapsulates the free response of sys, in state-space form, to the initial state vector in X0. You can incorporate ic when you simulate sys with the z1i input signal and compare the response with the z1i output signal.

Input Arguments

collapse all

Estimation data, specified as a timetable that uses a regularly spaced time vector. tt contains variables representing input and output channels. For multiexperiment data, tt is a cell array of timetables of length Ne, where Ne is the number of experiments.

The software determines the number of input and output channels to use for estimation from the dimensions of the specified polynomial orders. The input/output channel selection depends on whether the 'InputName' and 'OutputName' name-value arguments are specified.

  • If 'InputName' and 'OutputName' are not specified, then the software uses the first Nu variables of tt as inputs and the next Ny variables of tt as outputs.

  • If 'InputName' and 'OutputName' are specified, then the software uses the specified variables. The number of specified input and output names must be consistent with Nu and Ny.

  • For functions that can estimate a time series model, where there are no inputs, 'InputName' does not need to be specified.

For more information about working with estimation data types, see Data Domains and Data Types in System Identification Toolbox.

Estimation data, specified for SISO systems as a comma-separated pair of Ns-by-1 real-valued matrices that contain uniformly sampled input and output time-domain signal values. Here, Ns is the number of samples.

For MIMO systems, specify u,y as an input/output matrix pair with the following dimensions:

  • uNs-by-Nu, where Nu is the number of inputs.

  • yNs-by-Ny, where Ny is the number of outputs.

For multiexperiment data, specify u,y as a pair of 1-by-Ne cell arrays, where Ne is the number of experiments. The sample times of all the experiments must match.

For time series data, which contains only outputs and no inputs, specify [],y.

For more information about working with estimation data types, see Data Domains and Data Types in System Identification Toolbox.

Estimation data, specified as an iddata object that contains time-domain input and output signal values.

You cannot use frequency-domain data for estimating bj models.

Vector of matrices with nonnegative integers that contain the orders and delays of the Box-Jenkins model, as the following list describes.

  • nb — Order of the B polynomial + 1, specified as an Ny-by-Nu matrix, where Ny is the number of outputs and Nu is the number of inputs.

  • nc — Order of the C polynomial + 1, specified as an Ny-by-1 matrix.

  • nd — Order of the D polynomial + 1, specified as an Ny-by-1 matrix.

  • nf — Order of the F polynomial + 1, specified as an Ny-by-Nu matrix.

  • nk — Input delay in units of samples, specified as an Nu-by-Ny matrix.

Polynomial model that configures the initial parameterization of sys, specified as an idpoly model with the Box-Jenkins structure that has only B, C, D and F polynomials active.

bj uses the parameters and constraints defined in init_sys as the initial guess for estimating sys.

Use the Structure property of init_sys to configure initial guesses and constraints for B(q), F(q), C(q) and D(q).

To specify an initial guess for, say, the C(q) term of init_sys, set init_sys.Structure.C.Value to the guess value.

To specify constraints for, say, the B(q) term of init_sys:

  • set init_sys.Structure.B.Minimum to the minimum B(q) coefficient values

  • set init_sys.Structure.B.Maximum to the maximum B(q) coefficient values

  • set init_sys.Structure.B.Free to indicate which B(q) coefficients are free for estimation

You can similarly specify the initial guess and constraints for the other polynomials.

Estimation options, specified as an bjOptions option set. Options specified by opt include:

  • Estimation objective

  • Handling of initial conditions

  • Numerical search method and the associated options

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'InputDelay',1

Input channel names, specified as a string, character vector, string array, or cell array of character vectors.

If you are using a timetable for the data source, the names in InputName must be a subset of the timetable variables.

Example: sys = bj(tt,__,'InputName',["u1" "u2"]) selects the variables u1 and u2 as the input channels from the timetable tt to use for the estimation.

Output channel names, specified as a string, character vector, string array, or cell array of character vectors.

If you are using a timetable for the data source, the names in OutputName must be a subset of the timetable variables.

Example: sys = bj(tt,__,'OutputName',["y1" "y3"]) selects the variables y1 and y3 as the output channels from the timetable tt to use for the estimation.

Sample time, specified as the comma-separated pair consisting of 'Ts' and the sample time in the units specified by TimeUnit. When you use matrix-based data (u,y), you must specify Ts if you require a sample time other than the assumed sample time of 1 second.

To obtain the data sample time for a timetable tt, use the timetable property tt.Properties.Timestep.

Example: bj(umat1,ymat1,___,'Ts',0.08) computes a model with sample time of 0.08 seconds.

Input delays for each input channel, specified as the comma-separated pair consisting of 'InputDelay' and a numeric vector.

  • For continuous-time models, specify 'InputDelay' in the time units stored in the TimeUnit property.

  • For discrete-time models, specify 'InputDelay' in integer multiples of the sample time Ts. For example, setting 'InputDelay' to 3 specifies a delay of three sampling periods.

For a system with Nu inputs, set InputDelay to an Nu-by-1 vector. Each entry of this vector is a numerical value that represents the input delay for the corresponding input channel.

To apply the same delay to all channels, specify 'InputDelay' as a scalar.

Transport delays for each input/output pair, specified as the comma-separated pair consisting of 'IODelay' and a numeric array.

  • For continuous-time models, specify 'IODelay' in the time units stored in the TimeUnit property.

  • For discrete-time models, specify 'IODelay' in integer multiples of the sample time Ts. For example, setting 'IODelay' to 4 specifies a transport delay of four sampling periods.

For a system with Nu inputs and Ny outputs, set 'IODelay' to an Ny-by-Nu matrix. Each entry is an integer value representing the transport delay for the corresponding input/output pair.

To apply the same delay to all channels, specify 'IODelay' as a scalar.

You can specify 'IODelay' as an alternative to the nk value. Doing so simplifies the model structure by reducing the number of leading zeros in the B polynomial. In particular, you can represent max(nk-1,0) leading zeros as input/output delays using 'IODelay' instead.

Flag to use integrators in the noise channels, specified as false(Ny,1), a logical scalar, or a logical vector of length Ny, where Ny is the number of outputs.

Setting IntegrateNoise to true for a particular output results in the model:

y(t)=B(q)F(q)u(tnk)+C(q)D(q)e(t)1q1

Here, 11q1 is the integrator in the noise channel,e(t).

Output Arguments

collapse all

Box-Jenkins polynomial model that fits the estimation data, returned as a discrete-time idpoly object. This model is created using the specified model orders, delays, and estimation options.

Information about the estimation results and options used is stored in the Report property of the model. Report has the following fields:

Report FieldDescription
Status

Summary of the model status, which indicates whether the model was created by construction or obtained by estimation

Method

Estimation command used

InitialCondition

Handling of initial conditions during model estimation, returned as one of the following values:

  • 'zero' — The initial conditions were set to zero.

  • 'estimate' — The initial conditions were treated as independent estimation parameters.

  • 'backcast' — The initial conditions were estimated using the best least squares fit.

This field is especially useful to view how the initial conditions were handled when the InitialCondition option in the estimation option set is 'auto'.

Fit

Quantitative assessment of the estimation, returned as a structure. See Loss Function and Model Quality Metrics for more information on these quality metrics. The structure has these fields.

  • FitPercent — Normalized root mean squared error (NRMSE) measure of how well the response of the model fits the estimation data, expressed as the percentage fitpercent = 100(1-NRMSE)

  • LossFcn — Value of the loss function when the estimation completes

  • MSE — Mean squared error (MSE) measure of how well the response of the model fits the estimation data

  • FPE — Final prediction error for the model

  • AIC — Raw Akaike Information Criteria (AIC) measure of model quality

  • AICc — Small-sample-size corrected AIC

  • nAIC — Normalized AIC

  • BIC — Bayesian Information Criteria (BIC)

Parameters

Estimated values of model parameters

OptionsUsed

Option set used for estimation. If no custom options were configured, this is a set of default options. See bjOptions for more information.

RandState

State of the random number stream at the start of estimation. Empty, [], if randomization was not used during estimation. For more information, see rng.

DataUsed

Attributes of the data used for estimation, returned as a structure with the following fields.

  • Name — Name of the data set

  • Type — Data type

  • Length — Number of data samples

  • Ts — Sample time

  • InterSample — Input intersample behavior, returned as one of the following values:

    • 'zoh' — A zero-order hold maintains a piecewise-constant input signal between samples.

    • 'foh' — A first-order hold maintains a piecewise-linear input signal between samples.

    • 'bl' — Band-limited behavior specifies that the continuous-time input signal has zero power above the Nyquist frequency.

  • InputOffset — Offset removed from time-domain input data during estimation. For nonlinear models, it is [].

  • OutputOffset — Offset removed from time-domain output data during estimation. For nonlinear models, it is [].

Termination

Termination conditions for the iterative search used for prediction error minimization, returned as a structure with these fields.

  • WhyStop — Reason for terminating the numerical search

  • Iterations — Number of search iterations performed by the estimation algorithm

  • FirstOrderOptimality-norm of the gradient search vector when the search algorithm terminates

  • FcnCount — Number of times the objective function was called

  • UpdateNorm — Norm of the gradient search vector in the last iteration. Omitted when the search method is 'lsqnonlin' or 'fmincon'.

  • LastImprovement — Criterion improvement in the last iteration, expressed as a percentage. Omitted when the search method is 'lsqnonlin' or 'fmincon'.

  • Algorithm — Algorithm used by 'lsqnonlin' or 'fmincon' search method. Omitted when other search methods are used.

For estimation methods that do not require numerical search optimization, the Termination field is omitted.

For more information on using Report, see Estimation Report.

Estimated initial conditions, returned as an initialCondition object or an object array of initialCondition values.

  • For a single-experiment data set, ic represents, in state-space form, the free response of the transfer function model (A and C matrices) to the estimated initial states (x0).

  • For a multiple-experiment data set with Ne experiments, ic is an object array of length Ne that contains one set of initialCondition values for each experiment.

If bj returns ic values of 0 and the you know that you have non-zero initial conditions, set the 'InitialCondition' option in bjOptions to 'estimate' and pass the updated option set to bj. For example:

opt = bjOptions('InitialCondition','estimate')
[sys,ic] = bj(data,[nb nc nd nf nk],opt)
The default 'auto' setting of 'InitialCondition' uses the 'zero' method when the initial conditions have a negligible effect on the overall estimation-error minimization process. Specifying 'estimate' ensures that the software estimates values for ic.

For more information, see initialCondition. For an example of using this argument, see Obtain Initial Conditions.

More About

collapse all

Alternatives

To estimate a continuous-time model, use:

  • tfest — returns a transfer function model

  • ssest — returns a state-space model

  • bj to first estimate a discrete-time model and convert it a continuous-time model using d2c.

References

[1] Ljung, L. System Identification: Theory for the User, Upper Saddle River, NJ, Prentice-Hall PTR, 1999.

Version History

Introduced before R2006a

expand all