Two-Stage Models for Engines

Overview of the Mathematics of Two-Stage Models

This section contains an overview of the mathematics of two-stage models. A comprehensive reference for two-stage modeling is Davidian and Giltinan [3]. The information is divided into the following sections:

Lindstrom and Bates [6] define repeated measurements as data generated by observing a number of individuals repeatedly under various experimental conditions, where the individuals are assumed to constitute a random sample from a population of interest. An important class of repeated measurements is longitudinal data where the observations are ordered by time or position in space. More generally, longitudinal data is defined as repeated measurements where the observations on a single individual are not, or cannot be, randomly assigned to the levels of a treatment of interest.

Modeling data of this kind usually involves the characterization of the relationship between the measured response, y, and the repeated measurement factor, or covariate x. Frequently, the underlying systematic relationship between y and x is nonlinear. In some cases the relevant nonlinear model can be derived on physical or mechanistic grounds. However, in other contexts a nonlinear relationship might be imposed simply to provide a convenient empirical description for the data. The presence of repeated observations on an individual requires particular care in characterizing the variation in the experimental data. In particular, it is important to represent two sources of variation explicitly: random variation among measurements within a given individual (intraindividual) and random variation among individuals (interindividual). Inferential procedures accommodate these different variance components within the framework of an appropriate hierarchical statistical model. This is the fundamental idea behind the analysis of repeated measurement data.

Holliday [1,2] was perhaps the first to apply nonlinear repeated measurements analysis procedures to spark ignition engine data. The focus of Holliday's work was the modeling of data taken from engine mapping experiments. In these experiments, engine speed, load, and air/fuel ratio were held constant while spark was varied. Various engine response characteristics, for example, torque or emission quantities, were measured at each spark setting. Holliday modeled the response characteristics for each sweep as a function of spark advance. Variations in the individual sweep parameters were then modeled as a function of the global engine operating variables speed, load, and air/fuel ratio. Conceptually, variations in the measurements taken within a sweep represent the intraindividual component of variance. Similarly, variation in the sweep-specific parameters between sweeps represents the interindividual component of variance. You can generalize these principles to other steady-state engine modeling exercises where the nature of data collection usually involves sweeping a single engine control variable while the remainder are held at fixed values. These points suggest that nonlinear repeated measurements analysis represents a general approach to the parameterization of mean value engines models for controls-oriented development.

Another application for models of this form is the flow equations for a throttle body. Assuming the flow equations are based upon the usual one-dimensional isentropic flow principle, then they must be modified by an effective area term, A_e, which accounts for the fact that the true flow is multidimensional and irreversible. You can map the throttle flow characteristics by sweeping the throttle position at fixed engine speed. This data collection methodology naturally imposes a hierarchy the analysis of which is consistent with the application of nonlinear repeated measures. Experience in modeling effective area suggests that free knot spline or biological growth models provide good local predictions. The global phase of the modeling procedure is concerned with predicting the systematic variation in the response features across engine speed. A free knot spline model has proven useful for this purpose.

Local Models

Modeling responses locally within a sweep as a function of the independent variable only. That is,

y_{i}^{j} = f_{i} (s_{i}^{j}, θ_{i}) + ε_{i}^{j} for j = 1, 2, ... m_{i}

(1)

where the subscript i refers to individual tests and j to data within a test, is the j^th independent value, $θ_{i}$ is a (rx1) parameter vector, is the j ^th response, and is a normally distributed random variable with zero mean and variance σ². Note that equation (4–1) can be either a linear or a nonlinear function of the curve fit parameters. The assumption of independently normally distributed errors implies that the least squares estimates of $θ$ are also maximum likelihood parameters.

Local Covariance Modeling

The local model describes both the systematic and random variation associated with measurements taken during the i^th test. Systematic variation is characterized through the function f while variation is characterized via the distributional assumptions made on the vector of random errors e_i. Hence, specification of a model for the distribution of e_i completes the description of the intratest model. The Model-Based Calibration Toolbox™ product allows a very general specification of the local covariance:

(2)

where C_i is an (n_i x n_i) covariance matrix, is the coefficient of variation, and ξ_i is a (q-by-1) vector of dispersion parameters that account for heterogeneity of variance and the possibility of serially correlated data. The specification is very general and affords considerable flexibility in terms of specifying a covariance model to adequately describe the random component of the intratest variation.

The Model-Based Calibration Toolbox product supports the following covariance models:

Power Variance Model:
(3)
Exponential Variance Model:
(4)
Mixed Variance Model:
(5)

where diag{x} is a diagonal matrix.

Correlation models are only available for equispaced data in the Model-Based Calibration Toolbox product. It is possible to combine correlation models with models with the variance models such as power.

One of the simplest structures that can be used to account for serially correlated errors is the AR(m) model (autoregressive model with lag m). The general form of the AR(m) model is

(6)

where is the k^th lag coefficient and v_j is an exogenous stochastic input identically and independently distributed as . First- and second-order autoregressive models are implemented in the Model-Based Calibration Toolbox product.

Another possibility is a moving average model (MA). The general structure is

(7)

where is the k^th lag coefficient and v_j is an exogenous stochastic input identically and independently distributed as . Only a first-order moving average model is implemented in the Model-Based Calibration Toolbox product.

Response Features

From an engineering perspective, the curve fit parameters do not usually have any intuitive interpretation. Rather characteristic geometric features of the curve are of interest. The terminology “response features” of Crowder and Hand [8] is used to describe these geometric features of interest. In general, the response feature vector p_i for the i^th sweep is a nonlinear function (g) of the corresponding curve fit parameter vector $θ_{i}$ , such that

(8)

Global Models

Modeling the variation in the response features as a function of the global variables. The response features are carried through to the second stage of the modeling procedure rather than the curve fit parameters because they have an engineering interpretation. This ensures that the second stage of the modeling process remains relatively intuitive. It is much more likely that an engineer will have better knowledge of how a response feature such as MBT behaves throughout the engine operating range (at least on a main effects basis) as opposed to an esoteric curve fit parameter estimate.

The global relationship is represented by one of the global models available in the Model-Based Calibration Toolbox product. In this section we only consider linear models that can be represented as

P_{i} = X_{i} β + γ_{i for i = 1, 2, ..., r}

(9)

where the X_i contains the information about the engine operating conditions at the i^th spark sweep, β is the vector of global parameter estimates that must be estimated by the fitting procedure, and $γ_{i}$ is a vector of normally distributed random errors. It is necessary to make some assumption about the error distribution for $γ$ , and this is typically a normal distribution with

(10)

where r is the number of response features. The dimensions of D are (rxr) and, being a variance-covariance matrix, D is both symmetric and positive definite. Terms on the leading diagonal of D represent the test-to-test variance associated with the estimate of the individual response features. Off-diagonal terms represent the covariance between pairs of response features. The estimation of these additional covariance terms in a multivariate analysis improves the precision of the parameter estimates.

Two-Stage Models

To unite the two models, it is first necessary to review the distributional assumptions pertaining to the response feature vector p_i. The variance of p_i (Var(p_i)) is given by

(11)

For the sake of simplicity, the notation σ²C_i is to denote Var(p_i). Thus, p_iiis distributed as

(12)

where C_i depends on f_i through the variance of $θ_{i}$ and also on g_i through the conversion of $θ_{i}$ to the response features p_i. Two standard assumptions are used in determining C_i: the asymptotic approximation for the variance of maximum likelihood estimates and the approximation for the variance of functions of maximum likelihood estimates, which is based on a Taylor series expansion of g_i. In addition, for nonlinear or g_i, C_i depends on the unknown $θ_{i}$ ; therefore, we will use the estimate in its place. These approximations are likely to be good in the case where σ² is small or the number of points per sweep (m_i) is large. In either case we assume that these approximations are valid throughout.

We now return to the issue of parameter estimation. Assume that the $γ_{i}$ are independent of the . Then, allowing for the additive replication error in response features, the response features are distributed as

(13)

When all the tests are considered simultaneously, equation (6-13) can be written in the compact form

(14)

where P is the vector formed by stacking the n vectors p_i on top of each other, Z is the matrix formed by stacking the n X_i matrices, W is the block diagonal weighting matrix with the matrices on the diagonal being σ²C_i+D, and ω is a vector of dispersion parameters. For the multivariate normal distribution (6-14) the negative log likelihood function can be written:

(15)

Thus, the maximum likelihood estimates are the vectors β_ML and ω_ML that minimize logL(β,ω). Usually there are many more fit parameters than dispersion parameters; that is, the dimension of β is much larger than ω. As such, it is advantageous to reduce the number of parameters involved in the minimization of logL(β,ω). The key is to realize that equation (6-15) is conditionally linear with respect to β. Hence, given estimates of ω, equation (6-15) can be differentiated directly with respect to β and the resulting expression set to zero. This equation can be solved directly for β as follows:

(16)

The key point is that now the likelihood depends only upon the dispersion parameter vector ω, which as already discussed has only modest dimensions. Once the likelihood is minimized to yield ω_ML, then, since W(ω_ML) is then known, equation (6-16) can subsequently be used to determine β_ML.

Global Model Selection

Before undertaking the minimization of Equation 15 (see Two-Stage Models) it is first necessary to establish the form of the X_i matrix. This is equivalent to establishing a global expression for each of the response features a priori. Univariate stepwise regression is used to select the form of the global model for each response feature. Minimization of the appropriate PRESS statistic is used as a model building principle, as specified in Stepwise in the Model Building Process. The underlying principle is that having used univariate methods to establish possible models, maximum likelihood methods are subsequently used to estimate their parameters.

Initial Values for Covariances

An initial estimate of the global covariance is obtained using the standard two-stage estimate of Steimer et al. [10],

(17)

where β are the estimates from all the univariate global models. This estimate is biased.

Quasi-Newton Algorithm

Implicit to the minimization of equation (6-17) is that D is positive definite. It is a simple matter to ensure this by noting that D is positive definite if and only if there is an upper triangular matrix, G, say, such that

(18)

This factorization is used in the Quasi-Newton algorithm. Primarily, the advantage of this approach is that the resulting search in G, as opposed to D, is unconstrained.

Expectation Maximization Algorithm

The expectation maximization algorithm is an iterative method that converges toward the maximal solution of the likelihood function. Each iteration has two steps:

Expectation Step — Produce refined estimates of the response features given the current parameter estimates.
Maximization Step — Obtain new estimates of the parameters (global model parameters and covariance matrix) for the new response features.

These steps are repeated until the improvement in value of the log likelihood function is less than the tolerance. Details of the algorithm can be found in [3, Chapter 5].

References

Holliday, T., The Design and Analysis of Engine Mapping Experiments: A Two-Stage Approach, Ph.D. thesis, University of Birmingham, 1995.
Holliday, T., Lawrance, A. J., Davis, T. P., Engine-Mapping Experiments: A Two-Stage Regression Approach, Technometrics, 1998, Vol. 40, pp 120-126.
Davidian, M., Giltinan, D. M., Nonlinear Models for Repeated Measurement Data, Chapman & Hall, First Edition, 1995.
Davidian, M., Giltinan, D. M., Analysis of repeated measurement data using the nonlinear mixed effects model, Chemometrics and Intelligent Laboratory Systems, 1993, Vol. 20, pp 1-24.
Davidian, M., Giltinan, D. M., Analysis of repeated measurement data using the nonlinear mixed effects model, Journal of Biopharmaceutical Statistics, 1993, Vol. 3, part 1, pp 23-55.
Lindstrom, M. J., Bates, D. M., Nonlinear Mixed Effects Models for Repeated Measures Data, Biometrics, 1990, Vol. 46, pp 673-687.
Davidian, M., Giltinan, D. M., Some Simple Methods for Estimating Intraindividual Variability in Nonlinear Mixed Effects Models, Biometrics, 1993, Vol. 49, pp 59-73.
Hand, D. J., Crowder, M. J., Practical Longitudinal Data Analysis, Chapman and Hall, First Edition, 1996.
Franklin, G.F., Powell, J.D., Workman, M.L., Digital Control of Dynamic Systems, Addison-Wesley, Second Edition, 1990.
Steimer, J.-L., Mallet, A., Golmard, J.L., and Boisvieux, J.F., Alternative approaches to estimation of population pharmacokinetic parameters: Comparison with the nonlinear mixed effect model. Drug Metabolism Reviews, 1984, 15, 265-292.

Linear Regression

Building a regression model that includes only a subset of the total number of available terms involves a tradeoff between two conflicting objectives:

Increasing the number of model terms always reduces the Sum Squared Error.
However, you do not want so many model terms that you overfit by chasing points and trying to fit the model to signal noise. This reduces the predictive value of your model.

The best regression equation is the one that provides a satisfactory tradeoff between these conflicting goals, at least in the mind of the analyst. It is well known that there is no unique definition of best. Different model building criteria (for example, forward selection, backward selection, PRESS search, stepwise search, Mallows C_p Statistic...) yield different models. In addition, even if the optimal value of the model building statistic is found, there is no guarantee that the resulting model will be optimal in any other of the accepted senses.

Principally the purpose of building the regression model for calibration is for predicting future observations of the mean value of the response feature. Therefore the aim is to select the subset of regression terms such that PRESS is minimized. Minimizing PRESS is consistent with the goal of obtaining a regression model that provides good predictive capability over the experimental factor space. This approach can be applied to both polynomial and spline models. In either case the model building process is identical.

The regression matrix can be viewed in the Design Evaluation Tool. Terms in this matrix define the full model. In general, the stepwise model is a subset of this full term set.
All regressions are carried out with the factors represented on their coded scales (-1,1).

Toolbox Terms and Statistics Definitions

Definitions

Symbol	Definition
N	Number of data points
p	Number of terms currently included in the model
q	Total number of possible model parameters (q=p+r)
r	Number of terms not currently included from the model
y	(Nx1) response vector
X	Regression matrix. X has dimensions (Nxq)
X_p	(Nxp) model matrix corresponding to terms currently included in the model
X_r	(Nxr) matrix corresponding to terms currently excluded from the model
	(px1) vector of model coefficients
PEV	Prediction Error Variance
	User-defined threshold criteria for automatically rejecting terms
	(Nx1) vector of predicted responses.
e	(Nx1) residual vector.
e_(i)	(Nx1) vector of PRESS residuals.
H	Hat matrix.
L	(Nx1) vector of leverage values.
VIF	Variance Inflation Factors
SSE	Error Sum of Squares. SSE = e'e
SSR	Regression Sum of Squares. SSE =
SST	Total Sum of Squares. SST = y'y - N
MSE	Mean Square Error. MSE = SSE/(N-p)
MSR	Mean Square of Regression. MSR = SSR/P
F	F-statistic. F = MSR/MSE
MSE_(i)	MSE calculated with i^th point removed from the data set.
RMSE	Root Mean Squared Error: the standard deviation of regression.
s_i	i^th R-Student or Externally Scaled Studentized Residual.
r_i	i^th Standardized or Internally Scaled Studentized Residual.
D	Cook's D Influence Diagnostic.
SEBETA	(px1) vector of model coefficient standard errors. where
PRESS	Predicted Error Sum of Squares. PRESS = e'_(i)e_(i)

For more on PRESS and other displayed statistics, see PRESS statistic, Guidelines for Selecting the Best Model Fit, and Pooled Statistics.