MATLAB Examples

Medium / Long Term Energy Forecasting with Econometrics

Goal - Produce a reliable med term forecasting model for Energy Demand

Challenge - current med/long term models in Australia are regression based. These models are now incorrect as demand has been decreasing. An econometrics based model is dynamic and hence can capture the up & down swings that may occur in the future.

Current model is Univariate, ARIMA/GARCH

Data - Data used is historical Monthly Demand data for NSW along with Sydney Temperature.

David Willingham david.willingham@mathworks.com.au May, 2014

Contents

```% Uncomment if you wish to show date pre-processing, which aggregates the data to monthly % takes about 100s % ds = '31-Jan-1999'; % dE = '30-Jun-2014'; % % tic % TEcon = data_prep(ds,dE); % toc % Loading in historical monthly Load Data load T T(end,:) = []; %removing last row as it's incomplete F = T.MonthlyTotalDemand; D = [num2str(T.Months),repmat('/',length(T.Months),1),num2str(T.Years)]; D = datenum(D,'mm/yyyy'); plot(D,F) dynamicDateTicks ylabel('Demand') xlabel('Dates') title('Historical Monthly Demand for NSW') % Split data into training and test data set. Fin = F(1:end-12); Fout = F(end-11:end); Datein = D(1:end-12); Dateout = D(end-11:end); ```

Fit the trend

try and fit using Curve Fitting toolbox highlights it flaws with the downturn in demand

```% show cftool fitting 6th order sum of sin % cftool(Datein,Fin) %6th order sum of sin cftool('cfit.sfit') %presaved 6th order sum of sin mdl = createFit(Datein,Fin) %6th order sum of sin % try again but only using data up to the downturn % then see what happens when it's applied to the rest of the data set. % cftool(Datein(1:110),Fin(1:110)) %6th order sum of sin mdl = createFit1(Datein(1:110),Fin(1:110)); %6th order sum of sin figure % plot(D,F) Fin2 = mdl(Datein); plot(Datein,Fin,'ro') hold on plot(Datein,Fin2) dynamicDateTicks ylabel('Demand') xlabel('Dates') legend('actual demand','forecast demand','Location','best') title('Demand Forecast based on 6th order sum of sin') ```
```mdl = General model Sin6: mdl(x) = a1*sin(b1*x+c1) + a2*sin(b2*x+c2) + a3*sin(b3*x+c3) + a4*sin(b4*x+c4) + a5*sin(b5*x+c5) + a6*sin(b6*x+c6) where x is normalized by mean 7.327e+05 and std 1524 Coefficients (with 95% confidence bounds): a1 = 1.74e+08 (-1.565e+13, 1.565e+13) b1 = 0.7564 (-1312, 1313) c1 = 1.735 (-596, 599.4) a2 = 1.614e+08 (-1.565e+13, 1.565e+13) b2 = 0.7851 (-1371, 1372) c2 = -1.393 (-620.7, 617.9) a3 = 1.296e+05 (-1.124e+06, 1.384e+06) b3 = 3.8 (-22, 29.6) c3 = 1.157 (-13.91, 16.22) a4 = 5.957e+04 (-6.299e+05, 7.49e+05) b4 = 6.233 (-6.758, 19.23) c4 = -1.19 (-5.115, 2.736) a5 = 6.154e+05 (5.319e+05, 6.989e+05) b5 = 52.46 (52.32, 52.6) c5 = -2.496 (-2.631, -2.361) a6 = 6.684e+05 (5.846e+05, 7.521e+05) b6 = 26.18 (26.05, 26.31) c6 = -0.4139 (-0.5404, -0.2875) ```

Unit Root Tests for Stationarity

Augmented Dickey-Fuller test for unit root H0: Non-Stationary process H1: Stationary process

Note, the example here uses the ADFTEST function in its simplest form using the default number of lags (0), the default model (AR) and the default test statistic (t1). Besides customizing this test, there exists a variety of alternative tests:

• pptest: Phillips-Perron test
• kpsstest: KPSS test for trend stationarity
```[H,pValue] = adftest(diff(F)); if (H == false) disp(['The null hypothesis cannot be rejected with a pValue of ', ... num2str(pValue)]) disp('The data does not show enough evidence of stationarity') else disp(['The null hypothesis can be rejected with a pValue of ', ... num2str(pValue)]) disp('The data shows strong evidence of stationarity') end ```
```Warning: Test statistic #1 below tabulated critical values: minimum p-value = 0.001 reported. The null hypothesis can be rejected with a pValue of 0.001 The data shows strong evidence of stationarity ```

Test for ARCH Effects (Residual Autocorrelation)

Ljung-Box Q-test test for residual autocorrelation H0: series of residuals exhibits no autocorrelation H1: series of residuals exhibits some autocorrelation

Note, the example here uses the LBQTEST on the squared residuals. An alternative may be archtest, the Engle test for residual heteroscedasticity, for which you have to determine a suitable number of lags to draw valid inferences from it.

```[H,pValue] = lbqtest(diff(F)); if (H == false) disp(['The null hypothesis cannot be rejected with a pValue of ', ... num2str(pValue)]) disp('The data does not show enough evidence of autocorrelation') else disp(['The null hypothesis can be rejected with a pValue of ', ... num2str(pValue)]) disp('The data shows strong evidence of autocorrelation') end ```
```The null hypothesis can be rejected with a pValue of 0 The data shows strong evidence of autocorrelation ```

Identifying a suitable model

Now that we have verified to have a stationary, autocorrelated time series, we still have to identify a suitable model structure.

By computing the ACF and PACF of the stationary time series and of its squares, one can identify the orders of a first candidate model. The following rules of thumb may be useful:

• If the PACF of the series displays a sharp cutoff and/or the lag-1 autocorrelation is positive, i.e. if the series appears slightly "underdifferenced", then consider adding an AR term to the model. The lag at which the PACF cuts off is the indicated number of AR terms.
• If the ACF of the series displays a sharp cutoff and/or the lag-1 autocorrelation is negative, i.e. if the series appears slightly "overdifferenced", then consider adding an MA term to the model. The lag at which the ACF cuts off is the indicated number of MA terms.
• It is possible for an AR term and an MA term to cancel each other's effects, so if a mixed AR-MA model seems to fit the data, also try a model with one fewer AR term and one fewer MA term.
```figure subplot(2,2,1) autocorr(diff(F),12); subplot(2,2,2) parcorr(diff(F),12); subplot(2,2,3) autocorr(diff(F).^2,12); subplot(2,2,4) parcorr(diff(F).^2,12); % There is difficulty looking for ARIMA/GARCH effects, lets consider adding % a seasonal filter and look at the underlying trend ```

Seasonality Filter

Do the same process again, adding a 12 pt seasonal filter

```%12 pt seasonal filter pts = 12; win = ones(pts,1)./pts; F2 = convn(F,win,'same'); %applying filter F3 = F2(6:end-6); %removing moving avg window edges figure plot(D,F) hold on plot(D,F2,'r') dynamicDateTicks ylabel('Demand') xlabel('Dates') title('Historical Monthly Demand for NSW') legend('Original','Filtered Data') ```

ARIMA/GARCH effects in filtered (de seaonsonlised data)

Let's now repeat the process and look for ARIMA/GARCH effects

```% Show There is evidence of GARCH effects [H,pValue] = adftest(diff(F3)); if (H == false) disp(['The null hypothesis cannot be rejected with a pValue of ', ... num2str(pValue)]) disp('The data does not show enough evidence of stationarity') else disp(['The null hypothesis can be rejected with a pValue of ', ... num2str(pValue)]) disp('The data shows strong evidence of stationarity') end [H,pValue] = lbqtest(diff(F3)); if (H == false) disp(['The null hypothesis cannot be rejected with a pValue of ', ... num2str(pValue)]) disp('The data does not show enough evidence of autocorrelation') else disp(['The null hypothesis can be rejected with a pValue of ', ... num2str(pValue)]) disp('The data shows strong evidence of autocorrelation') end % taking 1st difference for 1 integrated term (I) figure subplot(2,2,1) autocorr(diff(F3),12); %AR term difficult to determine, lets use 1 for now subplot(2,2,2) parcorr(diff(F3),12); %shows MA term of 1 subplot(2,2,3) autocorr(diff(F3).^2,12); % shows P Garch term of lag 1 subplot(2,2,4) parcorr(diff(F3).^2,12); % Shows Q Garch term of lag 1 % It shows there is an ARIMA model of at least 1,1,1 % It shows there is a GARCH model of 1,1 ```
```Warning: Test statistic #1 below tabulated critical values: minimum p-value = 0.001 reported. The null hypothesis can be rejected with a pValue of 0.001 The data shows strong evidence of stationarity The null hypothesis can be rejected with a pValue of 0 The data shows strong evidence of autocorrelation ```

Build and forecast ARIMA/GARCH model

```garchMdl = garch(1,1); % garchMdl = garch(0,1); %show even with Garch terms, not much difference % in error mdl2 = arima('D',1,'Seasonality',12,'MALags',1,'SMALags',1,'Variance',garchMdl); %there is a montly trend fit = estimate(mdl2,Fin); [yf,ymse] = forecast(fit,12,'Y0',Fin); [ysim] = simulate(fit,12,'Y0',Fin,'NumPaths',5) figure plot(D,F) hold on plot(Dateout,Fout,'m') plot(Dateout,yf,'r'); plot(Dateout,yf+1.96*sqrt(ymse),'r:') plot(Dateout,yf-1.96*sqrt(ymse),'r:') % plot(Dateout,ysim) %show montecarl simulations if desired dynamicDateTicks % Error calculation looking 1 year forward error = yf - Fout; errorpct =abs(error)./Fout*100; MAE = mean(abs(error)); MAPE = mean(errorpct(~isinf(errorpct))); title(['Long Term Energy Forecast NSW - Mean Error ',num2str(MAPE),'%']) xlabel('Date') ylabel('Total Monthly Demand') legend('Historical Demand','Actual Demand','Predicted Demand','95% conf bound','95% conf bound') dynamicDateTicks % observations - model captures seasonal aspect well % misses overall declining trend ```
``` ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(1): --------------------------------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant -3205.9 15601.5 -0.205486 MA{1} -0.272396 0 0 SMA{1} -0.272396 0.0576038 -4.72878 GARCH(1,1) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 7.55219e+09 0.00220816 3.42013e+12 GARCH{1} 0.913404 0.0139327 65.5582 ARCH{1} 0.0324951 0.0143506 2.26436 ysim = 1.0e+07 * 1.2257 1.2355 1.2986 1.2741 1.2038 1.2514 1.2091 1.2439 1.2700 1.2224 1.1931 1.2169 1.2229 1.2539 1.2254 0.9624 1.0561 1.0654 1.0634 0.9060 1.0283 0.9893 1.0229 1.0616 0.9764 1.0087 0.9854 1.0741 1.0551 0.9508 1.0891 1.0082 1.0156 1.0882 1.0277 1.2056 1.1002 1.1298 1.2153 1.1230 1.0377 0.9424 0.9912 1.0334 0.9233 1.0990 1.0681 1.0955 1.1184 1.0623 1.1052 0.9738 1.0214 1.1046 0.9750 1.1331 1.0843 1.1171 1.2382 1.0679 ```

Do the same again, this time stop just before the downturn in energy

```% Split data into training and test data set. F2 = F(1:128); D2 = D(1:128); Fin = F2(1:end-12); Fout = F2(end-11:end); Datein = D2(1:end-12); Dateout = D2(end-11:end); % garchMdl = garch(1,1); % need to do different Garch model as Garch(1,1) errors out garchMdl = garch(0,1); % in error mdl2 = arima('D',1,'Seasonality',12,'MALags',1,'SMALags',1,'Variance',garchMdl); %there is a montly trend fit = estimate(mdl2,Fin); [yf,ymse] = forecast(fit,12,'Y0',Fin); figure plot(D2,F2) hold on plot(Dateout,Fout,'m') plot(Dateout,yf,'r'); plot(Dateout,yf+1.96*sqrt(ymse),'r:') plot(Dateout,yf-1.96*sqrt(ymse),'r:') dynamicDateTicks % Error calculation looking 1 year forward error = yf - Fout; errorpct =abs(error)./Fout*100; MAE = mean(abs(error)); MAPE = mean(errorpct(~isinf(errorpct))); title(['Long Term Energy Forecast NSW - Mean Error ',num2str(MAPE),'%']) xlabel('Date') ylabel('Total Monthly Demand') legend('Historical Demand','Actual Demand','Predicted Demand','95% conf bound','95% conf bound') dynamicDateTicks % observations - model captures seasonal aspect well % misses overal declining trend and error goes up ```
```Warning: Rank deficient, rank = 1, tol = 1.348921e+01. Warning: Lower bound constraints are active; standard errors may be inaccurate. ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(1): --------------------------------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 2817.66 14549.1 0.193665 MA{1} -0.388308 0.0550042 -7.05961 SMA{1} -0.388308 0.0550451 -7.05436 GARCH(0,0) Conditional Variance Model: ---------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 1.33401e+11 0.0032421 4.11466e+13 ```

Backtesting results

```% Creating BackTest that: % 1. Rebuild's the model everymonth % 2. Forecasts ahead 12 months. % 3. Compares to actual results % 4. Averages the 12 month monthly error's steps = length([108:1:length(D)]); count = 1; h = waitbar(count/steps,'Please Wait') ; tic for i = [108:1:length(D)] waitbar( count/steps,h) Ttemp = F(1:i,:); Tdates = D(1:i,:); % Split data into training and test data set. Fin = Ttemp(1:end-12,:); Find = Tdates(1:end-12,:); Fout = Ttemp(end-11:end,:); Foutd = Tdates(end-11:end,:); garchMdl = garch(0,1); mdl3 = arima('D',1,'Seasonality',12,'MALags',1,'SMALags',1,'Variance',garchMdl); %there is a montly trend fit = estimate(mdl3,Fin,'display','off'); [yf,ymse] = forecast(fit,12,'Y0',Fin); error = yf - Fout; errorpct =abs(error)./Fout*100; MAEbtest(i,:) = mean(abs(error)); MAPEbtest(i,:) = mean(errorpct(~isinf(errorpct))); count = count+1; end toc close(h) ```
```Warning: Rank deficient, rank = 1, tol = 5.777667e+01. Warning: Rank deficient, rank = 1, tol = 5.549453e+01. Warning: Rank deficient, rank = 1, tol = 5.809548e+01. Warning: Rank deficient, rank = 1, tol = 5.885973e+01. Warning: Rank deficient, rank = 1, tol = 2.300511e+01. Warning: Rank deficient, rank = 1, tol = 5.948340e+01. Warning: Rank deficient, rank = 1, tol = 1.951760e+01. Warning: Rank deficient, rank = 1, tol = 6.046812e+01. Warning: Rank deficient, rank = 1, tol = 4.080914e+01. Warning: Rank deficient, rank = 1, tol = 6.158290e+01. Warning: Rank deficient, rank = 1, tol = 9.615351e+01. Warning: Rank deficient, rank = 1, tol = 6.190033e+01. Warning: Rank deficient, rank = 1, tol = 5.726661e+01. Warning: Rank deficient, rank = 1, tol = 6.194501e+01. Warning: Rank deficient, rank = 1, tol = 6.546999e+01. Warning: Rank deficient, rank = 1, tol = 5.112819e+01. Warning: Rank deficient, rank = 1, tol = 3.937122e+01. Warning: Rank deficient, rank = 1, tol = 4.876619e+01. Warning: Rank deficient, rank = 1, tol = 4.737291e+01. Warning: Rank deficient, rank = 1, tol = 5.888880e+01. Warning: Rank deficient, rank = 1, tol = 8.298769e+01. Warning: Rank deficient, rank = 1, tol = 6.774759e+01. Warning: Rank deficient, rank = 1, tol = 6.747853e+01. Warning: Rank deficient, rank = 1, tol = 6.811806e+01. Warning: Rank deficient, rank = 1, tol = 2.030700e+02. Warning: Rank deficient, rank = 1, tol = 7.843788e+01. Warning: Rank deficient, rank = 1, tol = 1.348921e+01. Warning: Rank deficient, rank = 1, tol = 3.050921e+01. Warning: Rank deficient, rank = 1, tol = 6.778595e+01. Warning: Rank deficient, rank = 1, tol = 1.247101e+02. Warning: Rank deficient, rank = 1, tol = 9.592531e+01. Warning: Rank deficient, rank = 1, tol = 4.407777e+01. Warning: Rank deficient, rank = 1, tol = 5.302340e+01. Warning: Rank deficient, rank = 1, tol = 8.969668e+01. Warning: Rank deficient, rank = 1, tol = 1.992132e+02. Warning: Rank deficient, rank = 1, tol = 1.915360e+02. Warning: Rank deficient, rank = 1, tol = 5.442397e+01. Warning: Rank deficient, rank = 1, tol = 2.843012e+02. Warning: Rank deficient, rank = 1, tol = 6.690010e+02. Warning: Rank deficient, rank = 1, tol = 3.140100e+02. Warning: Rank deficient, rank = 1, tol = 1.477979e+02. Warning: Rank deficient, rank = 1, tol = 2.769302e+01. Warning: Rank deficient, rank = 1, tol = 4.423201e+01. Warning: Rank deficient, rank = 1, tol = 1.604503e+02. Warning: Rank deficient, rank = 1, tol = 1.028534e+02. Warning: Rank deficient, rank = 1, tol = 7.439498e+01. Warning: Rank deficient, rank = 1, tol = 1.251831e+02. Warning: Rank deficient, rank = 1, tol = 9.861621e+01. Warning: Rank deficient, rank = 1, tol = 1.108615e+02. Warning: Rank deficient, rank = 1, tol = 8.482719e+01. Warning: Rank deficient, rank = 1, tol = 5.269826e+01. Warning: Rank deficient, rank = 1, tol = 1.005678e+02. Warning: Rank deficient, rank = 1, tol = 2.156968e+02. Warning: Rank deficient, rank = 1, tol = 4.760483e+02. Warning: Rank deficient, rank = 1, tol = 3.652522e+02. Warning: Rank deficient, rank = 1, tol = 1.875755e+02. Warning: Rank deficient, rank = 1, tol = 1.097258e+02. Warning: Rank deficient, rank = 1, tol = 1.900944e+02. Warning: Rank deficient, rank = 1, tol = 1.455059e+02. Warning: Rank deficient, rank = 1, tol = 6.520350e+01. Warning: Rank deficient, rank = 1, tol = 1.731317e+02. Warning: Rank deficient, rank = 1, tol = 2.088591e+02. Warning: Rank deficient, rank = 1, tol = 2.816248e+02. Warning: Rank deficient, rank = 1, tol = 1.922926e+02. Warning: Rank deficient, rank = 1, tol = 1.816824e+02. Warning: Rank deficient, rank = 1, tol = 1.484791e+02. Warning: Rank deficient, rank = 1, tol = 4.686741e+02. Warning: Rank deficient, rank = 1, tol = 5.389766e+02. Warning: Rank deficient, rank = 1, tol = 5.483263e+02. Warning: Rank deficient, rank = 1, tol = 5.631838e+02. Warning: Rank deficient, rank = 1, tol = 3.212220e+02. Warning: Rank deficient, rank = 1, tol = 4.718760e+02. Warning: Rank deficient, rank = 1, tol = 3.922035e+02. Warning: Rank deficient, rank = 1, tol = 6.800673e+02. Warning: Rank deficient, rank = 1, tol = 4.395927e+02. Warning: Rank deficient, rank = 1, tol = 8.096288e+02. Warning: Rank deficient, rank = 1, tol = 7.132119e+02. Warning: Rank deficient, rank = 1, tol = 7.740225e+02. Warning: Rank deficient, rank = 1, tol = 2.622484e+02. Warning: Rank deficient, rank = 1, tol = 1.783237e+02. Warning: Rank deficient, rank = 1, tol = 6.341032e+02. Warning: Rank deficient, rank = 1, tol = 3.138847e+02. Warning: Rank deficient, rank = 1, tol = 4.849516e+02. Warning: Rank deficient, rank = 1, tol = 7.822079e+02. Elapsed time is 139.340660 seconds. ```

Plotting Error

```I = find(MAPEbtest ~= 0); derror_btest = D(I); mape_btest = MAPEbtest(I); figure plot(derror_btest,mape_btest) dynamicDateTicks title(['Backtesting Results Mean Abs Perc Error Pct - ',num2str(mean(mape_btest)),'%']) ylabel('Percentage') xlabel('Date') ```