Calculate the test set mean squared error (MSE) of a direct forecasting model.
Load the sample file TemperatureData.csv, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.
Year Month Day TemperatureF
____ ___________ ___ ____________
2015 {'January'} 1 23
2015 {'January'} 2 31
2015 {'January'} 3 25
2015 {'January'} 4 39
2015 {'January'} 5 29
2015 {'January'} 6 12
2015 {'January'} 7 10
2015 {'January'} 8 4
For this example, use a subset of the temperature data that omits the first 100 observations.
Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. Then, use t to convert Tbl into a timetable.
Plot the temperature values in Tbl over time.
Partition the temperature data into training and test sets by using tspartition. Reserve 20% of the observations for testing.
Create a full direct forecasting model by using the data in trainingTbl. Train the model using a decision tree learner. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags.
Mdl =
DirectForecaster
Horizon: 1
ResponseLags: [1 2 3 4 5 6 7]
LeadingPredictors: [1 2 3]
LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]}
ResponseName: 'TemperatureF'
PredictorNames: {'Year' 'Month' 'Day'}
CategoricalPredictors: 2
Learners: {[1×1 classreg.learning.regr.CompactRegressionTree]}
MaxLag: 7
NumObservations: 372
Properties, Methods
Mdl is a DirectForecaster model object. By default, the horizon is one step ahead. That is, Mdl predicts a value that is one step into the future.
Calculate the test set MSE. Smaller MSE values indicate better performance.