Forecasting is predicting given available information for
We have several sources of error, like parameter uncertainty, model uncertainty, and ignorance about future errors
Our 1-step ahead forecast is defined as
How do we get an optimal forecast? In practice, we minimize the mean squared error
We propose this will look like
We can do some algebra with expectations to show that is a lower bound for any forecaster (with regards to MSE)
The same result holds for the h-step ahead forecast for general , which is
Example:
For an process, we get
If is normal then our full error term is also normally distributed, which means we can build confidence intervals
With an intercept, our forecasts look like,
For , we can derive
Example:
Let follow
(assuming )
Definition: The least squares estimator of a parameter minimizes the sum of squared residuals
Suppose we have a sample of size from an process,
with and
In matrix form, we can write
For , this reduces to
With some reasonable assumptions, it can be shown that as , with asymptotic variance
- with
- is stationary
For , this reduces to
This means this is a robust estimator!
For large , we have
where and
Definition: For a sample with joint density , the maximum likelihood estimator is
In practice, we maximize the log likelihood
This writes a -dimensional joint density as a sum of univariate densities
Example:
Now take
We then derive this by setting the derivative to 0
For any , the value of that maximizes the above function is the one that minimizes . So effectively, we’ve reached the LS estimator!
For ARMA models, we use a similar approach to derive a ML estimator
Example:
Consider
We get
Again, the ML estimate of equals the LS estimate, essentially because are assumed to be normal
Example:
Consider with
Assume
for and for
So we get
Model Selection
How do we select parameters?
If we select too large then we risk higher variance for our model
If we select too small then our model cannot possibly capture all the data’s nuances
We can strike a balance by minimizing an information criterion,
- is the sum of squared residuals based on parameters
- For with intercept,
- is the Akaike information criterion and is the Bayesian information criterion
We can use this by choosing and testing each model
These can also be computed with the log likelihood
As the sample size goes to infinity BIC correctly estimates the true order, but AIC can outperform in finite samples