Suppose we run a linear regression of
on
producing a vector of
coefficients,
. Now think about setting the explanatory
variables to new values (
) and making a prediction under this
new situation about the next value of the random variable
, which we will
call
.
has
elements, which is the number of predicted values
(not the number of observations).
The prediction
has two forms of uncertainty associated with it: The
first is that its expected value
is estimated (with
)
and thus depends on the estimated value
. This estimation uncertainty can be systematically reduced by increasing the sample
size. The second is fundamental variability in the dependent variable
around this expected value. Thus, even if we knew
(as in the case
where we had an infinite number of observations), and were thus able to observe
, we would not expect that
would provide perfect
predictions.
In the linear case, we can think about the components of the variance in
simply by writing down the prediction as
and
calculating its variance:
| (1) | |||
To make the transition from the linear case to the more general case, we write
this down and then analyze the full likelihood model (although in the linear
case, one can make less restrictive assumptions). Thus, we assume a Normal
stochastic component:
| (2) |
![]() |
(3) |
Under a properly specified likelihood model, such as this, the maximum
likelihood estimate has an asymptotic Normal sampling distribution (see King,
1989, for a review of likelihood models and methods). In this
case, the estimator
of
is distributed as follows:
| (4) | |||
Since these distributions, representing estimation variability and fundamental
variability respectively, are independent, we can combine them to derive the
unconditional distribution for
.
If we are really interested in knowing where the expected value is, then we
would focus on the estimation variability in Equation
. At other
times we might be interested only in the variability around the expected
value--variability in the world which beyond that due to
estimation; this can be found in the fundamental variability of Equation
. However, in most cases, we are interested in making predictions
of where
is likely to be on the basis of some specified explanatory
variables
, as can be found in the total variability represented in
Equation
.