next up previous contents home.gif
Next: All Other Cases Up: Nonlinear Functional Forms Previous: Introduction   Contents

The Linear Case

Suppose we run a linear regression of $y$ on $X$ producing a vector of coefficients, $b=(X'X)^{-1}X'y$. Now think about setting the explanatory variables to new values ($X^p$) and making a prediction under this new situation about the next value of the random variable $Y$, which we will call $Y^p$. $Y^p$ has $N$ elements, which is the number of predicted values (not the number of observations).

The prediction $Y^p$ has two forms of uncertainty associated with it: The first is that its expected value $E(Y^p)$ is estimated (with $\hat y^p=X^pb$) and thus depends on the estimated value $b$. This estimation uncertainty can be systematically reduced by increasing the sample size. The second is fundamental variability in the dependent variable around this expected value. Thus, even if we knew $\beta$ (as in the case where we had an infinite number of observations), and were thus able to observe $E(Y^p)=\mu=X^p\beta$, we would not expect that $\mu$ would provide perfect predictions.

In the linear case, we can think about the components of the variance in $Y^p$ simply by writing down the prediction as $Y^p = X^pb + \epsilon^p$ and calculating its variance:

$\displaystyle V(Y^p)$ $\textstyle =$ $\displaystyle V(X^pb) + V(\epsilon^p)$ (1)
  $\textstyle =$ $\displaystyle X^pV(b){X^p}' + \sigma^2I$  
  $\textstyle =$ $\displaystyle \sigma^2X^p({X}'{X})^{-1}{X^p}' + \sigma^2I$  

where the two terms on the right side of this equation correspond to the estimation uncertainty and the fundamental variability, respectively. This is a simpler form of the proof in Johnston (1984: 199).

To make the transition from the linear case to the more general case, we write this down and then analyze the full likelihood model (although in the linear case, one can make less restrictive assumptions). Thus, we assume a Normal stochastic component:

\begin{displaymath}
Y \sim N(\mu,\sigma^2I)
\end{displaymath} (2)

and linear systematic component, $\mu=X\beta$. The likelihood function is then:
\begin{displaymath}
L(\beta,\sigma^2\vert y) = \prod_{i=1}^n N(X_i\beta,\sigma^2)
\end{displaymath} (3)

Under a properly specified likelihood model, such as this, the maximum likelihood estimate has an asymptotic Normal sampling distribution (see King, 1989, for a review of likelihood models and methods). In this case, the estimator $b$ of $\beta$ is distributed as follows:

$\displaystyle b$ $\textstyle \sim$ $\displaystyle N(\beta,V(b))$ (4)
  $\textstyle =$ $\displaystyle N\left(\beta,\sigma^2(X'X)^{-1}\right)$  

We can now specify the distribution of the two quantities of interest in this analysis, $\hat Y^p$ and $Y^p$.
\begin{displaymath}
\hat Y^p \sim N(X^p\beta,\thinspace X^pV(b){X^p}')
\end{displaymath} (5)

and
\begin{displaymath}
Y^p\vert\beta \sim N(X^p\beta,\thinspace \sigma^2I)
\end{displaymath} (6)

Since these distributions, representing estimation variability and fundamental variability respectively, are independent, we can combine them to derive the unconditional distribution for $Y^p$.

\begin{displaymath}
Y^p \sim N(X^p\beta,\thinspace X^pV(b){X^p}'+\sigma^2I)
\end{displaymath} (7)

If we are really interested in knowing where the expected value is, then we would focus on the estimation variability in Equation [*]. At other times we might be interested only in the variability around the expected value--variability in the world which beyond that due to estimation; this can be found in the fundamental variability of Equation [*]. However, in most cases, we are interested in making predictions of where $Y^p$ is likely to be on the basis of some specified explanatory variables $X^p$, as can be found in the total variability represented in Equation [*].


next up previous contents home.gif
Next: All Other Cases Up: Nonlinear Functional Forms Previous: Introduction   Contents
Gary King 2005-03-28