next up previous contents home.gif
Next: An Example Up: Nonlinear Functional Forms Previous: The Linear Case   Contents

All Other Cases

To begin, I write down a general likelihood specification, with a stochastic component

\begin{displaymath}
Y \sim f(y\vert\theta)
\end{displaymath} (8)

and a (linear or nonlinear) systematic component, which for simplicity we assume is the expected value of the random variable:
\begin{displaymath}
E(Y) = \theta = g(X,\beta) = g(\beta)
\end{displaymath} (9)

The likelihood function is then written as usual
\begin{displaymath}
L(\beta\vert y) = \prod_{i=1}^n f(y_i\vert\theta)
\end{displaymath} (10)

The estimate of $E(Y)$ is the predicted value

\begin{displaymath}
\hat y^p = g(X^p,b) = g(b)
\end{displaymath} (11)

where $b$ is the ML estimator of $\beta$. As in the linear case, the explanatory variable matrix $X^p$ has one row for each predicted value, and so has dimensions $(N\times k)$.

Some examples of nonlinear functional forms include the exponential, logistic, or probit functions, respectively:

\begin{displaymath}
E(Y_1) = \exp(X\beta)
\end{displaymath} (12)


\begin{displaymath}
E(Y_2) = \frac{1}{1+\exp(-X\beta)}
\end{displaymath} (13)


\begin{displaymath}
E(Y_3) = \Phi^{-1}(X\beta)
\end{displaymath} (14)

The fundamental variability can be calculated, conditional on knowing $\beta$, by the usual methods from probability theory:

\begin{displaymath}
V(Y) = E(Y^2)-E(Y)^2
\end{displaymath} (15)

where
\begin{displaymath}
E(Y^2) = \int_{-\infty}^\infty y^2 f(y\vert\theta) dy
\end{displaymath} (16)

and
\begin{displaymath}
E(Y) = \int_{-\infty}^\infty y f(y\vert\theta) dy
\end{displaymath} (17)

In practice, one usually need not use Equations [*], [*], and [*] since the results for most popular distributions are widely available in books on probability theory. Thus, the fundamental variability is $\lambda$ in the Poisson distribution, $\lambda\sigma^2$ for the negative binomial (depending on parameterization), and $\sigma^2$ for the Normal.

Calculating the estimation variability will usually require more effort, since these calculations are not as widely reported. Since expectations and variances are linear operators, a general method of calculating these is by calculating the linear approximation to the arbitrary linear function--the Taylor series. The Taylor series approximation of $\hat y^p=g(b)$ is as follows:

\begin{displaymath}
\hat y^p = g(b) = g(\beta) + g'(\beta)(b-\beta) + \cdots
\end{displaymath} (18)

where $g'(\beta)$ is the first derivative of the functional form $g(\beta)$ (from Equation 9) with respect to $\beta$. If there are $k$ elements of $\beta$, and $N$ predicted values, $g(\beta)$ is $(N\times 1)$ and $g'(\beta)$ is $(N\times k)$.

We now drop all but the first two terms in Equation [*] (making the equality in that equation an approximation), and apply the variance operator:

$\displaystyle V(\hat Y^p)$ $\textstyle \approx$ $\displaystyle V[g(\beta)] + V[g'(\beta)(b-\beta)]$ (19)
  $\textstyle =$ $\displaystyle g'(\beta)V(b)g'(\beta)'$  

(No covariances are necessary because each of the terms in the Taylor series approximation are independent.)

The matrix on the right side of Equation [*] is an $(N\times
N)$ matrix, whereas $V(b)$ is $(k\times k)$. The $V(\hat Y^p)$ can be consistently estimated by substituting the estimated parameter vector and covariance matrix calculated by all standard ML routines for $\beta$ and $V(b)$, respectively, in this equation. The standard errors of the elements of $\hat Y^p$ (based on estimation variability only) are the square roots of the diagonal elements of this matrix. Off diagonal elements of this matrix are covariances, useful for calculating the variances of constructions such as $(\hat Y^p_1 - \hat
Y^p_2)$.1

The total variability is again merely the sum of the estimation and fundamental variabilities.


next up previous contents home.gif
Next: An Example Up: Nonlinear Functional Forms Previous: The Linear Case   Contents
Gary King 2005-03-28