Recent Papers
(Papers appearing here that have not yet been
published will likely change frequently; any comments you might have would be
greatly appreciated.)
- Matching for Causal
Inference Without Balance Checking by Stefano M. Iacus, Gary
King, and Giuseppe Porro, Version: 7/1/08. (Paper: PDF)
We address a major discrepancy in matching methods for causal
inference in observational data. Since these data are typically
plentiful, the goal of matching is to reduce bias and only
secondarily to keep variance low. However, most matching methods
seem designed for the opposite problem, guaranteeing sample size ex
ante but limiting bias by controlling for covariates through
reductions in the imbalance between treated and control groups only
ex post and only sometimes. (The resulting practical difficulty may
explain why most published applications do not check whether
imbalance was reduced and so may not even be decreasing bias.) We
introduce a new class of "Monotonic Imbalance Bounding" (MIB)
matching methods that enables one to choose a fixed level of maximum
imbalance, or to reduce maximum imbalance for one variable without
changing the maximum imbalance for the others. We then discuss a
specific MIB method called "Coarsened Exact Matching" (CEM) which,
unlike most existing approaches, also explicitly bounds through ex
ante user choice both the degree of model dependence and the
treatment effect estimation error, eliminates the need for a
separate procedure to restrict data to common support, meets the
congruence principle, is robust to measurement error, works well
with modern methods of imputation for missing data, is
computationally efficient even with massive data sets, and is easy
to understand and use. This method can improve causal inferences in
a wide range of applications, and may be preferred for simplicity of
use even when it is possible to design superior methods for
particular problems. We also make available open
source software which implements all our suggestions.
- Ordinary
Economic Voting Behavior in the Extraordinary Election of Adolf
Hitler by Gary King, Ori Rosen, Martin Tanner, and
Alexander F. Wagner, Version: 8/1/2007. (Paper: PDF)
The enormous Nazi voting literature rarely builds on modern
statistical or economic research. By adding these approaches, we
find that the most widely accepted existing theories of this era
cannot distinguish the Weimar elections from almost any others in
any country. Via a retrospective voting account, we show that
voters most hurt by the depression, and most likely to oppose the
government, fall into separate groups with divergent interests.
This explains why some turned to the Nazis and others turned away.
The consequences of Hitler's election were extraordinary, but the
voting behavior that led to it was not. Forthcoming in Journal of Economic
History 68, 4 (December, 2008) (The methods used here
were developed in this
article.)
-
The Changing Evidence Base of Political Science
Research by Gary King. (Paper: PDF)
An essay to appear in Gary King, Norman Nie, and Kay
Scholzman, eds., The Future of Political Science: 100
Perspectives, New York: Routledge Press, 2009. (a volume in
honor of Sidney Verba).
-
The Essential Role of Pair
Matching in Cluster-Randomized Experiments, with Application to the
Mexican Universal Health Insurance Evaluation by Kosuke Imai, Gary
King, and Clayton
Nall. Version: 4/9/08. (Paper: PDF)
A basic feature of many field experiments is that investigators are
only able to randomize clusters of individuals --- such as
households, communities, firms, medical practices, schools, or
classrooms --- even when the individual is the unit of interest. To
recoup some of the resulting efficiency loss, many studies pair
similar clusters and randomize treatment within pairs. Other
studies (including almost all published political science field
experiments) avoid pairing, in part because some prominent
methodological articles claim to have identified serious problems
with this ``matched-pair cluster-randomized'' design. We prove that
all such claims about problems with this design are unfounded. We
then show that the estimator for matched-pair designs favored in the
literature is appropriate only in situations where matching is not
needed. To address this problem without modeling assumptions, we
generalize Neyman's (1923) approach and propose a simple new
estimator with much improved statistical properties. We also
introduce methods to cope with individual-level noncompliance, which
most existing approaches often assume away. We show that from the
perspective of, among other things, bias, efficiency, power, or
robustness, pairing should be used in cluster-randomized experiments
whenever feasible; failing to do so is equivalent to discarding a
considerable fraction of one's data. We develop these techniques in
the context of a randomized evaluation we are conducting of the
Mexican Universal Health Insurance Program.
-
Extracting Systematic Social Science Meaning from
Text by Daniel
Hopkins and Gary King. Version:
4/15/08. (Paper: PDF)
We develop a method of automated content analysis that gives
approximately unbiased estimates of quantities of theoretical
interest to social scientists. With a small sample of documents
hand coded into investigator-chosen categories, our method can give
accurate estimates of the proportion of documents in each category
in a larger population. Existing methods allow for the possibility
of substantial bias in estimating the category proportions that are
often of interest to social scientists. We first show how to
correct the bias for any existing classifier, and then go further to
estimate the proportions without the intermediate step of individual
document classification and with greatly reduced assumptions. We
also introduce a statistical correction for the less-than-perfect
levels of inter-coder reliability that typically characterize human
document classification. These methods allow us to measure the
classical conception of public opinion as those views that are
actively and publicly expressed, rather than the attitudes or
non-attitudes of the populace as a whole. Specifically, we track
the daily opinions of thousands of people about President Bush using
a massive data set of online blogs we develop and make available
with this article. We also offer easy-to-use software that
implements our methods, and we demonstrate its effectiveness with
several other text categorization problems.
-
An Introduction to
the Dataverse Network as an Infrastructure for Data
Sharing by Gary King. Version: 7/8/07. (Paper: PDF) We introduce a set of integrated
developments in web application software, networking, data
citation standards, and statistical methods designed to put some
of the universe of data and data sharing practices on somewhat
firmer ground. We have focused on social science data, but
aspects of what we have developed may apply more widely. The
idea is to facilitate the public distribution of persistent,
authorized, and verifiable data, with powerful but easy-to-use
technology, even when the data are confidential or proprietary.
We intend to solve some of the sociological problems of data
sharing via technological means, with the result intended to
benefit both the scientific community and the sometimes apparently
contradictory goals of individual researchers. Published in, Sociological Methods and
Research.
- Misunderstandings among Experimentalists and
Observationalists about Causal
Inference by Kosuke Imai, Gary King, and Elizabeth
Stuart. Version: 9/19/07
(Paper: PDF)
We attempt to clarify, and suggest how to avoid, several serious
misunderstandings about and fallacies of causal inference in
experimental and observational research. These issues concern some
of the most basic advantages and disadvantages of each basic
research design. Problems include improper use of hypothesis tests
for covariate balance between the treated and control groups, and
the consequences of using randomization, blocking before
randomization, and matching after treatment assignment to achieve
covariate balance. Applied researchers in a wide range of
scientific disciplines seem to fall prey to one or more of these
fallacies, and as a result make suboptimal design or analysis
choices. To clarify these points, we derive a new four-part
decomposition of the key estimation errors in making causal
inferences. We then show how this decomposition can help scholars
from different experimental and observational research traditions
better understand each other's inferential problems and attempted
solutions. Journal of the
Royal Statistical Association, Series A. Vol. 171, Part 2
(2008): 481-502.
-
A `Politically Robust'
Experimental Design for Public Policy Evaluation, with
Application to the Mexican Universal Health Insurance
Program, by Gary King, Emmanuela Gakidou, Nirmala
Ravishankar, Ryan T. Moore, Jason Lakin, Manett Vargas, Martha
María Téllez-Rojo, Juan Eugenio Hernández
Ávila, Mauricio Hernández Ávila, and
Héctor Hernández Llamas. Version: 1/24/2007
(Paper: PDF). We develop an approach to
conducting large scale randomized public
policy experiments intended to be more robust to the political
interventions that have ruined some or all parts of many similar
previous efforts. Our proposed design is insulated from
selection bias in some circumstances even if we lose observations;
our inferences can still be unbiased even if politics disrupts any
two of the three steps in our analytical procedures; and other
empirical checks are available to validate the overall design. We
illustrate with a design and empirical validation of an evaluation
of the Mexican Seguro Popular de Salud (Universal Health
Insurance) program we are conducting. Seguro Popular, which is
intended to grow to provide medical care, drugs, preventative
services, and financial health protection to the 50 million Mexicans
without health insurance, is one of the largest health reforms of
any country in the last two decades. The evaluation is also large
scale, constituting one of the largest policy experiments to date
and what may be the largest randomized health policy experiment
ever. Journal of Policy Analysis and
Management, Vol. 26, Issue 3 (2007): 479--506.
- What to do about Missing
Values in Time Series Cross-Section Data by James Honaker
and Gary King. Version: 9/21/07 (Paper:
PDF)
Applications of modern methods for analyzing data with missing
values, based primarily on multiple imputation, have in the last
half-decade become common in American politics and political
behavior. Scholars in these fields have thus increasingly avoided
the biases and inefficiencies caused by ad hoc methods like listwise
deletion and best guess imputation. However, researchers in much of
comparative politics and international relations, and others with
similar data, have been unable to do the same because the best
available imputation methods work poorly with the time-series
cross-section data structures common in these fields. We attempt to
rectify this situation. First, we build a multiple imputation model
that allows smooth time trends, shifts across cross-sectional units,
and correlations over time and space, resulting in far more accurate
imputations. Second, we build nonignorable missingness models by
enabling analysts to incorporate knowledge from area studies experts
via priors on individual missing cell values, rather than on
difficult-to-interpret model parameters. Third, because these tasks
could not be accomplished within existing imputation algorithms, in
that they cannot handle as many variables as needed even in the
simpler cross-sectional data for which they were designed, we also
develop a new algorithm that substantially expands the range of
computationally feasible data types and sizes for which multiple
imputation can be used. These developments also made it possible to
implement the methods introduced here in freely available open
source software that is considerably more
reliable than existing
algorithms.
- Verbal
Autopsy Methods with Multiple Causes of Death by Gary
King and Ying Lu. Version: published. (Paper: PDF)
Verbal autopsy procedures are widely used for estimating
cause-specific mortality in areas without medical death
certification. Data on symptoms reported by caregivers along with
the cause of death are collected from a medical facility, and the
cause-of-death distribution is estimated in the population where
only symptom data are available. Current approaches analyze only
one cause at a time, involve assumptions judged difficult or
impossible to satisfy, and require expensive, time consuming, or
unreliable physician reviews, expert algorithms, or parametric
statistical models. By generalizing current approaches to analyze
multiple causes, we show how most of the difficult assumptions
underlying existing methods can be dropped. These generalizations
also make physician review, expert algorithms, and parametric
statistical assumptions unnecessary. With theoretical results, and
empirical analyses in data from China and Tanzania, we illustrate
the accuracy of this approach. While no method of analyzing verbal
autopsy data, including the more computationally intensive approach
offered here, can give accurate estimates in all circumstances, the
procedure offered is conceptually simpler, less expensive, more
general, as or more replicable, and easier to use in practice than
existing approaches. We also show how our focus on estimating
aggregate proportions, which are the quantities of primary interest
in verbal autopsy studies, may also greatly reduce the assumptions
necessary, and thus improve the performance of, many individual
classifiers in this and other areas. As a companion to this paper,
we also offer easy-to-use software that implements
the methods
discussed herein.
forthcoming in Statistical Science,
Vol. 23, No. 1 (February, 2008, forthcoming), .
- Toward A Common
Framework for Statistical Analysis and Development by
Kosuke Imai, Gary King, and Olivia Lau. Version: 4/9/08. (Paper: PDF) We
describe some progress toward a common framework for statistical
analysis and software development built on and within the R
language, including R's numerous existing packages. The framework
we have developed offers a simple unified structure and syntax that
can encompass a large fraction of statistical procedures already
implemented in R, without requiring any changes in existing
approaches. We conjecture that it can be used to encompass and
present simply a vast majority of existing statistical methods,
regardless of the theory of inference on which they are based,
notation with which they were developed, and programming syntax with
which they have been implemented. This development enabled us, and
should enable others, to design statistical software with a single,
simple, and unified user interface that helps overcome the
conflicting notation, syntax, jargon, and statistical methods
existing across the methods subfields of numerous academic
disciplines. The approach also enables one to build a graphical
user interface that automatically includes any method encompassed
within the framework. We hope that the result of this line of
research will greatly reduce the time from the creation of a new
statistical innovation to its widespread use by applied researchers
whether or not they use or program in R. forthcoming in the Journal of Computational and
Graphical Statistics .
- The Future of Partisan
Symmetry as a Judicial Test for Partisan Gerrymandering after LULAC
v. Perry by Bernard Grofman and Gary King. Version: proofs.
(Paper: PDF) While the Supreme Court in Bandemer
v. Davis found partisan gerrymandering to be justiciable, no
challenged redistricting plan in the subsequent 20 years has been
held unconstitutional on partisan grounds. Then, in Vieth
v. Jubilerer, five justices concluded that some standard
might be adopted in a future case, if a manageable rule could be found.
When gerrymandering next came before the Court, in LULAC
v. Perry, we along with our colleagues filed an Amicus Brief
(King et al., 2005), proposing the test be based in part on the
partisan symmetry standard. Although the issue was not
resolved, our proposal was discussed and positively evaluated in
three of the opinions, including the plurality judgment, and for
the first time for any proposal the Court gave a clear indication
that a future legal test for partisan gerrymandering will likely
include partisan symmetry. A majority of Justices now appear to
endorse the view that the measurement of partisan symmetry may be
used in partisan gerrymandering claims as “a helpful (though
certainly not talismanic) tool” (Justice Stevens, joined by
Justice Breyer), provided one recognizes that “asymmetry alone is
not a reliable measure of unconstitutional partisanship” and
possibly that the standard would be applied only after at least
one election has been held under the redistricting plan at issue
(Justice Kennedy, joined by Justices Souter and Ginsburg). We use
this essay to respond to the request of Justices Souter and
Ginsburg that “further attention … be devoted to the
administrability of such a criterion at all levels of
redistricting and its review.” Building on our previous scholarly
work, our Amicus Brief, the observations of these five Justices,
and a supporting consensus in the academic literature, we offer
here a social science perspective on the conceptualization and
measurement of partisan gerrymandering and the development of
relevant legal rules based on what is effectively the Supreme
Court’s open invitation to lower courts to revisit these issues
in the light of LULAC v. Perry. Published in the Election Law
Journal, January 2007.
- A Proposed
Standard for the Scholarly Citation of Quantitative Data
by Micah Altman and Gary King. Version: published.
(Paper: PDF) An essential aspect of science is a
community of scholars cooperating and competing in the pursuit
of common goals. A critical component of this community is the
common language of and the universal standards for scholarly
citation, credit attribution, and the location and retrieval of
articles and books. We propose a similar universal standard for
citing quantitative data that retains the advantages of print
citations, adds other components made possible by, and needed
due to, the digital form and systematic nature of quantitative
data sets, and is consistent with most existing
subfield-specific approaches. Although the digital library
field includes numerous creative ideas, we limit ourselves to
only those elements that appear ready for easy practical use by
scientists, journal editors, publishers, librarians, and
archivists. Published in
D-Lib Magazine, March, 2007
- Matching
as Nonparametric Preprocessing for Reducing Model Dependence in
Parametric Causal Inference by Daniel E. Ho, Kosuke
Imai, Gary King, and Elizabeth A. Stuart, Version: 5/4/2006.
(Paper: PDF) Although political
science articles rarely include causal estimates from more than
a few model specifications, authors usually choose these from
numerous trial runs readers never see. Given the typically
large variation in estimates across choices of control
variables, functional forms, and other modeling assumptions,
how can researchers ensure that the few estimates presented are
accurate or representative? How do readers know that
publications are not merely demonstrations that the author
found it possible to find a specification that fits
his or her favorite hypothesis? Matching methods, which offer
the promise of causal inference with fewer assumptions, is one
possible way forward, but the literature suffers from
conflicting approaches to estimation, uncertainty, theoretical
results, and practical advice. We propose a unified approach
that makes it possible for researchers to preprocess data (see
our easy-to-use software, Matchit) and
then to apply whatever familiar parametric techniques they
would have used anyway. Instead of replacing existing methods,
we use matching to make parametric models work better by giving
more accurate and considerably less model-dependent causal
inferences. Published in Political
Analysis.
- Demographic Forecasting by Federico Girosi and Gary
King, (Book manuscript: PDFs) We introduce a
framework for forecasting age-sex-country-cause-specific
mortality rates that incorporates considerably more
information, and thus has the potential to forecast much
better, than any existing approach. Mortality forecasts are
used in a wide variety of academic fields, and for global and
national health policy making, medical and pharmaceutical
research, and social security and retirement planning.
As it turns out, the tools we developed in pursuit
of this goal also have broader statistical implications, in
addition to their use for forecasting mortality or other
variables with similar statistical properties. First, our
methods make it possible to include different explanatory
variables in a time series regression for each cross-section,
while still borrowing strength from one regression to improve
the estimation of all. Second, we show that many existing
Bayesian (hierarchical and spatial) models with explanatory
variables use prior densities that incorrectly formalize prior
knowledge. Many demographers and public health researchers have
fortuitously avoided this problem so prevalent in other fields
by using prior knowledge only as an ex post check on empirical
results, but this approach excludes considerable information
from their models. We show how to incorporate this demographic
knowledge into a model in a statistically appropriate way.
Finally, we develop a set of tools useful for developing models
with Bayesian priors in the presence of partial prior
ignorance. This approach also provides many of the attractive
features claimed by the empirical Bayes approach, but fully
within th standard Bayesian theory of inference. YourCast software is available to implement all
the methods in the book. Forthcoming,
Princeton University Press
- The
Dangers of Extreme Counterfactuals by Gary King and
Langche Zeng. (Article: PDF) We address the problem that
occurs when inferences about counterfactuals — predictions,
"what if" questions, and causal effects — are attempted far
from the available data. The danger of these extreme
counterfactuals is that substantive conclusions drawn from
statistical models that fit the data well turn out to be based
largely on speculation hidden in convenient modeling
assumptions that few would be willing to defend. Yet existing
statistical strategies provide few reliable means of
identifying extreme counterfactuals. We offer a proof that
inferences farther from the data are more model-dependent, and
then develop easy-to-apply methods to evaluate how
model-dependent our answers would be to specified
counterfactuals. These methods require neither sensitivity
testing over specified classes of models nor evaluating any
specific modeling assumptions. If an analysis fails the simple
tests we offer, then we know that substantive results are
sensitive to at least some modeling choices that are not based
on empirical evidence. Published in
Political Analysis. Whatif
software is available to implement all procedures described in
this paper. For a less technical version of this paper, see the
next paper.
- When Can History be Our
Guide? The Pitfalls of Counterfactual Inference by Gary
King and Langche Zeng, Version: published. (Paper: PDF)
Inferences about counterfactuals are essential for prediction,
answering "what if"' questions, and estimating causal effects.
However, when the counterfactuals posed are too far from the
data at hand, conclusions drawn from well-specified statistical
analyses become based on speculation and convenient but
indefensible model assumptions rather than empirical evidence.
Unfortunately, standard statistical approaches assume the
veracity of the model rather than revealing the degree of
model-dependence, and so this problem can be hard to detect. We
develop easy-to-apply methods to evaluate counterfactuals that
do not require sensitivity testing over specified classes of
models. If an analysis fails the tests we offer, then we know
that substantive results are sensitive to at least some
modeling choices that are not based on genuine prior knowledge
or empirical evidence. We use these methods to evaluate the
extensive scholarly literatures on the effects of changes in
the degree of democracy in a country (on any dependent
variable) and separate analyses of the effects of UN
peacebuilding efforts, and find evidence that many scholars are
inadvertently drawing conclusions based more on modeling
hypotheses than on their empirical evidence. For some research
questions, history contains insufficient information to be our
guide. International
Studies Quarterly. Whatif
software is available to implement all procedures described in
this paper. For a technical version of this paper, see the
previous paper.
- Death
by Survey: Estimating Adult Mortality without Selection Bias
from Sibling Survival Data by Emmanuela Gakidou and Gary
King, Version: 2/21/06. (Paper: PDF). The widely used methods for
estimating adult mortality rates from sample survey responses
about the survival of siblings, parents, spouses, and others
depend crucially on an assumption that we demonstrate does not
hold in real data. We show that when this assumption is
violated — so that the mortality rate varies with sibship size
— mortality estimates can be massively biased. By using
insights from work on the statistical analysis of selection
bias, survey weighting, and extrapolation problems, we propose
a new and relatively simple method of recovering the mortality
rate with both greatly reduced potential for bias and increased
clarity about the source of necessary assumptions. Demography.
- Understanding
the Lee-Carter Mortality Forecasting Method by Federico
Girosi and Gary King. (Paper: PDF) We demonstrate here several previously
unrecognized or insufficiently appreciated properties of the
Lee-Carter mortality forecasting approach, the dominant method
used in both the academic literature and practical
applications. We show that this model is a special case of a
considerably simpler, and less often biased, random walk with
drift model, and prove that the age profile forecast from both
approaches will always become less smooth and unrealistic after
a point (when forecasting forward or backwards in time) and
will eventually deviate from any given baseline. We use these
and other properties we demonstrate to suggest when the model
would be most applicable in practice.
- Publication, Publication by Gary King, Version:
published. (Article: PDF) I show herein how to write a
publishable paper by beginning with the replication of a
published article. This strategy seems to work well for class
projects in producing papers that ultimately get published,
helping to professionalize students into the discipline, and
teaching them the scientific norms of the free exchange of
academic information. I begin by briefly revisiting the
prominent debate on replication our discipline had a decade ago
and some of the progress made in data sharing since.
Published in PS: Political Science and
Politics. See also, Data
Sharing and Replication.
- The
Supreme Court During Crisis: How War Affects only Non-War
Cases by Lee Epstein, Daniel E. Ho, Gary King, and
Jeffrey A. Segal, Version: Published. (Article: PDF) Does
the U.S. Supreme Court curtail rights and liberties when the
nation's security is under threat? In hundreds of articles and
books, and with renewed fervor since September 11, 2001,
members of the legal community have warred over this question.
Yet, not a single large-scale, quantitative study exists on the
subject. Using the best data available on the causes and
outcomes of every civil rights and liberties case decided by
the Supreme Court over the past six decades and employing
methods chosen and tuned especially for this problem, our
analyses demonstrate that when crises threaten the nation's
security, the justices are substantially more likely to curtail
rights and liberties than when peace prevails. Yet
paradoxically, and in contradiction to virtually every theory
of crisis jurisprudence, war appears to affect only cases that
are unrelated to the war. For these cases, the effect of war
and other international crises is so substantial, persistent,
and consistent that it may surprise even those commentators who
long have argued that the Court rallies around the flag in
times of crisis. On the other hand, we find no evidence that
cases most directly related to the war are affected.
We attempt to explain this seemingly paradoxical
evidence with one unifying conjecture: Instead of balancing
rights and security in high stakes cases directly related to
the war, the justices retreat to ensuring the institutional
checks of the democratic branches. Since rights-oriented and
process-oriented dimensions seem to operate in different
domains and at different times, and often suggest different
outcomes, the predictive factors that work for cases unrelated
to the war fail for cases related to the war. If this
conjecture is correct, federal judges should consider giving
less weight to legal principles established during wartime for
ordinary cases, and attorneys should see it as their
responsibility to distinguish cases along these lines.
Published New York University Law
Review. Winner of
the 2005 Pi Sigma Alpha Award, the 2005 Robert
H. Durr Award, and the 2006 McGraw-Hill Award, and
Honorable Mention for the 2005 American Judicature Society
Award and Law and Society Association Prize.
- Comparing
Incomparable Survey Responses: New Tools for Anchoring
Vignettes by Gary King and Jonathan Wand, Version:
11/3/2006. (Paper: PDF) When respondents use the ordinal response
categories of standard survey questions in different ways, the
validity of analyses based on the resulting data can be biased.
Anchoring vignettes is a survey design
technique, introduced by King,
Murray, Salomon, and Tandon (2004), intended to correct for
some of these problems. We develop new methods both for
evaluating and choosing anchoring vignettes, and for analyzing the
resulting data. With surveys on a diverse range of topics in a
range of countries, we illustrate how our proposed methods can
improve the ability of anchoring vignettes to extract information
from survey data, as well as saving in survey administration
costs. Political
Analysis
- Enhancing the
Validity and Cross-cultural Comparability of Survey
Research by Gary King, Christopher J.L. Murray, Joshua
A. Salomon, and Ajay Tandon, Version: Published. (Article: PDF) We
address two long-standing problems in survey research:
measuring complicated concepts, such as political freedom or
efficacy, that researchers define best with reference to
examples; and what to do when respondents interpret identical
survey questions in different ways. Scholars have long
addressed these problems with approaches to reduce
incomparability, such as writing more concrete questions --
with uneven success. Our alternative is to measure directly
whatever incomparability exists and to correct for it. We
measure incomparability via respondents' assessments, on the
same scale as the self-assessments to be corrected, of
hypothetical individuals described in short vignettes. Since
actual levels of the vignettes are invariant over respondents,
variability in vignette answers reveals incomparability. Our
corrections require either simple recodes or a statistical
model designed to save survey administration costs. With
analysis, simulations, and cross-national surveys, we show how
these problems can drastically mislead survey researchers and
how our approach can fix them. You may also be interested in
the Anchoring Vignettes website, which
includes information about conferences on the subject, a FAQ,
software, example vignettes, and other materials. Published in American Political Science
Review.
- Finding New
Information for Ecological Inference Models: A Comment on Jon
Wakefield, "Ecological Inference in 2x2 Tables."
(Paper: PDF) To
appear in the Journal of the Royal Statistical Society,
Series A.
- Theory and Evidence in International Conflict: A Response
to de Marchi, Gelpi, and Grynaviski by Nathaniel Beck,
Gary King, and Langche Zeng, Version: Corrected Proofs.
(Article: PDF) Published
in the American Political Science
Review.
- A Consensus
on Second Stage Analyses in Ecological Inference Models
by Christopher Adolph and Gary King, with Michael C. Herron and
Kenneth W. Shotts. (Article: PDF) Published in
Political Analysis.
- What to do
When Your Hessian is Not Invertible: Alternatives to Model
Respecification in Nonlinear Estimation by Jeff Gill and Gary King. (Article: PDF) What should a researcher do when
statistical analysis software terminates before completion with
a message that the Hessian is not invertible? The standard
textbook advice is to respecify the model, but this is another
way of saying that the researcher should change the question
being asked. Obviously, however, computer programs should not
be in the business of deciding what questions are worthy of
study. Although noninvertible Hessians are sometimes signals of
poorly posed questions, nonsensical models, or inappropriate
estimators, they also frequently occur when information about
the quantities of interest does exist in the data, through the
likelihood function. We explain the problem in some detail and
lay out two preliminary proposals for ways of dealing with
noninvertible Hessians without changing the question asked.
Published in Sociological Methods and
Research. (Also
available to implement the ideas in the paper is Software in:
Gauss or R).
- The
Future of Replication. (Article:
PDF) Since I proposed the
replication standard for political science research, more
journals have required or encouraged authors to make data
available, and more authors have shared their data. The calls
for continuing this trend are more persistent than ever, and
the agreement among the journal editors in this Symposium to
adopt a unified replication policy continues this trend. In
this article, I offer a vision of a possible future for the
replication movement. The plan is to implement this vision via
the Virtual Data Center project, which — by automating the
process of citing, verifying, finding, sharing, archiving,
subsetting, converting, analyzing, and distributing data — may
greatly facilitate adherence to the replication standard.
Published in a symposium on the subject in
International Studies Perspectives; this
article begins at p.100 in the pdf of the symposium. (You may
be interested my "Replication,
Replication" article, where the original proposal appeared,
and related information.)
- Armed
Conflict as a Public Health Problem by Christopher J.L.
Murray; Gary King, Alan D. Lopez, Niels Tomijima, and Etienne
Krug. (Article: PDF) Armed conflict is a major cause of
injury and death worldwide, but we need much better methods of
quantification before we can accurately assess its effect.
Armed conflict between warring states and groups
within states have been major causes of ill health and
mortality for most of human history. Conflict obviously causes
deaths and injuries on the battlefield, but also health
consequences from the displacement of populations, the
breakdown of health and social services, and the heightened
risk of disease transmission. Despite the size of the health
consequences, military conflict has not received the same
attention from public health research and policy as many other
causes of illness and death. In contrast, political scientists
have long studied the causes of war but have primarily been
interested in the decision of elite groups to go to war, not in
human death and misery.
We review the limited knowledge on the health
consequences of conflict, suggest ways to improve measurement,
and discuss the potential for risk assessment and for
preventing and ameliorating the consequences of conflict.
Published in the BMJ. Note: The BMJ
used to be called the British Medical Journal. Also available
is the BMJ's
Interactive version.
- Did
Illegally Counted Overseas Absentee Ballots Decide the 2000
U.S. Presidential Election? by Kosuke Imai and Gary
King, Revised Proofs: 7/20/2004. (Article: PDF)
Although not widely known until much later, Al Gore received
202 more votes than George W. Bush on election day in Florida.
George W. Bush is president because he overcame his election
day deficit with overseas absentee ballots that arrived and
were counted after election day. In the final official tally,
Bush received 537 more votes than Gore. These numbers are taken
from the official results released by the Florida Secretary of
State's office and so do not reflect overvotes, undervotes,
unsuccessful litigation, butterfly ballot problems, recounts
that might have been allowed but were not, or any other
hypothetical divergence between voter preferences and counted
votes. After the election, the New York Times
conducted a six month long investigation and found that 680 of
the overseas absentee ballots were illegally counted, and no
partisan, pundit, or academic has publicly disagreed with their
assessment. In this paper, we describe the statistical
procedures we developed and implemented for the Times
to ascertain whether disqualifying these 680 ballots would have
changed the outcome of the election. The methods involve adding
formal Bayesian model averaging procedures to King's (1997) ecological inference model. Formal
Bayesian model averaging has not been used in political science
but is especially useful when substantive conclusions depend
heavily on apparently minor but indefensible model choices,
when model generalization is not feasible, and when potential
critics are more partisan than academic. We show how we derived
the results for the Times so that other scholars can
use these methods to make ecological inferences for other
purposes. We also present a variety of new empirical results
that delineate the precise conditions under which Al Gore would
have been elected president, and offer new evidence of the
striking effectiveness of the Republican effort to convince
local election officials to count invalid ballots in Bush
counties and not count them in Gore counties. (You might also
be interested in the slides (PDF) we use to present this
material.) Published in Perspectives
on Politics.
- An
Automated Information Extraction Tool For International
Conflict Data with Performance as Good as Human Coders: A Rare
Events Evaluation Design by Gary King and Will Lowe,
Version: 11/1/01. (Article: PDF) Despite widespread recognition that
aggregated summary statistics on international conflict and
cooperation miss most of the complex interactions among
nations, the vast majority of scholars continue to employ
annual, quarterly, or occasionally monthly observations. Daily
events data, coded from some of the huge volume of news stories
produced by journalists, have not been used much for the last
two decades. We offer some reason to change this view, which we
feel should lead to considerably increased use of these data.
We address advances in event categorization schemes and
software programs that automatically produce data by 'reading"
news stories without human coders. We design a method that
makes it feasible for the first time to evaluate these programs
when they are applied in areas with the particular
characteristics of international conflict and cooperation data,
namely event categories with highly unequal prevalences, and
where rare events (such as highly conflictual actions) are of
special interest. We use this rare events design to evaluate
one existing program, and find it to be as good as trained
human coders, but obviously far less expensive to use. For
large scale data collections, the program dominates human
coding. We believe our new evaluative method will be of use in
international relations, as well as more generally in the field
of computational linguistics, for evaluating other automated
information extraction tools. We also believe that the data
created by programs similar to the one we evaluated should see
dramatically increased use in international relations research.
To facilitate this process, we are also releasing with this
article 3.7 million coded international events, covering the
entire world for the last decade. Published in International
Organization.
- Improving
Forecasts of State Failure by Gary King and Langche
Zeng, Version: Published. ( Article:
PDF). We offer the first
independent scholarly evaluation of the claims, forecasts, and
causal inferences of the State Failure Task Force and their
efforts to forecast when states will fail. State failure refers
to the collapse of the authority of the central government to
impose order, as in civil wars, revolutionary wars, genocides,
politicides, and adverse or disruptive regime transitions.
States that sponsor international terrorism or allow it to be
organized from within their borders are all failed states. This
task force, set up at the behest of Vice President Gore in
1994, has been led by a group of distinguished academics
working as consultants to the U.S. Central Intelligence Agency.
State Failure Task Force reports and publications have received
attention in the media, in academia, and from public policy
decision-makers. In this article, we identify several
methodological errors in the task force work that cause their
reported forecast probabilities of conflict to be too large,
their causal inferences to be biased in unpredictable
directions, and their claims of forecasting performance to be
exaggerated. However, we also find that the task force has
amassed the best and most carefully collected data on state
failure in existence, and the required corrections which we
provide, although very large in effect, are easy to implement.
We also reanalyze their data with better statistical procedures
and demonstrate how to improve forecasting performance to
levels significantly greater than even corrected versions of
their models. Although still a highly uncertain endeavor, we
are as a consequence able to offer the first accurate forecasts
of state failure, along with procedures and results that may be
of practical use in informing foreign policy decision making.
We also describe a number of strong empirical regularities that
may help in ascertaining the causes of state failure.
Published in World
Politics. You may also be interested in the last
state failure task force report (PDF), the state failure data we used to write this paper, or the
related paper, Nathaniel Beck, Gary King, and Langche Zeng.
"Improving Quantitative Studies of International Conflict: A
Conjecture," American Political Science Review, Vol.
94, No. 1 (March, 2000): 21-36. Winner of the Gosnell
Prize. (Article: PDF | Abstract:
HTML)
- The Rules
of Inference by Lee Epstein and Gary King, Version:
Published. (Article: PDF) Although the term "empirical
research" has become commonplace in legal scholarship over the
past two decades, law professors have, in fact, been conducting
research that is empirical -- that is, learning about the world
using quantitative data or qualitative information -- for
almost as long as they have been conducting research. For just
as long, however, they have been proceeding with little
awareness of, much less compliance with, the rules of
inference, and without paying heed to the key lessons of the
revolution in empirical analysis that has been taking place
over the last century in other disciplines. The tradition of
including some articles devoted to exclusively to the
methodology of empirical analysis -- so well represented in
journals in traditional academic fields -- is virtually
nonexistent in the nation's law reviews. As a result, readers
learn considerably less accurate information about the
empirical world than the studies' stridently stated, but
overconfident, conclusions suggest. To remedy this situation
both for the producers and consumers of empirical work, this
Article adapts the rules of inference used in the natural and
social sciences to the special needs, theories, and data in
legal scholarship, and explicate them with extensive
illustrations from existing research. The Article also offers
suggestions for how the infrastructure of teaching and research
at law schools might be reorganized so that it can better
support the creation of first-rate empirical research without
compromising other important objectives. Published in The University of Chicago Law
Review 69, 1 (Winter, 2002): 1--209, with comments
from six scholars and a response (PDF) by us. (An abbreviated
version of our discussion of infrastrucure (PDF) (Section IX, Pp.114-133) is
published in the Journal of Legal
Education, along with comments from five law school
deans.
- Measuring Total
Health Inequality: Adding Individual Variation to Group-Level
Differences by Emmanuela Gakidou and Gary King, Version:
Published. (Article: PDF) Background:
Studies have revealed large variations in average health status
across social, economic, and other groups. No study
exists on the distribution of the risk of ill-health across
individuals, either within groups or across all people
in a society, and as such a crucial piece of total health
inequality has been overlooked. Some of the reason for this
neglect has been that the risk of death, which forms the basis
for most measures, is impossible to observe directly and
difficult to estimate. Methods: We
develop a measure of total health inequality -
encompassing all inequalities among people in a society,
including variation between and within groups - by adapting a
beta-binomial regression model. We apply it to children under
age two in 50 low- and middle-income countries. Our method has
been adopted by the World Health Organization and is being
implemented in surveys around the world; preliminary estimates
have appeared in the World Health Report (2000). Results: Countries with similar average child
mortality differ considerably in total health inequality.
Liberia and Mozambique have the largest inequalities in child
survival, while Colombia, the Philippines and Kazakhstan have
the lowest levels among the countries measured. Conclusions:Total health inequality estimates
should be routinely reported alongside average levels of health
in populations and groups, as they reveal important
policy-related information not otherwise knowable. This
approach enables meaningful comparisons of inequality across
countries and future analyses of the determinants of
inequality. Published in International
Journal for Equity in Health.
- A Fast, Easy,
and Efficient Estimator for Multiparty Electoral Data by
James Honaker, Jonathan Katz, and Gary King, Version: 10/2001.
(Article: PS , PDF) Katz
and King (1999) develop a model for predicting or explaining
aggregate electoral results in multiparty democracies. This
model is, in principle, analogous to what least squares
regression provides American politics researchers in that
two-party system. Katz and King applied this model to
three-party elections in England and revealed a variety of new
features of incumbency advantage and where each party pulls
support from. Although the mathematics of their statistical
model covers any number of political parties, it is
computationally very demanding, and hence slow and numerically
imprecise, with more than three. The original goal of our work
was to produce an approximate method that works quicker in
practice with many parties without making too many theoretical
compromises. As it turns out, the method we offer here improves
on Katz and King's (in bias, variance, numerical stability, and
computational speed) even when the latter is computationally
feasible. We also offer easy-to-use software (Amelia) that implements our
suggestions in Political
Analysis. You may also be interested in the two
predecessors to this paper, Katz and King's "A Statistical Model for Multiparty
Electoral Data," American Political Science
Review, 93, 1 (March, 1999): 15-32 (Replication Dataset: ICPSR 1190) | Abstract: HTML) Winner of the
Pi Sigma Alpha Award, and King, Honaker, Joseph, and
Scheve's Analyzing Incomplete
Political Science Data, American Political Science
Review, 95, 1 (March, 2001): 49-69 (Abstract: HTML) Named
most cited by ISI's Essential Science Indicators). (To draw
ternary diagrams, see the Stata module TRIPLOT,
by Nicholas Cox, or our Gauss
procedures.)
- Proper Nouns
and Methodological Propriety: Pooling Dyads in International
Relations Data, Version: Published. (Article: PDF) This
article is the concluding comment in the symposium on the
analysis of dyadic international conflict data, to be published
in International
Organization. The participants' included Donald
Green, Soo Yeon Kim, and David Yoon, and their original paper,
"Dirty Pool," and responses by John Oneal and Bruce Russett,
"Clear and Clean: The Fixed Effects of the Liberal Peace," and
Nathaniel Beck and Jonathan Katz, "Throwing Out the Baby with
the Bath Water: A Comment on Green, Kim, and Yoon."
- Bayesian and
Frequentist Inference for Ecological Inference: The R x C
Case by Ori Rosen, Wenxin Jiang, Gary King, and Martin
A. Tanner, Version: Published. (Article: PDF) In this
paper we propose Bayesian and Frequentist approaches to
ecological inference, based on R x C contingency tables,
including a covariate. The proposed Bayesian model extends the
binomial-beta hierarchical model developed by King, Rosen and
Tanner (1999) from the 2 x 2 case to the R x C case. As in the
2 x 2 case, the inferential procedure employs Markov chain
Monte Carlo (MCMC) methods. As such, the resulting MCMC
analysis is rich but computationally intensive. The Frequentist
approach, based on first moments rather than on the entire
likelihood, provides quick inference via nonlinear
least-squares, while retaining good Frequentist properties. The
two approaches are illustrated with simulated data, as well as
with real data on voting patterns in Weimar Germany. In the
final section of the paper we provide an overview of a range of
alternative inferential approaches which trade-off
computational intensity for statistical efficiency.
Published in Statistica
Neerlandica. You may also be interested in Gary
King, Ori Rosen, and Martin Tanner's "Binomial-Beta Hierarchical Models
for Ecological Inference" Sociological Methods and
Research Vol. 28, No. 1 (August, 1999): 61-90 ( Abstract:
HTML ) which was our
first effort in this line of research, or my book A Solution to the Ecological
Inference Problem: Reconstructing Individual Behavior from
Aggregate Data.
- Rethinking Human
Security by Gary King and Christopher J.L. Murray,
Version: Published. (Article: PDF In the last two decades, the
international community has begun to conclude that attempts to
ensure the territorial security of nation-states through
military power have failed to improve the human condition.
Despite astronomical levels of military spending, deaths due to
military conflict have not declined. Moreover, even when the
borders of some states are secure from foreign threats, the
people within those states do not necessarily have freedom from
crime, enough food, proper health care, education, or political
freedom. In response to these developments, the international
community has gradually moved to combine economic development
with military security and other basic human rights to form a
new concept of "human security". Unfortunately, by common
assent the concept lacks both a clear definition, consistent
with the aims of the international community, and any agreed
upon measure of it. In this paper, we propose a simple,
rigorous, and measurable definition of human security: the
expected number of years of future life spent outside the state
of "generalized poverty". Generalized poverty occurs when an
individual falls below the threshold in any key domain of human
well-being. We consider improvements in data collection and
methods of forecasting that are necessary to measure human
security and then introduce an agenda for research and action
to enhance human security that follows logically in the areas
of risk assessment, prevention, protection, and compensation.
This paper has appeared in Political
Science Quarterly.
- Logistic Regression in
Rare Events Data by Gary King and Langche Zeng. (
Article: PDF) Rare events are binary dependent
variables with dozens to thousands of times fewer ones (events,
such as wars, vetoes, cases of political activism, or
epidemiological infections) than zeros ("nonevents"). In many
literatures, rare events have proven difficult to explain and
predict, a problem that seems to have at least two sources.
First, popular statistical procedures, such as logistic
regression, can sharply underestimate the probability of rare
events. We recommend corrections that outperform existing
methods and change the estimates of absolute and relative risks
by as much as some estimated effects reported in the
literature. Second, commonly used data collection strategies
are grossly inefficient for rare events data. The fear of
collecting data with too few events has led to data collections
with huge numbers of observations but relatively few, and
poorly measured, explanatory variables, such as in
international conflict data with more than a quarter million
dyads, only a few of which are at war. As it turns out, more
efficient sampling designs exist for making valid inferences,
such as sampling all available events (e.g., wars) and a tiny
fraction of non-events (peace). This enables scholars to save
as much as 99% of their (non-fixed) data collection costs, or
to collect much more meaningful explanatory variables. We
provide methods that link these two results, enabling both
types of corrections to work simultaneously, and software that implements the methods
developed. You may also be interested in the slides used in giving talks on this paper
and part of the previous one. This paper has appeared in
Political Analysis. We
have also written a companion piece to this article for
International
Organization, "Explaining Rare Events in International
Relations" (PDF), which overlaps this one, excludes most
technical material, and includes empirical examples and
pedagogical material. The next paper is related too.
- Estimating Risk and Rate
Levels, Ratios, and Differences in Case-Control Studies
by Gary King and Langche Zeng. (Article: PDF) Classic,
or "cumulative," case-control sampling designs do not admit
inferences about quantities of interest other than risk ratios,
and then only by making the rare events assumption.
Probabilities, risk differences, and other quantities cannot be
computed without knowledge of the population incidence
fraction. Similarly, density, or "risk set," case-control
sampling designs do not allow inferences about quantities other
than the rate ratio. Rates, rate differences, cumulative rates,
risks, and other quantities cannot be estimated unless
auxiliary information about the underlying cohort such as the
number of controls in each full risk set is available. Most
scholars who have considered the issue recommend reporting more
than just the relative risks and rates, but auxiliary
population information needed to do this is not usually
available. We address this problem by developing methods that
allow valid inferences about all relevant quantities of
interest from either type of case-control study when completely
ignorant of or only partially knowledgeable about relevant
auxiliary population information. Published in Statistics in
Medicine.