Data, Models and the Search for Exchangeability

Data, Models and theSearch for Exchangeability Mark Hopkins, Department of Economics Math Department Colloquium Gettysburg College April 14, 2005

“Torture the data, and they will confess…” • Theory: • Is data mining a dirty word? • Statistics vs. econometrics and the role of the ex ante theory • Information extraction amounts to a conditioning problem • Conditioning: bias vs. variance, or a search for exchangeability…? • Propagating “model uncertainty” into our parameter estimates • Using new Bayesian statistical methods in econometrics • What do economists have to learn from statisticians? • Application: • Why do some countries become rich faster than others?

Preliminaries: Recalling Bayes’ Rule • Bayes’ Rule tells us how we can update our beliefs (about event A) given some data (knowledge that event B happened) • Example: What is the probability that Saddam had weapons of mass destruction (WMD), given that none have been found (NF)? • The answer depends both on the “strength of the data” p(NF|WMD) and one’s own (subjective) prior beliefs about p(WMD) • The statistician's job is (should be) to help you update your own personal beliefs… all truth is “subjective” in a Bayesian world

Prior beliefs modify our view of the information contained in “data”

Statistical Inference: A Review • The goal: observe the world (gather data, D) and then draw conclusions and/or make predictions • This requires a theory (or model, M) to organize relationships • Mathematics (Probability Theory) • A statistical model is simply a probability distribution, p(D|M), where M {,A}consists of • A set of structural assumptions (A), and • some vector () parameterizing the probability distribution. This usually represents the “question of interest”: e.g. {,2} • Statistical inference: • “Drawing conclusions” refers to p(|D,A) • “Making predictions” refers to p(Dnew|,A)

Estimating p(|D,A):Two Practical (& Related) Problems #1: Inference about  is conditional on model assumptions • In practice, we don’t know the true structural assumptions (A) • What do we know? Bayes Rule: p(M |D)  p(D|M)p(M) • Hypothesis testing can reject a model, but it can neither confirm it nor tell you the correct alternative! • Statistics vs. econometrics: what role does the prior p(M) play? • Traditional statistics recognizes uncertainty about  but not A. • Result: run a specification search for A, but pretend you didn’t! #2: What if data are not drawn from the same distribution? • Inference about  is based on averaging repeated draws • A fundamental statistical issue: “We are each a population of 1!” • A methodological guide for “”: conditional exchangeability

The Conditioning Problem: A Familiar Example • Data D = {X,Y}; we want to know the “effect of X on Y” • We are interested in the regression (or C.E.F.): E[Y|X] • Define the residual or “error” as:   Y – E[Y|X] • Familiar Linear Example: model M is E[Y|X] = 0 + 1X • so Y=0 + 1X +  • Estimation / inference: • Estimation: find {0,1} that minimize some loss function L( ) • Inference: conditional on our information set ,  must be exchangeable

The Benefits of Using the Bayesian Approach of “Exchangeability” • Classical (Frequentist) “i.i.d.”vs. Bayesian “exchangeability” • A foundation for statistical inference on population data • DeFinetti’s Representation Theorem states… • If a sample {X1, X2,…,Xn} is a subset of an infinite exchangeable sequence, {X}, then it is “as if ” p(D |,A) exists, where  ~p( ) • Clarifies the goal of conditioning / model search process • We are trying to achieve “anonymity” of regression residuals • Clarifies the relationship between model search and prediction • What is the basis for using the past to make predictions of the future? … when the past and future are part of an exchangeable sequence!

Example of a Conditioning Problem:The Sources of Economic Growth • Why have some countries grown richer faster than others do? • Data (D): growth rates (g) & assorted country characteristics (X) • Observations are countries (n 100) • Ex ante theory: The Solow Model of Capital Accumulation • The Problem: What about other variables that may affect g ? • Omitted variable bias & “robustness” problems • D.o.F. problem: # Theories > # Observations … (plus multicollinearity!) • Specifying functional forms for variables like democracy, ethnic diversity • Population heterogeneity… Are France, Taiwan, and Sudan really all “draws from the same distribution”? Inference about 2…?

Exchangeability in Cross-Country Growth Regressions • Inference requires conditional exchangeability • France, Taiwan, and Sudan are not exchangeable, but can we find appropriate vector X such that   g – E[g|X] are exchangeable? • Conditioning just boils down to a problem of model selection! • The classical approach to model selection is “hypothesis testing” • However, D.o.F. problem has led to upward “specification search”! • In summary: • Two types of uncertainty: sampling (variance), model (bias) • Model Selection usually involve an artful trade-off of bias vs. variance • However, classical methods do not propagate our model uncertainty into coefficient estimates • Can Bayesian statistics help us bring science to the art of selection?

The Growth Literature, Take 1:OLS estimates w/ controls & dummies

The Growth Literature, Take 2:“Explaining” Parameter Heterogeneity • Tree Regressions • Local Linear Regressions (Spline models) • Varying Coefficient / Hierarchical Models

A Tree Regression s60<0.095 | EQINV<0.0144 laam<0.5 s60<0.03 -0.0072 0.0040 0.0159 NONEQINV<0.1624 DEMOC65<0.8435 FRAC<0.155 0.0213 0.0130 0.0068 EQINV<0.04949 EQINV<0.05405 lny60<8.49696 0.0170 0.0330 0.0532 0.0390 0.0250

An Additive Spline Model: Investment

An Additive Spline Model: Schooling

An Additive Spline Model:Population Growth

Using splines to reveal non-linearities: Solow + s(FRAC)

Does democracy modify effects of investment and schooling?

A Varying Coefficient Model

Specification Searches A specification search is a search for the mode of P(M |D)… • Bayes Rule: • Problem #1: How strong is your prior belief about M? • Problem #2: Can you characterize your prior beliefs? • Problem #3: Using the same data to find M and to estimate  ? • Danger! Why? • Problem #4: By conditioning model on M [not p(M) ], you are understating uncertainty about coefficient estimates!

Bayesian Model Averaging (BMA) • An alternative to trying to find the single best model (i.e., the mode of p(M) – is to consider the entire distribution of specifications… • Suppose you assign probability p(Ak) to K specifications, then • Averaging over model space improves statistical inference • Coefficient estimates tend to have better predictive ability • Standard errors reflect model, as well as parametric uncertainty

Some nasty theoretical details • Choosing the space of models and model priors • Managing summation in BMA can be tricky…with 12 possible covariates, there are 212 = 4,096 different models to combine! • “Occam’s Window” suggested by Rafferty (1994): eliminate larger and/or less probable models • MC3 techniques transit across model space. Compute p(,A) from p(|A) and p(A|D) • Computing the integral p(D|A) = p(D|,A)p(|A)d • This is done directly in MC3 techniques for BMA, otherwise… • Can approximate using p(D| MLE,A)

Bayesian Model Selection Results

Bayesian Model Averaging Results

Conclusions • Standard statistical inference is conditional on the chosen model • A data-driven model search is usually an unavoidable fact of life • Model must include appropriate vector of controls (bias vs. variance) • Model should address parameter heterogeneity and functional form • A methodological guide for conditioning is exchangeability • Of course, the very fact that we are searching for a model means we are really less certain about our estimates that we are stating… • BMA techniques help to “propagate model uncertainty” into coefficient estimates and standard errors

Data, Models and the Search for Exchangeability

Data, Models and the Search for Exchangeability

Presentation Transcript

Data Models for Warehouse

A Search Engine for 3D Models

AND/OR search spaces for graphical models

Learning Content Models for Semantic Search

7. Models for Count Data, Inflation Models

Data Search and Retrieval

Job Search Models

Multiple Retrieval Models and Regression Models for Prior Art Search

Logic Models for Evaluation: The Search for Clarity

Data, Methods, and Models

Data Federation and Search

MODELS FOR PANEL DATA

Data Processing and Search

Change Detection in Data Streams by Testing Exchangeability

Parcel Data Models for the Geodatabase

In search for Data

Models and languages for semistructured data

Search Best Models

The Data Search

Change Detection in Data Streams by Testing Exchangeability

Data and text mining: the search for unknown knowns

Perspective of the university on ExchangeAbility