200 likes | 672 Vues
Causality and Identification in Structural Econometrics. Causality and Probability in the Sciences University of Kent, Canterbury September 10 2008 Damien Fennell CPNSS, London School of Economics. Introducing Econometrics. What is econometrics?
E N D
Causality and Identification in Structural Econometrics Causality and Probability in the Sciences University of Kent, Canterbury September 10 2008 Damien Fennell CPNSS, London School of Economics
Introducing Econometrics What is econometrics? Very crudely, it is a sub-discipline of economics that attempts to use observational economic data to predict, to measure and to identify economic structures. Why is it interesting? (1) Methodology of structural econometrics is philosophically rich; it offers a particular approach to structural equation modelling (2) Econometrics analysis influences policy that influences everyone.
Simultaneous Equation Models Economics makes widespread use of simultaneous equation models to model equilibrium relations e.g. in supply and demand models. As a result, conventional econometrics incorporated simultaneous equation models into structural models in econometrics. So, in contrast to many other forms of structural equation modelling, econometrics centrally relies on non-recursive models. This leads to different methodological issues, I focus on just one of these in this talk: identification.
Avoiding causes In the early days of econometrics (30s & 40s) founders wrote in depth on how structural equation models should be interpreted e.g. Frisch, Haavelmo, Koopmans and Simon. However, as Kevin Hoover notes (2001), under the influence of the then dominant logical positivism, causal talk was frowned upon and econometrics developed structural modelling methods avoiding causal terminology. Hoover proposes two reasons for this: Wold lost the debate with Haavelmo to restrict models that represented causal chains. ‘Simon [1952]showed … that a linear system was identified if and only if it was causally ordered’ I focus on the second reason here: does Simon’s (1952) analysis do this i.e. licence dropping causal talk?
Identification and causal order Herbert Simon’s paper does claim an important equivalence between identification and causal order. But Does it licence an equivalence between identifiability rather than causal order? And I also ask: What does identifiability require of causal order?
Background: Simon’s causal order in brief Herbert Simon defines a causal order for a linear set of equations. STEP 1: Formal definition As the order in which we solve for the variables in a set of equations, when we solve the equations using the fewest equations. STEP 2: Causal semantics Intervention-based interpretation of structural equations.
Simon’s causal order – example Consider the following simple abstract supply and demand model. x1 = a + u1 … (det1) x2 = b + u2 … (det2) q = c + dp + dx1 + u3 … (supply) q = e + fp + gx2 + u4 … (demand) x1 – a determinant of demand, x2 – determinant of supply, p - equilib price, q – equilib quantity, u’s errors, a,b,c...e - parameters Solving for the variables block-recursively gives the causal order: Variable Ordering Equation Ordering {x1} {x2} {det1} {det2} {p, q} {supply, demand}
Simon’s semantics in brief To give substance to this formal ordering, Simon asks us to imagine there is an experimenter (can also be nature) that can control/change certain factors (philosophical position manipulability view, cf Woodward) We are to interpret the structural equations as follows. Formal Term Interpretation equationmechanism parameterdirectly controllable factor variableindirectly controllable factor error termOmitted factors treated ‘as if’ a directly controllable factor
More semantics Interpretation of: Minimal set of equations denotes: a set of mechanisms that together (just) jointly determines the values of a set of indirectly controllable factors (denoted by the minimal set of variables) given directly controlled factors (parameters) and previously (causally precedent) determined indirectly controllable factors (other previously solved variables in the equations). The causal order: x causally precedes y iff x must be determined for y to be and if there is a chain of mechanisms from x to y (represented by a series of equations used when solving for y using x).
The Identification problem In econometrics one assumes a general structural model relating variables (some observable) and attempts to infer the values of structural parameters by regressing on the data of observable variables. However, if the structural model is too general then the data may not be sufficient to infer a unique structure The identification problem.
Identification problem in simult. equation models Assume form of the equations correctly represents the ‘true’ causal model, and all variables are observable, but parameters are unknown. How to infer them from data? Not always possible Simple Supply-Demand Model (cf. Supply – Demand situation faced by early econometricians, Morgan 1990) Price, p Demand1 Demand2 Supply1 Supply2 Mistaken Regression Line (q1, p1) (q2, p2) Quantity, q Figure 1 – The Identification Problem – Causes change in both mechanisms
Simon: Identification and Causal Order In his paper, Simon proves the following: ‘ In order to permit the determination of coefficients of an identifiable complete subset of equations we need to relax at most the equations that are [causally] precedent to this subset’ (p. 33) Interpreted using his semantics this means: An experimenter can infer the structural parameters in an identifiable complete subset of equations by ‘relaxing’ the causally precedent mechanisms to create observations constrained only by the mechanisms in the set to be identified.
Price, p Demand1 Demand2 Supply1/Regression Line (q2, p2) (q1, p1) Quantity, q Figure 2 – No Identification Problem – Income changes for Demand mechanism Example1: Identifiable supply and demand model Model: Causal Order: {i} {p,q} {income} {supply, demand} The experimenter varies income to ‘relax’ the demand equation. The changes in income causes changes in the demand mechanism. The resulting observations are only constrained by the supply mechanism, allowing it to be identified.
The ‘experiment’ to identify supply {i} {p,q} {income} {supply, demand} Price, p Supply/Regression Line Demand Quantity, q
Example 2: Unidentifiable Supply and Demand Price, p Demand Supply Mistaken regression line Quantity, q Figure 1 – The Identification Problem – change in both mechanisms Problem: Here varying income causes shifts in both mechanisms: There is no cause allowing one to relax one mechanism but not the other.
An extension In earlier work I prove: A mechanism is identifiable if and only if for any two factors in that mechanism, x and y, x and y can be varied while holding all other factors in the mechanism fixed. In turn this holds if and only if x or y has a cause, z , that is a directly controllable factor such that. z does not cause any other factor in the mechanism. OR z does cause some of the other factors in the mechanism, but it is possible to change it to vary either x and/or y while not changing any other factor in that mechanism (this may require changing some other directly controllable factors to cancel out the impact of z on other factors in the mechanism).
What this means Shows that identifiability puts limits on how causally ‘connected’ the system is i.e. Examples above: In the first income example, supply equation identifiable because income only impacts demand. Allows identification of supply equation. In second case income factor is too ‘connected’, it impacts both mechanisms. As a result it is not possible to vary quantity and price as the only two factors in either the demand or supply mechanism, so these cannot be identified.
Return to my earlier question: Have attempted to clarify what identifiability requires of a set of structural equations (interpreted using Simon’s semantics) Does this rationalise (following Hoover) econometricians focusing on identification in place of causal order? From analysis of identification above, nothing suggests this. Simon’s identification result does not show equivalence of identification and causal order. Unsurprising perhaps! Easy to see non-equivalence of concepts. NOT NECESSARY: causal order intelligible in non-identifiable systems NOT SUFFICIENT: Can have spurious but identifiable equations.
Another perspective - Identification in its conventional form The Rank condition Given a system of m linear equations in m endogenous variables and n exogenous variables in which all variables are observable. A necessary and sufficient condition for the coefficients of an equation to be identified is that the submatrix formed from the columns of the coefficients of the variables (endogenous and exogenous) omitted from that equation has rank m-1. Satisfaction of rank condition: exclusions of variables from equations – but on what basis? Need causal content here to justify exclusions. Restriction to mathematics obscures this. Identification isn’t enough!
So what work does identification do? Identification in structural equations permits one to infer certain structural parameters How? Because when we assert identifiability, we assert strong constraints on the causal structure generating the observations. These strong constraints allow us to ‘reconstruct’ the impact of the different variables on each other even when we cannot perform experiments on them. It is condition on the causal structure that permits inference. BUT this does not license replacing causal order with it!