1 / 33

Three Basic Principles of Social Science Research

Three Basic Principles of Social Science Research. Yu Xie University of Michigan. Conceptual versus Technical Knowledge. Technical knowledge is important once you understand how to conduct empirical research.

Mercy
Télécharger la présentation

Three Basic Principles of Social Science Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Three Basic Principles of Social Science Research Yu Xie University of Michigan

  2. Conceptual versus Technical Knowledge • Technical knowledge is important once you understand how to conduct empirical research. • More often than not, sociologists don’t know how to conceptualize a research problem that is empirically testable. • The most difficult, and the most important, part of methodological training in sociology is conceptual rather than technical. • Be a thinker, not merely a technician.

  3. Inspired by Otis Dudley Duncan • “But sociology is not like physics. Nothing but physics is like physics, because any understanding of the world that is like the physicist’s understanding becomes part of physics…” • (Otis Dudley Duncan. 1984. Notes on Social Measurement. p.169)

  4. Definition of Terms • By “social science research,” I mean quantitative social science research. • By “basic principles,” I mean general concepts that can be used in actual research, not generalizations from research.

  5. Lessons from the History of Science • This field is mostly dominated by the history of physical science. • Plato has a long-lasting influence in science and western philosophy in general. • “The safest general characterization of the European philosophical tradition is that it consists in a series of footnotes to Plato.” (Whitehead) (Mayr 1982, p.38)

  6. What Made Plato so Important in the History of Science? • The separation between the “world of being” and the “world of becoming.” • The scientist’s (philosopher’s) task is to go beyond observables (world of becoming) to gain understanding of the world of being. [I.e., need for abstract thinking] • True knowledge lies in universal, unchanging laws, not in concrete objects. • Laws are assumed to exist, created rationally by the Creator. Thus the word “discovery.” This is the teleological aspect of science.

  7. Plato’s Typological Thinking • Plato’s account of variation: poor replicas of the world of being. • The world of being consists of discontinuous, abstract ideas (or forms). • Great success story of following Plato’s typological thinking in physics. • It also resolved the potential conflict between science and religion (sufficient, physical, or immediate causes versus “final causes”). • Examples: Copernicus, Galileo, and Newton.

  8. Deviations • According to typological thinking, deviations are nothing more than undesirable aberrations. • We attain true knowledge after getting rid of apparent deviations through abstract thinking. • Bernoulli’s law of large numbers and Laplace’s central limit theorem provided the mathematical solution to the measurement of uncertainty. • Remove uncertainty through repeated observations and assess uncertainty through a probability (normal) distribution. • Deviation is “error,” undesirable but manageable with repeated observations (due to “statistical compensation”)

  9. Difficulties in Social Science and Quetelet’s Solution • Plato’s typological thinking never worked well for the study of humans. • There is simply too much variation and uncertainty. • Measurement theory provided a possible solution: attaining reliable measurements in the social world. • Quetelet's social physics was premised on the “average man,” which seems to satisfy Plato’s criteria as “true knowledge.”

  10. Quetelet’s Social Physics • Measurement theory, when applied to social phenomena, became the “law of accidental causes” because they also follow normal distributions. • “The law of accidental causes is a general law that applies to individuals as well as peoples and that rules our moral and intellectual qualities no less than our physical qualities.”(Kruger 1987, p.76) • He paid attention to variations in averages, such as by nation, location, age, and race. • Regularities in averages => constant causes => laws. • Moral standard: “The average man…would represent all that is great, good, or beautiful.”(Stigler, 1986, p.171)

  11. Darwin’s Population Thinking • Variation is reality, not undesirable error on the part of the observer. • What is important is the individual, not the type. • Offspring of the same parents are different from each other. • Variation is inheritable from generation to generation. • Variation is fundamental to natural selection: abundant genetic variation is produced in every generation, but only relatively few individuals survive and reproduce.

  12. Population Thinking and Statistics • In typological thinking, deviations from the mean are nothing but “errors”, with the mean approaching the true cause. (Example, measurement of the speed of sound.) • In populationthinking, deviations are the reality of substantive importance; the mean is a property of a population. • Distinction between “mean” and “average” by Jevons, and that between “mean of observations” and “mean of statistics” by Edgeworth. (Duncan, 1984, p. 108)

  13. Galton and Social Science • Francis Galton, Darwin’s cousin, introduced population thinking to social science. • To Galton, the value of averages is limited. “Individual differences… were almost the only thing of interest.”(Hilts, 1973, p.221) • Thus, Quetelet’s social physics is of little value. • Scientific inquiry should focus on variations and covariations.

  14. What is Unique about Variability in Social Science? • More variability, since unit of analysis is not the individual, but an individual’s act at a given time. • Variability for human behavior does not necessarily have a physical agent and is (largely) not inheritable. • Humans can and do change surrounding conditions that affect them. • “Men make their own history, but they do not make it as they please” -- Karl Marx. • Humans are rational in the sense that they may base behaviors on anticipated consequences. • Past events, even those due to chance, affect future events (path-dependency).

  15. First Principle • Variability is the very essence of social science research. • “Variability Principle.”

  16. Second Principle • Social grouping reduces such variability. • “Social Grouping Principle.”

  17. What is a Social Group? • I do not take a stand between a nominalist view versus a realist view. • Social grouping is meaningful only in terms of a social outcome. • Thus, social grouping may have different meanings when applied to different social outcomes. • Social grouping reduces variability in a social outcome. More reduction, more significance is a social grouping. • There are always within-group variation -- variability not explained by social grouping. • There is a tradeoff between parsimony (of social grouping) and accuracy (reduced variability): a more detailed grouping scheme results in a larger reduction of variability.

  18. Third Principle • Patterns of population variability vary with social context, which is often defined by time and space. • “Social Context Principle”

  19. Different “Regimes” of Variability • Social contexts are different from social groups in that the former are self-contained social systems with natural boundaries, for example by time and space. • Patterns of individual variability may be governed by “relationships” between individuals that are not reducible to individuals’ attributes. • Patterns of individual variability may be governed by macro-level conditions such as “social structure,” “political structure,” or “culture,” which may be discontinuous and fixed. • Collective action may lead to changes of macro-level conditions and human relationships –major sources of social change. [Premise of Marxism.]

  20. A Detailed Look: Implications for Regression Analysis with Survey Data • Setup: • A population with N individuals. • There is an outcome of interest, sat Y that is measured on the real line. • There is an independent variable of interest, say D. For simplicity, let us assume that D is a binary “treatment,” D=1 (T), D=0 (C). This is the simplest case. Let us call it “canonical case”

  21. Canonical Case Examined • What is the causal effect of treatment D? • It is the counterfactual effect for the ith individual: YiT - YiC However, we either observe YiT when Di =1 or YiC when Di =0. • Conclusion: it is not possible to identify individual-level causal effect without assumptions.

  22. At Another Extreme • We can impose a strong, unrealistic assumption that all individuals are homogeneous (an assumption often made in physical science), then we have YiT = YT ; YiC = YC We only need two observations to identify the causal effect:YT when D=1 andYC when D=0. • Implication: it is because of population variability that makes “scientific sampling” necessary.

  23. Now Consider the Usual Case • Population is divided into two subpopulations: P1 if Di =1, P0 if Di=0. • Use the following notations: • q = proportion of P0 in P • E(Y1T) = E(YT|D=1) , E(Y1C) = E(YC|D=1) • E(Y0T) = E(YT|D=0) , E(Y0C) = E(YC|D=0) • By total expectation rule: • E(YT - YC) = E(Y1T – Y1C)(1-q) + E(Y0T – Y0C)q = E(Y1T – Y0C) - E(Y1C – Y0C) - (d1-d0)q, where d1 =E(Y1T – Y1C), d0 =E(Y0T – Y0C). • Or: • E(Y1T – Y0C) = E(YT - YC) + E(Y1C – Y0C) + (d1-d0)q.

  24. In Other Words The “simple” estimator E(Y1T – Y0C) contains two sources of biases: • The average difference between P1 and P0 in the absence of treatment. ( “heterogeneity bias.”) • The difference in the average treatment effect between P1 and P0. ( “endogeneity bias.”) • Both sources of bias average to zero under randomized assignment.

  25. In Regression Language • Yi = a + diDi + ei Two types of variability: • Heterogeneity bias: ei. If corr(e,,D)≠0, => heterogeneity bias. • Endogeneity bias: di If corr(d,,D)≠0, => endogeneity bias.

  26. Comments • Comment 1: D is a random variable containing heterogeneity • Comment 2: Heterogeneity bias may result from “omitted variable biases.” • Comment 2: Endogeneity bias may result from rational “anticipatory behavior.” • Comment 3: Endogeneity means that variability in Y could be enhanced (or reduced) by treatment D. • Comment 4: this model is not estimable. Needs to be “constrained.” (Assumptions) • Common assumptions: • corr(e,,D)≠0 • di = d.

  27. A General Lesson • In reviewing Manski's book, I stated (1996, AJS):“When observed data are thin, it takes strong assumptions to yield sharp results. There is no free information in statistics. Either you collect it, or you assume it.”

  28. Using Social Grouping to Control for Heterogeneity • Assuming no endogeneity bias (which is more difficult to handle). • Social grouping always reduces variability => implies more within-group homogeneity. • We may assume that meaningful heterogeneity can be captured by social grouping. • Assumption of conditional independence:e┴ D|X • Change regression to: Yi = a + dDi +b’Xi + ei • Estimable (via OLS or ML)

  29. Comments • Comment 1: For X to do this, it needs to be correlated with D (“correlation condition”) and affects Y (“relevance condition”). • Comment 2: X should be pre-treatment, determining both D and Y structurally. • Comment 3: There are other research designs: • Propensity score. • Instrumental variable estimation. • Fixed effects model. • Heckman-type selection models (which also handles endogeneity bias). • Quasi-experiments. • They all depend on strong, untestable assumptions.

  30. Accounting for Heterogeneous Responses • More difficult to handle, a degree of freedom problem. • Possible with nested data, assuming that patterns of relationships are homogeneous (or following a distribution) within social contexts (by time or space). • dk is allowed to vary across k (k=1,…K), social context, but is fixed within k. For example: • Yik = ak + dkDik + eikak = l+fzk+mk dk = g1+szk+nk • Application of the Social Context Principle.

  31. Comments • Comment 1: It is possible to impose a parametric assumption on individual-level di, but the results are dependent on the assumption. (Bayesian approach.) • Comment 2: Nested (or hierarchical) structure could be used for studying time variation or spatial variation. A key assumption: there are common features that are shared by different elements in a common social context. • Comment 3: If variations across social contexts are systematic, they can be modeled (i.e., multi-level models, or hierarchical linear models, random-coefficient models, and growth-curve models). If such variations are left as observed (or saturated), we have the fixed-effects model.

  32. Concluding Remarks • (1) Sampling is important, since we can only discuss aggregated properties of a population. • (2) Descriptive studies are informative and arguably the only achievable thing we can do, without strong assumptions. • (3) Randomized experiments can’t solve our problems entirely, because it’s hard to generalize experimental results to the population. • (4) Statistics, while imperfect, are the only tools in social science to characterize heterogeneity. • I.e., counterfactual effects are inherently not estimable at individual level.

  33. Concluding Remarks (Continued) • (5) Statistical results are meaningful only when interpreted in reference to a population of interest. • I.e., aggregate results are essentially weighted. • (6) Causality is always probabilistic. • (7) The distinction between the “effects of causes”, and the “causes of effects.” (Identification problem). • (8) There is an asymmetry between causes and consequences. • Effects are always populational attributes • Causes may not be social – thus non-populational. • (9) Theory is important because we always need to make assumptions in statistical analysis (e.g., econ).

More Related