Martin Biewen (Goethe University Frankfurt) Stephen Jenkins (University of Essex)

Variance estimation for Generalized Entropy and Atkinson inequality indices: the complex survey data case Martin Biewen (Goethe University Frankfurt) Stephen Jenkins (University of Essex) Presentation at 4th German Stata User Group Meeting, Mannheim, 31 March 2006

Inequality indices: measures of the dispersion of a distribution • Imposition of a small number of axioms substantially restricts functional form that indices may have • Axioms for • Anonymity • Scale invariance • Replication invariance • Normalization • Principle of Transfers: mean preserving spread in increases

Classes of inequality measures satisfying the axioms for • Generalized Entropy • Advantage: subgroup decomposability transfer sensitivity

Classes of inequality measures satisfying the axioms • Atkinson index • Advantage: welfare interpretation • Gini coefficient • Advantage: most well-known inequality index inequality aversion

Estimation of inequality indices • These indices are routinely calculated by many analysts … • The most commonly-used programs among Stata users are ineqdeco and inequal7 (available usingssc) • But only rarely do analysts report estimates of the associated sampling variances (or SEs) of the esti-mates!

Estimation of inequality indices • Analytical derivations to date have omitted some important situations (and indices) • Most derivations assume i.i.d. observations (cf. survey clustering or other sample dependencies!), and don‘t consider probability weighting (cf. strati-fication!) • The methods that do exist are not ‘well known’ • Lack of available software • But cf. geivars (Cowell (1989), linearization methods; i.i.d. assumptions) and ineqerr (bootstrap), both available using ssc

What we provide • Estimates of indices and associated sampling varian-ces for all members of the GE and Atkinson classes, while also … • Accounting for clustering and stratification, and for the i.i.d. case • Analytical results (see our paper) and new Stata programs (version 8.2): svygei and svyatk • Based on Taylor-series linearization methods com-bined with a result from Woodruff (JASA, 1971). • Results don‘t apply to Gini coefficient.

Overview of analytical derivation • Write estimator of each index as a function of popula-tion totals (involves sums over clusters, weights etc.) • (Taylor-series approximation) Variance of each esti-mator can be approximated by variance of 1st order ‘residual’ • As is, each expression is not easily calculated … • But (Woodruff): reversing order of summation in ‘residual’→ estimation is equivalent to derivation of a sampling variance of a total estimator for which one can apply standard svymethods

The programs: svygei and svyatk svygei varname [if exp] [in range] [,alpha(#) subpop(varname) level(#) svyatk varname [if exp] [in range] [,epsilon(#) subpop(varname) level(#) • Where, of course, the data have first been svyset. • How data are organised, and described using svyset is of crucial importance … Calculations for (use alpha(#) option to chose one other than ) Calculations for (use epsilon(#) option to chose one other than )

Survey data set-up for estimation of inequality among individuals 1) Observation unit is person; sampling unit is household; all persons in each household attributed with the equivalised income of the house-hold to which they belong; individual sample weight available (‘xwgt’) but no information about PSU or strata: 2) As 1), except also know PSU and strata information (includes allowance for within-household correlation): 3) Observation unit is household; sampling unit is household; weight (‘xhhwgt’)= household sample weight household size; no information about PSU or strata svyset [pw=xwgt], psu(hh_id) svyset [pw=xwgt], psu(PSU_id) strata(STRATA_id) svyset [pw=xhhwgt] → i.i.d. case

Illustration • German Socio-Economic Panel (GSOEP), wave 18 data (2001) used as a cross-section • 12,939 individuals in 5,195 households; 1004 PSUs (‘psu’), 169 strata (‘strata’) • Equivalized (‘square-root equivalence scale’) post-tax post-benefit household income (‘eq’) • Each individual attributed with the equivalised income of her household (→ ‘clustering’ within households) • Even if survey does not include PSU and strata identifiers, you should account for this (use house-hold identifier as PSU variable)

Generalized Entropy indices . ssc install svygei_svyatk . version 8.2 . svyset [pweight=xwgt], psu(psu) strata(strata) . svygei eq Complex survey estimates of Generalized Entropy inequality indices pweight: xwgt Number of obs = 12939 Strata: strata Number of strata = 169 PSU: psu Number of PSUs = 1004 Population size = 31487411 --------------------------------------------------------------------------- Index | Estimate Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- GE(-1) | .1179647 .00614786 19.19 0.000 .1059151 .1300143 MLD | .1020797 .00495919 20.58 0.000 .0923599 .1117996 Theil | .1027892 .0058706 17.51 0.000 .091283 .1142954 GE(2) | .1201693 .00962991 12.48 0.000 .101295 .1390436 GE(3) | .1713159 .02301064 7.45 0.000 .1262159 .2164159 ---------------------------------------------------------------------------

Atkinson indices . svyset [pweight=xwgt], psu(psu) strata(strata) . svyatk eq Complex survey estimates of Atkinson inequality indices pweight: xwgt Number of obs = 12939 Strata: strata Number of strata = 169 PSU: psu Number of PSUs = 1004 Population size = 31487411 --------------------------------------------------------------------------- Index | Estimate Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- A(0.5) | .0496963 .0025263 19.67 0.000 .0447448 .0546477 A(1) | .0970424 .00447794 21.67 0.000 .0882658 .105819 A(1.5) | .1434968 .00616915 23.26 0.000 .1314055 .1555881 A(2) | .1908923 .00804946 23.71 0.000 .1751157 .206669 A(2.5) | .2432834 .01237288 19.66 0.000 .219033 .2675338 ---------------------------------------------------------------------------

Subpopulation option . gen female = sex==2 . svygei eq, subpop(female) Complex survey estimates of Generalized Entropy inequality indices pweight: xwgt Number of obs = 12939 Strata: strata Number of strata = 169 PSU: psu Number of PSUs = 1004 Population size = 31487411 Subpop: female, subpop. size = 16499055 --------------------------------------------------------------------------- Index | Estimate Std. Err. z P>|z| [95% Conf. Interval] ---------+----------------------------------------------------------------- GE(-1) | .112828 .00573308 19.68 0.000 .1015914 .1240646 MLD | .0994741 .00471331 21.10 0.000 .0902362 .1087121 Theil | .0998958 .00543287 18.39 0.000 .0892476 .110544 GE(2) | .1151464 .00877057 13.13 0.000 .0979564 .1323364 GE(3) | .1596125 .02029283 7.87 0.000 .1198392 .1993857 ---------------------------------------------------------------------------

Empirical illustration in our paper • GSOEP income data for 2001 (same as used here) • British Household Panel Survey for 2001 (9,979 indi-viduals in 4,058 households; 250 PSUs, 75 strata) • Results: • Inequality larger in Britain than in Germany, for all indices, and difference is statistically significant • z-ratios (index  SE) vary from 7.5 to 23.9 (DE) and 5.1 to 31.9 (GB), being smallest for top-sensi-tive indices and largest for middle-sensitive indices • Although sample larger in Germany, z-ratios are not always smaller (→ different sample designs)

Empirical illustration (ctd.) reject

Empirical illustration (ctd.) • Effects of different assumptions about survey design on sampling variance estimates? • For each index, the estimated standard error is larger if one accounts for survey clustering and stratification (unsurprising), but … • Results suggest that accounting for survey design features per se have little (additional) effect on variance estimates as long as the replication of incomes within multi-person households is ac-counted for

Conclusions • Researchers now have the means to estimate samp-ling variances for most of the inequality indices in common use, accomodating a range of potential assumptions about design effects Topics for future research: • GE indices are additively decomposable by popula-tion subgroup (→ineqdeco): extend results here to the components of decompositions • Extend results to Gini coefficient and other measures based on order-statistics (Lorenz curves etc.)

Selected references • Biewen, M. and Jenkins S.P. (2006): Estimation of Generalized Entropy and Atkinson indices from com-plex survey data, forthcoming in: Oxford Bulletin of Economics and Statistics • Cowell, F.A. (2000): Measurement of inequality, in A.B. Atkinson and F. Bourguignon (eds), Handbook of Income Distribution, Vol. 1, Elsevier, Amsterdam • Woodruff, R.S. (1971): A simple method for approxi-mating the variance of a complicated estimate, Jour-nal of the American Statistical Association, 66, 411-4

Martin Biewen (Goethe University Frankfurt) Stephen Jenkins (University of Essex)

Martin Biewen (Goethe University Frankfurt) Stephen Jenkins (University of Essex)

Presentation Transcript

Semantic Web Services - An Introduction

Goethe University Frankfurt

Scaling-up education reforms in Kenya: An evaluation of the nationwide Teacher Internship Programme

Autonomous Cleaning of Corrupted Scanned Documents A Generative Modeling Approach

Johann Wolfgang von Goethe

Neutron capture cross sections of 70,72,73,74,76 Ge at n_TOF EAR-1

Johann Wolfgang Goethe-Universität – in the heart of Germany and Europe

Martin Biewen, University of Mainz, IZA, DIW

Frankfurt, 11 th November 2004

The implications of IFRS adoption on the valuation of loan loss allowances

Goethe University Frankfurt, Germany

Dagstuhl April 2008

Why do Dissolution Testing? Prof. Dr. Jennifer Dressman Johann Wolfgang Goethe University

T. Staeger , J. Grieser and C.-D. Schönwiese

GENDER AND BANKING: ARE WOMEN BETTER LOAN OFFICERS?

Department of Finance, Goethe University Frankfurt, Germany

Industry / Academic Partnerships for XML

Johann Wolfgang Goethe-Universität Frankfurt am Main