Sensitivity of Inequality Measures to Extreme Values: Insights from Empirical Analysis

Inequality: Empirical Issues Inequality and Poverty Measurement Technical University of Lisbon Frank Cowell http://darp.lse.ac.uk/lisbon2006 July 2006

Motivation • Interested in sensitivity to extreme values for a number of reasons • Welfare properties of income distribution • Robustness in estimation • Intrinsic interest in the very rich, the very poor.

Sensitivity? • How to define a “sensitive” inequality measure? • Ad hoc discussion of individual measures • empirical performance on actual data (Braulke 83). • not satisfactory for characterising general properties • Welfare-theoretical approaches • focuses on transfer sensitivity (Shorrocks-Foster 1987) • But does not provide a guide to the way measures may respond to extreme values. • Need a general and empirically applicable tool.

Preliminaries • A large class of inequality measures: • Define two moments: • Can be written as:

The Influence Function • Mixture distribution: • Influence function: • For the class of inequality measures: • which yields:

Some Standard Measures • GE: • Theil: • MLD: • Atkinson: • Log var:

…and their IFs • GE: • Theil: • MLD: • Atkinson: • Log var:

Special case • The Gini coeff: • The IF: • where:

z  0 z   GE a > 1 - za 0 < a  1 - z a = 0 log z z a < 0 za z Log Var [log z] 2 z Gini - z Tail behaviour

Implications • Generalised Entropy measures with  > 1 are very sensitive to high incomes in the data. • GE ( < 0) are very sensitive to low incomes • We can’t compare the speed of increase of the IF for different values of 0 <  < 1 • If we don’t know the income distribution, we can’t compare the IFs of different class of measures. • So, let’s take a standard model…

Singh-Maddala c = 1.7 c = 1.2 c = 0.7

Using S-M to get the IFs Good model of income distribution of German households • Use these to get true values of inequality measures. • Obtained from the moments: • Take parameter values a=100, b=2.8, c=1.7 • Normalise the IFs • Use relative influence function

IFs based on S-M Gini Gini Gini Gini

IF using S-M: conclusions • When z increases, IF increases faster with high values of a. • When z tends to 0, IF increases faster with small values of a. • IF of Gini index increases slower than others but is larger for moderate values of z. • Comparison of the Gini index with GE or Log Variance does not lead to clear conclusions.

A simulation approach • Use a simulation study to evaluate the impact of a contamination in extreme observations. • Simulate 100 samples of 200 observations from S-M distribution. • Contaminate just one randomly chosen observation by multiplying it by 10. • Contaminate just one randomly chosen observation by dividing it by 10. • Compute the quantity Contaminated Distribution Empirical Distribution

Contamination in high values RC(I) 100 different samples sorted such that Gini realisations are increasing. • Gini is less affected by contamination than GE. • Impact on Log Var and GE (0<a1 is relatively small compared to GE (a<0) or GE (a>1) • GE (0a1) is less sensitive ifais smaller • Log Var is slightly more sensitive than Gini

Contamination in low values RC(I) 100 different samples sorted such that Gini realisations are increasing. • Gini is less affected by contamination than GE. • Impact on Log Var and GE (0<a1 is relatively small compared to GE (a<0) or GE (a>1) • GE (0a1) is less sensitive ifais larger • Log Var is more sensitive than Gini

Influential Observations • Drop the ithobservation from the sample • Call the resulting inequality estimate Î(i) • Compare I(F) with Î(i) • Use the statistic • Take sorted sample of 5000 • Examine 10 from bottom, middle and top

Influential observations: summary • Observations in the middle of the sorted sample don’t affect estimates compared to smallest or highest observations. • Highest values are more influential than smallest values. • Highest value is very influential for GE (a = 2) • Its estimate should be modified by nearly 0.018 if we remove it. • GE (a = –1) strongly influenced by the smallest observation.

Extreme values • An extreme value is not necessarily an error or some sort of contamination • Could be an observation belonging to the true distribution • Could convey important information. • Observation is extreme in the sense that its influence on the inequality measure estimate is important. • Call this a high-leverage observation.

High-leverage observations • The term leaves open the question of whether such observations “belong” to the distribution • But they can have important consequences on the statistical performance of the measure. • Can use this performance to characterise the properties of inequality measures under certain conditions. • Focus on the Error in Rejection Probability as a criterion.

Davidson-Flachaire (1) • Even in very large samples the ERP of an asymptotic or bootstrap test based on the Theil index, can be significant • Tests are therefore not reliable. • Three main possible causes : • Nonlinearity • Noise • Nature of the tails.

Davidson-Flachaire (2) • Three main possible causes : • Indices are nonlinear functions of sample moments. Induces biases and nonnormality in estimates. • Estimates of the covariances of the sample moments used to construct indices are often noisy. • Indices often sensitive to the exact nature of the tails. A bootstrap sample with nothing resampled from the tail can have properties different from those of the population. • Simulation experiments show that case 3 is often quantitatively the most important. • Statistical performance should be better with MLD and GE (0 < a < 1 ), than with Theil.

Empirical methods Empirical Distribution • The empirical distribution Indicator function • Empirical moments • Inequality estimate

Testing • Variance estimate • For given value I0 test • Test statistic

Bootstrap • To construct bootstrap test, resample from the original data. • Bootstrap inference should be superior • For bootstrap sample j, j = 1,…,B, a bootstrap statistic W*j is computed almost as W from the original data • But I0 in the numerator is replaced by the index Î estimated from the original data. • Then the bootstrap P-value is

Error in Rejection Probability: A • ERPs of asymptotic tests at the nominal level 0.05 • Difference between the actual and nominal probabilities of rejection • Example: • N = 2 000 observations • ERP of GE (a =2) is 0.11 • Asymptotic test over-rejects the null hypothesis • The actual level is 16%, when the nominal level is 5%.

Error in Rejection Probability: B • ERPs of bootstrap tests. • Distortions are reduced for all measures • But ERP of GE (a = 2) is still very large even in large samples • ERPs of GE (a = 0.5, –1) is small only for large samples. • GE (a=0) (MLD) performs better than others. ERP is small for 500 or more observations.

a N=50,000 N=100,000 2 0.0492 0.0415 1 0.0096 0.0096 0.5 0.0054 0.0052 0 0.0024 0.0043 –1 0.0113 0.0125 More on ERP for GE What would happen in very large samples?

ERP: conclusions • Rate of convergence to zero of ERP of asymptotic tests is very slow. • Same applies to bootstrap • Tests based on GE measures can be unreliable even in large samples.

Sensitivity: a broader perspective • Results so far are for a specific Singh-Maddala distribution. • It is realistic, but – obviously – special. • Consider alternative parameter values • Particular focus on behaviour in the upper tail • Consider alternative distributions • Use other familiar and “realistic” functional forms • Focus on lognormal and Pareto

Alternative distributions • First consider comparative contamination performance for alternative distributions, same inequality index • Use same diagrammatic tool as before • x-axis is the 100 different samples, sorted such inequality realizations are increasing • y-axis is RC(I) for the MLD index

Singh-Maddala • Distribution function: • Inequality found from: c = 0.7 (“heavy” upper tail) c = 1.2 c = 1.7

Contamination S-M

Lognormal • Distribution function: • Inequality: s = 0.5 s = 0.7 s = 1.0 (“heavy” upper tail)

Contamination: Lognormal

Pareto a = 1.5 (“heavy” upper tail) a = 2.0 a = 2.5

MLD Contamination Pareto

ERP at nominal 5%: MLD • Asymptotic tests • Bootstrap tests

ERP at nominal 5%: Theil • Asymptotic tests • Bootstrap tests

Comparing Distributions • Bootstrap tests usually improve numerical performance. • MLD is more sensitive to contamination in high incomes when the underlying distribution upper tail is heavy. • ERP of an asymptotic and bootstrap test based on the MLD or Theil index is more significant when the underlying distribution upper tail is heavy.

Why the Gini…? • Why use the Gini coefficient? • Obvious intuitive appeal • Sometimes suggested that Gini is less prone to the influence of outliers • Less sensitive to contamination in high incomes than GE indices. • But little to choose between… • the Gini coefficient and MLD • Gini and the logarithmic variance

The Bootstrap…? • Does the bootstrap “get you out of trouble”? • bootstrap performs better than asymptotic methods, • but does it perform well enough? • In terms of the ERP, the bootstrap does well only for the Gini, MLD and logarithmic variance. • If we use a distribution with a heavy upper tail bootstrap performs poorly in the case of a = 0 • even in large samples.

Sensitivity of Inequality Measures to Extreme Values: Insights from Empirical Analysis