Distances...

University of Warwick, Department of Sociology, 2014/15SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)Clustering and Scaling(Week 19)

Distances... • Some quantitative techniques derive and/or use distances between variables, or distances between categories within variables, as the basis for the construction of maps or the division of items into sets of similar items. • These include multidimensional scaling, correspondence analysis, and cluster analysis.

Multidimensional scaling (MDS) • MDS is applied to a set of distances between all pairs of categories within a set of categories. • See Coxon (1982); Kruskal and Wish (1978)

Cluster analysis • In cluster analysis, distances between items (cases and/variables) are generated from the raw data, and then used to generate a categorisation of the items. • See Everitt (1993; see also later editions)

Classifying women’s occupations • Dale et al. (1985: see handout) used cluster analysis to develop an ‘alternative’ set of categories for women’s occupations.

The Cambridge Scale • The Cambridge Social Stratification scale was originally derived via the application of multidimensional scaling to occupation-based cross-tabulations matching the occupations of individuals and their ‘associates’. • It subsequently moved in the direction of using correspondence analysis (Prandy 1990; Prandy and Bottero 1998: 2.6 - see handout).

‘Marriage and the Social Order’ • Prandy and Bottero (1998: handout) applied correspondence analysis to occupation-based cross-tabulations to locate occupations on a number of (highly correlated) occupational scales.

Correspondence analysis • Correspondence analysis in effect partitions the relationship in a cross-tabulation (and more specifically the chi-square statistic) into components reflecting a number of underlying dimensions (see Greenacre 2007). • More specifically, the difference between the distributions of values for two categories is split into components reflecting different underlying dimensions.

Association models • More recently the Cambridge scale and international equivalents have tended to use ‘association models’, which are a form of statistical model that echoes aspects of correspondence analysis. • See Goodman, L.A. 1986. ‘Some useful extensions to the usual correspondence analysis approach and the usual loglinear approach in the analysis of contingency tables (with comments)’,. Int. Statist. Rev. 54: 243-309. • See also: http://www.camsis.stir.ac.uk/

Evaluating the NS-SEC • In Rose and Pevalin (2003), various chapters (by Mills and Evans [see extract in handout], Coxon and Fisher, and Fisher) involved the application of cluster analysis, multidimensional scaling, and association models to the relationship between employment relations measures and occupational categories.

More references… • Cluster analysis: Hair, J.F. Jr. and Black, W.C. 2000. ‘Cluster Analysis’. In In L. Grimm and P. R. Yarnold (eds) Reading and Understanding More Multivariate Statistics. Washington, DC: APA Press. • Multidimensional scaling: Stalans, L.J. 1995. ‘Multidimensional scaling’. In L. Grimm and P. R. Yarnold(eds) Reading and Understanding Multivariate Statistics. Washington, DC: APA Press. • Correspondence analysis: Phillips, D. 1995. ‘Correspondence Analysis’, Social Research Update 7. (http://sru.soc.surrey.ac.uk/SRU7.html)

Row and column scores in correspondence analysis These are chosen in such a way that each successive dimension explains as much of the cross-tabulation’s chi-square statistic as possible, by contributing to a contingency hierarchy (see next slide) which is as small a chi-square ‘distance’ as possible from the residuals of the independence model applied to the original cross-tabulation (i.e. from the expected values within the calculation of the chi-square statistic.)

Table 2/5: First contingency hierarchy (from Lampard 1992: 30; residuals in brackets) Calculation of one of the entries: 35.66 = -0.96 x -0.93 x 0.25 x 0.20 x (n=)774

So what’s left? • Note that the five biggest discrepancies between the residuals and the contingency hierarchy are in the third row and/or third column; these are consequently the focus of the second contingency hierarchy. • However, the first contingency hierarchy accounts for 131.6 of the original chi-square statistic of 153.2 (i.e. 85.9%), leaving only 21.6 for the subsequent contingency hierarchies.

Distances...

Distances...

Presentation Transcript

Distances

Distances

Cosmic Distances

Measuring Distances

Cosmic Distances

Great Distances

Astronomical distances

Cosmological Distances

Astronomical distances

Finding distances

Distances

Distances

Sight Distances

Edit Distances

Distances

Astronomical Distances

Astronomical Distances

Astronomical Distances