Describing Childhood Diet with Cluster Analysis: Insights from Young Statisticians' Meeting 2011

Andrew Smith Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011

Describing diet with cluster analysis • Pauline M. Emmett • P. Kirstin Newby • Kate Northstone • World Cancer Research Fund • MRC, Wellcome Trust, University of Bristol

Outline • Introductions • ALSPAC • Food frequency questionnaires • Dietary patterns • Cluster analysis • k-means cluster analysis • Results • 3 cluster solution • Associations with socio-demographic variables

ALSPAC • Avon Longitudinal Study of Parents and Children • Birth cohort study • 14,541 pregnant women and their children • www.bris.ac.uk/alspac

Food frequency questionnaires

Dietary patterns • Examine diet as a whole • Analyse multivariate FFQ data • Use correlations between foods • PCA • Cluster analysis Image: Paul / FreeDigitalPhotos.net

Cluster analysis • Separate subjects into non-overlapping groups • Based on ‘distances’ between individuals • Unsupervised learning Image: Boaz Yiftach / FreeDigitalPhotos.net

k-means cluster analysis • Most widely used for dietary patterns • Number of clusters, k, is specified beforehand • Minimises • Distance from each subject to his/her cluster mean • Summed over all subjects in that cluster • Summed over all clusters

k-means cluster analysis

Problems with the standard algorithm • Short-sighted • Tends to find solutions that are at a local minimum • So run algorithm 100 times and choose solution that is minimum out of all minima

Standardising the input variables

Reliability of the cluster solution • Split sample in half • Perform separate analyses on each half • See how many children change clusters • Repeat 5 times • 32 out of 8,279 children changed cluster (0.4%)

4177 children Processed Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net

2065 children Plant-based Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net

2037 children Traditional British Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net

Associations with socio-demographic vars

Summary • Multivariate methods to compress FFQ data into dietary patterns • k-means cluster analysis is widespread but must be applied carefully • Processed, Plant-based and Traditional British clusters in 7-year-old children • Associated with various socio-demographic variables

Describing Childhood Diet with Cluster Analysis: Insights from Young Statisticians' Meeting 2011