120 likes | 139 Vues
CoDaPack : A tool for Compositional Data Analysis. M. Comas- Cufí & S. Thió-Henestrosa ( marc.comas@udg.edu ) Dept. Computer Sciences and Applied Mathematics University of Girona ( UdG ) Catalonia-Spain. What’s coda?.
E N D
CoDaPack: A toolforCompositional Data Analysis M. Comas-Cufí & S. Thió-Henestrosa (marc.comas@udg.edu) Dept. Computer Sciences and Applied Mathematics University of Girona (UdG) Catalonia-Spain
What’s coda? We are interested in relative values ; total sum is not informative. • Vector x=[x1, x2,…, xD] • Add to a constant: 100, 1, 106, 109, … • Units: percentage, part per one, ppm, ppb, … • Has positive elements • Carry only relative information • Examples • Production (pieces): [Ok, NonOk, Rework] = [87, 1, 12] • Household budget (€): [Food, Serv., Other] = [1150, 623, 351] • Daily activities (h): [Work, Sleep, Other] = [7.5, 7.5, 9]
Sample space of coda: simplex • Compositional data live in the simplex (S) represented in ternary (D=3), quaternary (D=4), … diagram x = [0.45,0.35,0.2] x=[0.2,0.25,0.2 ,0.35] D=3 S3 D=4 S4
Euclidean distance appropriate? B A STOP PROD. HALF PROD. NON-STOP PROD. STOP PROD. HALF PROD. NON-STOP PROD. A2009= [0.2, 0.1, 0.7] B2009= [0.4, 0.3, 0.3] B2010= [0.3, 0.4, 0.3] A2010= [0.1, 0.2, 0.7] A2010- A2009= B2010- B2009= [-0.1, 0.1, 0] de(A)=de(B)=0.14 measures the absolute difference
Euclidean distance appropriate? B A STOP PROD. HALF PROD. NON-STOP PROD. NON-STOP PROD. STOP PROD. HALF PROD. 2009 2010 0.1 0.2 0.7 0.7 0.4 0.4 0.3 0.3 0.3 0.1 0.2 0.3 0.7 0.3 0.3 0.4 0.1 0.2 -50% -25% +100% +33.3% 0% 0%
Euclidean distance appropriate? STOP PROD. Our interest lies on relative values A2010/A2009=[1/2, 2, 1] B2010/B2009=[3/4, 4/3, 1] Euclidian distance: de(A) = de(B) = 0.14 B2009 B2010 A2009 A2010 Aitchisondistance: da(A)=0.6276 da(B) = 0.3970 HALF PROD. NON-STOP PROD.
Log-ratio methodology • Aitchison geometry to CODA is equivalent to classical euclidean geometry to log-ratio values. Simplex (restricted space) Real space (non restricted) [x1,…,xD] log(xi/xj), i,j = 1,…,D, j ≠ i
Software CoDaPack: software developed by the Departament of Computer Science and Applied Mathematics in the Universitat de Girona. Easy and intuitive. http://ima.udg.edu/codapack marc.comas@udg.edu compositions (R-package): analysis of compositional and positive data using different approaches. http://cran.r-project.org/ raimon.tolosana@upc.edu robCompositions (R-package): robust estimation for compositional data http://cran.r-project.org/ templ@tuwien.ac.at
References • Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London. Reprinted in 2003 with additional material byBlackburn Press. • Proceedings of CoDaWork, 2003-2005-2008-2011: available in http://dugi-doc.udg.edu/handle/10256/150. • CoDaWeb: Compositional Data Analysis Web Site: http://www.compositionaldata.com/