120 likes | 128 Vues
Explaining Statistical Models through Metadata. Andrew Westlake Survey & Statistical Computing & Imperial College London WWW.SASC.CO.UK. OPUS - objectives. Data Integration – the Holy Grail Can we bring together information from multiple datasets
E N D
Explaining Statistical Models through Metadata Andrew Westlake Survey & Statistical Computing & Imperial College London WWW.SASC.CO.UK
OPUS - objectives • Data Integration – theHoly Grail • Can we bring together information from multiple datasets • Can we combine what we already know with evidence from new datasets • In ways that are • Coherent • Formalised • Transparent • Central role of the Statistical Model Model + Knowledgei + Evidencei Model + Knowledgei+1 Explaining Statistical Models through Metadata
Examples of Multiple Data Sources • UK Crime Statistics • Household Surveys – British Crime Survey • Police Statistics – Reported Crime • Different selection mechanisms for who responds and what is reported • Transport for London • Household Surveys (LATS) • On-Board surveys (RODS, BODS) • Road-side counting and interviews • Automatic sensors – ticket gates, road loops • In-car tracking • Census • Different sources give different partial but overlapping views of the same underlying system Explaining Statistical Models through Metadata
Information from Multiple Datasets Explaining Statistical Models through Metadata
Meta-data about Statistical Models • Generic Issue • How to record information about the • Specification • Processing • Associated with Statistical Models • Essential for Opus • Relevant whenever users are distant from modellers • Official Statistics • Archives • Statistical Dissemination in general • Purpose is to support end users • Confidence in Statistical Results • Understanding of model form • Reuse or extension • Structure and Functionality Explaining Statistical Models through Metadata
Statistical Models • Interest in measures which summarise the underlying system • Can be derived - may even be stochastic • Explicit representation of variability • Errors in measurement processes, surrogates for intended measures, variability of respondents • Stochastic (distribution) components in Influence relationships • Relationships between measured (and unobserved) variables • Linear Regression • Generalised Linear Models • Conditional Independence, … • Mathematical forms, deterministic and stochastic • Parameters for Distributions, Relationships and Measures • Fit model using Data • Methods usually tied to model form • Yields estimates of parameters, with precision (uncertainty) Explaining Statistical Models through Metadata
Meta-data in Opus • Analyse Requirements • Generic representation of Statistical Models (not just Opus) • Variables, Parameters, Distributions, Relationships (mathematics) • No new wheels – assume DDI, etc. or equivalent • Design Structures • StatModel – Object design in UML • Descriptions of actual models in XML • Implement Functionality • Presentation Tools • Templates and Applets • Use R service for statistical displays • Demonstration web site • Technical details • Separate discussion Explaining Statistical Models through Metadata
StatModel Components • Multiple Statistical Models of any system • Focus on different sub-systems • Different levels of abstraction • Functional Form of Model Specification • Variables, Parameters • Derivations, constraints and stochastic relationships • Fitting Steps • Links to datasets – how are Data variables linked to Model ones • Methods used and outcomes • Knowledge States • Knowledge (uncertainty distributions) for Parameters • Each Fit produces a new State Explaining Statistical Models through Metadata
Demonstration • Public Demonstration Site • http://155.198.92.106/ • London 2 • Stylesheet – listings, mathematics • Alternative Model Forms • Model Diagram, Process diagram • WP08 • Model Sequence • WP11 • Understanding complex WinBUGS • Show Doodle and Script in WinBUGS • Documentation in StatModel • Influence Diagram in StatModel Explaining Statistical Models through Metadata
Conclusions • Propose structure for storing information about Statistical Models • Seems to work well for us • Refinement and application outside Opus needed • For end users, so must address presentation • Some basic tools demonstrated • Specialised solutions for application domain usually needed • Meta-data capture is difficult issue • Integration into modelling applications • Encourage modellers to document and explain their choices • Scope for further development www.opus-project.org – www.sasc.co.uk Explaining Statistical Models through Metadata
Acknowledgements • Rajesh Krishnan, Imperial College London • Implementation of the web application • Miles Logie, Minnerva • Saikumar Chalisani, ETH Zürich • Contribution to initial ideas about StatModel • Software Used • hyperModel – XMLModeling.com, David Carlson • UML modelling for XML Schema • Formulator - www.hermitech.ic.zt.ua • MathML editor, integrates with XML Spy • XML Spy – www.altova.com • XML Editor and associated applications • JUNG - jung.sourceforge.net/ • Java Graph Visualization and Layout Explaining Statistical Models through Metadata
End Explaining Statistical Models through Metadata