280 likes | 417 Vues
Promoting Good Statistical Practices. Roger Stern - SSC, Reading WMO/FAO training workshop - November 2005. Contents. Understanding the present situation : The need for (basic) training in statistics Past training in statistics Developments in statistical computing
E N D
Promoting Good Statistical Practices Roger Stern - SSC, Reading WMO/FAO training workshop - November 2005
Contents • Understanding the present situation: • The need for (basic) training in statistics • Past training in statistics • Developments in statistical computing • And in statistical analyses • Possibilities for the future • Resources • statistical software (freely available in Africa) • materials to promote good statistical practices • training materials • Spatial analysis • In conclusion • These are exciting times - let’s look forwards not backwards PROMOTING GOOD STATISTICAL PRACTICE
Training in statistics • It is difficult to practice good statistics • unless we have had appropriate training • For example seasonal forecasting • Uses PCA • Spatial methods mentioned in this workshop include: • Kriging, and co-kriging • PCA and clustering • When many staff find more basic concepts difficult • Percentiles and return periods – (show CAST as preview) • Standard errors, etc • So they have to accept (advanced) methods in an unquestioning way PROMOTING GOOD STATISTICAL PRACTICE
Past training in statistics • Training for (non-statistician) users in the past has been problematical • consequently they fear statistics • and hence also statisticians • Similarly, insufficient soft training for statisticians • consequently they sometimes lack communication skills • and marketing skills • and are often side-lined in important development and research projects • just like Met staff perhaps??? PROMOTING GOOD STATISTICAL PRACTICE
Common training problems for non-statisticians • Training is dominated by analysis • with little on data management • or on design • A recipe-book approach is used • hence e.g. overuse of irrelevant significance tests • little understanding of principles • Training emphasises hand computation • for understanding (which they don’t get!) • but not needed later • and little experience of computers for statistical work • Presentation is too mathematical • not conceptual AND often taught by someone who has little interest in the student’s main subject areas PROMOTING GOOD STATISTICAL PRACTICE
RESULT! • Users with near universal dislike of statistics • and statisticians? • strong demand for relevant in-service training in statistics • Most of these past weaknesses in training • are the same for statisticians • who can be too pedantic and inflexible in their advice • and are then feared and ignored, where possible, by potential clients • We see later how this can now easily change • for both statisticians • and for others who need to generate and use statistics PROMOTING GOOD STATISTICAL PRACTICE
Advances in statistical computing • History • 1960’s SAS and SPSS started • A long way back in computer terms • By early 1980’s • Statistics packages well established • Micro-computers appeared – too small for these packages • So lots of other statistics packages • that made the same mistakes as SAS and SPSS a generation earlier • it is easy to write statistical software, but difficult to write good software PROMOTING GOOD STATISTICAL PRACTICE
Statistics packages : THEN • In the 1990’s • Standard statistics packages dominant again • compare other types of software • With some additions e.g. Stata • All command-driven • So you had to learn the language (for SPSS, or SAS) • So people and training courses used just one package • Data transfer between packages was difficult • Training courses often confused • learning the package with learning statistics • c.f. data management – learning concepts or learning Access PROMOTING GOOD STATISTICAL PRACTICE
A big advance….. Windows appeared & EXCELruled the world for better for worse! PROMOTING GOOD STATISTICAL PRACTICE
Statistics packages : NOW • All common packages are in Windows • Very similar interface • Like other Windows software • So very easy to learn • And to add to Excel • so you can still keep your “security blanket” • And easy to add another package • hence not so critical what package is used for statistics training • Data transfer has also become easy • Hardly need a training course • for the software • so can concentrate on training in statistics again! PROMOTING GOOD STATISTICAL PRACTICE
Advances in statistical analysis • The “estuary model” • ever-increasing unity to the methods • this makes training much easier • if we build a solid foundation • special methods are then seen as such PROMOTING GOOD STATISTICAL PRACTICE
Start in 1960’s • In the mountains there were little streams • Regression and • Analysis of variance • These were for normally distributed data • In another valley • parameter estimation was for other distributions, like Poisson and binomial • And leading to another valley • the chi square analysis for categorical data PROMOTING GOOD STATISTICAL PRACTICE
Then • In the late 1960’s • Chi-square tests joined with other ways of looking at multidimensional contingency tables • to become log-linear models • In the early 1970’s • log-linear models • joined probit analysis • into the general stream of generalized linear models • that also included ANOVA and regression • for normal and non-normal data PROMOTING GOOD STATISTICAL PRACTICE
And finally for us here • In the 1980’s • REML started • and is for data at multiple levels • By the 1990’s it had joined the mainstream • and included powerful methods for spatial modelling • So now • same modelling ideas used for a wide range of problems • Making both training and analysis • simpler and more coherent • as long as the trainers know. BUT some are still up in the mountains! PROMOTING GOOD STATISTICAL PRACTICE
So where are we now? • Statistical software has developed • and so has user’s computing skills • Statistical methods have developed • and are easier to use • And the resources to bring the two together • are now being made available • and are becoming accessible throughout • We describe some of these resources • First generally • And then look briefly at methods for spatial modelling PROMOTING GOOD STATISTICAL PRACTICE
Software includes: • SSC-Stat • add-in for Excel to encourage good use • with a tutorial guide • and guides for good tables and good graphs • for example it provides boxplots • Instat+ • first simple statistics package for ‘Excel-lers’ • supports good teaching of statistics • stepping stone to other statistics packages • tutorial guide, introductory guide • and climatic guide, now updated for Instat Version 3 • for example for data summary or training • Genstat • One of the major statistics packages (like SPSS, Systat) • For modern statistical modelling, like GLMs and REML • And good facilities for spatial modelling PROMOTING GOOD STATISTICAL PRACTICE
Genstat • Specially for agricultural applications • And now with added climatic features • Like extremes, and circular plots • Plus a climatic guide PROMOTING GOOD STATISTICAL PRACTICE
Resources for good statistical practice • Good practice guides • Mini-guidesfor statistical sceptics • designed originally to promote good statistical practice in DFID projects • covering design, data management analysis and presentation • a book is now available • And so much more: • Participatory (QQA) stuff, important for Met services • Now a book is available, based on Malawi’s “starter pack” • Data management – where Met services can support other groups PROMOTING GOOD STATISTICAL PRACTICE
Training resources include • Statistical games to help teach statistics • Reading and BUCS • For example PADDY, the rice survey game • Materials for distance learning • Now CAST in general • But can now be adapted for African needs • With support from the Rockefeller foundation PROMOTING GOOD STATISTICAL PRACTICE
Interesting ways of learningTraining software • Statistics concepts through CAST PROMOTING GOOD STATISTICAL PRACTICE
Interesting ways of learningStatistical Games • Simulating a survey based on a real crop cutting survey in Sri Lanka PROMOTING GOOD STATISTICAL PRACTICE
And in climatology • Providing the basic statistical skills • Now through a facilitated e-learning course • Tested in 2005, and provided from 2006 • For staff in HQ and (hopefully) in outstation offices • Because decentralisation is important • Using a specially adapted version of CAST • That can be provided to African Services • You have seen this earlier • Also software (Instat) plus Genstat • Each with their special climatic guide PROMOTING GOOD STATISTICAL PRACTICE
Spatial ideas • More to spatial analysis than just maps • Remember the data – when will you map? • Daily – many “layers” • Annually (e.g. date of start of the season) • Averages – take care of different years at different stations • Example where map does not give the full answer • Southern Zambia – risky for maize • Suggest strategy – say farmers overall have 20% (1 year in 5) risk of replanting • How much seed should be stocked? • Map – very simple 20% everywhere – does it answer the question? • Need spatial correlations – why? PROMOTING GOOD STATISTICAL PRACTICE
GIS and mapping • Many problems can be mapped effectively • Then much “spatial analysis” is descriptive statistics • Selection of subsets, • Transformations to provide new layers • Logical calculations • Etc • This is non-controversial • Simple smoothing to provide contours is the same • As long as the spatial “averaging” e.g. splines, inverse distance is recognised as such • But kriging, etc is moving into inferential ideas • And statistical packages could also be used for such operations PROMOTING GOOD STATISTICAL PRACTICE
Spatial statistics with statistical software • Many statistical packages, e.g. Genstat • Provide some facilities for spatial analysis • For example kriging • And REML – for the future PROMOTING GOOD STATISTICAL PRACTICE
Demonstration • Show two examples of Genstat • First is a simple contour plot • Shows the value of a log file of commands • Second is an example of kriging • Shows more facilities in fitting and plotting • Other facilities include • Co-kriging • REML for “proper” spatial modelling • Within which kriging is a special case • More “research” and case studies are needed PROMOTING GOOD STATISTICAL PRACTICE
In conclusion • The time is right: • Statistics has changed • Training methods can change • The resources are here • And in Africa: • Evidence-based decision making is (more) encouraged • Met Services are key organisations • Because climatic data are needed in so many applications Challenge: How will you proceed?? PROMOTING GOOD STATISTICAL PRACTICE