1 / 136

Designing and Creating applications built on R

Designing and Creating applications built on R. Richard Pugh, Andy Nicholls & Chris Campbell 23 rd October 2012. Thank you for the invitation to speak tonight. Richard Pugh Principal R Consultant & Co-Founder. Andy Nicholls Senior R Consultant. Chris Campbell Senior R Consultant. Agenda.

Télécharger la présentation

Designing and Creating applications built on R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing and Creating applications built on R Richard Pugh, Andy Nicholls & Chris Campbell 23rd October 2012

  2. Thank you for the invitation to speak tonight

  3. Richard Pugh Principal R Consultant & Co-Founder Andy Nicholls Senior R Consultant Chris Campbell Senior R Consultant

  4. Agenda • Who are Mango Solutions? • Why Build Analytic Applications on R? • Formal R Application Development • Case Studies • The R Community • Discussion

  5. Who are Mango Solutions?

  6. Overview of Mango Solutions • Private Company formed in 2002 • Global Team of ~70 • Cross-Sector Software and Services • ISO 9001 Accredited

  7. Located here ... Bath, UK London, UK Shanghai, CN Basel, CH

  8. Spend a lot of time here ...

  9. The Beginning: October 2002 • Started by 2 ex-Insightful colleagues • Sales Guy (BO, Cognos etc) • Techy Guy (S+, SAS, R etc) • Idea to deploy predictive analytics to business users

  10. Why Mango? • Early awful ideas • DataStatz • Stats Entertainment • VizUStat • Stats2U • In the end, named after my colleagues cat

  11. What we do? R Training Code Creation Consultants Validation Support

  12. What we do? Developers Consultants Analytic Application Development

  13. Mango Key Industries • Mango work across sectors: • Pharmaceuticals • Mango Imaging • Finance • Energy • Sensory

  14. Why Build Analytic Applications on R?

  15. Why Analytics? • Analytics can help people answer all sorts of questions • I believe there is no company in the world today who cannot benefit from analytics in some way • More and more people are realising it

  16. Who is a good driver? What bonus should I pay? How do we win more games? Will someone like this? When might this break? What are they likely to want?

  17. Why build Analytic Applications? • 3 key reasons we see: • To deploy analytical tools to decision makers • To make an analysts life more efficient • To add rigour to an analysts workflow

  18. Deploying Analytics • Adding analytics into a business process can mean more informed decisions can be made • Complex analytics shouldn’t be attempted by non-analysts • Means there is a communication between the decision maker and the analyst

  19. Deploying Analytics • If we build an application which … • is easy for the decision maker to use • contains the correct analysis to apply • communicates analytical results in suitable manner • … this leads to some major benefits!

  20. Benefits for the Analyst Benefits for the Decision Maker No need to wait for information Can perform “what if” analysis Decision not dependent on analyst availability Less need to perform often-repetitive tasks Comfortable that the “right” analysis is being run Can get on with more strategic things?

  21. Analytic App Structure User Interface Analytic Outputs Analytic Engine Data Storage Data Code Mgment Analytic Code

  22. Why build Analytic Applications on R? Building applications requires installing analytic engine on desktops, servers, clusters, clouds R is license free Building analytic applications involves integrating an analytic engine with other technologies (data sources, UI etc) R’s open nature means it can be readily integrated

  23. Why build Analytic Applications on R? We want a programmable engine so that it can be readily extended (i.e. no black boxes please) R can be extended by the developer as needed We often want to be able to deploy new algorithms and techniques as they become available R is rapidly developed

  24. Formal R Application Development

  25. Formal R Development • Creating sophisticated analytic applications requires a formal development approach • This mostly means taking standard development practices and applying it to analytics • Mango’s formal R development procedures and structure has been evolving since its inception ~2004

  26. Issue Tracking Requirements Quality Manual Project Mgment Behaviour Driven Dev Procedures R Coding Standards StatET runit roxygen2 mangoUtils Continuous Integration Code Review Review board Knowledge Mgment

  27. Issue Tracking Requirements Quality Manual Project Mgment Behaviour Driven Dev Procedures Coding Standards StatET runit roxygen2 mangoUtils Continuous Integration Code Review Review board Knowledge Mgment

  28. Issue Tracking Requirements Quality Manual Project Mgment Behaviour Driven Dev Procedures R Coding Standards StatET testthat roxygen2 mangoUtils Continuous Integration Code Review Review board Knowledge Mgment

  29. Case Studies

  30. Case Studies • These are examples of applications we’ve built that use R in some way • We’re presented a range of information about each including: • Business Reason for the application • Technical Approach • Some Technical Detail where applicable • Things that worked well / things that didn’t

  31. Case Studies • Ranges from information we can fully disclose to only being able to say vague things about the customer • Only so much info we can give today – please see us after or contact us and we can step through things in more detail Richard Pugh = rpugh@mango-solutions.com Andy Nicholls = anicholls@mango-solutions.com Chris Campbell = ccampbell@mango-solutions.com

  32. Case Studies • PKPD Web Modelling Platform • M&S Workflow Platform • Non-Compartmental Analysis Application • Coffee Blend Optimisation Tool • Pipeline Corrosion Forecasting Application • Backtesting Application

  33. Case StudyPKPD Web Platform

  34. Case Study: PKPD ModellingOverview • Pharmacokinetics-pharmacodynamics (PKPD) is the study of the manner in which a drug transitions through the body and its impact on a target disease • PK is highly complex, involving sophisticated non-linear mixed effects modelling approaches

  35. Case Study: PKPD ModellingOverview • Modellers use “NONMEM” software in order to fit these models • Inputs and outputs to NONMEM are a mixture of structured and unstructured textual files • R often used to analyse the outputs in order to assess model fit (see “xpose4” library)

  36. Case Study: PKPD ModellingOverview • PKPD is an evolving and exciting area, with modellers needing flexibility and a variety of tools • However, being within life sciences, rigour around workflows is key in order to satisfy regulatory requirements

  37. Case Study: PKPD ModellingThe Challenge • Build a modern modelling platform that provides rigour whilst allowing the modellers the flexibility they need • Range of technical users from “everything is a shell script” to “which button do I click” • Execution of third party tools (NONMEM, R, SAS, PsN, …) in a controlled manner • Interface to generate reproducible graphics, tables and reports

  38. Case Study: PKPD ModellingThe “R” bit • Where does R fit in? • Many users use R and want to be able to develop scripts and execute them on an internal grid • R used as the graphics engine to support the model evaluation and reporting processes • Users want to be able to execute R interactively with objects in their project

  39. App Server The App Execution Server(s) RPoolMgr MIF Queue MIF Cloud + Others Grid + Others

  40. Case Study: PKPD ModellingWhat is a “Report Item Definition” • Definition of a graph or table that can be executed from Navigator • Consists of snippet of R code, options that may be presented to the user, required columns, and a few other bits • Can be used in a number of situations in the application • Originally XML then stored in Db (XML shown to give a feel for structure on next slide)

  41. Command Definition Report Options Source Data

  42. The App / RPool Manager Text Data Table Graph xml Method xml Method xml Method xml Method Text Item Data Item Table Item Graph Item Data Frame Table Object Character Graphics

  43. Version Control Command Definitions Command Results Execution Engine (Java)

  44. Case Study: PKPD ModellingHow are “RIDs” used? • Created, managed by Super Users (under version control) • Called in a few places in the application: • Directly (create this graph with this data) • In “Run Views” (reports) • In “Comparison Views” (reports that compare models) • In “Template Reports” (tagged docx files)

  45. Case Study: PKPD ModellingOutcome • The app in general was a big success • The “R” part was created as a separate service that we have since reused in a number of other applications (e.g. Lloyds Risk Platform!) • Shame that regulatory rules forced some design which we’re now building alternatives too • Next: interactive graphical presentation

  46. Case StudyM&S Workflow Platform

  47. Case Study: M&S Workflow PlatformOverview • Exciting project for major pharmaceutical company • Possibly the closest we’ve come to deploying an analysts workflow in a scalable platform • Hundreds of pre-clinical (animal) studies are run by a team of ~400 scientists • Analysis performed by roughly 15 advanced modellers • Outcome: most studies not analysed!

  48. Case Study: M&S Workflow PlatformThe Challenge • Idea to create a truly scalable platform to allow bench scientists to run their own analysis • Modeller publishes an analysis “protocol” containing analysis paths, code, and support documentation • Desktop application pulls from central set of protocols and “derives” the interface which is presented to the user • Modelling can put in checks to ensure things look right (e.g. data is of right format, model fit is particularly poor but user seems keep to create predictions from it)

  49. Case Study: M&S Workflow PlatformThe Solution • Eclipse RCP application executing R and NONMEM scripts on an internal LSF grid, with protocols and code held in SVN • Generated workflow “protocol” definition (XML) detailing possible paths in a step, linked to R scripts and NONMEM model code with corresponding dialog • Built “Protocol Developer” Eclipse interface onto repository • RCP application derives analysis paths, UI, options and commentary to guide the end user

More Related