190 likes | 378 Vues
Data Science Incubator. ß. This morning. Context: A Data Science Environment Data Science Studio Pilot Incubator Program Discussion. A 5-year, $37.8 million cross-institutional collaboration. Establish a virtuous cycle. 6 working groups, each with 3-6 faculty from each institution.
E N D
This morning • Context: A Data Science Environment • Data Science Studio • Pilot Incubator Program • Discussion
Establish a virtuous cycle • 6 working groups, each with • 3-6 faculty from each institution
Pilot Program Organizers • Andrew Whitaker, Research Scientist • Dan Halperin, Director of Research, Scalable Data Analytics • Jake Vanderplas, Director of Research, Physical Sciences • Bill Howe, Associate Director
The Data Science Studio • An open collaborative research space • A resident data science team • Permanent staff of ~5 data scientists – applied research and development • ~15-20 data science fellows (research scientists, visitors, postdocs, students) • How to Engage: • Drop-in open workspace • Studio “Office Hours” • Incubation Program …plus seminars, sponsored lunches, workshops, bootcamps, joint proposals...
A partnership among … • Provost • UW Libraries • Physics, Astronomy, Arts & Sciences • eScience Institute 6th floor Physics Astronomy Building
Estimated Timeline: • Design Phase Jan-June • Construction June – Sep • Target: October 1, 2014
Incubator Program Overview • Goal:Create watercooler opportunities and scale our efforts by co-locating collaborations from different fields in the studio • Protocol: ~1-page proposals for 1-quarter, on-site data science collaborations with us • What we're looking for:Projects where fruitful collaboration is possible, with potential for significant impact, and that have sustained engagement • This meeting: Pilot program for Spring Quarter to inform full launch Fall 2014. http://data.uw.edu/incubator
Spring Incubator Pilot Program Logistics • Applications due online 3/10 • Each proposal identifies a Project Lead (PL) • The person doing the work, not the thesis advisor • Incubator participants join the studio 2 days/week • Days decided collectively by participants and team • Pilot program operates out of Sieg 326 • Milestones at 3, 6, 9 weeks • blog posts + demo, visualization, IPython notebook, dataset, GitHub repo, preliminary results, etc. • Networking/poster session during 9th week
Areas of interest • scalable data management and analytics • learning and predictive models • interactive visualization • parallel algorithms • code review, publishing, and reproducibility • online teaching materials, tutorials
A Live SeaFlow Dashboard Francois Ribalet Ginger Armbrust Jarred Swalwell d1 FSC (Forward scatter) Nozzle Microscope Objective Red fluo Lens Pine Hole Laser d2 Orange fluo
SeaFlow Ambitions • SeaFlow is a huge success! NSF wants one on every R/V
SeaFlow Ambitions • Underway biology should enable adaptive sampling - a sort of “holy grail” • How can remote collaborators participate? • What about citizen science? “Wait! We saw a populationchange between P3 and P4!”“Let’s go back!”
A Live SeaFlow Dashboard Where is the ship? What is it doing? Is the instrument working? What phytoplankton populationsare we seeing?
The AscotDB Project • A multi-year collaboration between UW Astronomy and UW Computer Science researchers and students • ASCOT = the AStronomyCOllaborativeToolkit • Goal: Provide an interactive and collaborative environment for analysis of astronomical data.
The AscotDB Project • Interacting browser-based widgets for generating database queries & associated visualization. • The resulting visualizations can be shared with collaborators through a browser URL
Pilot cohort desiderata • good clustering • alignment with sponsor and program goals • new directions, new questions • availability, engagement, commitment • “do only what we can only do together” • with apologies to Djikstra • clarity and shovel-readiness • capacity for measurable outcomes
Spring Schedule • 3/10: Proposals due • 3/14: Follow-up requests • 3/21: Pilot participants notified • 3/31: Spring program start date • 4/21: First milestone • 5/12: Second milestone • 6/2: Third milestone • 6/6: Poster/networking event