1 / 32

Welcome to Winter 2010 RAD Lab Retreat

Welcome to Winter 2010 RAD Lab Retreat. Armando Fox. Welcome. Introductions Progress in last 6 months Preview of project-end demo (Jan. 2011) Preview of retreat demos Breakout topics (at dinner) Retreat logistics. RAD Lab 5-year Mission (unchanged since 2006, except blue text).

Solomon
Télécharger la présentation

Welcome to Winter 2010 RAD Lab Retreat

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to Winter 2010RAD Lab Retreat Armando Fox

  2. Welcome • Introductions • Progress in last 6 months • Preview of project-end demo (Jan. 2011) • Preview of retreat demos • Breakout topics (at dinner) • Retreat logistics

  3. RAD Lab 5-year Mission(unchanged since 2006, except blue text) Enable 1 person to develop, deploy, operate next -generation Internet application • Key enabling technology: Statistical machine learning • debugging, monitoring, power management, auto-configuration, performance prediction, ... • Highly interdisciplinary faculty & students • PI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB) • 2 postdocs, ~30 PhD students, ~6 undergrads • Teaching integrated with research • Grad project courses: cloud computing; SaaS • Lower division ugrad course: intro to Web 2.0 app development • Upper division ugrad course: SaaS development & operations

  4. RAD Lab Support ?

  5. RAD Lab Prototype v2.0 Drivers Drivers Drivers WebApp PIQL “I” in PIQL SCADS NS1 SCADS Dir Dir Chukwa & XTrace (monitoring) Dir Dir NS2 New apps, equipment, global policies (eg SLA) Chukwa trace coll. local OS functions Offered load, resource utilization, etc. Director NS3 SLAs, policies NEXUS Training data Web 2.0 apps SPARK, SEJITS Ruby on Rails environment web svc APIs performance & cost models Log Mining AutomaticWorkload Evaluation (AWE) Hadoop + HDFS MPI KCCA-based M/R scheduling Hadoop + HDFS Chukwa trace coll. Hadoop + HDFS local OS functions VM monitor

  6. RAD Lab Prototype:System Architecture WebApp PIQL Xtrace + Chukwa (monitoring) WebApp PIQL WebApp PIQL “I” in PIQL Dir Dir Dir Dir SCADS NS1 SLA, policies Batch/Analytics NS2 NEXUS NS3 SPARK, SEJITS log mining Hadoop + HDFS MPI KCCA-based M/R scheduling Hadoop + HDFS Hadoop + HDFS

  7. Impact of Above the Clouds • > 30K 54K downloads; ~6K IBM, MS, Cisco • “Circulated to CxOs” of major IT firms • IBM: “profound effect” on datacenter strategy • Short version to appear in March 2010 CACM • Cited by >70 papers, including MS, NIST • 20K+ visits to blog (50% USA, ~5% each Japan/UK/India), 700+ RSS followers • Ongoing dialogue with readers (~1 post/mo.) • Linked from >60 blogs/feeds • Many requests for permission to reprint, translate, include in books

  8. Invited appearances/talks • Internal conferences: Fujitsu, SAP, Google, Univ. of California CC Task Force • Conference appearances • CC panels at ISCA 2009, VMware Acad. Summit @ SOSP 2009 • Invited talks: LISA 2009, IEEE SASO (Self-adaptive, Self-organizing Systems) • High Performance Computing & Infrastructure • World Economic Forum CC panel • “Energy efficiency & CC” @ MS Faculty Summit • Nodalities podcast series

  9. What’s New

  10. What’s New: SCADS • PIQL: Performance Introspective Query Language for SCADS • Enforce performance safety • Generate query plans from primitives (get, get_range, put) and indices • Characterizing & synthesizing workload spikes and data hotspots • Director controlling data movement, replication during changing workload

  11. What’s New: Workload analysis & generation • Chukwa (log collection) integrated with all pieces • Online console log mining to find operational problems • How can we improve console logs? • KCCA-driven scheduling for MapReduce analytics • Introspecting the performance of SCADS (PIQL) queries

  12. What’s New: Infrastructure • Nexus, a substrate for cloud computing • Simultaneously share/schedule resources across interactive and batch • Spark, a Scala-based library for machine learning on cloud computing • Datacenter-in-a-box using RAMP • Very close to emulating 10,000-server/1,000-switch system running real SW • ½ rack FPGA boards (1 board ~ 1 container) • slowdown ~2 orders of magnitude

  13. What’s New: Labs/Projects • Tonight: AMP Lab: Algorithms, Machines, People (Mike Franklin) • current analytics scale poorly despite cloud computing & advances in SML • new ways to gather info: crowds/social networks, simulators, etc. • combine RAD Lab expertise in DB, SML, Cloud to create “cyberspace exploratorium” for large scale analytics • Tomorrow: Berkeley Wireless Research Center: large-scale-app driven wireless & chips (Jan Rabaey) • Wednesday: LoCal: computer scientists look at energy (Randy Katz)

  14. What’s new: Publications • 23 new publications • 15 with Affiliate co-authors • ~10 in first-tier conferences/journals in systems & machine learning • All available from RAD Lab website • Details in Backup Slides

  15. What’s new: Students • Dr. Archana Ganapathi (PhinisheD, on market) • Dr. Arsalan Tavakoli (at McKinsey & Co.) • Dr. Dilip Anthony Joseph (at Conviva) • New undergrads to help write apps! • Allen Chen • Amber Feng • Karl He • Sunil Pedapudi • Marcelo Velloso

  16. Engagement with Affiliates

  17. Presentations & outreach • RAD Lab PI’s at affiliate outreach events • Google Faculty Summit (Fox, Katz, Stoica, Jordan, Franklin) • Microsoft Faculty Summit (Patterson) • Microsoft ALT-TAB (Patterson) • Sun TAB (Patterson) • VMware GoVirtual spotlight (Fox) • RAD Lab faculty invited presentations • Fujitsu America: Cloud computing (Fox) • OpenCirrus summit: Cloud futures (Fox) • Sun Labs: LoCal (Katz) • VMware Academic Summit panel at SOSP 2009: Cloud computing & virtualization (Fox) • LISA 2009: Cloud computing (Fox)

  18. Students working with industry collaborators • Student research visits to affiliates • Nexus: Andy Konwinski, Ben Hindman, Matei Zaharia, Ion Stoica, Scott Shenker (Cloudera, Yahoo!, Facebook) • Console log mining: Wei Xu (HP Labs, Google) • Microsoft site visit (many students & PI’s, ~20 MSR) • Google onsite review (~10 Googlers) • Students interning/collaborating with affiliates • Gunho Lee (HP Labs, with Partha Ranganathan) • Wei Xu (Google; applying console log mining techniques to in-house data) • John Duchi (Google; 4 publications with Yoram Singer et al. on online learning & large scale optimization) • Ganesh Ananthanarayanan (MSR): improving performance of MapReduce/Dryad jobs

  19. Affiliates engagement • Research visits to RAD Lab by Affiliates • Alice Zheng, Dushyanth Narayanan (MSR) • Jesus Molina (Fujitsu America) • Devendra Jaisinghani (eBay) • Yoram Singer (Google) • Frank Steinhans (SAP) • Greg Papadopoulos & Dave Douglas (Sun) • Research visits to Affiliates by RAD Lab PI’s • Amazon: James Hamilton (Shenker) • HP Labs: Prith Banerjee (Katz, Fox) • MSR (Stoica, Shenker) • Yahoo!: Eric14, Owen O’Malley, Surendra Reddy (Shenker)

  20. Demo Plans

  21. Final demo (January 2011) Enable 1 person to develop, deploy, operate next -generation Internet application at scale • 3 SCADS-backed Web apps written by undergrads • 3 analytics jobs using Spark/SEJITS • Running on >=1000 cloud computing nodes • Managed by Nexus • Director scales SCADS storage up/down & replicates • One or more workload spikes/data hotspots • while underlying hardware fails and software crashes • App-driven decisions: relaxed consistency, log mining

  22. Demoable now Enable 1 person to develop, deploy, operate next -generation Internet application at scale • 1 SCADS-backed Web app (SCADr) written by grad • 3 analytics jobs using Spark/SEJITS • Running on >=1000 cloud computing nodes • Not Managed by Nexus • Director scales SCADS storage up/down & replicates • One or more workload spikes/data hotspots • while underlying hardware fails and software crashes • App-driven decisions: relaxed consistency, log mining

  23. Breakout Topics

  24. Breakout topics & leaders • AMP Lab plans (Algorithms, Machines & People) [Mike Franklin, Mike Jordan] • Datacenter storage: RAM, flash, disk? [Dave Patterson] • SCADS—what’s next? [Michael Armbrust, Beth Trushkowsky] • Cloud Programming Beyond MapReduce [Matei Zaharia, Armando Fox] • What are logs good for? [Wei Xu] • Workload spike modeling [Peter Bodik] • What’s (technically) new about cloud security? [Anthony Joseph, Yanpei Chen] • Datacenter energy efficiency: “race to sleep”? [Randy Katz]

  25. Logistics

  26. Logistics • Wifi: local access only during sessions • Check-in; what’s covered • Next break: get keys from Kattt or Sean (NOT check-in desk) • Skiing tomorrow • Transportation, lift tickets on us • Rentals, lessons at your own expense • Show up to morning sessions in ski wear • Bag lunches will be available as you leave

  27. BACKUP SLIDESincluding publications details

  28. Progress: Publications • Efficient Online and Batch Learning with Forward Backward Splitting. John Duchi and Yoram Singer (Google). Journal of Machine Learning Research (JMLR), vol. 11, 2010. • Online and Batch Learning with Forward Backward Splitting. John Duchi and Yoram Singer. Neural Information Processing Systems (NIPS) 2009. • Oral Presentation.Boosting with Structural Sparsity. John Duchi and Yoram Singer. International Conference on Machine Learning (ICML) 2009. • Understanding TCP Incast Throughput Collapse in Datacenter Networks. Yanpei Chen, Rean Griffith et al. Proceedings of the 1st ACM Workshop on Research on Enterprise Networking (WREN 2009). August 2009.

  29. Progress: publications (2) • Statistics-Driven Workload Modeling for the Cloud. Archana Ganapathi, Yanpei Chen et al. Accepted to Workshop on Self-Managing Database Systems (SMDB) 2010. • The nested Chinese restaurant process and Bayesian inference of topic hierarchies. Blei, D., Griffiths, T., and Jordan, M. I. Journal of the ACM. (to appear). • Estimating divergence functionals and the likelihood ratio by convex risk minimization. Nguyen, X., Wainwright, M., and Jordan, M. I. IEEE Transactions on Information Theory. (to appear). • Joint covariate selection and joint subspace selection for multiple classification problems. Obozinski, G., Taskar, B. and Jordan, M. I. Statistics and Computing. (to appear). • Support union recovery in high-dimensional multivariate regression. Obozinski, G., Wainwright, M. and Jordan, M. I. Annals of Statistics. (to appear). • Nonparametric latent feature models for link prediction. Miller, K., Griffiths, T., and Jordan, M. I. Advances in Neural Information Processing (NIPS) 22, (2010).

  30. Publications (3) • Fast approximate spectral clustering. Yan, D., Huang, L. (Intel), and Jordan, M. I. 15th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD), Paris, France. (2009). • On surrogate loss functions and f-divergences. Nguyen, X., Wainwright, M., and Jordan, M. I. Annals of Statistics, 37, 876-904. (2009). • Kernel dimension reduction in regression. Fukumizu, K., Bach, F. R., and Jordan, M. I. Annals of Statistics, 37, 1871-1905. (2009). • Hierarchical Bayesian nonparametric models with applications. Teh, Y. W. and Jordan, M. I. In Bayesian Nonparametrics: Principles and Practice, Cambridge, UK: Cambridge University Press. (2009). • Large-Scale System Problem Detection by Mining Console Logs. W. Xu, L. Huang, A. Fox, D. Patterson, M. Jordan. Proc. SOSP 2009. • Output-Deterministic Replay for Multiprocessor Programs. G. Altekar, I. Stoica. Proc. SOSP 2009. • W. Xu, L. Huang, A. Fox, D. Patterson, M. Jordan. Online Problem Detection by Mining Console Logs. Proc. ICDM 2009.

  31. Publications (4) • P. Bodik, M. Goldszmidt (MSR), A. Fox, H. Andersen (Microsoft), Dawn Woodard. Fingerprinting the Datacenter: Automated Classification of Performance Crises. Proc. EuroSys 2010 (to appear) • A Common Substrate for Cluster Computing.  B. Hindman, A. Konwinski, M. Zaharia and I. Stoica.  HotCloud 2009, June 2009. • Macroscope: End-Point Approach to Networked Application Dependency Discovery, Lucian Popa, Byung-Gon Chun (Intel), Ion Stoica, Jaideep Chandrashekar (Intel), Nina Taft (Intel), in proceedings of the 5th ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT 2009), December 2009 • Rule-based Forwarding (RBF): improving the Internet’s flexibility and security, Lucian Popa, Ion Stoica, Sylvia Ratnasamy (Intel), in proceedings of the Eighth ACM Workshop on Hot Topics in Networks (HotNets 2009), October 2009 • DryadInc: Reusing work in large-scale computations, Lucian Popa, Mihai Budiu (MSR), Yuan Yu (MSR), Michael Isard (MSR), in proceedings of the first USENIX workshop on Hot Topics in Cloud Computing (HotCloud 2009), June 2009 • An Energy Case for Hybrid Datacenters. Byung-Gon Chun (Intel), Gianluca Iannaccone (Intel), Giuseppe Iannaccone, Randy Katz, Gunho Lee, Luca Niccolini. HotPower09, Oct 2009

  32. Web 2.0 apps Analy- tics Other cloud apps SCADS RoR + PIQL Hadoop SEJITS Spark Chukwa / Nexus Cloud API’s (Eucalyptus)

More Related