1 / 22

Computer Science Research

Computer Science Research. Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003 Chicago. Computer Science Research. Introduction & Context (Ian Foster: 30 mins) Vision : Virtual data as e-science enabler

yitta
Télécharger la présentation

Computer Science Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science Research Ian FosterUniversity of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review29-30 January 2003Chicago

  2. Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov

  3. Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov

  4. PetaScale Virtual Data Grids (1) Production Team Research group Individual Investigator Interactive User Tools Request Planning & Scheduling Tools Request Execution & Virtual Data Tools Management Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services • PetaOps • Petabytes Transforms Distributed resources (code, storage, Raw datasource Performance computers, and network) Ian Foster, U.Chicago foster@mcs.anl.gov

  5. Petascale Virtual Data Grids (2) Ian Foster, U.Chicago foster@mcs.anl.gov

  6. Computer Science and GriPhyN Partner Physics Projects Requirements Prototyping & experiments Production Deployment Other linkages: • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Larger Science Community Techniques & software Tech Transfer Globus, Condor, NMI, EU DataGrid, PPDG Communities Ian Foster, U.Chicago foster@mcs.anl.gov

  7. Computer Science Challenges (1) • Virtual data • Representation, discovery, & manipulation of workflows and associated data & programs • Planning • Mapping workflows in an efficient, policy-aware manner to distributed resources • Execution • Executing workflows, including data movements, reliably and efficiently • Performance • Monitoring aspects of system performance for scheduling & troubleshooting Ian Foster, U.Chicago foster@mcs.anl.gov

  8. Computer Science Challenges (2) • Engage meaningfully with physics groups • Provide educational opportunities • Develop, package, deliver, and support quality software • Achieve outreach to groups outside partner physics experiments Ian Foster, U.Chicago foster@mcs.anl.gov

  9. Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov

  10. GriPhyN Computer Science Team • U.Chicago: Dumitrescu, Foster, Iamnitchi, Milligan, Ranganathan, Ripeanu, Voeckler, Wilde • USC/ISI: Deelman, Kesselman, Mehta, Patil, Singh, Vahi • NWU -> TAMU: Taylor, Yin • UCB: Franklin, Liu • UCSD: Marzullo, Moore, Zhang,Jagatheesan • UW-Madison: Alderman, Arpaci-Dusseau, Arpaci-Dusseau, Bailey, Bent, Kosar, Livny, Roy, Stanley, Thain • UF: Arbee, George, Jiang, Katageri, Ranka, Rodriguez • UT Brownsville: Campanelli,Morris,Zamora • LBNL: Shoshani Faculty/Staff, Student/Postdoc (underlined = present) Ian Foster, U.Chicago foster@mcs.anl.gov

  11. Computer Science Research:How do We Work? • System architecture & virtual data toolkit as two overarching organizational mechanisms • Project activities all defined in relationship to these organizing principles: • Research: Explore new techniques to guide evolution of the system architecture and VDT • Development: Construct VDT software • Evaluation: Apply and evaluate VDT software and/or new techniques in context of application challenges Ian Foster, U.Chicago foster@mcs.anl.gov

  12. Computer Science Research:How Are We Coordinated? • The activities of this large, multidisciplinary group are coordinated by frequent and multivalent communications • Face-to-face meetings in large & small groups • Formal and informal documents defining requirements, challenge problems, testbeds • Email, phone calls, videoconferences • Cooperation on challenge problems and technology and application demonstrations • Cooperation on software releases Ian Foster, U.Chicago foster@mcs.anl.gov

  13. GriPhyN Architecture/VDTand CS Research Projects Virtual Data Ontologies (Zhao) Partial Queries (Liu, Franklin) Chimera Virtual Data System + Pegasus Planner Virtual data language design (Voeckler,Wilde) AI Planning (Deelman,Narang) Planning Virtual data language applns (Milligan, Zhao) Decentralized scheduling (Ranganathan) Prophesy (Taylor, Yin) DAGman Workflow Fault-tolerant master-worker (Marzullo) DAGman enhancements (UW team) Policy-aware scheduling (Dumitrescu) Globus Toolkit, Condor, Ganglia, Etc. Execution Scalable replica location service (UC, ISI team) NeST Storage mgmt (UW team) HP monitoring (George) VDT Research Ian Foster, U.Chicago foster@mcs.anl.gov

  14. GriPhyN Arch/VDT—CS ResearchDegree of Coupling Already Underway Virtual Data Ontologies (Zhao) Partial Queries (Liu, Franklin) Pending Chimera Virtual Data System + Pegasus Planner Virtual data language design (Voeckler,Wilde) AI Planning (Deelman,Narang) Planning Virtual data language applns (Milligan, Zhao) Decentralized scheduling (Ranganathan) Prophesy (Taylor, Yin) DAGman Workflow Fault-tolerant master-worker (Marzullo) DAGman enhancements (UW team) Policy-aware scheduling (Dumitrescu) Globus Toolkit, Condor, Ganglia, Etc. Execution Scalable replica location service (UC, ISI team) NeST Storage mgmt (UW team) HP monitoring (George) VDT Research Ian Foster, U.Chicago foster@mcs.anl.gov

  15. Examples of Technology Injection:Chimera R&D Timeline • Chimera-1 • Java code & class model • XML VDL • TR/DV model • Compound TRs • General Grid exec env • Optimized DB schema • Chimera-2 • Type model • Dataset catalog • Metadata • Hyperlinks • Instance tracking • Performance data • Chimera-3 • Knowledge repr. • Policy-driven planners • VD browsers, composers • … • Chimera-0 • Derivations only • Grid exec environment • (prototype) • PERL & PostgresQL TECH 2002 2003 2004 CMS & ATLAS analysis w/ROOT, CLARENS, JAS CMS analysis prototype w/ROOT Sloan cluster-finding science Bio Grid facility … APPS CMS event simulation prototyping Sloan cluster finding Sloan near-earth object ATLAS events-on- demand CMS official event simulation LIGO pulsar search Ian Foster, U.Chicago foster@mcs.anl.gov

  16. Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov

  17. Dissemination: Targets • Researchers and educators • Facilitate creation of new knowledge • Computer science research community • Contribute to knowledge • Engage community in solving our problems • Open source community • Contribute to open Grid technology base • Industry • Contribute to vibrant commercial technology Ian Foster, U.Chicago foster@mcs.anl.gov

  18. Dissemination: Mechanisms • Software • VDT: adoption by LHC Computing Grid • Globus Toolkit and Condor systems • Publications and talks • XX papers, YY tech reports, ZZ talks • Workshops and meetings • E.g., “Data Derivation & Provenance”, Oct 02 • Community activities • E.g., advisory committees, GGF standards Ian Foster, U.Chicago foster@mcs.anl.gov

  19. Representative Publications • Annis, J., Zhao, Y., Voeckler, J., Wilde, M., Kent, S., Foster, I., Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey. SC'2002, 2002. • Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Livny, M., Flexibility, Manageability, and Performance in a Grid Storage Appliance, HPDC’11, 2002. • Deelman, E., Blackburn, K., Ehrens, P., Kesselman, C., Koranda, S., Lazzarini, A., Mehta, G., Meshkat, L., Pearlman, L., Blackburn, K. and Williams., R., GriPhyN and LIGO: Building a Virtual Data Grid for Gravitational Wave Scientists, HPDC’11, 2002. • Foster, I., Voeckler, J., Wilde, M., Zhao, Y., Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation, SSDBM, 2002. • Iamnitchi, A., Ripeanu, M., Foster, I., Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations. 1st Intl. Workshop on Peer-to-Peer Systems, 2002. • Raman, P., George, A., Radlinski, M., Subramaniyan, R., GEMS: Gossip-Enabled Monitoring Service for Heterogeneous Distributed Systems, Technical Report, UF, 2002. • Ranganathan, K. and Foster, I., Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications, HPDC’11, 2002. • Ripeanu, M., Foster, I., Iamnitchi, A. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. Internet Computing, 6 (1). 50-57. 2002. Ian Foster, U.Chicago foster@mcs.anl.gov

  20. Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov

  21. The Nature of Future Challenges • GriPhyN R&D is proving very successful • In terms of “new ideas” • In terms of interest & adoption • Our major challenges as we move forward are to scale and sustain the effort • Research scope: virtual data => KR; planning, execution => x1000 larger; …; … • Software support: we need NMIx10! • Infrastructure & application support • See Atkins cyberinfrastructure report! Ian Foster, U.Chicago foster@mcs.anl.gov

  22. Summary • CS has made significant contributions both to experiments and to knowledge, e.g. • Virtual data concepts and technologies • Scheduling in large-scale distributed systems • DAGman workflow management & execution • Scalable replica location services • VDT (& underlying Globus Toolkit & Condor systems) a good technology transfer vehicle • Adoption by major science projects • Adoption of Grid concepts within industry • Major challenge: exploiting opportunities Ian Foster, U.Chicago foster@mcs.anl.gov

More Related