1 / 52

CLADE Review 2003-2008

CLADE Review 2003-2008. Nancy Wilkins-Diehr wilkinsn@sdsc.edu. The Origin of CLADE. “The CLADE workshop began with a discussion at HPDC-11, July 24-26, 2002, at Edinburgh International Conference Center in Scotland.

sanjiv
Télécharger la présentation

CLADE Review 2003-2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLADE Review2003-2008 Nancy Wilkins-Diehr wilkinsn@sdsc.edu CLADE 2008, June 23, 2008

  2. The Origin of CLADE • “The CLADE workshop began with a discussion at HPDC-11, July 24-26, 2002, at Edinburgh International Conference Center in Scotland. • Salim Hariri, C.S. Raghavendra, and I and likely a couple of others got to talking about the state of Grid applications. • At that time quite a lot of progress had been made with tools and technologies for distributed applications, but we were not seeing many applications papers at HPDC, or in other forums either. • So Salim suggested that we put together a workshop to focus attention on applications, and he asked me to help organize it.” Ray Bair CLADE 2008, June 23, 2008

  3. Keys to the Success of CLADE • Compliments the HPDC program • Focus on real applications that demonstrate the use of Grid approaches on a significant scale. • CLADE's association with HPDC still distinguishes it from other conferences • Bringing together cutting edge computer science and applications • Support of the HPDC Steering Committee • Strong Program Committee chairs • Good advice from CLADE's Steering Committee • Engaged Program Committee members • Peer-review system has been important in selecting good papers that are timely and interesting • Distribution of the CLADE proceedings at the workshop increases the value and usefulness of the papers to the participants CLADE 2008, June 23, 2008

  4. 2008 CLADE Organization • STEERING COMMITTEE • Raymond Bair, ANL • Ioana Banicescu, Mississippi State Univ. • Francine Berman, Univ. of Calif., San Diego • Jack Dongarra, Univ. of Tenn., Knoxville • Salim Hariri, University of Arizona • Manish Parashar, Rutgers University • Viktor Prasanna, Univ. of Southern Calif. • Joel Saltz, Ohio State University • Edward Seidel, Louisiana State University • Alan Sussman, University of Maryland • PROGRAM COMMITTEE Henrique Andrade, IBM ResearchDavid Bernholdt, ORNLJiannong Cao, HK PolyUUmit Catalyurek, Ohio State U.Kenneth Chiu, U. BinghamtonJose Cunha, U. Nova de LisboaEwa Deelman, ISIFrederic Desprez, ENS LyonHai Jin, HUSTTevfik Kosar, Louisiana State U.Tahsin Kurc, Ohio State U.Jysoo Lee, Calit2 Sang Boem Lim, KonKuk U.David Lowenthal, U. GeorgiaMalika Mahoui, IUPUIJames Myers, NCSA Gregory Newby, Arctic Region Supercomputing CenterJun Ni, U. IowaYoonho Park, IBM ResearchMarlon Pierce, Indiana U. Ilkyun Ra, U. Colorado DenverThomas Rauber, U. BayreuthGudula Rünger, TU ChemnitzEdward Walker, TACC Shaowen Wang, UIUC CLADE 2008, June 23, 2008

  5. Today’s Talk • Overview CLADE keynotes 2003-2007 • 2003 “Dynamic Data Driven Application Systems”, Frederica Darema • 2004 “A Grid based Diagnostics and Prognosis System for Rolls Royce Aero Engines: The DAME Project”, Jim Austin • 2005 “Enabling Science and Engineering Applications on the Grid”, Ed Seidel • 2006 “Gridcast - a Next Generation Broadcasting Infrastructure?”, Terry Harmer • 2007 “The Cancer Biomedical Informatics Grid: Connecting the Cancer Research Community”, Scott Oster • TeraGrid Science Gateways CLADE 2008, June 23, 2008

  6. CLADE 2003, Seattle • Keynote Presentation • Frederica Darema, Senior Science and Technology Advisor and Director of the Next Generation Software Program, National Science Foundation • Dynamic Data Driven Application Systems • Highlighted the relationship between theory, simulation and experiment or field data • Dynamic feedback and control loop between simulation and experimental data • “DDDAS has potential for significant impact to science, engineering, and commercial world, akin to the transformation effected since the ‘50s by the advent of computers” CLADE 2008, June 23, 2008

  7. Example DDDAS Applications • Generalized methodology for state estimation and prediction • Predictor-Corrector methods • Advanced Driving Assistance Systems for automobiles • Tracking algorithms for Air Traffic Control • Enhancing oil exploration methods and capabilities • Enhanced manufacturing supply chains through sensor information CLADE 2008, June 23, 2008 Source Frederica Darema

  8. Virtual operations re-planning and control • Event-driven simulations for systems subject to unplanned outages • Earthquake tolerant buildings and bridges • Fire propagation prediction and management CLADE 2008, June 23, 2008 Source Frederica Darema

  9. Integrated Image-Guided Interventions • Real-time, three-dimensional (3D) imaging needs of surgeons. • Biodiversity and bio-complexity • Dramatic changes due to habitat transformation, invasions of exotic species, chemical contamination, diseases and epidemics, climate change, and floods and drought CLADE 2008, June 23, 2008 Source Frederica Darema

  10. Hydro-complexity – Weather, Water and Pollution • Design and configuration methodologies for sensor networks • The oceanographic community at large has interests in DDDAS in order to help optimize observing systems for important scientific studies. CLADE 2008, June 23, 2008 Source Frederica Darema

  11. CLADE 2004, Honolulu • Keynote Presentation • Jim Austin, University of York • A Grid based Diagnostics and Prognosis System for Rolls Royce Aero Engines: The DAME Project • Very practical engineering application • Using distributed data intensive Grid application to diagnosis and prognosis of Rolls-Royce Aero Engines CLADE 2008, June 23, 2008

  12. Distributed Aircraft Maintenance Environment (DAME) • UK e-Science pilot project • Quote • Neural network–based techniques for real-time monitoring • Compare stored vibration data with instantaneous snapshots • Each flight produces 1GB of data, TBs per year of distributed data for a fleet. • AURA • Advanced Uncertain Reasoning Architecture for Pattern Matching • Pattern matching among terascale datasets, distribute for speed • CBR • Case Based Reasoning systems for intelligent decision support • Correlates engine anomalies with root cause • Combine into scalable system using grid middleware • Utilising large amounts of vibration and performance data available from modern aero-engines for fleet based diagnostics CLADE 2008, June 23, 2008 Source: Jim Austin

  13. Fault diagnosis and prognosis integrated with predictive maintenance • Detect that engine has deviated from normal (QUOTE) • Diagnose why (AURA) • Form a prognosis (CBR) • Plan remedial actions • Common components of all fault diagnosis and prognosis systems CLADE 2008, June 23, 2008 Source: Jim Austin

  14. Quality of Service and Security are two most important project concerns • QoS critical for commercial deployment, SLAs will likely be a necessity • Workgroup formed to focus on security • Future directions • Base services can be used with many other apps • Put core services into a portal • More flexible workflow configurations • Current project considered a demonstration project • Commercial implementation will need high availability, reliability, data integrity, confidentiality CLADE 2008, June 23, 2008 Source: Jim Austin

  15. CLADE 2005, Research Triangle Park, NC • Keynote Presentation • Ed Seidel, Louisiana State University • Enabling Science and Engineering Applications on the Grid • Ed Seidel, recently named Office of Cyberinfrastructure director at NSF reporting to Dr. Bement • Many years experience with distributed applications and high performance computing CLADE 2008, June 23, 2008

  16. Optical Networks 1000x faster than regionalWhat are people doing with this? • Collaboration • Distributed communities (NEES, GEON), shared CI – data, code, tools, resources, simulations • Standard things • Task farming, resource brokering, remote steering • New scenarios • Apps abstracted, dynamic apps find their own services, resources, people; distributed apps – spawned, monitored • Grids bring it all together, but worries in the US about DOE, NSF CI funding CLADE 2008, June 23, 2008 Source: Ed Seidel

  17. Distributed computation – the old way • Why? • Capacity: computers can’t keep up with needs • Throughput • Issues • Bandwidth (increasing faster than computation) • Latency • Communication needs, Topology • Communication/computation • Techniques to be developed • Overlapping communication/computation • Extra ghost zones to reduce latency • Compression • Algorithms to do this for scientist • Gridlab.org, cactuscode.org CLADE 2008, June 23, 2008 Source: Ed Seidel

  18. Distributed computation – the new way • Intelligent parameter surveys, Monte Carlos • May control other simulations • Dynamic staging: move to faster/cheaper/bigger machine (“Grid Worm”) • Need more memory? Need less? • Multiple universe: clone to investigate steered parameter (“Gird Virus”) • Automatic component loading • Needs of process change, discover/load/execute new component somewhere • Automatic “look ahead”, convergence testing • spawn off and run coarser resolution to predict likely future, study convergence • Routine profiling • Best machine/queue, choose resolution parameters based on queue • Dynamic load balancing: inhomogeneous loads, multiple grids • DDDAS: injecting data into the above, feed back to experiment CLADE 2008, June 23, 2008 Source: Ed Seidel

  19. GridLab5M EU Project • Code/User/Infrastructure should be aware of environment • Discover resources available NOW, and their current state • What is my allocation? • What is the bandwidth/latency between sites? • Code/User/Infrastructure should be able to make decisions • A slow part of my simulation can run asynchronously…spawn it off! • New, more powerful resources just became available…migrate there! • Machine went down…reconfigure and recover! • Need more memory (or less!)…get it by adding (dropping) machines! • Code/User/Infrastructure should be able to publish to central server for tracking, monitoring, steering… • Unexpected event…notify users! • Collaborators from around the world all connect, examine simulation. • Rethink algorithms: Task farming, vectors, pipelines, etc all apply on Grids… The Grid IS your Computer! CLADE 2008, June 23, 2008 Source: Ed Seidel

  20. Ed’s Conclusions • Optical Networks, grids promise new ways of computing • Networks need application toolkits, reasonable cost model • Standards developing • 15 years ago: parallel computing drove interconnects, HPF, MPI • Now: 2 levels...OGSA grid services, SAGA for apps • GridLab: www.gridlab.org • Grid Application Toolkit: www.gridlab.org/GAT • Documentation, publications, software download • Cactus Computational Toolkit: www.cactuscode.org • GGF “Simple API for Grid Applications” (SAGA) • Today, SAGA continues as an active research group in the Open Grid Forum (OGF) • Paper presentation on GAT/SAGA at TeraGrid 08 last week CLADE 2008, June 23, 2008 Source: Ed Seidel

  21. CLADE 2006, Paris • Keynote Presentation • Terry Harmer, Technical Director of the Belfast e-Science Centre (BeSC) • Gridcast - a Next Generation Broadcasting Infrastructure? • Media broadcasting • BBC has offices in most world capitals • Large scale, distributed, dynamic, highly reactive management of broadcast content • Prototype broadcasting grid developed has been deployed since 2004 • UK e-Science project • 50% of funding for UK e-Science centers must come from industry CLADE 2008, June 23, 2008

  22. Broadcasting is distributedUndergoing rapid technical change • Grid can potentially address technical challenges • Secure, wide area distribution of high volume content • Secure remote access to high value technical resources • Advanced editing suites • Integration of devices, equipment, applications • Economic challenges to deliver cost-effective. Resilient, extensible infrastructure in rapidly changing environment • BBC wanted move to commodity infrastructure • 280 gig per hour in data movement • Grid as integration framework • Tie together various platforms • Deploy software • Not really for computing at this stage • 13 May, 2008 • BeSC awarded over £900,000 to continue its role in developing the successor to the world wide web • Use of grid via Gridcast provides greater programming autonomy among BBC sites CLADE 2008, June 23, 2008 Source: Terry Harmer

  23. CLADE 2007, Monterrey, CA • Keynote Presentation • Scott Oster, Ohio State University • The Cancer Biomedical Informatics Grid: Connecting the Cancer Research Community • Goal: Relieve suffering due to cancer by 2015 • 61 cancer labs supported by the National Cancer Institute (NCI) • More than 50 of these, 30 organizations, 800 people involved in caBIG • Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network CLADE 2008, June 23, 2008

  24. caBIG Motivation • This year there will be approximately 1,400,000 Americans diagnosed with cancer • More than 500,000 Americans are expected to die from cancer this year • In 2005, the NIH estimated costs for cancer at $209.9 billion, with direct medical costs of $74 billion CLADE 2008, June 23, 2008 Source Scott Oster

  25. What is caBIG? • Common, widely distributed infrastructure that permits the cancer research community to focus on innovation • Shared, harmonized set of terminology, data elements, and data models that facilitate information exchange • Collection of interoperable applications developed to common standards • Cancer research data available for mining and integration CLADE 2008, June 23, 2008 Source Scott Oster

  26. Driving Needs • A multitude of “legacy” information systems, most of which cannot be readily shared between institutions • Difficulty in identifying and accessing available resources • Approach: standards-based grid, WSRF web services, Introduce • But standards in Web/Grid service domain are turbulent at best • Competing interests of “big business” and multiple standards bodies • An absence of tools to connect different databases • An absence of common data formats • Approach: Adopt XML as data exchange format • Cancer Data Standards Repository (caDSR) captures logical model with annotations; facilitates reuse and formal definition • A huge and growing volume of data must be collected, analyzed, and made accessible • Gridftp, move services to data • Few common vocabularies, making it difficult, if not impossible, to interlink diverse research and clinical results CLADE 2008, June 23, 2008 Source Scott Oster

  27. An absence of information infrastructure to share data within an institution, or among different institutions • If cancer is cured, and caBIG resources play a role, there will be much interest in knowing who contributed what (and who funded them) • Technical Approach • Single sign on, Grid Authentication and Authorization with Reliably Distributed Services (GAARDS) • Federate Identity Management (Dorian) • Authorization solutions • GridGrouper for group-based • CSM for local policy • Globus PDPs for complex rules • Institutional Review Boards (IRB) involved for any protected health information (PHI); even for de-identified data • Grid is multi-institutional which means IRBs must reach agreements (read: separately employed lawyers working together) • Socio-Cultural Approach • Whole workspace in caBIG dedicated to it (DSIC) • NCI in a good position to “encourage” it • Large percentage of institutions’ cancer research funding comes from NCI • Hope is motivation will be value-based once initially primed CLADE 2008, June 23, 2008 Source Scott Oster

  28. Scott’s Summary • The bad news: • Large-scale, distributed knowledge sharing is hard • The good news: • The potential rewards are large • The good news (for computer scientists): • There are lots of unsolved problems (and interest in getting them solved) • Disparate Systems • Lack of Common Data Formats • Data Interoperability • Finding Resources • Data Size • User Accounting • Data Privacy • Intellectual Capital • Complicated Trust Arrangements • Computationally Intensive • Evolving Infrastructure CLADE 2008, June 23, 2008 Source Scott Oster

  29. TeraGrid Science Gateways CLADE 2008, June 23, 2008

  30. Phenomenal Impact of the Internet on Worldwide Communication and Information Retrieval Only 16 years since the release of Mosaic! • Implications on the conduct of science are still evolving • 1980’s, Early gateways, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today • 1992 Mosaic web browser developed • 1995 “International Protein Data Bank Enhanced by Computer Browser” • 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program • Simultaneous explosion of digital information • Analysis needs in a variety of scientific areas • Sensors, telescopes, satellites, digital images and video • #1 machine on Top500 today is more powerful than all combined entries on the first list in 1993 CLADE 2008, June 23, 2008

  31. 1998 Workshop Highlights Early Impact of Internet on Science • Shared access to geographically disperse resources • Assembling the best minds to tackle the toughest problems regardless of location • Tackling the same problems differently, but also tackling different problems • Not only the scope, but the process of scientific investigation is changed • “As the chemical applications and capabilities provided by collaboratories become more familiar, researchers will move significantly beyond current practice to exciting new paradigms for scientific work” • Requirements for future success include: • - Development of interdisciplinary partnerships of chemists and computer scientists • - Flexible and extensible frameworks for collaboratories • - Means to deploy, support, and evaluate collaboratories in the field CLADE 2008, June 23, 2008

  32. Rapid Advances in Web Usability • First generation • Static Web pages • Second generation • Dynamic, database interfaces, cgi • Lacked the ease of use of desktop applications • Third generation • True networked and internetworked applications that enable dynamic two-way, even multi-way, communication and collaboration on the Web. • Remarkable new uses of the Web in the organizational workplace and on the Internet • Source: Screen Porch White Paper, The University of Western Ontario (1996) CLADE 2008, June 23, 2008

  33. The convenience of getting scientific material on the web opens doors to better attitudes and understanding of science. November 20, 2006 John B. Horrigan, Associate Director CLADE 2008, June 23, 2008 http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdf

  34. NSF (my sponsor) has long recognized the importance of science and technology interactions • Interdisciplinary programs did much to facilitate application-technology integration and develop standard tools • 1997 PACI Program • “Shotgun marriages” of technologists and application scientists • A few groups served as path finders and benefited tremendously • NPACI neuroscience thrust in 1997 leads to Telescience portal and BIRN in 2001 • Information Technology Research (ITR) • NSF Middleware Initiative (NMI) • Plug and play tools so more groups can benefit CLADE 2008, June 23, 2008

  35. NSF Continues Its Leadership TodayWhat Will Lead to Transformative Science? • “Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.” • “In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.” Gateways are a terrific example of interfaces that can support transformative science CLADE 2008, June 23, 2008

  36. Evolution of the Gateway Program • 2004 “TeraGrid Science Gateway” term originates • We will help them build gateway portals that leverage TeraGrid capabilities and provide web-based interfaces to community tools • 2005 Gateway requirements analysis team • Areas of identified commonality include: • Web services, auditing, community accounts, flexible allocations, scheduling, outreach • Needs of command-line supercomputing users fairly well defined • Ssh to tg-login • Data transfer to and from supercomputer • Software • MPI, math libraries, domain software • Compilers • Batch queue submission • Help desk • Need to address Gateway developer needs just as efficiently CLADE 2008, June 23, 2008

  37. Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? • Resource discovery • Accounting • Security • Proposal-based requests for resources (peer-reviewed access) • Code scaling and performance numbers • Justification of resources • Gateway citations • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge • Small number of developers can impact thousands of scientists • But need a way to train and fund those developers and provide them with appropriate tools CLADE 2008, June 23, 2008

  38. Ongoing Work to Meet Common Needs • Web Services • GT4 deployment, identification of remaining capabilities • Information services, MDS • Registry of Gateway services • TG-specific “where can I run soonest” with QBETS • Auditing • GRAM audit to retrieve usage information for individual compute jobs • GridShib • Counting gateway users, individualized accounting, increased security • Community Accounts • Policy finalized, security approaches being tested by RPs • GridShib development, testing with gateways • Resource requests • Collaboration with reviewers to develop guidelines for Gateway PIs • Adapt to usage uncertainties, ability to assess impact, Gateway management structure • Scheduling • Metascheduling • On-demand via SPRUCE framework • Outreach • Pathways project • Gateway use by educators • Training MSI students to build Gateways • Documentation • Extensive wiki information transformed into navigable documentation • Gateway Hosting • Available at IU through peer review • Staff Support • Targeted support, general capabilities, production coordinator CLADE 2008, June 23, 2008

  39. Variety of Gateways Available Today CLADE 2008, June 23, 2008

  40. TeraGrid selects all gateways (F) TeraGrid designs all gateways (F) TeraGrid limits the number of gateways (F) All gateways need TeraGrid funding to exist (F) Any PI can request an allocation and use it to develop a gateway (T) Gateway design is community-developed and that is the core strength of the program (T) TeraGrid staff are alerted to gateway work when a proposal is reviewed or when a community account is requested (T) Limited TeraGrid support can be provided for targeted assistance to integrate an existing gateway with TeraGrid (T) Easy Gateway True and False TestAnswers Provided CLADE 2008, June 23, 2008

  41. Gateway Idea Resonates with Scientists • Capabilities provided by the Web are easy to envision because we use them in every day life • Researchers can imagine scientific capabilities provided through a familiar interface • Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities • But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities • Scientists know they can undertake more complex analyses and that’s all they want to focus on • But this seamless access doesn’t come for free. It all hinges on very capable developers. CLADE 2008, June 23, 2008

  42. Gateways Greatly Expand Access • Almost anyone can investigate scientific questions using high end resources • Not just those in the research groups of those who request allocations • Fosters new ideas, cross-disciplinary approaches • Encourages students to experiment • But used in production too • Increasing number of papers resulting from the use of gateways • Scientists can focus on challenging science problems rather than challenging infrastructure problems CLADE 2008, June 23, 2008

  43. Highlights: NanoHub Explosive User Growth • In past 12 months • 68,975 users • 43% from U.S. • 25,187 course downloads • 8,287 podcast downloads • 371 online meetings • Full featured gateway • Simulation tools, curricula, multimedia, user contributions, collaborations CLADE 2008, June 23, 2008

  44. Highlights: LEAD Inspires StudentsAdvanced capabilities regardless of location • A student gets excited about what he was able to do with LEAD • “Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!” • Eric (email, March 2007) CLADE 2008, June 23, 2008

  45. Highlights: GridChem Employs a Client-Server Approach… CLADE 2008, June 23, 2008

  46. …for Production Science • Chemical Reactivity of the Biradicaloid (HO...ONO) Singlet States of Peroxynitrous Acid. The Oxidation of Hydrocarbons, Sulfides, and Selenides. Bach, R. D et al. J. Am. Chem. Soc. 2005, 127, 3140-3155. • The "Somersault" Mechanism for the P-450 Hydroxylation of Hydrocarbons. The Intervention of Transient Inverted Metastable Hydroperoxides. Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006, 128(5), 1474-1488. • The Effect of Carbonyl Substitution on the Strain Energy of Small Ring Compounds and their Six-member Ring Reference Compounds Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006,128(14), 4598. • Azide Reactions for Controlling Clean Silicon Surface Chemistry: Benzylazide on Si(100)-2 x 1Semyon Bocharov et al..J. Am. Chem. Soc.,128 (29), 9300 -9301, 2006 • Chemistry of Diffusion Barrier Film Formation: Adsorption and Dissociation of Tetrakis(dimethylamino)titanium on Si(100)-2 × 1Rodriguez-Reyes, J. C. F.; Teplyakov, A. V.J. Phys. Chem. C.; 2007; 111(12); 4800-4808. • Computational Studies of [2+2] and [4+2] Pericyclic Reactions between Phosphinoboranes and Alkenes. Steric and Electronic Effects in Identifying a Reactive Phosphinoborane that Should Avoid Dimerization Thomas M. Gilbert* and Steven M. Bachrach Organometallics, 26 (10), 2672 -2678, 2007. CLADE 2008, June 23, 2008

  47. cancer Bioinformatics GridAddressing today’s challenges in cancer research and treatment • The mission of caBIG™ is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes. • The goals of caBIG™ are to: • Connect scientists and practitioners through a shareable and interoperable infrastructure • Develop standard rules and a common language to more easily share information • Build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care. CLADE 2008, June 23, 2008 Source: cabig.cancer.gov

  48. caBIG and TeraGrid • caBIG conducted study of all Gateways • Pleased to discover that community accounts and web services will exactly meet their requirements • TeraGrid resources incorporated into geWorkbench • an open source platform for integrated genomics used to • Load data from local or remote data sources. • Visualize gene expression and sequence data in a variety of ways. • Provide access to client- and server-side computational analysis tools such as t-test analysis, hierarchical clustering, self organizing maps, regulatory networks reconstruction, BLAST searches, pattern/motif discovery, etc. • Clustering is used to build groups of genes with related expression patterns which may contain functionally related proteins, such as enzymes for a specific pathway • Validate computational hypothesis through the integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis. CLADE 2008, June 23, 2008

  49. geWorkbench Integrages TeraGrid Resources “Although the new service is TeraGrid-aware, the perspective from geWorkbench does not change. As far as geWorkbench is concerned, it is still connecting to a Hierarchical Clustering caGrid service. The difference is now the caGrid service is a gateway service that submits a TeraGrid job on behalf of geWorkbench. geWorkbench, however, does not notice this difference.” Source: http://wiki.c2b2.columbia.edu/informatics/index.php/GeWorkbench_Example CLADE 2008, June 23, 2008

  50. Hide the “C” in CLADE with a GatewayWhen is a gateway appropriate? • Researchers using defined sets of tools in different ways • Same executables, different input • GridChem, CHARMM • Creating multi-scale or complex workflows • Datasets • Common data formats • National Virtual Observatory • Earth System Grid • Some groups have invested significant efforts here • caBIG, extensive discussions to develop common terminology and formats • BIRN, extensive data sharing agreements • Difficult to access data/advanced workflows • Sensor/radar input • LEAD, GEON CLADE 2008, June 23, 2008

More Related