1 / 17

BioScience on the TeraGrid

BioScience on the TeraGrid. Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University of Chicago & Argonne National Laboratory Affiliate Faculty, Center for Computation & Technology, LSU

ziva
Télécharger la présentation

BioScience on the TeraGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University of Chicago & Argonne National Laboratory Affiliate Faculty, Center for Computation & Technology, LSU Adjunct Associate Professor, Electrical and Computer Engineering, LSU

  2. What is the TeraGrid • World’s largest distributed cyberinfrastructure for open scientific research, supported by US NSF • Integrated high performance computers (>2 PF HPC & >27000 HTC CPUs), data resources (>2 PB disk, >60 PB tape, data collections), visualization, experimental facilities (VMs, GPUs, FPGAs), network at 11 Resource Provider sites • Allocated to US researchers and their collaborators through national peer-review process • DEEP: provide powerful computational resources to enable research that can’t otherwise be accomplished • WIDE: grow the community of computational science and make the resources easily accessible • OPEN: connect with new resources and institutions • Integration: Single {portal, sign-on, help desk, allocations process, advanced user support, EOT, campus champions} http://www.teragrid.org/

  3. Governance • 11 Resource Providers (RPs) funded under separate agreements with NSF • Different start and end dates • Different goals • Different agreements • Different funding models • 1 Coordinating Body – Grid Infrastructure Group (GIG) • University of Chicago/Argonne National Laboratory • Subcontracts to all RPs and six other universities • 7-8 Area Directors • Working groups with members from many RPs • TeraGrid Forum with Chair

  4. Who Uses TeraGrid (2009) (2008)

  5. How TeraGrid Is Used 2006 data

  6. How One Uses TeraGrid POPS (for now) User Portal Science Gateways Command Line Viz Service Data Service RP 1 RP 2 TeraGrid Infrastructure (Accounting, Network, Authorization,…) Network, Accounting, … RP 3 Compute Service

  7. User Portal: portal.teragrid.org http://portal.teragrid.org/

  8. Science Gateways • A natural extension of Internet & Web 2.0 • Idea resonates with Scientists • Researchers can imagine scientific capabilities provided through familiar interface • Mostly web portal or web or client-server program • Designed by communities; provide interfaces understood by those communities • Also provide access to greater capabilities (back end) • Without user understand details of capabilities • Scientists know they can undertake more complex analyses and that’s all they want to focus on • TeraGrid provides tools to help developer • Seamless access doesn’t come for free • Hinges on very capable developer Nancy Wilkins-Diehr

  9. TeraGrid -> XD Future • Current RP agreements end in March 2011 • Except track 2 centers (current and future) • TeraGrid XD (eXtreme Digital) starts in April 2011 • Era of potential interoperation with OSG and others • New types of science applications? • Current TG GIG continues through July 2011 • Allows four months of overlap in coordination • Probable overlap between GIG and XD members • Blue Waters (track 1) production in 2011

  10. Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS) • Model large-scale patient-specific cerebral blood flow in clinically-relevant time scale • Provide simulation support within the operating theatre for neuroradiologists • Provide new information to surgeons for patient management and therapy: • Diagnosis and risk assessment • Predictive simulation in therapy • Provide patient-specific information to help plan embolisation of arterio-venous malformations, coiling of aneurysms, etc. Clinical workflow: • Book computing resources in advance or use preemption • Shift imaging data around quickly over high-bandwidth low-latency dedicated links • Interactive simulations and real-time visualization for immediate feedback Peter Coveney, University College London

  11. OLSGW Gadgets • OLSGW Integrates bio-informatics applications • BLAST, InterProScan, CLUSTALW , MUSCLE, PSIPRED, ACCPRO, VSL2 • 454 Pyrosequencing service under development • Four OLSGW gadgets have been published in the iGoogle gadget directory. Search for “TeraGrid Life Science”. Wenjun Wu, Thomas Uram, Michael Papka, ANL

  12. Multiscale Simulation of Arterial Tree Need to combine multi-scale models: 1D (arteries), 3D Navier Stokes (organs, arterial junctions, etc.), Dissipative Particle Dynamics (capillaries, venules, arterioles, blood cells, etc.), Molecular Dynamics (blood cells, platelets, molecular adhesion, etc.) Arterioles/venules 50 microns activated platelets Platelet diameter is 2-4 µm Normal platelet concentration in blood is 300,000/mm3 Functions: activation, adhesion to injured walls, and other platelets NIH/NSF-IMAG project: George EmKarnaidakis, Brown

  13. Expressed Sequence Tag (EST) Pipeline • ESTs are a collection of random cDNA sequences, sequenced from a cDNA library or sequencing devices • Typical inputs are O(Million) sequences • Newer 454 devices from higher volume, are relatively easy to obtain and operate • Stored using FASTA format • ESTs are clustered and assembled to form contigs • Contigs then used to identify potential unknown genes, by Blasting against known protein database • Goal: Use TeraGrid for backend computing, with existing software, and a gateway frontend Initial results – run that took 5 days on local cluster done in 2 days – more opt. underway A. Kulshrestha, S. L. Pallickara, K. N. Muthuram, C. Kong, Q. Dong, M. Pierce, H. Tang, IU

  14. Multiscale Computer Simulation of the Immature HIV-1 Virion Coarse-grained (CG) model development CG simulation Experimental structures Wright, Schooler, Ding, Kieffer, Fillmore, Sundquist, Jenson, EMBO, 26, 2218, 2007 CG model refinement Atomic-level simulation Key CG interactions New CG Interactions from MD An iterative modeling approach combining experimental imaging (cryo-electron tomography), coarse-grained (CG) simulation, and atomic-level molecular dynamics (MD) G. A. Voth, U. of Chicago

  15. CIPRES Portal: A New Science Gateway for Systematics • Systematics: study of diversification of life and relationships among living things through time • CIPRES: a flexible web application that can be sustained by the community at minimal cost even beyond the funding period of the project • Tools include parallel versions of MrBayes, RAxML, GARLI • User requirements include: • Access to most or all native command line options • Add new tools quickly • Provide personal user space for storing results • Use TeraGrid resources to quickly provide results • Cited in at least 35 publications, including Nature, PNAS, Cell • Examples: New Family Tree for Arthropoda, Genome Sequence of a Transitional Eukaryote, Co-evolution of Beetles and Flowering Plants • Used routinely in at least 5 undergraduate classes • Use 77% US (incl. 17 EPSCoR states), 23% 33 other countries Mark Miller, SDSC

  16. Patient-Specific HIV Drug Therapy • HIV-1 Protease is a common target for HIV drug therapy • Enzyme of HIV responsible for protein maturation • Target for anti-retroviral Inhibitors • Example of structure assisted drug design • 9 FDA inhibitors of HIV-1 protease • So what’s the problem? • Emergence of drug resistant mutations in protease • Render drug ineffective • Drug resistant mutants have emerged for all FDA inhibitors • Too many mutations to be interpreted by a clinician • Solution: build a Binding Affinity Calculator (BAC) • Provide tools that allow simulations to be used in clinical context, including lightweight client • User only needs specify enzyme, mutations relative to wildtype, drug • Others options can be specified but begin as default • Requires large number of simulations to be constructed and run automatically (across distributed HPC resources) • To investigate generalisation • Automation is critical for clinical use • Turn-around time scale of around a week is required • Trade off between accuracy and time-to-solution Initial results – ensemble MD calculations for lopinavirvswildtype & five mutants – appear promising; excellent relative ranking in binding free energies Peter Coveney, University College London

  17. Scripting Protein Structure Prediction … intnSim = 1000; intmaxRounds = 3; Protein pSet[ ] <ext; exec="Protein.map">; float startTemp[ ] = [ 100.0, 200.0 ]; float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ]; foreachp, pn in pSet { foreacht in startTemp { foreachd in delT { ItFix(p, nSim, maxRounds, t, d); } } } ItFix() { foreachsim in [1:nSim] { (structure[sim], log[sim]) = predict(p, t, d); } result = analyze(structure) } 1000 predict() calls Analyze() 10 proteins x 1000 simulations x3 rounds x 2 temps x 5 delta-T’s = 300K application runs T. Sosnick, K. Freed, G. Hocky, J. DeBartolo, A. Adhikari, J. Xu, W. Wilde, U. Chicago

More Related