1 / 31

Mark Silberstein, CS, Technion Dan Geiger, Computational Biology Lab

Computational Biology Laboratory. Distributed Systems Laboratory. Superlink-Online: Harnessing the world’s computers to hunt for disease-provoking genes. Mark Silberstein, CS, Technion Dan Geiger, Computational Biology Lab Assaf Schuster, Distributed Systems Lab

Télécharger la présentation

Mark Silberstein, CS, Technion Dan Geiger, Computational Biology Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Biology Laboratory Distributed Systems Laboratory Superlink-Online: Harnessing the world’s computers to hunt for disease-provoking genes Mark Silberstein, CS, Technion Dan Geiger, Computational Biology Lab Assaf Schuster, Distributed Systems Lab Genetics Research Institutes in Israel, EU, US MS eScience Workshop 2008

  2. Familial Onychodysplasia and dysplasia of distal phalanges (ODP) III-15 IV-10 IV-7

  3. Family Pedigree MS eScience Workshop 2008

  4. . M1 M2 Chromosome pair: Marker Information Added MS eScience Workshop 2008

  5. M1 M2 D1 D2 M3 M4 θ III-15 151,159 III-16 151,155 a h 202,209 202,202 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation The computational problem: find a value of θ maximizing Pr(data|θ) LOD score (to quantify how confident we are): Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)]. MS eScience Workshop 2008

  6. Results of Multipoint Analysis MS eScience Workshop 2008

  7. The Bayesian network model Si3f Li2f y2 Xi2 Li2m Li3f Xi3 Li3m Y3 Li1f Xi1 Y1 Li1m Si3m Locus 2 (Disease) Locus 3 Locus 4 Locus 1 This model depicts the qualitative relations between the variables. We need also to specify the joint distribution over these variables. MS eScience Workshop 2008

  8. Finding the best order is equivalent to finding the best order for sum-product operations for high dimensional matrices: The Computational Task • Computing Pr(data|θ) for a specific value of θ: • Exponential time and space in: • #variables • five per person • #markers • #gene loci • #values per variable • #alleles • non-typed persons • table dimensionality • cycles in pedigree MS eScience Workshop 2008

  9. Divisible Tasks through Variable Conditioning non trivial parallelization overhead MS eScience Workshop 2008

  10. Basic unit of execution – batch job Non-interactive mode: “enqueue – wait – execute – return” Self-contained execution sandbox A linkage analysis request - a task A bag (of millions) of jobs Turnaround time is important Terminology MS eScience Workshop 2008

  11. The system must be geneticists-friendly Interactive experience Low response time for short tasks Prompt user feedback Simple, secure, reliable, stable, overload-resistant, concurrent tasks, multiple users... Fast computation of previously infeasible long tasks via parallel execution Harness all available resources: grids, clouds, clusters Use them efficiently! Requirements MS eScience Workshop 2008

  12. Grids or Clouds? Small tasks are severely slow on grids Takes 5 minutes on 10-nodes dedicated cluster May take several hours on a grid Preempted jobs, UW Madison Grid (k CPUs) Error rate, UW Madison Queuing time in EGEE Cloud (k CPUs) Queue Waiting Time Remaining Jobs in Queue Long tail due to failures Time Should we move scientific loads on the cloud? YES! MS eScience Workshop 2008

  13. Consider 3.2x106 jobs, ~40 min each It took 21 days on ~6000-8000 CPUs It would cost about $10Kon Amazon’s EC2 Grids or Clouds? ? Should we move scientific loads on the cloud? NO! MS eScience Workshop 2008

  14. Clouds or Grids? Clouds and Grids! Reliability Low High Performance predictibility Low High Potential amount of available resources High Low Reuse of existing infrastructure High Low Opportunistic Dedicated “Burst” computing Throughput computing MS eScience Workshop 2008

  15. Cheap and Expensive Resources Task sensitivity to QoSdiffer in different stages High performance High throughput Remaining jobs in queue • Use cheap unreliable resources • Grids • Community grids • Non-dedicated clusters • Use expensive reliable resources • Dedicated clusters • Clouds • Dynamically determine entering tail mode • Switch to expensive resources (gracefully) MS eScience Workshop 2008

  16. Glue pools together via overlay Submitter to Grid 1 Submitter to Cloud 1 Scheduling Server Job queue Virtual cluster maintainer Submitter to Cloud 2 Scheduler Submitter to Grid 2 Issues: granularity, load balancing, firewalls, failed resources, scheduler scalability…

  17. Practical considerations Overlay scalability and firewall penetration Server may not initiate connect to the agent Compatibility with community grids The server is based on BOINC Agents are upgraded BOINC clients Elimination of failed resources from scheduling Performance statistics is analyzed Resource allocation depending on the task state Dynamic policy update via Condor classadmechanism MS eScience Workshop 2008

  18. SUPERLINK@TECHNION BOINC clients submitter for Madison pool BOINC clients submitter for EGEE Upgraded BOINC Server Virtual cluster maintainer HTTP frontend Scheduler Submitter To EC2 Cloud Submitter to OSG Submitter to Technion Database jobs, monitoring, system statistics Submitter to any grid/cluster/cloud Web Portal Task execution and monitoring workflow Task state Dedicated cluster fallback

  19. Superlink-online 1.0: http://bioinfo.cs.technion.ac.il

  20. Task Submission

  21. Superlink-online statistics ~1720CPUyearsfor ~18,000 tasks during 2006-2008 (counting) ~37 citations (several mutations found) Examples: Ichthyosis,"uncomplicated" hereditary spastic paraplegia (1-9 people per 100,000) Over 250 (counting) users: Israeli and international Soroka H., Be'erSheva, GalilMa'aravi H., Nahariya, Rabin H., Petah Tikva, Rambam H., Haifa, BeneyTzion H., Haifa, Sha'areyTzedek H., Jerusalem, Hadassa H., Jerusalem, Afula H. NIH, Universities and research centers in US, France, Germany, UK, Italy, Austria, Spain, Taiwan, Australia, and others... Task example 250 days on single computer - 7 hours on 300-700 computers Short tasks: few seconds even during severe overload MS eScience Workshop 2008

  22. Using our system in Israeli Hospitals • Rabin Hospital, by Motti Shochat’s group • New locus for mental retardation • Infantile bilateral striatal necrosis • Soroka Hospital, by Ohad Birk’s group • Lethal congenital contractural syndrome • Congenital cataract • Rambam Hospital, by Eli Shprecher’s group • Congenital recessive ichthyosis • CEDNIK syndrome • Galil Ma’aravi Hospital, by Tzipi Falik’s group • Familial Onychodysplasia and dysplasia • Familial juvenile hypertrophy MS eScience Workshop 2008

  23. Utilizing Community Computing ~3.4 TFLOPs, ~3000 users, from 75 countries

  24. Superlink-online V2(beta) deployment Technion Condor pools Submission server EGEE-II BIOMED VO Dedicated cluster UW in Madison Condor pool ~12,000 hosts operational during the last month Superlink@Campus Superlink@Technion OSG GLOW VO MS eScience Workshop 2008

  25. 3.1 million jobs in 21 days 60 dedicated CPUs only MS eScience Workshop 2008

  26. Conclusions Our system integrates clusters, grids, clouds, community grids, etc. Geneticist friendly Minimizes use of expensive resources while providing QoS for tasks Generic mechanism for scheduling policy Can dynamically reroute jobs from one pool to another according to a given optimization function (budget, energy, etc.) MS eScience Workshop 2008

  27. NVIDIA Compute Unified Device Architecture (CUDA) 16MPX8SPX4 MP MP Shared memory (16KB) Shared memory (16KB) ... SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Register file Register file SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Cached Read-Only memory Cached Read-Only memory GPU ~1 cycle ~TB/s Global Memory MS eScience Workshop 2008

  28. Key ideas (Joint work with John Owens -UC Davis) Software-managed cache We implement the cache replacement policy in software Maximization of data reuse Better compute/memory access ratio A simple model for performance bounds Yes, we are (optimal) Use special function units for hardware-assisted execution MS eScience Workshop 2008

  29. Results summary Experiment setup CPU: single core Intel Core 2 2.4GHz, 4MB L2 GPU: NVIDIA G80 (GTX8800), 750MB GDDR4, 128 SP, 16K mem / 512 threads Only kernel runtime included (no memory transfers, no CPU setup time) 2500~ 2 x 25 x 25 x 2 Use of SFU: expf is about 6x slower than “+” on GPU, but ~200x slower on CPU Hardware Software managed Caching

  30. Acknowledgments Superlink-online team: Alumni: Anna Tzemach, Julia Stolin, NikolayDovgolevsky, MaayanFishelson, HadarGrubman, OphirEtzion Current: ArtyomSharov, Oren Shtark Prof. MironLivny (Condor pool UW Madison, OSG) EGEE BIOMED VO and OSG GLOW VO Microsoft TCI program, NIH grant, SciDAC Institute for ultrascale visualization If your grid is underutilized – let us know! Visit us at: http://bioinfo.cs.technion.ac.il/superlink-online Superlink@TECHNION project home page: http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion MS eScience Workshop 2008

  31. QUESTIONS??? Visit us at: http://bioinfo.cs.technion.ac.il/superlink-online MS eScience Workshop 2008

More Related