Harnessing Grids and Clouds for Genetic Linkage Analysis

Computational Biology Laboratory Distributed Systems Laboratory Superlink-Online: Harnessing the world’s computers to hunt for disease-provoking genes Mark Silberstein, CS, Technion Dan Geiger, Computational Biology Lab Assaf Schuster, Distributed Systems Lab Genetics Research Institutes in Israel, EU, US MS eScience Workshop 2008

Familial Onychodysplasia and dysplasia of distal phalanges (ODP) III-15 IV-10 IV-7

Family Pedigree MS eScience Workshop 2008

. M1 M2 Chromosome pair: Marker Information Added MS eScience Workshop 2008

M1 M2 D1 D2 M3 M4 θ III-15 151,159 III-16 151,155 a h 202,209 202,202 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation The computational problem: find a value of θ maximizing Pr(data|θ) LOD score (to quantify how confident we are): Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)]. MS eScience Workshop 2008

Results of Multipoint Analysis MS eScience Workshop 2008

The Bayesian network model Si3f Li2f y2 Xi2 Li2m Li3f Xi3 Li3m Y3 Li1f Xi1 Y1 Li1m Si3m Locus 2 (Disease) Locus 3 Locus 4 Locus 1 This model depicts the qualitative relations between the variables. We need also to specify the joint distribution over these variables. MS eScience Workshop 2008

Finding the best order is equivalent to finding the best order for sum-product operations for high dimensional matrices: The Computational Task • Computing Pr(data|θ) for a specific value of θ: • Exponential time and space in: • #variables • five per person • #markers • #gene loci • #values per variable • #alleles • non-typed persons • table dimensionality • cycles in pedigree MS eScience Workshop 2008

Divisible Tasks through Variable Conditioning non trivial parallelization overhead MS eScience Workshop 2008

Basic unit of execution – batch job Non-interactive mode: “enqueue – wait – execute – return” Self-contained execution sandbox A linkage analysis request - a task A bag (of millions) of jobs Turnaround time is important Terminology MS eScience Workshop 2008

The system must be geneticists-friendly Interactive experience Low response time for short tasks Prompt user feedback Simple, secure, reliable, stable, overload-resistant, concurrent tasks, multiple users... Fast computation of previously infeasible long tasks via parallel execution Harness all available resources: grids, clouds, clusters Use them efficiently! Requirements MS eScience Workshop 2008

Grids or Clouds? Small tasks are severely slow on grids Takes 5 minutes on 10-nodes dedicated cluster May take several hours on a grid Preempted jobs, UW Madison Grid (k CPUs) Error rate, UW Madison Queuing time in EGEE Cloud (k CPUs) Queue Waiting Time Remaining Jobs in Queue Long tail due to failures Time Should we move scientific loads on the cloud? YES! MS eScience Workshop 2008

Consider 3.2x106 jobs, ~40 min each It took 21 days on ~6000-8000 CPUs It would cost about $10Kon Amazon’s EC2 Grids or Clouds? ? Should we move scientific loads on the cloud? NO! MS eScience Workshop 2008

Clouds or Grids? Clouds and Grids! Reliability Low High Performance predictibility Low High Potential amount of available resources High Low Reuse of existing infrastructure High Low Opportunistic Dedicated “Burst” computing Throughput computing MS eScience Workshop 2008

Cheap and Expensive Resources Task sensitivity to QoSdiffer in different stages High performance High throughput Remaining jobs in queue • Use cheap unreliable resources • Grids • Community grids • Non-dedicated clusters • Use expensive reliable resources • Dedicated clusters • Clouds • Dynamically determine entering tail mode • Switch to expensive resources (gracefully) MS eScience Workshop 2008

Glue pools together via overlay Submitter to Grid 1 Submitter to Cloud 1 Scheduling Server Job queue Virtual cluster maintainer Submitter to Cloud 2 Scheduler Submitter to Grid 2 Issues: granularity, load balancing, firewalls, failed resources, scheduler scalability…

Practical considerations Overlay scalability and firewall penetration Server may not initiate connect to the agent Compatibility with community grids The server is based on BOINC Agents are upgraded BOINC clients Elimination of failed resources from scheduling Performance statistics is analyzed Resource allocation depending on the task state Dynamic policy update via Condor classadmechanism MS eScience Workshop 2008

SUPERLINK@TECHNION BOINC clients submitter for Madison pool BOINC clients submitter for EGEE Upgraded BOINC Server Virtual cluster maintainer HTTP frontend Scheduler Submitter To EC2 Cloud Submitter to OSG Submitter to Technion Database jobs, monitoring, system statistics Submitter to any grid/cluster/cloud Web Portal Task execution and monitoring workflow Task state Dedicated cluster fallback

Superlink-online 1.0: http://bioinfo.cs.technion.ac.il

Task Submission

Superlink-online statistics ~1720CPUyearsfor ~18,000 tasks during 2006-2008 (counting) ~37 citations (several mutations found) Examples: Ichthyosis,"uncomplicated" hereditary spastic paraplegia (1-9 people per 100,000) Over 250 (counting) users: Israeli and international Soroka H., Be'erSheva, GalilMa'aravi H., Nahariya, Rabin H., Petah Tikva, Rambam H., Haifa, BeneyTzion H., Haifa, Sha'areyTzedek H., Jerusalem, Hadassa H., Jerusalem, Afula H. NIH, Universities and research centers in US, France, Germany, UK, Italy, Austria, Spain, Taiwan, Australia, and others... Task example 250 days on single computer - 7 hours on 300-700 computers Short tasks: few seconds even during severe overload MS eScience Workshop 2008

Using our system in Israeli Hospitals • Rabin Hospital, by Motti Shochat’s group • New locus for mental retardation • Infantile bilateral striatal necrosis • Soroka Hospital, by Ohad Birk’s group • Lethal congenital contractural syndrome • Congenital cataract • Rambam Hospital, by Eli Shprecher’s group • Congenital recessive ichthyosis • CEDNIK syndrome • Galil Ma’aravi Hospital, by Tzipi Falik’s group • Familial Onychodysplasia and dysplasia • Familial juvenile hypertrophy MS eScience Workshop 2008

Utilizing Community Computing ~3.4 TFLOPs, ~3000 users, from 75 countries

Superlink-online V2(beta) deployment Technion Condor pools Submission server EGEE-II BIOMED VO Dedicated cluster UW in Madison Condor pool ~12,000 hosts operational during the last month Superlink@Campus Superlink@Technion OSG GLOW VO MS eScience Workshop 2008

3.1 million jobs in 21 days 60 dedicated CPUs only MS eScience Workshop 2008

Conclusions Our system integrates clusters, grids, clouds, community grids, etc. Geneticist friendly Minimizes use of expensive resources while providing QoS for tasks Generic mechanism for scheduling policy Can dynamically reroute jobs from one pool to another according to a given optimization function (budget, energy, etc.) MS eScience Workshop 2008

NVIDIA Compute Unified Device Architecture (CUDA) 16MPX8SPX4 MP MP Shared memory (16KB) Shared memory (16KB) ... SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Register file Register file SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Cached Read-Only memory Cached Read-Only memory GPU ~1 cycle ~TB/s Global Memory MS eScience Workshop 2008

Key ideas (Joint work with John Owens -UC Davis) Software-managed cache We implement the cache replacement policy in software Maximization of data reuse Better compute/memory access ratio A simple model for performance bounds Yes, we are (optimal) Use special function units for hardware-assisted execution MS eScience Workshop 2008

Results summary Experiment setup CPU: single core Intel Core 2 2.4GHz, 4MB L2 GPU: NVIDIA G80 (GTX8800), 750MB GDDR4, 128 SP, 16K mem / 512 threads Only kernel runtime included (no memory transfers, no CPU setup time) 2500~ 2 x 25 x 25 x 2 Use of SFU: expf is about 6x slower than “+” on GPU, but ~200x slower on CPU Hardware Software managed Caching

Acknowledgments Superlink-online team: Alumni: Anna Tzemach, Julia Stolin, NikolayDovgolevsky, MaayanFishelson, HadarGrubman, OphirEtzion Current: ArtyomSharov, Oren Shtark Prof. MironLivny (Condor pool UW Madison, OSG) EGEE BIOMED VO and OSG GLOW VO Microsoft TCI program, NIH grant, SciDAC Institute for ultrascale visualization If your grid is underutilized – let us know! Visit us at: http://bioinfo.cs.technion.ac.il/superlink-online Superlink@TECHNION project home page: http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion MS eScience Workshop 2008

QUESTIONS??? Visit us at: http://bioinfo.cs.technion.ac.il/superlink-online MS eScience Workshop 2008

Harnessing Grids and Clouds for Genetic Linkage Analysis

Harnessing Grids and Clouds for Genetic Linkage Analysis

Presentation Transcript

Computational Biology

CS 618 Computational Biology

Computational Structural Biology

Computational Biology

CS 502: An Introduction to Computational Molecular Biology

Computational Biology

Computational Biology

CS 394C Algorithms for Computational Biology

Computational Biology

Computational Biology

Computational Biology

Computational Problems in Genetic Linkage Analysis Dan Geiger CS, Technion

Computational Biology

Computational Systems Biology

CS 394C: Computational Biology Algorithms

Computational Biology

Computational Molecular Biology

Alina Shaikhet (CS, Technion)

CS 394C Algorithms for Computational Biology

Computational Biology

Prof Dan Geiger, CBL,Technion Prof Assaf Schuster, DSL,Technion

Computational Biology