Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Gabor Marth, Goncalo Abecasis, PIs PowerPoint Presentation
Download Presentation
Gabor Marth, Goncalo Abecasis, PIs

Gabor Marth, Goncalo Abecasis, PIs

155 Vues Download Presentation
Télécharger la présentation

Gabor Marth, Goncalo Abecasis, PIs

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Robust Software Tools for Variant Identification and Functional Assessment(Boston College & University of Michigan) Gabor Marth, Goncalo Abecasis, PIs

  2. Informatics challenges for genomic analysis • Tool building • Widening accessibility • Facilitating analysis

  3. Intentions of the RFA

  4. Our approach • Complete toolbox including variant interpretation • Full pipelines for start-to-finish analysis • Easily accessible and well documented methods • Cloud deployment (in addition to single machine/local compute cluster) • Open development model

  5. Progress in first 6 months • Starting with two sets of tools and pipelines, geared toward high quality local analysis, battle-tested in the 1000GP data and medical sequencing projects • The two groups follow a “divide and conquer” strategy to put critical pieces in place for making our algorithms available for the wider genomics community • Boston College • A universal tool/pipeline launcher application • Infrastructure for dissemination • Cloud access via Galaxy • University of Michigan • Integration of variant annotation/impact assessment • Pipeline/workflow control infrastructure • Adaptation for Amazon Cloud Services

  6. Functionality & Tools

  7. Scope

  8. Tools constantly evolving (as they must to remain relevant) Our community toolbox to be updated with new tools as they become available Include latest versions ref: TATAGAGAGAGAGAGAGAGCGAGAGAGAGAGAGAGAGGGAGAGACGGAGTT alt: TATAGAGAGAGAGAGAGCGAGAGAGAGAGAGAGAGAGGGAGAGACGGAGTT New algorithms for complex variant detection (FreeBayes) ref: TATAGAGAGAGAGAGAGAGC--GAGAGAGAGAGAGAGAGGGAGAGACGGAGTT alt: TATAGAGAGAGAGAGAG--CGAGAGAGAGAGAGAGAGAGGGAGAGACGGAGTT

  9. Include tools when ready for prime time The BC mobile element insertion caller performs best in its class

  10. EPACTS variant interpretation tools (Efficient and Parallelizable Association Container Toolbox) • Genetic analysis tool based on VCF • Fast and parallelizable access to large VCF files • Built-in widely used single variant and burden tests • R/C++ interface for extending to newer tests • Binary & quantitative phenotypes with covariates • Useful visualization tools of association results • Automated visualization

  11. Pipelines & workflow

  12. The UM pipeline Genotype Likelihood samtools glfMultiples BAM Unfiltered VCF Genotype Likelihood BAM Genotype Likelihood BAM vcfCooker Hard-filtered VCF SVM Beagle/Thunder Filtered/Phased VCF Filtered VCF Optional LD-aware step EPACTS Filtered/Phased VCF

  13. UMAKE workflow system • Makefile based approach • The Make utility is very good for representing dependencies • Pick up where left off on Failure • Flexible deployment • Local Machine • Local Cluster (Mosix) • Amazon Web Services Elastic Compute Cloud (EC2) • Default options • User configurable

  14. Application of UMAKE to large-scale projects Computational cost is ~1 week / 1000 samples in a 5 node mini-cluster

  15. Accessibility

  16. The Boston College tool hub http://gkno.me (genome)

  17. Simplified installation & use • Unified launcher application (gkno) • single tools (e.g. Mosaik) • tool “macros” (e.g. map) • pipelines (e.g. exome variant calling) • Download and installation • All tools pulled in a single step from github • All tools installed • All tools tested

  18. Easily configurable pipeline system • Part of our new unified launcher system (gkno) • Pipeline types (e.g. mapping, variant calling) and instances (exome, whole-genome) • User-configurable: tools can be swapped in and out, parameters configured via config files

  19. Support • Documentation • Tutorials / Blog • User forum • Bug reports

  20. Deployment / Cloud

  21. Software deployment • All software is ready for running locally on a single machine • UMAKE adds cluster support • Cloud deployment • Simple Michigan pipelines ported to Amazon • Portation of all project software on the way

  22. Cloud-based analysis – Galaxy

  23. Open & Collaborative development MODEL

  24. Integration • Our workflows leverage 3rd party tools for specific functionality • All our tools are open-source, available on github (many clones, community contributed code) • Ensemble approach (multiple tools for critical tasks)

  25. Ensemble approach • Multiple tools usually benefit analysis

  26. Ensemble approach • Our pipelines will use multiple aligners (BWA, Mosaik) and variant callers (Freebayes, glfMultiples), developed by BC/UM

  27. In progress • Expanding pipelines to integrate all tools • Michigan tools -> gkno • BC tools -> Michigan cloud ready pipelines • Large data set analysis on the cloud • Integrate variant interpretation tools • Integrate SV tools as they become more robust • Integrate consensus analysis (SVM and MLP approaches to callset aggregation) • Minimal, functional pipeline -> Galaxy

  28. Team Boston College University of Michigan Mary-Kate Trost Tom Blackwell Hyun-Min Kang Youna Hu Adrian Tan XiaoweiZhan Dajiang Liu Goncalo Abecasis • Alistair Ward • Derek Barnett • Chase Miller • Wan-Ping Lee • Erik Garrison • Gabor Marth