1 / 13

The Center for Computational Genomics and Bioinformatics

The Center for Computational Genomics and Bioinformatics. Christopher Dwan Mike Karo Tim Kunau. Outline. Perspective Processing tasks & requirements Computational solutions Interesting issues. Funding chart. The “Bioinformatics” component. “Pipeline” data processing and storage

faith
Télécharger la présentation

The Center for Computational Genomics and Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau

  2. Outline • Perspective • Processing tasks & requirements • Computational solutions • Interesting issues

  3. Funding chart

  4. The “Bioinformatics” component • “Pipeline” data processing and storage • 100Kb data • <5sec processing time • 10,000+ / month • The problem: Interface (batch & dependancy management) • Similarity search • Search against one or more ~10GB databases • The Problem: Data movement & memory • (much easier on dedicated resources)

  5. The “bioinformatics” component • “Unigene” assembly • Traditional long run, big memory compute problem • Comes at the end of the other two types • The problem: algorithms • Clustering / Pattern Discovery • Conference driven • Causes us to redo the other tasks

  6. The “bioinformatics” component • “Data warehouses” • Mirroring and cross checking other public resources • Local Oracle implementation of public databases for local users (Genbank / Swiss-PROT / Medicago …)

  7. The “bioinformatics” component • Microarray data • Image data (~1MB per image) requires processing and storage • Unknown normalization, errors, etc. requires that we simply keep all the raw data. • Web based display of results • Visualization…

  8. Computational resources • ~100 CPU Opportunistic Condor “Flock” • Not dedicated • Configuration can change without warning • No permanent local data storage • Machines sit on desks. • “flocking” with Madison, CS dept, other labs • Reciprocity can hurt a LOT. • Server farms • Intel / Alpha • Hard to find money to buy dedicated machines, esp. on single organism projects.

  9. Software and user issues • An intuitive interface to parallel and batch systems gives uninformed users a great deal of power. • Tools from outside: Poor scalability • Tools from inside: Poor portability

  10. Heuristic algorithms • Many bioinformatics tools are heuristic rather than complete searches. • These searches can return different results on different machines (dynamic thresholds, 32 vs. 64 bit math, …) • How do we tell “different” from “erroneous?”

  11. Thank you: • The Condor team at Madison • Sanger Center

  12. Collaborations are the key • Christopher Dwan cdwan@ahc.umn.edu • Mike Karo mek@ahc.umn.edu • Tim Kunau kunau@ahc.umn.edu

More Related