1 / 25

Unlocking Hidden Features of IMG: Clustering in Genome Biology for Enhanced Data Annotation

Explore the significant role of clustering in the Integrated Microbial Genomes (IMG) system, as presented by Sean Hooper. This program enhances data compression, assists in annotation, and groups similar functions critical for handling large datasets, such as metagenomics. Delve into putative or hypothetical gene searches and study the often-overlooked gene clusters within IMG, including COG and Pfam categorizations. Discover how MCL clustering on sequence data can reveal phylogenetic relationships and facilitate the analysis of complex genomes.

Télécharger la présentation

Unlocking Hidden Features of IMG: Clustering in Genome Biology for Enhanced Data Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IMG clusters – the hidden features Sean Hooper Genome Biology Program JGI

  2. Clusters work behind the scenes in IMG Used for Data compression Annotation assistance Grouping of similar functions Necessary for large datasets, e.g. metagenomics Background

  3. Example • Search for a gene annotated as putative or hypothetical • Study the often overlooked clusters of genes in IMG

  4. Putative ribolase carboxylase

  5. COG Pfam IMG

  6. 1997: 720 cogs 2003: 4873 cogs Tatusev et al 1997

  7. COG Pfam IMG

  8. COG Pfam IMG

  9. MCL clustering on sequence

  10. Nodes = IMG genes Edges = in same cluster

  11. Alignment detail

  12. How do these clusters relate to phylogeny? Phylogeny

  13. Provide fast access to related proteins Ease analysis and annotation (but cannot replace experimental work) Reveal substructures in function and phylogeny Conclusions

  14. Acknowledgements Chalmers, Sweden D Dalevi Genome Biology K Mavrommatis IJ Anderson NC Kyrpides A Pati IMG crew K Palappian E Szeto VK Markowitz

  15. Cluster overview of Archaea Spectral bipartitioning Integrate metadata (phenotype, phylogeny) COAL demo

More Related