250 likes | 374 Vues
Explore the significant role of clustering in the Integrated Microbial Genomes (IMG) system, as presented by Sean Hooper. This program enhances data compression, assists in annotation, and groups similar functions critical for handling large datasets, such as metagenomics. Delve into putative or hypothetical gene searches and study the often-overlooked gene clusters within IMG, including COG and Pfam categorizations. Discover how MCL clustering on sequence data can reveal phylogenetic relationships and facilitate the analysis of complex genomes.
E N D
IMG clusters – the hidden features Sean Hooper Genome Biology Program JGI
Clusters work behind the scenes in IMG Used for Data compression Annotation assistance Grouping of similar functions Necessary for large datasets, e.g. metagenomics Background
Example • Search for a gene annotated as putative or hypothetical • Study the often overlooked clusters of genes in IMG
COG Pfam IMG
1997: 720 cogs 2003: 4873 cogs Tatusev et al 1997
COG Pfam IMG
COG Pfam IMG
Nodes = IMG genes Edges = in same cluster
Provide fast access to related proteins Ease analysis and annotation (but cannot replace experimental work) Reveal substructures in function and phylogeny Conclusions
Acknowledgements Chalmers, Sweden D Dalevi Genome Biology K Mavrommatis IJ Anderson NC Kyrpides A Pati IMG crew K Palappian E Szeto VK Markowitz
Cluster overview of Archaea Spectral bipartitioning Integrate metadata (phenotype, phylogeny) COAL demo