1 / 38

Bioinformatic analysis of protein complexes

Bioinformatic analysis of protein complexes. Roland Krause Cellzome AG, Heidelberg. Overview. The Proteome: proteins and their interactions The yeast proteome project at Cellzome Experimental data generation Functional analysis Obtaining protein complexes

chet
Télécharger la présentation

Bioinformatic analysis of protein complexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatic analysis of protein complexes Roland Krause Cellzome AG, Heidelberg

  2. Overview The Proteome: proteins and their interactions • The yeast proteome project at Cellzome • Experimental data generation • Functional analysis • Obtaining protein complexes • Comparing protein complexes and interacting proteins • Shared components: Building blocks or biochemical artifacts?

  3. The study of the protein repertoire expressed in the cell Protein expression levels Qualitative Quantitative Localization Protein interactions – Powerful tool for the elucidation of protein function Pair-wise interactions Protein complexes Protein complexes Visible structural bodies Important players in molecular life Proteomics

  4. Cellzome AG Dr. Giulio Superti-Furga Dr. Gitte Neubauer Yeast biology Dr. Anne-Claude Gavin Dr. Paola Grandi Dr. Christina Rau Mass spectrometry Bernhard Küster, PhD Markus Bösche Bioinformatics Dr. Georg Casari Jens Rick Julien Gagneur EMBL Dr. Peer Bork Dr. Christian von Mering Biozentrum Würzburg Prof. Dr. Thomas Dandekar Prof. Dr. Jörg Schultz Acknowledgements

  5. The Yeast Proteome Projectat Cellzome Tandem Affinity Purification (TAP) Mass-Spectrometry (MS)

  6. Homologous transformation (addition of TAP-tag) Test for expression Large scale culture Purification Gel separation of complexes Mass spectrometry (MALDI) Workflow TAP-MS

  7. Large scale culture Transformation Separation LIMS system An integrated workflow Mass spectrometry

  8. Handling laboratory informationand annotation • Selection of protocol according to features of gene • Size, membrane association • Process information/ user management • Annotation of new complexes and novel findings • Collection of information for patent • Database of known drug targets

  9. Key figures of the screen • Purifications: 589 • Proteins retrieved: 1440, ~ 300 novel • Complexes discovered: 232, 60% appear as novel • Overlap to the Y2H-data: 165 of ~1500 interactions • 37% percent of proteins in complexes are shared • [Gavin, AC., Bösche, M., Krause, R., et al. (2002) Nature]

  10. Functional analysis • Protein complexes share many components • The resulting network of complexes builds a higher order network • Highly conserved and essential proteins tend to interact with each other • All localizations are sampled well but for membrane proteins • Small proteins are underrepresented

  11. Examples of new findings • New complexes • 90S Pre-Ribosome • Gives rise to the primordial, nucleolar ribosome • Third largest complex in the yeast cell • Established functionally by Grandi et al, 2002 (Mol. Cell.), Dragon et al, 2002, (Nature) • COP9/Signalosome • “Missing” complex known in human, fly, Arabidopsis • Known to be related to the 19S regulatory part of the proteasome • Shares components with the proteasome in yeast • New interactors for known complexes • Iwr1 with RNA polymerase II • YFL049w with SWI/SNF complex • Apparent underestimate of protein complexes in the reference literature

  12. A comprehensive list of protein complexes Cluster analysis for protein complexes

  13. Obtaining complexes + + TAP purifications TAP-tagged protein (entry point) yTAP-complexes (232)

  14. A comprehensive list of protein complexes • Assembly of individual interactions into “physiological” protein complexes • Allows interpretation and annotation of results • Manually performed for the publication in Nature • Used known complexes as a guide • Contains several inconsistencies • An automatic procedure would be beneficial • Cluster analysis • Should preserve features of real complexes • Possible clusters • Of proteins • Of purifications • Large number of clusters compared to clustering of transcription profiles

  15. There is no standard on comparing sets of complexes How shall we treat the intricate structure of protein complexes? Variant complexes RSC complex Lsm1-Lsm7 vs Lsm2-Lsm8 Cyclin dependent kinases Megacomplexes Assemblies of complexes Transient interactions Definitions vary Kinetics Cell cycle Functional associations Benchmarking protein complexes

  16. Ribosomal biogenesis From Schafer et al, EMBO Journal (2003)

  17. Clustering of proteins • Clustering of proteins • Shared components are not preserved • Each protein is assigned to a single complex • Megacomplexes did not allow for a good separation • Very few data points ~ 80% of the proteins have less than 3 identifications • Simpler approach: Cluster of purifications • A purification should contain complexes already

  18. Clustering of purifications

  19. Dice-Index Jaccard-Index na, nb: Number of components in group a or b Geometric index Simpson-Index ni: Number of components in the intersection Similarity indices for comparing complexes or purifications

  20. Experimental complications • Sensitive identification of background proteins • Ribosomal proteins • Heat shock proteins • Abundant enzymes • Filtering by class and detection frequency • Missing identifications • Differences in expression levels • Small proteins • Membrane associated proteins

  21. Refinements of similarity indices • Normalized Dice-like index (by column) • Normalized Simpson-like-index f: Frequency of detection

  22. Comparing clustering results • Manually refined the MIPS and YPD complex sets for benchmark • Parameter exploration, comparing the results to the benchmark set • ~80 complexes are contained in the curated complex set • No increase when expanding beyond 250 complexes • 252 complexes from the TAP set using means clustering and a threshold of 0.3

  23. Results and conclusions • Combined HMS-PCI and the TAP set • 494 clusters (a reasonable total number of complexes) • 46 of 94 identical entry points occur in the same cluster • Refining the distance index is crucial to the clustering • Future work • Clustering of proteins (bi-clustering) and classification of proteins • Different clustering algorithms • Including more information into distance measure • Bait protein • Refine benchmarking • [Krause R., et al. (2003) Bioinformatics.]

  24. Comparison of protein-protein interaction screens Differences between individual methods and reference sets

  25. Comparison of different data sets • Biochemical purifications • Gavin et al. (2002) (TAP) • Ho et al. (2002) (HMS-PCI) • Yeast-two hybrid • Ito et al. (2000, 2001), Uetz et al. (2000) • mRNA-co-expression • Eisen et al., (1998) Marcotte (2000) • In silico predictions • STRING (von Mering et al., (2003) • Synthetic lethals

  26. energy production aminoacid metabolism other metabolism translation transcription transcriptional control protein fate cellular organization transport and sensing stress and defense genome maintenance cellular fate/organization uncharacterized Interaction density E G 0 10 M Interaction pro 1000 possible P T B F O A R D C U E G M P T B F O A R D C U Interaction density

  27. Functional biases

  28. Comparison

  29. Conclusions • The overlap between the individual methods is surprisingly small • Different methods complement each other • Individual methods are not exhausted • Single experimental methods can be as reliable as combined sets • Integration [ Bader, G. and Hogue, C. (2002) Nat. Biot.] [Kemmeren H., et al. (2002) Mol. Cell] [Von Mering C., Krause, R., et al. (2002) Nature]

  30. Shared components of protein complexes Biochemical artifacts or versatile building blocks?

  31. Shared components in the Cellzome screen Co-activator of Pol II transcription SAGA complex Cytoskeleton NuA4 histone acetylase Chromatin remodelling Histone deacetylase complex

  32. Motivation and approach • Artifacts or structural principle? • Relevance to medical target discovery • Target to several processes • Understanding side effects • Evolutionary insights • Study of known shared components

  33. Dihydrolipoamide dehydrogenase (Lpd1) 2-Oxoglutarate dehydrogenase Glycine decarboxylase Pyruvate dehydrogenase Common enzymatic function RNA polymerases Shared proteins are not the “business end” Regulatory – structural roles Functional arrangements Cramer, P., et al. (2000) Science

  34. Structural arrangements for shared components Examples: Spt6 – Tethers exosome to the RNA polymerase for surveillance Examples: Lsm1-7 complex Lsm2-8 complex Examples: Signaling networks Manuscript in preparation

  35. Research interests

  36. Research interests • Improve clustering approaches • Find a sensible structure for protein complexes and their interactions • Benchmark set of protein complexes in yeast • Functional properties of protein complexes • Conquering the human proteome and experimental planning • Hypothesis-free research

  37. Thank you!

  38. Thank you!

More Related