240 likes | 467 Vues
Proteomics technologies and protein-protein interaction. Michael Kerner (Lars Kiemer, Anders Hinsby) Center for Biological Sequence Analysis The Technical University of Denmark Advanced Bioinformatics – November 2006. Outlining the problem.
E N D
Proteomics technologies and protein-protein interaction Michael Kerner (Lars Kiemer, Anders Hinsby) Center for Biological Sequence Analysis The Technical University of Denmark Advanced Bioinformatics – November 2006
Outlining the problem Around 30% of the human proteins still have no annotated function. Even if the function is known, we often don’t know anything about the big picture (regulation?, multiple functions?, pathogenesis?, mutations?, splice variants?). In fact, the individual proteins are as interesting as bricks in a wall – what we want to know about is the system.
Example: signal transduction cascade EXTRACELLULAR NCAM NCAM CB1 NCAM FGFR NCAM bRaf Ras PKC Frs2 Sos Ca2+ Raf C-Fos Grb2 Shc DAGL MEK CYTOPLASM CREB PKA PLC Fyn MAPK Rap1 MAPK CaMKII Fak NUCLEUS GAP43
FGFR CB1 cAMP Rap1 bRaf Example: signal transduction cascade EXTRACELLULAR NCAM NCAM NCAM 2-AG DAG PIP2 Frs2 Ras NCAM DAGL Sos Grb2 Fyn Sos PLC Shc Fak Grb2 Raf IP3 Ca2+ PKC PKA MEK CYTOPLASM GAP43 NUCLEUS MAPK CaMKII CREB MAPK C-Fos Transcription
Obtaining data High-throughput data can provide information about interactions with other proteins, protein abundance in different tissues, transcriptional regulation, etc. High-throughput experimental techniques provide large data sets – thus no manual curation is possible. These data sets often contain false positives. They still miss many interactions. But combining several such data sets increases confidence and coverage.
Protein interactions reveal a lot! Hints of the function of a protein are revealed when its interaction partners are known. Guilt by association! Complexes in which none of the interaction partners have known functions are even more interesting.
Yeast-two-hybrid screening • Has been widely used • Only binary interactions • High false postive rate • Proteins always get expressed as chimera • Proteins must be able to enter the nucleus
Affinity purification • Large-scale • Can be done on any preparation of cells • Often complexes are purified and the order of binding is not obtained • An extra step is needed to identify purified proteins
Ions are detected as they discharge on the detector Separates gas-phase Ions by m/z Converts the analyte into gasphase ions Mass Analyzer(s) Ion Source Detector Mass spectrometer Q1 q2 + TOF 3 principal components
Mass spectrometry in short • Extremely sensitive • Mass precision of one atom • In principle, detection of one, relatively short peptide allows for unambiguous identification of a protein. (in practice: two or three peptides) • Proteins usually have to be digested to smaller peptides before analysis. • Some proteins are difficult to digest with proteases. • Some peptides are very difficult to ionize. • Due to the high sensitivity of the method, contaminations are difficult to avoid. • Protein/peptide identification is still mostly qualitative only. • Relative (but not yet absolute) protein concentrations can be obtained with more sopisticated experimental setups.
Protein interaction databases: Spoke/Matrix Affinity pulldown Bait Prey Spoke Matrix Truth?
Protein interaction databases: Overlap Protein interaction data: A total of 18.629 articles represented in the databases (June 2005). *Approx. 10% of pp interactions in BIND are db’ imports
Species bias in available data • A few select organisms are very well-studied, while others are not. • The BIND database, species distribution (Alfarano et al., NAR, 2005):
Trans-organism protein interaction network Orthologs? Orthologous genes are direct descendants of a gene in a common ancestor: S. cerevisiae D. melanogaster H. sapiens (O'Brien K, Remm et al. 2005)
Trans-organism protein interaction network H. sapiens MOSAIC D. melanogaster Experim. C. elegans Experim. S. cerevisiae Experim.
Repetition of experiments adds credibility Light blue connection – 1 experiment. Darker blue connection – >1 experiment, 1 organism. Purple connection - >1 experiment, >1 organisms.
Adding co-expression data Red connector – co-expression in 80 different tissues with a correlation coefficient above 0.7. Grey nodes – no expression data available.
Relative Relative level level of of protein in protein in the the nucleolus nucleolus after after inhibition inhibition of of transcription transcription increased increased unchanged unchanged decreased decreased Nucleolus dynamics Nodes are coloured according to level of protein in the nucleolus following transcriptional inhibition (Andersen et al., Nature, 2005).
Adding up to make high quality associations Integration of various data sources builds up confidence
TFIID DNA repair SMARCA complex Arp2/3 Ribosome (predominantly 60S) Identifying functional complexes
Summary • Protein-protein interactions can reveal hints about the function of a protein (guilt by association). • Information about protein interactions is obtained with different technologies each with its own advantages and weaknesses. • Due to the high degree of systemic conservation, interactions can be inferred from observed interactions in other species. • Data are always error-prone. Repeated observations build up confidence. • Integrating different types of data can futher build up confidence.