on proteogenomic and its implications on the field of bioinformatics n.
Skip this Video
Loading SlideShow in 5 Seconds..
On Proteogenomic and its implications on the field of Bioinformatics PowerPoint Presentation
Download Presentation
On Proteogenomic and its implications on the field of Bioinformatics

On Proteogenomic and its implications on the field of Bioinformatics

219 Vues Download Presentation
Télécharger la présentation

On Proteogenomic and its implications on the field of Bioinformatics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. On Proteogenomic and its implications on the field of Bioinformatics Hugo Willy

  2. What is proteogenomics? • A combination of the words Proteomics and Genomics. • Proteogenomicscommonly refer to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations.

  3. A brief introduction to Protein Mass Spectrometry • It is used to characterize protein sequence. • The basic idea is to ionize proteins and let it “fly” in a vacuum chamber. • The mass/charge (m/z) ratio of the ion can be deduced from the Time of Flight (TOF) of the ion (to reach a detector) or the frequency in which it is circling in a magnetic field.

  4. A brief introduction to Protein Mass Spectrometry

  5. A brief introduction to Protein Mass Spectrometry • Some Mass Spectrometry technique ionize whole proteins but the current popular method is to chop a protein into peptides. • The peptides are separated by their masses before ionization and sequenced independently. • The peptide sequences are mapped back to known protein sequences or used for de novo sequencing (very much like genome sequencing) • The peptide lengths – according to the people I met is around 7-15 amino acids

  6. The pros and cons of Protein Mass Spectrometry • Pros: • It is accurate in determining mass. • It can surely point, assuming unambiguous mapping to a protein sequence, to those proteins that are translated in the cell – this can point which mRNAs get translated and which are not. • It can be used to quantify the amount of different proteins in the sample – as opposed to predicting it from the mRNA levels using microarray

  7. The pros and cons of Protein Mass Spectrometry • Pros: • It can identify Post Translational Modification i.e • If proteins are phosphorylated (then it is Kinase related) • If proteins are methylated and acetylated (important in Histone code) • If proteins are ubiquitinated (related to protein degradation) • It can detect (ribosomal) programmed frameshift and alternative splicing events.

  8. The pros and cons of Protein Mass Spectrometry • Cons: • It is still expensive (but some expert in RECOMB Satellite for Computational Proteomics said it is just as expensive as RNA-Seq). • It is hard to distinguish amino acids with similar mass sum (most notably Leucine and Isoleucine) • We do not have reliable way to amplify proteins in the sample (serious problem)

  9. What does proteogenomics offer? • Accurate prediction of Translation Start Site. • Accurate prediction of programmed frameshifts. • Accurate prediction of post translational modification. • A confirmation if a (pseudo)gene is actually translated. • Observation: most current algorithms on gene prediction are not based on proteomic data (because they were not available)

  10. What does proteogenomicsstruggle with? • For a novel protein, mapping the peptides from the Mass Spectrometry experiments to the exomes/genomes (similar problem as RNA-Seq) • Currently they try to collect exomes (regions that is assumed to be exons) and translate them in 6 different frames (3 in each DNA strand). • They also build a exon splice graph which models different splicing alternatives of a single gene

  11. Exon splice graph Each box represents a single exon and the arrows represent possible combinations of them in the translated protein product. They developed a program to search a peptide in this graph called Inspect. Can be found at

  12. Current works in proteogenomics • Revising gene models – hence their annotations. • Finding novel peptides that maps to non-exonic regions – novel genes?

  13. Some papers and reviews on this field • Nitin Gupta et al. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomicannotation. Genome Res 2007. • Proteogenomics: Annotating Genomes using the Proteome. Natalie Castellana. Poster in RECOMB CP 2011. • Tutorial: Proteogenomics. Natalie Castellana. • Most of the work are done by PavelPevzner and other groups in UC San Diego. Here is their website

  14. Comparative Proteogenomics • Is a branch of proteogenomics that compares proteomic data from multiple related species concurrently and exploits the homology between their proteins to improve annotations with higher statistical confidence. • In a sense – this is the approximate peptide matching problem. • However, it needs to take residue conservation at different part of the proteins into account e.g sites which are post translationally modified must be preserved to maintain function.

  15. Comparative Proteogenomics • Some work in comparative proteogenomics: • Nitin Gupta et al. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 2008. • GenoMS (Castellanaet al. MCP 2010) – This is a program to map peptides to the genome of other related organism

  16. Metaproteomics • Metaproteomics (also Community Proteomics, Environmental Proteomics, or Community Proteogenomics) is the study of all protein samples recovered directly from environmental samples. • This involves simultaneous mapping of peptides to all known genomes and proteomes to get the identity of different organisms present in a sample. • Example work in this field is by Wilmes P, Bond PL. Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol.2006.

  17. De Novo Novel Protein Sequencing • CSPS (Bandeira et al. Nat. Biot. 2009)

  18. Mass Spectra Database • MassBank •

  19. Discussion on the problems and possible future directions • I notice that Hoang’s problem – the one which may be able to store multiple reference genomes is going to be very relevant. • RNA-Seq - Mass Spectrometry = Non-coding RNA? • Anything else?