1 / 39

Improving Genome Annotation using Proteomics

Improving Genome Annotation using Proteomics. Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park. Mass Spectrometry for Proteomics. Measure mass of many (bio)molecules simultaneously High bandwidth

brone
Télécharger la présentation

Improving Genome Annotation using Proteomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Genome Annotation using Proteomics Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park

  2. Mass Spectrometry for Proteomics • Measure mass of many (bio)molecules simultaneously • High bandwidth • Mass is an intrinsic property of all (bio)molecules • No prior knowledge required

  3. Sample + _ Detector Ionizer Mass Analyzer Mass Spectrometer ElectronMultiplier(EM) Time-Of-Flight (TOF) Quadrapole Ion-Trap MALDI Electro-SprayIonization (ESI)

  4. 100 % Intensity 0 m/z 250 500 750 1000 High Bandwidth

  5. Mass is fundamental!

  6. Mass Spectrometry for Proteomics • Measure mass of many molecules simultaneously • ...but not too many, abundance bias • Mass is an intrinsic property of all (bio)molecules • ...but need a reference to compare to

  7. Mass Spectrometry for Proteomics • Mass spectrometry has been around since the turn of the century... • ...why is MS based Proteomics so new? • Ionization methods • MALDI, Electrospray • Protein chemistry & automation • Chromatography, Gels, Computers • Protein / genome sequences • A reference for comparison

  8. Enzymatic Digest and Fractionation Sample Preparation for Peptide Identification

  9. Single Stage MS MS m/z

  10. Tandem Mass Spectrometry(MS/MS) m/z Precursor selection m/z

  11. Tandem Mass Spectrometry(MS/MS) Precursor selection + collision induced dissociation (CID) m/z MS/MS m/z

  12. Peptide Identification • For each (likely) peptide sequence 1. Compute fragment masses 2. Compare with spectrum 3. Retain those that match well • Peptide sequences from (any) sequence database • Swiss-Prot, IPI, NCBI’s nr, ESTs, genomes, ... • Automated, high-throughput peptide identification in complex mixtures

  13. Peptide Identification ...can provide direct experimental evidence for the amino-acid sequence of functional proteins. Evidence for: • Functional protein isoforms • Translation start and frame • Proteins with short open-reading-frames

  14. How could this help? • Evidence for SNPs and alternative splicing stops with transcription • No genomic or transcript evidence for translation start-site. • Conservation doesn’t stop at coding bases! • Statistical gene-finders struggle with micro-exons, translation start-site, and short ORFs.

  15. What can be observed? • Known coding SNPs • Novel coding mutations • Alternative splicing isoforms • Microexons ( non-cannonical splice-sites ) • Alternative translation start-sites ( codons ) • Alternative translation frames • “Dark” open-reading-frames

  16. Splice Isoform • Human Jurkat leukemia cell-line • Lipid-raft extraction protocol, targeting T cells • von Haller, et al. MCP 2003. • LIME1 gene: • LCK interacting transmembrane adaptor 1 • LCK gene: • Leukocyte-specific protein tyrosine kinase • Proto-oncogene • Chromosomal aberration involving LCK in leukemias. • Multiple significant peptide identifications

  17. Splice Isoform

  18. Novel Splice Isoform

  19. Translation Start-Site • Human erythroleukemia K562 cell-line • Depth of coverage study • Resing et al. Anal. Chem. 2004. • THOC2 gene: • Part of the heteromultimeric THO/TREX complex. • Initially believed to be a “novel” ORF • RefSeq mRNA in Jun 2007, no RefSeq protein • TrEMBL entry Feb 2005, no SwissProt entry • Genbank mRNA in May 2002 (complete CDS) • Plenty of EST support • ~ 100,000 bases upstream of other isoforms

  20. Translation Start-Site

  21. Translation Start-Site

  22. Translation Start-Site

  23. Translation Start-Site

  24. Easily distinguish minor sequence variations Two B. anthracis Sterne α/β SASP annotations • RefSeq/Gb: MVMARN... (7441 Da) • CMR: MARN... (7211 Da) • Intact proteins differ by 230 Da • 7441 Da vs 7211 Da • N-terminal tryptic peptides: • MVMAR (606.3 Da), MVMARNR (876.4 Da), vs • MARNR (646.3 Da) • Very different MS/MS spectra

  25. Bacterial Gene-Finding • Find all the open-reading-frames... …TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA… Stopcodon Stopcodon ...courtesy of Art Delcher

  26. Bacterial Gene-Finding • Find all the open-reading-frames......but they overlap – which ones are correct? Reversestrand Stopcodon …ATCTTTTTACCGAGAAATCTATTTAAAGTACTTTTTATAACT… …TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA… Stopcodon Stopcodon ShiftedStop ...courtesy of Art Delcher

  27. Coding-Sequence “Score” ...courtesy of Art Delcher

  28. Glimmer3 trained & compared to RefSeq genes with annotated function Correct STOP: 99.6% Correct START: 84.3% “Not all the genomes necessarily have carefully/accurately annotated start sites, so the results for number of correct starts may be suspect.” Glimmer3 Performance

  29. N-terminal peptides • (Protein) N-terminal peptides establish • start-site of known & unexpected ORFs Use: • Directly to annotate genomes • Evaluate and improve algorithms • Map cross-species

  30. N-terminal peptide workflows • Typical proteomics workflows sample peptides from the proteome “randomly” • Caulobacter crescentus (70%) • 3733 Proteins (RefSeq Genome annot.) • 66K tryptic peptides (600 Da to 3000 Da) • 2085 N-terminal tryptic peptides (3%)

  31. Protect protein N-terminus Digest to peptides Chemically modify free peptide N-term Use chem. mod. to capture unwanted peptides N-terminal peptide workflow Nat Biotech, Vol. 21, pp. 566-569, 2003.

  32. Multiple (digest) enzymes: trypsin-R: 60% (80%) acid + lys-C + trypsin:85% (94%) Repeated LC-MS/MS Precursor Exclusion / Inclusion lists MALDI / ESI Protein separation and/or orthogonal fractionation Increasing N-terminal peptide coverage Anal Chem, Vol. 76, pp. 4193-4201, 2004.

  33. Proteomics Informatics • Search spectra against: • Entire bacterial genome; • All Met initiated peptides; or • Statistically likely Met initiated peptides. • Easily consider initial Met loss PTM, too • Off-the-shelf MS/MS search engines (Mascot / X!Tandem / OMSSA)

  34. Other Practical Issues • Suitable for commonly available instrumentation • Only the sample prep. is (somewhat) novel. • Need living organism • Stage of life-cycle? • Bang for buck? • N-terminal peptides / $$$$

  35. Other Research Projects • Alternative splicing and coding SNPs in clinical cancer samples • MS/MS spectral matching using HMMs • Combining MS/MS search engine results using machine learning • Microorganism identification using MS (www.RMIDb.org) • Gapped/spaced seeds for inexact sequence alignment. • Applications of SBH-graphs and Eulerian paths

  36. Hidden Markov Models for Spectral Matching • Capture statistical variation and consensus in peak intensity • Capture semantics of peaks • Extrapolate model to other peptides • Good specificity with superior sensitivity for peptide detection • Assign 1000’s of additional spectra (w/ p-value < 10-5)

  37. Peptide DLATVYVDVLK

  38. Peptide DLATVYVDVLK

  39. Acknowledgements • Catherine Fenselau, Steve Swatkoski • UMCP Biochemistry • Chau-Wen Tseng, Xue Wu • UMCP Computer Science • Cheng Lee • Calibrant Biosystems • PeptideAtlas, HUPO PPP, X!Tandem • Funding: NIH/NCI, USDA/ARS

More Related