1 / 16

Peptide-assisted annotation of the Mlp genome

Peptide-assisted annotation of the Mlp genome. Philippe Tanguay Nicolas Feau David Joly Richard Hamelin. Objective. Use peptide libraries to validate the in silico prediction of gene models. Assumption : « if a peptide protein is detected, then there must be a gene that encodes it ».

shalin
Télécharger la présentation

Peptide-assisted annotation of the Mlp genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin

  2. Objective • Use peptide libraries to validate the in silico prediction of gene models Assumption : « if a peptide protein is detected, then there must be a gene that encodes it »  Mapping peptides on a translated genome sequence = provides « correct frames of translation »

  3. Waters MassPREP station LTQ ThermoElectron Methodology (hardware) Urediniospores (3729) Extraction Slicing Digestion Elution Protein extraction 1D SDS-PAGE Gel slicing (64) Trypsin digestion LC-MS/MS Bioinformatics Peptide MS/MS data acquisition

  4. Methodology (Bioinformatic) Protein databases built from… Gene catalog (16694 GM) 6 frames translation of the genome Spectral identification by sequence database searching Mascot Sequest Mascot Sequest Statistical validation of peptide identifications 1 - Comparison of results from both db 2- Comparison of peptides and GM (validation/correction of genome annotations)

  5. MLP proteomic results so far • 691 000 MS/MS spectra obtained from the total proteins Unique peptides: 6-frame translation Gene catalog Mascot + Sequest 4699 352 10980 Only Mascot False discovery rate below 1.6% 352 unique peptides obtained from the 6-frames translation db have do not match GM of the Gene catalog

  6. Peptide frequency distribution on GM The 10980 + 4699 peptides represent assignments for nearly 10% of the Gene catalog e.g. 1659 GM 300 No. gene model 250 Mean  9 peptides covering 134 AA / GM 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 No. peptide/gene model

  7. Automated classification of peptides with no hit (352) on the Gene catalog • 5’ extension of a predicted GM • If peptide (s) located within the 1000 bp upstream the predicted GM start codon • 3’ extension of a predicted GM • If peptide (s) located within the 1000 bp downstream the predicted GM stop codon • 5’ and 3’ extension of a predicted GM • If peptides located within the 1000 bp upstream the start codon and within the 1000 bp downstream the predicted GM stop codon • Internal extension of a predicted GM • If peptide (s) located in the GM • New GM • If no predicted GM in the vicinity of the peptide (s)

  8. Corrections-Additions to the Gene catalog • Mapping of the peptides with no hit on the genome allowed the following modifications Total 172

  9. Manual curation- Internal extension

  10. Manual curation- Internal extension • EuGene’s prediction is OK

  11. Manual curation- New GM

  12. Manual curation- New GM

  13. Summary – Peptide-assisted genome annotation • Validated 10 % of the predicted GM • Corrected/found > 170 GM With little resources (6000 $ worth of materials and services, and a few weeks worth of labour) our proteomic analysis: According the manual curation accomplished so far, it appears that EuGene had predicted most of the corrected/found > 170 GM

  14. Perspectives • Analysing the Sequest output obtained from the 6-frames translation 5051 peptides identified with Mascot (352 with no hits on the Gene catalog) Sequest ? • A quantitative proteomic approach (iTRAQ) will be used to compare urediniospores, germinated urediniospores and haustoria protein complexes

  15. Available material • Our set of peptide spectra from urediniospores proteins is available to validate new GM predictions • The peptides GFF files will be made available to the Melampsora community

  16. Finding the peptides on the different model prediction sets % Model prediction set Total GM GM validated  Do we need to perform a new spectra search on the whole model prediction sets ?

More Related