1 / 19

Introduction Ordering of P . knowlesi contigs v P . falciparum methodology progress/status

Introduction Ordering of P . knowlesi contigs v P . falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating the ‘first’ proteome. some ‘syntenic breakpoints’ in ACT view.

xannon
Télécharger la présentation

Introduction Ordering of P . knowlesi contigs v P . falciparum methodology progress/status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction • Ordering of P. knowlesi contigs v P. falciparum • methodology • progress/status • towards a synteny map – ‘true’ scaffold • 2. Gene prediction • generating the ‘first’ proteome. • some ‘syntenic breakpoints’ in ACT view.

  2. Overview Contig Ordering (knowlesi v falciparum) : benefits and a caveat. Speeds:- • Annotation via generation of pseudomolecules • Prefinishing • dissemination of gene models via GeneDB. • identification ‘syntenic breakpoints’ • Towards ‘species specific genes’ • Generation of a predicted proteome • Positive impact on gene models in falciparum by identification of missed genes/exons • CAVEAT: current methodology assumes synteny no evidence for physical linkage of contigs in pseudomolecules Integration of read pair data needed to confirm linkage to generate scaffolds

  3. 3D7 chrA as reference Ordered contigs A Ordered contigs B 3D7 chrB as reference Read pairs can confirm or deny physical linkage of contigs assumed by ordering

  4. Ellen Adlam’s contig ordering Script – Brief Methodology Four stages: 1. Pk contig set is filtered to remove those below 5 kb. 2. TBlastX on sections of Pk contigs against Pf chromosomes. Contigs split into 14 groups according to the top hit linked to a Pf chromosome. 3. Coordinates of hits examined. Pk Contigs ordered relative to the ‘corresponding’ Pf chromosome. 4. Coordinates are reexamined and N’s are inserted to represent gaps as expected by measurement against Pf.

  5. Contigs ordered against Pf Chr7. Ordering tends to fail in highly variable regions Subtelomeres Internal var arrays

  6. Integration of data to inform gene models Comparison of regions of synteny with falciparum Blast and fastaA Gene prediction algorithms SNAP Projector Intergrate into ACT EST data Proteome data Manual review Acurate gene predictions

  7. ACT visualisation of ‘synteny’ to aid annotation

  8. Contigs ordering results/estimates/next steps coverage (5x) 18.6 Mb ordered av. 21 (980 gaps) gene preds 2300 (8x) 23 Mb ordered av. 29 kb (280 gaps) 5100 Manually reviewed models (297) for chr 6 (estimated time scale for manual review of all genepredictions: 40 -70 person days, 2 – 3.5 months) Passed on to aid in prefinishing. possible next steps: 1. May be possible to manually order smaller contigs into the gaps 2. Analyse using read pair data (sequencing and BAC end reads) to generate scaffolds (IN PROGRESS). 3. Identify BAC clones which may be telomeric/subtelomeric by mapping end reads onto the metachromosomes.

  9. Identification of gene duplication/deletion P. knowlesi P. falciparum chr7

  10. Gene finders different types: ab initio - bases predictions on statistic profile calculated from a training set (criteria: consensus sequence start sites, splice junctions, sequence composition on codon and DNA level for coding, introns and non-coding, intron length distribution, exon length distribution) comparative - bases predictions on sequence similarity to coding in related organism and uses statistic profile from training set to a much lesser extent

  11. Projector precise alignment step of algorithm means that it needs much memory it cannot go through an entire sequence before we can feed it the reference and query sequence we need to: align the corresponding chromosome contigs. identify which gene plus surrounding sequence in annotated corresponds to which section in unannotated (Ellen's script and gene modeller can provide some hints for this) take the two linked regions in unannotated and reference and give these to projector as input it can only predict for regions for which you have told it to at the moment it can only be run by the person who wrote it but it is being callibrated and underdevelopement for wider use. can show where it observed conservation on sequence level for both (for untranslated, exon and intron)

  12. Exploring different gene finding tools for P. knowlesi originated from the complex and slow process of manually building a training set for unannotated organism making use of an annotated relative (P. falciparum) SNAP ab initio GENE MODELLER comparative, sensitive blast, then tries to find start/ stop/ splice site near BLAST hit ends; needs refinement PROJECTOR comparative gives us a good opportunity to evaluate strengths and weaknesses of each trial on an ordered contig set for knowlesi chr6 which had been annotated.

  13. Sensitivity and specificity performance for single exon and multi exon genes Single exon >1 exon

  14. Sensititivity and specificity measured against a set of 156 manually annotated genes

  15. How well are start and stop codons predicted?

  16. Conclusions on gene prediction performance Specificity Projector (26)> SNAP (6) > Gene Modeller (0) “New” projector: 20 % (17 %) exact specificity of the gene models made Sensitivity SNAP (154 ) > Gene Modeller(143 ) > Projector (128 ) SNAP/Gene modeller although not specific are sensitive Gene Modeller due to the blast parameters chosen (low penalties for gap opening, extension and mismatch, word size 9) Can the strengths of Gene modeller or SNAP be combined with the specificity of projector?

  17. Future work New run of the latest contig ordering set using projector informed with additional data as “intervals” to improve sensitivity.

More Related