1 / 30

The Zebrafish Genome Sequencing Project Bioinformatics resources

The Zebrafish Genome Sequencing Project Bioinformatics resources. Kerstin Howe, Mario Caccamo, Ian Sealy. Bioinformatics resources. outline clone mapping, sequencing and manual annotation in genome assemblies and automated annotation in integrated ZF-Models data and tools.

long
Télécharger la présentation

The Zebrafish Genome Sequencing Project Bioinformatics resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Zebrafish Genome Sequencing ProjectBioinformatics resources Kerstin Howe, Mario Caccamo, Ian Sealy

  2. Bioinformatics resources • outline • clone mapping, sequencing and manual annotation in • genome assemblies and automated annotation in • integrated ZF-Models data and tools

  3. Clone mapping and sequencing • mapping • 2 BAC Tuebingen libraries • 1 BAC and 1 cosmid library from single Tuebingen double-haploid fish • end sequencing, RH mapping, fingerprinting • pieced together according to fingerprints, marker mapping, sequence alignment • currently ~ 2500 ctgs

  4. + + = Clone mapping and sequencing • sequencing pipeline • select clones based on position in fpc contig • subcloning • sequencing • automatical assembly/pre-finishing (back to sequencing if necessary) • finishing • QC • automated analysis pipeline • manual annotation • submission to EMBL

  5. RepeatMasker • CpG island prediction • Genscan • FGenesh • halfwise (Pfam) • EPCR • Blast (ESTs, cDNAs, proteins) • gene structures • remarks (gene names, function, similarities) • other features otter • mysql database in 'ensembl style' • acedb or apollo front end • open to users from the 'outside' EMBL Manual annotation unfinished sequence finished sequence automated analysis pipeline manual annotation

  6. Manual annotation • annotation policy • follows guidelines for human annotation (havana team, Sanger Institute) • no "guesses", annotations solely based on supporting evidence • annotation of: CDSs and UTRs / transcripts • splice variants • pseudogenes • poly A features • transposons • repeats • approved nomenclature (SI:clone.number) • collaboration with ZFIN • existing ZFIN records are reported • ZFIN provides new records for newly found genes

  7. DNA CpG island repeats Genscan FGenesH proteins ESTs mRNAs Manual annotation

  8. vega.sanger.ac.uk

  9. Vega contigview

  10. Vega geneview

  11. www.sanger.ac.uk/Projects/D_rerio

  12. www.sanger.ac.uk/Projects/D_rerio

  13. when to use what • go to vega.sanger.ac.uk if you need • highly reliable sequence • highly reliable annotation (with your input) • ‘your gene’ stable over time (TILLING) • go to www.ensembl.org if you need • the whole genome • comparative data • ZF-Models microarray or insertional mutagenesis data • complicated searches (BioMart)

  14. Zebrafish Genome Project clone libraries markers (T51) sequencing tile path BACs map WGS assembly fpc ctg contig supercontig 1.63 Gb contigs finish clone clones+ctgs whole genome shotgun sequencing clone mapping and sequencing WGS reads integration (un)finished clones assembly release (Zv5) ~ 8,000 finished clones (~1 Gb) automatic annotation manual annotation

  15. WGS assembly A B C phrap B C A read-pair tracker B A C NNNNNNNN gap Phusion assembler - High Performance Assembly Group (Zemin Ning et al.) reads group reads contig contig contig contig contig supercontig supercontig supercontig supercontig

  16. Read grouping • word distribution seq.errors frequency repeats k-mer occurrence ~7 • k-mer word hashing continuous base hash - k=12 ATGGCGTGCAGTCCATGTTCGGATCA ATGGCGTGCAGT TGGCGTGCAGTC GGCGTGCAGTCC GCGTGCAGTCCA gap hash k=12 (4x3) - dealing with variation ATGGCGTGCAGTCCATGTTCGGATCA ATGGCGTGCAGTCCATGT TGGCGTGCAGTCCATGTT GGCGTGCAGTCCATGTTC GCGTGCAGTCCATGTTCG

  17. Zebrafish Genome Project clone libraries markers (T51) sequencing map WGS assembly whole genome shotgun sequencing clone mapping and sequencing WGS reads integration (un)finished clones assembly release (Zv5) ~ 7,000 finished clones (~1 Gb) automatic annotation manual annotation

  18. Integration cDNA WGS supercontig bacends marker Zv5 scaffoldn.1 BX005153 Zv5 scaffoldn.3 BX005057.8 Zv5 scaffoldn.5 BX005049.6 BX005123.6 Zv5 scaffoldn.7 BACs BX005049.6 BX005123.6 BX005153 BX005057.8 fpc contig Zv5 scaffoldn

  19. Assemblies

  20. Automatic Annotation Zebrafish Proteins Other Proteins Zebrafish cDNAs Zebrafish ESTs Genewise Exonerate Exonerate Genewise genes Aligned cDNAs AlignedESTs ClusterMerge Genewise geneswith UTRs Supported ab initio (optional) Genebuilder Ensembl EST genes Final set

  21. Ensembl

  22. Contigview

  23. Geneview

  24. Searching Ensembl

  25. Biomart start filter output

  26. Do’s and Dont’s go elsewhere (Ensembl) if you want to know about the whole genome need comparative data need ZF-Models microarray or insertional mut data need to do complicated searches go to Vega if you need highly reliable sequence need highly reliable annotation need ‘your gene’ stable over time (TILLING)

  27. DAS DAS client DAS server DAS server DAS server remote storage remote storage remote storage genome browser local storage reference sequence XML

  28. SNPs and Indels

  29. Ensembl releases

More Related