1 / 27

Second Tomato Finishing Workshop Chromosome 4

Second Tomato Finishing Workshop Chromosome 4. Tomato Project Group Wellcome Trust Sanger Institute 25th April 2008. Chromosome 4 Introduction. Data Flow at WTSI Sequencing Method Used Finishing Strategies Use of Overlapping Data Chr4 Sequence Update Discussion points for Workshop

gerardk
Télécharger la présentation

Second Tomato Finishing Workshop Chromosome 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Second Tomato Finishing WorkshopChromosome 4 • Tomato Project Group • Wellcome Trust Sanger Institute • 25th April 2008

  2. Chromosome 4 Introduction • Data Flow at WTSI • Sequencing Method Used • Finishing Strategies • Use of Overlapping Data • Chr4 Sequence Update • Discussion points for Workshop • Unmapped BACs • Examples of Problem Clones • Dealing with Large Repeats

  3. UK - Chromosome 4 • Gene space estimate for Chromosome 4 is 19Mb • Mapping, sequencing and finishing at Wellcome Trust Sanger Institute (WTSI) • BAC by BAC sequencing approach • Approximately 200 BACs • Funding at WTSI ends October 31st 2008

  4. Overview of WTSI Clone Pipeline • Clone Selection and Verification • Clones entered into pipeline Mapping BACs assigned to chr4 sequencing project on SGN BAC registry • Clone DNA Prep • Digest Confirmation • Library Construction (plasmid) Subcloning • Plasmid Prep • Sequencing & Processing Sequence Contigs >2Kb available on Sanger FTP site and Public Databases “Sequencing in Progress” Shotgun Sequencing HTGS Phase 1 • Sequence Improvement • Contig Orientation and Gap Closure • Confirmation of Assemby (QC) Finishing HTGS Phase 2 • Sequences Uploaded • to SGN • BAC Registry Updated Finished Sequence Final EMBL submission “Complete Sequence” HTGS Phase 3

  5. Clones Selection and Verification • BACs selected primarily from the • HindIII (LE-HBa-) and MboI (SL_MboI) libraries • Using Seed BACs from SGN, • end sequence alignment and FPC analysis • New BACs selected from in house overgoes for markers • Selected 5 clones from the fosmid library • based on end sequence alignments and fingerprints

  6. Plasmid Prep and Shotgun Sequencing • Optimised for 384 well prep and sequencing • Capillary Sequencing • AB3730’s with AB Big Dye Terminator • pUC118 Double Stranded Sequencing Vector • 4-6Kb inserts, double end sequenced BACs Aim for 6x-8x Coverage Average Insert ~100-150Kb (LE_HBa- and SL_MBol- Libraries) 2x or 3x 384 plates per BAC ~750 paired end reads ~1500 reads in total Average 10-15 contigs Fosmids Average Insert ~35Kb 1x 384 plates

  7. Clone Finishing Gap4 (Staden) used to view and manipulate sequence data • Sequence Improvement Manual Finishing QC Checking

  8. BACs are viewed in relation to the mapped minimal tile path Use in house tpf visualisation tool e.g. ctg503 Manual Finishing of BACsBACs viewed in relation to map

  9. Use of Overlapping Sequences • From Minimal Tile Path the region finished in each clone depends on the order the clones enter finishing • Finish unique sequence with a 2000bp overlap between clones BAC1 BAC4 = gap closure BAC2 BAC3 = total BAC insert = finished region Final order and orientation of finished BACs are given in the AGP file e.g. BAC1-BAC2-BAC4-BAC3

  10. Summary of Clone Gap Closure Strategies • Make use of paired ends to order and orientate contigs • Identify whether gaps are spanned or unspanned – orchid example • Identify any repeats associated with gaps – dotter example • Estimate gap sizes using restriction digest data • This will determine appropriate strategy for gap closure e.g. • primer/oligo walking into regions of low quality or gaps spanned by paired end reads • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps) • Use of alternative chemistries where appropriate • structural problems, mono- & di-nuclotide runs

  11. OrchidRead pair Visualisation Tool Contiguous sequence with good read pair coverage

  12. Visualising Repeats associated with gaps Inverted Repeat Direct Repeat

  13. Restriction Digests • Minimum of three restriction enzymes used to confirm the assembly • Selection depends on organism and the nature of the sequence • S. lycopersicum BACs are digested with • BamHI • EcoRI • HindIII • Comparison of real and virtual digest of entire BAC sequence

  14. ConfirmWTSI In-house digest visualisation tool

  15. In-house digest visualisation tool

  16. Clone Gap Closure Strategies • Make use of paired ends to order and orientate contigs • Identify whether gaps are spanned or unspanned – orchid • Identify any repeats associated with gaps – dotter • Estimate gap sizes using restriction digest • This will determine appropriate strategy for gap closure e.g. • primer/oligo walking into regions of low quality or gaps spanned by paired end reads • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps) • Use of alternative chemistries where appropriate • structural problems, mono- & di-nuclotide runs

  17. Sequencing Chemistries and Additives used in Finishing • 4:1 mix ratio of AB Big Dye Terminator : AB dGTP Terminator • used for general finishing reactions, not problem specific • AB dGTP Terminator • used for di-nucleotide runs and inverted repeats • Additive A (SequenceRx Enhancer Solution A - Invitrogen) • Dimethyl sulfoxide (DMSO) • Additive A+DMS0+dGTP • used for mono-nucloetide runs, inverted repeats • Sequence Finishing Kit (SFK) (TempliPhi - Amersham) • used to increase DNA yield • useful for structural problems caused by inverted repeats

  18. Alternative Gap Closure Strategies • Specialist Subcloning • Small Insert Libraries (SIL) Double Stranded pUC or Single Stranded M13 • Large Insert Libraries (LIL) • Transposon Libraries (TIL) • Restriction Fragment SIL (RFSIL) • Alternative Strategies for dealing with large repeats • - points for further discussion on Tuesday • - what repeats have other chromosomes found?

  19. Clone Gap Closure Strategies • Make use of paired ends to order and orientate contigs • Identify whether gaps are spanned or unspanned – orchid • Identify any repeats associated with gaps – dotter • Estimate gap sizes using restriction digest • This will determine appropriate strategy for gap closure e.g. • primer/oligo walking into regions of low quality or gaps spanned by paired end reads • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps) • Use of alternative chemistries where appropriate • structural problems, mono- & di-nuclotide runs

  20. Use of Misc_Feature Tags in EMBL/GenBank/DDBJ • Used regularly on finished sequence to identify regions of: • uni-directional chemistry when dGTP only • single subclone regions • including SIL and TIL only regions • pcr only • Single reads from direct walks on BAC DNA • data only from overlapping BACs • E.coli Transposon insertion sites • tag sp6 and t7 ends of overlaps (tomato) • gap sizes of force joins in tandem repeats

  21. Misc_Feature Tag Example Clone End Tags Accession Length of sequence Whole Clone Finished Both ends of clone cited

  22. Misc_Feature Tag Example

  23. QC Check of Clone Assembly • Before submission to public databases as HTGS phase 3 complete, all assembled BACs undergo several QC checks: • all reasonable chemistry attempts have been made for any specific problem types • all bases are above phred30 • orientation of paired end reads checked across assembly • assembly is confirmed by restriction digest data • correct misc_feature tags have been used to identify any regions where appropriate Ensures on high quality contiguous sequence with low error rate

  24. Chromosome 4 Clone Pipeline Additional 15 BACs finished - not on chromosome 4 from FISH

  25. Unmapped BACs moved from chr4 • bTH82D4 (LE_HBa082D04) moved to chr7 (on FISH map) • bTH91D14 (LE_HBa091D14) moved to chr5 (on FISH map)

  26. Points for Discussion at Workshop • What problematic sequence have other groups encountered? • Strategies for finishing repeats used by other chromosome groups? • Unmapped BACs any from other chromosomes?

  27. Acknowledgements • Cornell University: • Lukas Mueller • Robert Buels • Jim Giovannoni • Steve Tanksley • Colorado State University: • Stephen Stack • Suzanne Royer • Song-Bin Chang • Arizona Genomics Institute: • Rod Wing • Seunghee Lee • MIPS/IBI Institute for Bioinformatics: • Klaus Mayer • Remy Bruggmann • Wageningen University : • Rene Klein Lankhorst • Hans de Jong • Dora Szinay • Wellcome Trust Sanger Institute: • Karen McLaren • Clare Riddle • Sean Humphray • Christine Nicholson • Carol Scott • Stuart McLaren • Matt Jones • Christine Lloyd • Sarah Sims • Karen Oliver • Jane Rogers • Imperial College London: • Gerard Bishop • Daniel Buchan • James Abbott • Sarah Butcher • University of Nottingham: • Graham Seymour • Scottish Crop Research Institute: • Glenn Bryan FUNDING

More Related