270 likes | 278 Vues
Second Tomato Finishing Workshop Chromosome 4. Tomato Project Group Wellcome Trust Sanger Institute 25th April 2008. Chromosome 4 Introduction. Data Flow at WTSI Sequencing Method Used Finishing Strategies Use of Overlapping Data Chr4 Sequence Update Discussion points for Workshop
E N D
Second Tomato Finishing WorkshopChromosome 4 • Tomato Project Group • Wellcome Trust Sanger Institute • 25th April 2008
Chromosome 4 Introduction • Data Flow at WTSI • Sequencing Method Used • Finishing Strategies • Use of Overlapping Data • Chr4 Sequence Update • Discussion points for Workshop • Unmapped BACs • Examples of Problem Clones • Dealing with Large Repeats
UK - Chromosome 4 • Gene space estimate for Chromosome 4 is 19Mb • Mapping, sequencing and finishing at Wellcome Trust Sanger Institute (WTSI) • BAC by BAC sequencing approach • Approximately 200 BACs • Funding at WTSI ends October 31st 2008
Overview of WTSI Clone Pipeline • Clone Selection and Verification • Clones entered into pipeline Mapping BACs assigned to chr4 sequencing project on SGN BAC registry • Clone DNA Prep • Digest Confirmation • Library Construction (plasmid) Subcloning • Plasmid Prep • Sequencing & Processing Sequence Contigs >2Kb available on Sanger FTP site and Public Databases “Sequencing in Progress” Shotgun Sequencing HTGS Phase 1 • Sequence Improvement • Contig Orientation and Gap Closure • Confirmation of Assemby (QC) Finishing HTGS Phase 2 • Sequences Uploaded • to SGN • BAC Registry Updated Finished Sequence Final EMBL submission “Complete Sequence” HTGS Phase 3
Clones Selection and Verification • BACs selected primarily from the • HindIII (LE-HBa-) and MboI (SL_MboI) libraries • Using Seed BACs from SGN, • end sequence alignment and FPC analysis • New BACs selected from in house overgoes for markers • Selected 5 clones from the fosmid library • based on end sequence alignments and fingerprints
Plasmid Prep and Shotgun Sequencing • Optimised for 384 well prep and sequencing • Capillary Sequencing • AB3730’s with AB Big Dye Terminator • pUC118 Double Stranded Sequencing Vector • 4-6Kb inserts, double end sequenced BACs Aim for 6x-8x Coverage Average Insert ~100-150Kb (LE_HBa- and SL_MBol- Libraries) 2x or 3x 384 plates per BAC ~750 paired end reads ~1500 reads in total Average 10-15 contigs Fosmids Average Insert ~35Kb 1x 384 plates
Clone Finishing Gap4 (Staden) used to view and manipulate sequence data • Sequence Improvement Manual Finishing QC Checking
BACs are viewed in relation to the mapped minimal tile path Use in house tpf visualisation tool e.g. ctg503 Manual Finishing of BACsBACs viewed in relation to map
Use of Overlapping Sequences • From Minimal Tile Path the region finished in each clone depends on the order the clones enter finishing • Finish unique sequence with a 2000bp overlap between clones BAC1 BAC4 = gap closure BAC2 BAC3 = total BAC insert = finished region Final order and orientation of finished BACs are given in the AGP file e.g. BAC1-BAC2-BAC4-BAC3
Summary of Clone Gap Closure Strategies • Make use of paired ends to order and orientate contigs • Identify whether gaps are spanned or unspanned – orchid example • Identify any repeats associated with gaps – dotter example • Estimate gap sizes using restriction digest data • This will determine appropriate strategy for gap closure e.g. • primer/oligo walking into regions of low quality or gaps spanned by paired end reads • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps) • Use of alternative chemistries where appropriate • structural problems, mono- & di-nuclotide runs
OrchidRead pair Visualisation Tool Contiguous sequence with good read pair coverage
Visualising Repeats associated with gaps Inverted Repeat Direct Repeat
Restriction Digests • Minimum of three restriction enzymes used to confirm the assembly • Selection depends on organism and the nature of the sequence • S. lycopersicum BACs are digested with • BamHI • EcoRI • HindIII • Comparison of real and virtual digest of entire BAC sequence
Clone Gap Closure Strategies • Make use of paired ends to order and orientate contigs • Identify whether gaps are spanned or unspanned – orchid • Identify any repeats associated with gaps – dotter • Estimate gap sizes using restriction digest • This will determine appropriate strategy for gap closure e.g. • primer/oligo walking into regions of low quality or gaps spanned by paired end reads • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps) • Use of alternative chemistries where appropriate • structural problems, mono- & di-nuclotide runs
Sequencing Chemistries and Additives used in Finishing • 4:1 mix ratio of AB Big Dye Terminator : AB dGTP Terminator • used for general finishing reactions, not problem specific • AB dGTP Terminator • used for di-nucleotide runs and inverted repeats • Additive A (SequenceRx Enhancer Solution A - Invitrogen) • Dimethyl sulfoxide (DMSO) • Additive A+DMS0+dGTP • used for mono-nucloetide runs, inverted repeats • Sequence Finishing Kit (SFK) (TempliPhi - Amersham) • used to increase DNA yield • useful for structural problems caused by inverted repeats
Alternative Gap Closure Strategies • Specialist Subcloning • Small Insert Libraries (SIL) Double Stranded pUC or Single Stranded M13 • Large Insert Libraries (LIL) • Transposon Libraries (TIL) • Restriction Fragment SIL (RFSIL) • Alternative Strategies for dealing with large repeats • - points for further discussion on Tuesday • - what repeats have other chromosomes found?
Clone Gap Closure Strategies • Make use of paired ends to order and orientate contigs • Identify whether gaps are spanned or unspanned – orchid • Identify any repeats associated with gaps – dotter • Estimate gap sizes using restriction digest • This will determine appropriate strategy for gap closure e.g. • primer/oligo walking into regions of low quality or gaps spanned by paired end reads • PCR and direct walking on BAC DNA into regions of low quality and unspanned gaps (also attempted on unresolved spanned gaps) • Use of alternative chemistries where appropriate • structural problems, mono- & di-nuclotide runs
Use of Misc_Feature Tags in EMBL/GenBank/DDBJ • Used regularly on finished sequence to identify regions of: • uni-directional chemistry when dGTP only • single subclone regions • including SIL and TIL only regions • pcr only • Single reads from direct walks on BAC DNA • data only from overlapping BACs • E.coli Transposon insertion sites • tag sp6 and t7 ends of overlaps (tomato) • gap sizes of force joins in tandem repeats
Misc_Feature Tag Example Clone End Tags Accession Length of sequence Whole Clone Finished Both ends of clone cited
QC Check of Clone Assembly • Before submission to public databases as HTGS phase 3 complete, all assembled BACs undergo several QC checks: • all reasonable chemistry attempts have been made for any specific problem types • all bases are above phred30 • orientation of paired end reads checked across assembly • assembly is confirmed by restriction digest data • correct misc_feature tags have been used to identify any regions where appropriate Ensures on high quality contiguous sequence with low error rate
Chromosome 4 Clone Pipeline Additional 15 BACs finished - not on chromosome 4 from FISH
Unmapped BACs moved from chr4 • bTH82D4 (LE_HBa082D04) moved to chr7 (on FISH map) • bTH91D14 (LE_HBa091D14) moved to chr5 (on FISH map)
Points for Discussion at Workshop • What problematic sequence have other groups encountered? • Strategies for finishing repeats used by other chromosome groups? • Unmapped BACs any from other chromosomes?
Acknowledgements • Cornell University: • Lukas Mueller • Robert Buels • Jim Giovannoni • Steve Tanksley • Colorado State University: • Stephen Stack • Suzanne Royer • Song-Bin Chang • Arizona Genomics Institute: • Rod Wing • Seunghee Lee • MIPS/IBI Institute for Bioinformatics: • Klaus Mayer • Remy Bruggmann • Wageningen University : • Rene Klein Lankhorst • Hans de Jong • Dora Szinay • Wellcome Trust Sanger Institute: • Karen McLaren • Clare Riddle • Sean Humphray • Christine Nicholson • Carol Scott • Stuart McLaren • Matt Jones • Christine Lloyd • Sarah Sims • Karen Oliver • Jane Rogers • Imperial College London: • Gerard Bishop • Daniel Buchan • James Abbott • Sarah Butcher • University of Nottingham: • Graham Seymour • Scottish Crop Research Institute: • Glenn Bryan FUNDING