250 likes | 436 Vues
Sequencing the Gene Space of Tomato Chromosome 4. Christine Nicholson, Mapping Core Group Karen McLaren, Finishing Group Leader Wellcome Trust Sanger Institute. 26 th July 2006. A Summary of…. Chromosome 4 sequence FISH data Fingerprint generation Map analysis. Sequence content
E N D
Sequencing the Gene Space of Tomato Chromosome 4 • Christine Nicholson, Mapping Core Group • Karen McLaren, Finishing Group Leader • Wellcome Trust Sanger Institute • 26th July 2006
A Summary of… • Chromosome 4 sequence • FISH data • Fingerprint generation • Map analysis • Sequence content • Euchromatin & heterochromatin • Data deposition • Points for discussion
Sequence Generated 124,063 bp / clone
WTSI Pipeline – Tomato Clones * BAC verificationprocesses HTGS: Phase 1 Phase 2 Phase 3 193 BACs to sequence (projected) Aim : 80 % at prefinish (phase 1-2) by end of 2006
FISH Analysis FISH Tomato-EXPEN2000 36C23 114C15 308B7 6E18198L24 & 119A16 20F17 36C23 20F17 114C15 198L24 & 119A16 308B7 6E18 78E4 132O11 53M2 106F7 OBSERVATIONS: 1. Slight variation of marker order around centromere / heterochromatin 2. Euchromatic regions within heterochromatin 78E4 132O11 53M2 106F7
Fingerprinting – SL_MboI Library • Why Fingerprint ? • Increase map coverage & facilitate contiguity • Fingerprint data from > 1 library • Make clones visible in FPC • Additional resource for community • > 43,000 fingerprints generated sizes, bands, gels files • Assessment of Fingerprints: • Replicate existing LE_HBa fingerprints • BES matches to neighbouring clones in contigs • BLAST • Colony PCR verification
bTH = LE_HBabTM = SL_MboI SL_MboI clones with BES match by BLAST to sequence of LE_HBa- 31H5 Sizes/Bands and Gel files of new fingerprints as well as a assembled database available at; ftp://ftp.sanger.ac.uk/pub/tomato/map/
Map Coverage – Chromosome 4 • Contigs with chromosome 4 markers assessed • Map information – SGN • Chromosome 4: • 57 markers in FPC • 42 FPC contigs • 9 markers anchored to singletons in FPC
FPC Contig Merging • Relatively low marker coverage in FPC (57 markers) Screen MboI Library Genetic Mapping • ** Ideally more chr 4 confirmation of contigs ** Further FISH Colony PCR with BES
Finishing Solanum lycopersicum Chromosome 4 • 2,233,136 bp of sequence • HTGS Phase 3 – 461,727bp Features of the sequence observed so far Outline data deposition
Sequence Content > 2Mb of sequence at various stages Assess sequence early project to identify patterns Clone set selected from different regions
Euchromatic region Heterochromatic region-- Clone FISHed Tomato Chromosome 4 FISH Clone Locations euchromatin LE_HBa-78E4 Centromere LE_HBa-78E4 Chromomere
LE_HBa-78E4 Euchromatin • Phase 3 – Finished • Features include: • 6Kb tandem • 10Kb direct repeat 10 known repeats - TIGR
Euchromatic region Heterochromatic region-- Clone FISHed Tomato Chromosome 4 FISH Clone Locations heterochromatin/centromere border LE_HBa-308B7 LE_HBa-308B7 Centromere Assess possible heterochromatic features Chromomere
LE_HBa-308B7 Heterochromatin/Centromere Border • HTGS Phase 2 accession • No problematic sequence features 10 known repeats - TIGR Currently assessing latest repeat data to compare against euchromatic - SGN
Euchromatic region Heterochromatic region-- Clone FISHed Tomato Chromosome 4 FISH Clone Locations Heterochromatin close to heterochromatic/euchromatic border 4 BAC contig Centromere LE_HBa-27G19, LE_HBa-198L24, LE_HBa-119A16 & LE_HBa-31H5 Contig contains marker = C2_At5g37360 Chromomere
Chromosome 4 - 4 BAC contig near to Heterochromatin/Euchromatin Border • 450Kb of 3 Phase 3 + 1 Phase 2 accessions • Features include • direct repeats • inverted repeats Phase 3 region has been annotated – 3 gene objects found. What is the gene density in the gene space?
Finishing Conclusions • ~0.5Mb of HTGS Phase 3 Finished sequence • Assessment of euchromatin and heterochromatin • Annotation feedback for density in gene space
WTSI – Data Deposition ftp://ftp.sanger.ac.uk/pub/sequences/tomato Clones are finished & QC checked → stable Phase 3 accn. no submission of assembly or associated quality value files
Acknowledgements • Cornell University: • Lukas Mueller • Jim Giovannoni • Steve Tanksley • Colorado State University: • Stephen Stack • Song-Bin Chang • Arizona Genomics Institute: • Rod Wing • Seunghee Lee • Wellcome Trust Sanger Institute: • Jane Rogers • Sean Humphray • Carol Scott • Helen Beasley • Sarah Sims • MattJones • Ratna Shownkeen • Stuart McLaren • Christine Lloyd • Jennifer Harrow • Carol Carder • Paul Hunt • Mark Maddison • Richard Clark • Kate Fraser • Violetta Steeples • Thomas Bounford • Imperial College London: • Gerard Bishop • Daniel Buchan • James Abbott • Sarah Butcher • University of Nottingham: • Graham Seymour • Scottish Crop Research Institute: • Glenn Bryan FUNDING
Discussion Points How do we determine gene space has been sequenced Harmonise HTGS phases with NCBI Methods for reporting clone order and orientation? eg TPF and AGP
TPF File Tile Path Format file – tab delimited flat file GAP type-3 ? ? LE_HBa-24G5 ctg145 CT990489 LE_HBa-20F17 ctg145 GAP type-3 ? CT990488 LE_HBa-114C15 ctg5716 ? SL_MboI-143K21 ctg5716 GAP type-3 ? ? LE_HBa-147F16 ctg5014 CT990558 LE_HBa-308B7 ctg5014 GAP type-3 ? CT990624 LE_HBa-27G19 ctg15 CT476825 LE_HBa-198L24 ctg15 CT573298 LE_HBa-119A16 ctg15 CT485992 LE_HBa-31H5 ctg15
AGP File Accesioned Golden Path – tab delimited flat file Order and alignment of Phase 3 finished accessions chr4 1 50000 1 N 50000 clone no chr4 50001 100000 2 N 50000 clone no chr4 100001 150000 3 N 50000 contig no chr4 150001 200000 4 N 50000 clone no chr4 200001 360432 5 F CT476825.1 1 160432 + chr4 360433 370113 6 F CT573298.1 2001 11681 + chr4 370114 532277 7 F CT485992.1 2001 164164 + chr4 532278 582277 8 N 50000 contig no chr4 582278 632277 9 N 50000 clone no chr4 632278 682277 10 N 50000 contig no Gaps and unfinished clones are entered as 50,000bp sections to more accurately represent the chromosome in each build