160 likes | 340 Vues
The progress of Glossina genomics at RIKEN GSC. Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori) December 15, 2006, IGGI, Sanger, UK. Background. Sequencing and analysis of human chromosomes 11, 18 and 21
 
                
                E N D
The progress of Glossina genomics at RIKEN GSC Todd Taylor taylor@gsc.riken.jp RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori) December 15, 2006, IGGI, Sanger, UK
Background • Sequencing and analysis of human chromosomes • 11, 18 and 21 • Contributed about 4-5% of human genome sequence • Sequencing and analysis of chimpanzee genomic regions including • Whole-genome BAC-end sequence analysis • Chimpanzee chromosome 22 • Found differences (most minor) in nearly all of the coding genes between human and chimp • Chimpanzee Y chromosome • Development of novel methods for gene and promoter prediction • Identifying genes missed by other high-throughput methods • Identification of unique regulatory mechanisms
Phase III sequence-related activities • BAC ends • Finished BAC clones • Full length cDNAs • Whole-genome shotgun
BAC end sequencing • The first BAC library has been constructed (Yale) and 100,000 BAC end sequences are being produced (RIKEN) • Not yet • We will be able to sequence the ends of up to 50,000 BACs (100,000 reads) • Or possibly more if fosmid ends instead? • Can start from April 2007 • Will take about one month
Finished BAC clone sequencing • Five BACs have been fully sequenced (RIKEN) and no serious 'issues' have arisen. • VMRC29 library (CHORI) • 97H16, 39G22, 36N9, 31O6, 3E11 • 759,387 bp • GC level: 38.89% • Repeat content: 6.10% • Using the Drosophila fruit fly genus repeat library
RepeatMasker file name: gmm_clones sequences: 5 total length: 759387 bp GC level: 38.89 % bases masked: 46333 bp ( 6.10 %) ===================================================== number of length percentage elements occupied of sequence ----------------------------------------------------- Retroelements 56 12376 bp 1.63 % SINEs: 0 0 bp 0.00 % Penelope 31 2872 bp 0.38 % LINEs: 49 7695 bp 1.01 % CRE/SLACS 0 0 bp 0.00 % L2/CR1/Rex 7 3181 bp 0.42 % R1/LOA/Jockey 5 1138 bp 0.15 % R2/R4/NeSL 1 51 bp 0.01 % LTR elements: 7 4681 bp 0.62 % BEL/Pao 2 230 bp 0.03 % Gypsy/DIRS1 5 4451 bp 0.59 % DNA transposons 10 4348 bp 0.57 % Tc1-IS630-Pogo 8 2143 bp 0.28 % Other (Mirage, 1 126 bp 0.02 % P-element, Transib) Total interspersed repeats: 16724 bp 2.20 % Small RNA: 3 1357 bp 0.18 % Simple repeats: 237 12658 bp 1.67 % Low complexity: 366 15594 bp 2.05 % The query species was assumed to be "Drosophila fruit fly genus". Homo sapiens ( 4.08 %) Anopheles genus ( 4.52 %)
Full-length cDNA sequencing • Full length cDNAs for G. m morsitans (RIKEN) will be constructed and Sanger will perform a few hundred full length sequences on these. RIKEN will do some 5´ end sequencing. • Full-length cDNA libraries were prepared by Junichi Watanabe (Univ. Tokyo) • Sequencing of 9,462 cDNA clones (5' one pass) was recently completed
Whole-genome shotgun sequencing • RIKEN has applied to Japanese sources for funding for a further 3 million shotgun sequences (~3X coverage). • We failed to get the funding • At present, we have no money for WGS or additional BAC finishing • Will try for more • Japanese-African collaborative projects looking somewhat hopeful
3,857 contigs 10,213 singletons Transeq Transeq 57,860 ORFs 30,942 ORFs Strategy and results obtained from preliminary analysis 28,721 sequences were assembled into contigs and identified singletons CAP3 Total Contigs made=3,857; Total Singletons= 10,213 Translated contigs and singletons into Six Reading Frames Selected continuous ORFs containing atleast 50 amino acids Homology searched in SwissProt and NR protein databases BLAT 33% sequence identity Annotated 2,569 ORFs out of 3,857 contigs Annotated 2,783 ORFs out of 10,213 singletons
A large percent of ORFs from TseTse fly contigs resemble those of ‘fruit fly’ Others (6%) Aedes (3%) Anopheles (2%) Glossina (5%) Drosophila (84%)
Others (9%) Aedes (5%) Anopheles (2%) Glossina (3%) Drosophila (81%) A large percent of ORFs from TseTse fly Singletons resemble those of ‘fruit fly’
Metabolic Pathways BLAST Comparative Genomics INTERPRO SCAN Phylogenetic Classification PLHOST Protein Interaction PROSITE SCAN Enzyme Classification COGs Manatee (GO) 16s ribosomal RNA analysis FingerPRINTscan JAFA ? Taxonomic Classification HT-GO-FAT Pathogenicity index PubSearch Origin of Replication BLIMPS (BLOCKS) Pfam Secondary Structure Prediction Fold Prediction Other Analysis METABROWSER : a resource to analyse the metagenome GENE PREDICTION FUNCTIONAL ANNOTATION ADVANCED ANALYSIS Metagenome Analysis PipeLine GLIMMER Predicted Genes Annotated Genes GENEMARK Genomic Contigs & Sequences GETORF CRITICA INPUT MetaGene USER BROWSE Query the Metagnome Data Browser
Genes Proteins Sequence Download Novel Proteins Metagenome Data Browser Other Related Information Novel Pathways Comparative Analysis Novel Genomes METABROWSER : a resource to analyse the metagenome Metagenome Data Browser : Data from our internal projects
Current & Future Plans • Sequencing • More if funding allows • Analysis • We can contribute to the informatics of the Glossina genome, including cDNA analysis and annotation • But we don’t want to duplicate anyone’s efforts • Also BES mapping and comparative analysis with Drosophila, mosquito, etc. • ???
Acknowledgements • Informatics (RIKEN) • Tulika Prakash Srivastava • Vineet K. Sharma • Todd D. Taylor • Sequencing & Data Access • Atsushi Toyoda (RIKEN) • Junichi Watanabe (Univ. Tokyo) • Hiroyuki Wakaguri (Univ. Tokyo) • Yamashita (Kitasato Univ.) • Serap Aksoy (Yale) • Geoff Attardo (Yale) • Other • Masahira Hattori (Univ. Tokyo/RIKEN) • Yoshiyuki Sakaki (RIKEN)