1 / 48

H-Invitational dataset and H-Invitational database (H-InvDB)

HGM2005 April 22 nd , 2005. H-Invitational dataset and H-Invitational database (H-InvDB). Takashi Gojobori Biological Information Research Center (BIRC) National Institute of Advanced Industrial Science and Technology (AIST).

river
Télécharger la présentation

H-Invitational dataset and H-Invitational database (H-InvDB)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HGM2005 April 22nd, 2005. H-Invitational dataset and H-Invitational database(H-InvDB) Takashi Gojobori Biological Information Research Center (BIRC) National Institute of Advanced Industrial Science and Technology (AIST)

  2. Complete collection of high-quality human full-length cDNA clones and sequences. Integrative annotation of these clones, especially, the human curation under the unified criteria. Construction of a database (H-InvDB) and tools to further facilitate transcriptome researches. Goals of the H-Invitational Project

  3. H-Invitational

  4. The H-Invitational Dataset • Six FLcDNA Clone producers and DDBJ conducted a data freeze on July 15, 2002. • A total of 41,118 cDNAs were collected, and a number of annotation activities were carried out. • NCBI has supplied their latest genome assembly (build 34). • EBI provided a non-redundant SwissProt/TrEMBL protein set. Organization entries KIAA/KDRI 2,000 FLJ/Total 20,999 FLJ/KDRI 348 FLJ/IMSUT 4,842 FLJ/Helix 15,809 DKFZ/MIPS 5,555 MGC/NIH 11,809 CHGC 758 Total: 41,118

  5. Two Important Steps in H-Inv Annotation 1. Pre-computing Mapping on to the genome Sequence similarity search ORF prediction Functional motif prediction Structural prediction etc. 2. Human curation (annotation jamboree)

  6. “Human Full-Length cDNA Annotation Invitational” Jamboree(H-Invitational)August 25 - September 3, 2002 Co-organized by JBIRC and DDBJ/NIG Attended by more than 118 people from 40 organizations such as JBIRC, DDBJ, NCBI, EBI, Sanger Centre,NCI-MGC, DOE, NIH, DKFZ, CNHGC(Shanghai), RIKEN, Tokyo U, MIPS, CNRS, MCW, TIGR, CBRC, Murdoch U, U Iowa, Karolinska Int., WashU, U Cincinnati, Tokyo MD U, KRIBB, South African Bioinfor Inst, U College London, Reverse Proteomics Res. Inst., Kazusa DNA Inst, Weizmann Inst, Royal Inst. Tech. Sweden, Penn State U, Osaka U, Keio U, Kyushu U, TIT, Ludwig Inst. Brazil, Kyoto U, German Can.Inst., and NIG Supported by JBIC, METI, MEXT, NIH, and DOE

  7. Integrative Annotation of 21, 037 Human Genes Validated by Full-Length cDNA Clones Tadashi Imanishi, Takeshi Itoh, Yutaka Suzuki, Claire O’Donovan, Satoshi Fukuchi, Kanako O. Koyanagi, Roberto A. Barrero, Takuro Tamura, Yumi Yamaguchi-Kabata, Motohiko Tanino, Kei Yura, Satoru Miyazaki, Kazuho Ikeo, Keiichi Homma, Arek Kasprzyk, Tetsuo Nishikawa, Mika Hirakawa, Jean Thierry-Mieg, Danielle Thierry-Mieg, Jennifer Ashurst, Libin Jia, Mitsuteru Nakao, Michael A. Thomas, Nicola Mulder, Youla Karavidopoulou, Lihua Jin, Sangsoo Kim, Tomohiro Yasuda, Boris Lenhard, Eric Eveno, Yoshiyuki Suzuki, Chisato Yamasaki, Jun-ichi Takeda, Craig Gough, Phillip Hilton, Yasuyuki Fujii, Hiroaki Sakai, Susumu Tanaka, Clara Amid, Matthew Bellgard, Maria de Fatima Bonaldo, Hidemasa Bono, Susan K. Bromberg, Anthony Brookes, Elspeth Bruford, Piero Carninci, Claude Chelala, Christine Couillault, Sandro J. de Souza, Marie-Anne Debily, Marie-Dominique Devignes, Inna Dubchak, Toshinori Endo, Anne Estreicher, Eduardo Eyras, Kaoru Fukami-Kobayashi, Gopal Gopinathrao, Esther Graudens, Yoonsoo Hahn, Michael Han, Ze-Guang Han, Kousuke Hanada, Hideki Hanaoka, Erimi Harada, Katsuyuki Hashimoto, Ursula Hinz,Momoki Hirai, Teruyoshi Hishiki, Ian Hopkinson, Sandrine Imbeaud, Hidetoshi Inoko, Alexander Kanapin, Yayoi Kaneko, Takeya Kasukawa, Janet Kelso, Paul Kersey, Reiko Kikuno, Kouichi Kimura, Bernhard Korn, Vladimir Kuryshev, Izabela Makalowska, Takashi Makino, Shuhei Mano, Regine Mariage-Samson, Jun Mashima, Hideo Matsuda, Hans-Werner Mewes, Shinsei Minoshima, Keiichi Nagai, Hideki Nagasaki, Naoki Nagata, Rajni Nigam, Osamu Ogasawara, Osamu Ohara, Masafumi Ohtsubo, Norihiro Okada, Toshihisa Okido, Satoshi Oota, Motonori Ota, Toshio Ota, Tetsuji Otsuki, Dominique Piatier-Tonneau, Annemarie Poustka, Shuang-Xi Ren, Naruya Saitou, Katsunaga Sakai, Shigetaka Sakamoto, Ryuichi Sakate, Ingo Schupp, Florence Servant, Stephen Sherry, Rie Shiba, Nobuyoshi Shimizu, Mary Shimoyama, Andrew J. Simpson, Bento Soares, Charles Steward, Makiko Suwa, Mami Suzuki, Aiko Takahashi, Gen Tamiya, Hiroshi Tanaka, Todd Taylor, Joseph D. Terwilliger, Per Unneberg, Vamsi Veeramachaneni, Shinya Watanabe, Laurens Wilming, Norikazu Yasuda, Hyang-Sook Yoo, Marvin Stodolsky, Wojciech Makalowski, Mitiko Go, Kenta Nakai, Toshihisa Takagi, Minoru Kanehisa, Yoshiyuki Sakaki, John Quackenbush, Yasushi Okazaki, Yoshihide Hayashizaki, Winston Hide, Ranajit Chakraborty, Ken Nishikawa, Hideaki Sugawara, Yoshio Tateno, Zhu Chen, Michio Oishi, Peter Tonellato, Rolf Apweiler, Kousaku Okubo, Lukas Wagner, Stefan Wiemann, Robert L. Strausberg, Takao Isogai, Charles Auffray, Nobuo Nomura, Takashi Gojobori, and Sumio Sugano PLOS Biology 2: 856-875 (2004) (158 authors)

  8. H-Invitational Database - A New Human Gene Annotation Database H-Invitational Database (H-InvDB) is a human gene database with integrative annotation of 41,118 full-length cDNA clones. H-InvDB describes their gene structures, functions, domains, expression, diversity, and evolution. This is a product of the H-Invitational consortium comprised of 44 research institutes worldwide, organized by Japan Biological Information Research Center (JBIRC) and DNA Data Bank of Japan (DDBJ).

  9. H-InvDB overview http://www.h-invitational.jp

  10. HGM2005 April 20th, 2005. H-InvDB website www.h-invitational.jp H-InvDB Top page Officilal page • Search tools • Text/keyword search • Advanced search • BLAST search

  11. HGM2005 April 20th, 2005. Search tools (1) Text/keyword search & Advanced search

  12. Search tools (2) BLAST search BLAST search against all cDNAs and proteins in H-InvDB!!

  13. HGM2005 April 20th, 2005. H-InvDB: main viewers (1) Locus View • 21,037 predicted genetic loci annotations • Gene structures • Alternative splicing isoforms • Gene expression profiles • cDNA/ORF multiple alignments • Disease related-information • Hyperlinks to other databases

  14. HGM2005 April 20th, 2005. H-InvDB: main viewers (2) cDNA view • 41,118 cDNAs annotations • Protein functions • Location on the chromosome • Open reading frame • InterPro motif • Evolutionary feature • Secondary/tertiary structure • Subcellular Localization • SNPs / microsatellites • hyperlinks to other databases

  15. H-Inv DB: Auxiliary viewers (3) DiseaseInfo Viewer • Database of known and orphan genetic diseases • H-Inv loci with LocusLink, OMIM and GenAtlas • Known disease-related gene • Co-localized orphan pathology (candidate gene is unknown)

  16. Gene expression Library Pattern Similarity Search 10 and 40 categories of organs and tissues Analyzed by several platforms (iAFLP, SAGE and DNA array etc.) H-Inv DB: Auxiliary viewers (2) Human ANatomic Gene Expression Library (H-ANGEL)

  17. H-Inv DB: Auxiliary viewers (4, 5) Clustering Viewer & TOPO viewer Clustering Viewer • A viewer for making comparison between different methods • Mapping-based & 6 cDNA-based clustering by 6 institutes (Ensembl, FLJ, SANBI, TIGR, UniGene, and JBIRC) TOPO Viewer • A Tool for viewing subcellular localization • Subcellular targeting signals were predicted by PSORT II and TargetP • Transmembrane helices were predicted by SOSUI and TMHMM

  18. H-InvDB: Auxiliary viewers (1) G-integra • Genome map browser • Structure of gene cluster • Mapping Info. with all EST, ens_gene and ref_gene by UCSC • Orthologous genes of other species • Repeat, SNPs, microsatellite, ACC#, ID search • Hyperlinks to other databases • Mouse data is included in this view too. Scale Start position Search window Other spieces Notion ESTs H-Inv cDNAs (Green) RefSeq & Ensembl (Red) Genome (purple) Position on chromosome

  19. HGM2005 April 22nd, 2005. HUGO nomenclatures in H-InvDB

  20. H-InvDB annotation overview 41,118 human full-length cDNAs cDNA genome 21,037 H-Inv loci human full-length cDNAs human genome Locus 1 Locus 2

  21. Relation of HUGO approved gene symbols to H-InvDB loci • Direct relations By DNA databank accession number • Secondary relations By EntrezGene ID (LocusLink) => DNA databank accession number • Relations through H-InvDB annotation • By SwissProt ID : If a cDNA was annotated as a protein identical to known human protein of SwissProt • By RefSeq ID : by location on the genome

  22. Initial assignment of HUGO approved gene symbols to H-InvDB loci[2002/06/02] 381 1.8% 2.7% 20,656 5,969 No. of loci % in H-InvDB % in HUGO H-InvDB_1.0 [2002/06/02] HUGO [2002/06/02] 381 (1.8%) of H-Inv loci with HUGO approved gene symbols through DNA databank accession number [2002/06/02] for initial analysis.

  23. Assignment of HUGO approved gene symbols to H-InvDB loci[2005/03/01] 18,099 10,948 No. of loci % in H-InvDB % in HUGO 2,938 14.0% 21.2% H-InvDB_1.8 [2005/03/01] HUGO [2005/03/01] 2,938 (14.0%) of H-Inv loci with HUGO approved gene symbols through DNA databank accession number [2005/03/01]

  24. Gene symbols relations in H-InvDB

  25. Assignment of HUGO approved gene symbols to H-InvDB loci[2005/03/01] 11,885 11,554 No. of loci % in H-InvDB % in HUGO 9,152 44.12% 44.82% H-InvDB_1.8 [2005/03/01] HUGO [2005/03/01] 9,152 (44.1%) of H-Inv loci with HUGO approved gene symbols through primary, secondary relations and H-InvDB annotations [2005/03/01]

  26. Gene symbols for annotated orthologs in H-InvDB

  27. Examples of HUGO symbolsto mouse orthologs

  28. Possible proposals • H-InvDB annotation may be useful to assign HUGO approved gene symbols to new sequences • H-InvDB annotation may be useful to assign gene symbols to mouse genes

  29. A New Database for Comparative Genomics : G-compass URL = http://www.jbirc.aist.go.jp/g-compass/

  30. Sequence Alignment of Human and Mouse Genomes Data Source: Human Genome (NCBI, build 34) 3,070,128,059 bps Mouse Genome (UCSC, Feb 2003) 2,577,261,074 bps Length after RepeatMasking Human 1,688,881,391 bps Mouse 1,548,785,600 bps Alignment: number of aligned fragments 830,699 total length 522,465, 029 bps (gaps included) -> 17% of human genome could be aligned

  31. Features of G-compass • Map viewer of Genome Sequence Alignment and Gene Structures in Human and Mouse • Displays Orthologous Gene Pairs • Displays Percent Sequence Identity and Gaps • Displays Density fo CpG and G+C Contents • Downloadable Genome Alignments • External Links to H-InvDB

  32. Coverage of alignments in human chromosomes

  33. Table1.Summary of cDNA resources and Mapping *CHGC,DKFZ/MIPS,FLJ/HRI,FLJ/IMSUT,FLJ/KDRI,KDRI,MGC/NIH

  34. Figure 1. Procedure for Mapping and Clustering for human and model organisms 56,419 full-length Human cDNAs 4,430 Chimpanzee cDNA 2,230 Crab-eating macaque cDNAs 1,024 Rhesus monkey cDNAs 101,402 House mouse cDNAs 8058 Norway rat cDNAs Mapping to the Human genome (NCBI build34) Identity>=90% and length coverage>=70% Mapping to the Human genome (NCBI build34) Identity>=85% and length coverage>=70% Mapping to the Human genome (NCBI build34) Identity>=85% and length coverage>=70% Mapping to the House mouse genome (UCSC mm3) Identity>=95% and length coverage>=90% Mapping to the House mouse genome (UCSC mm3) Identity>=80% and length coverage>=70% Mapping to the Human genome (NCBI build34) Identity>=95% and length coverage>=90% Mapped to the Human genome Mapped to the House mouse genome Mapping-based clustering

  35. Figure 2.Automatical selection of ortholog candidate pairs Genome Alignment

  36. Figure 3.The distribution of ortholog candidate automatically selected.

  37. The results of Automatically ortholog selection 5 kinds of phylogenetic trees Figure 4. The window of Evolution-annotation-viewer Multiple Alignment viewer d ,d N S Curation Input form

  38. Figure 5.The rate of curation result among five species

  39. Key word search Key word search results Multiple alignment Human curation results Motif table Figure 6.The window of Evola-viewer

  40. Thank you !!

More Related