Download
ncbi molecular biology resources entrez n.
Skip this Video
Loading SlideShow in 5 Seconds..
NCBI Molecular Biology Resources —— Entrez PowerPoint Presentation
Download Presentation
NCBI Molecular Biology Resources —— Entrez

NCBI Molecular Biology Resources —— Entrez

213 Views Download Presentation
Download Presentation

NCBI Molecular Biology Resources —— Entrez

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. NCBI Molecular Biology Resources—— Entrez 王禄山 Mar. 2005

  2. NCBI Resources • About NCBI • NCBI Sequence Databases • Primary Database – GenBank • Derivative Databases - RefSeq • Entrez Databases and Text Searching • BLAST

  3. Bethesda, MD The National Institutes of Health

  4. The National Center for Biotechnology Information • Accepts submissions of primary data • Develops tools to analyze these data • Creates derivative databases based on the primary data • Provides free search, link, and retrieval of these data, primarily through the Entrez system

  5. The National Center for Biotechnology Information (NCBI) • Created as a part of the National Library of Medicine in 1988 • Establish public databases • Research in computational biology • Develop software tools for sequence analysis • Disseminate biomedical information • Tools: Entrez(1992) ,BLAST(1990), • GenBank (1992) • Free MEDLINE (PubMed, 1997) • Other databases: dbEST, dbGSS, dbSTS, MMDB, OMIM, UniGene, GeneMap, Taxonomy, CGAP, SAGE, LocusLink, RefSeq

  6. NCBI WWW Users per Day

  7. Christmas & New Year Number of Users and Hits Per Day 1997 1998 1999 2000 2001 2002 2003

  8. Homepage - accessing the data all[filter]

  9. all[filter] 1/11/2005

  10. Entrez Nucleotide Primary Data • GenBank / DDBJ / EMBL 46,974,918 (98.86 %) Derivative Data • RefSeq 533,236 (1.12 %) • PDB (structures) 5,484 • Third Party Annotation (TPA) 4,516 “Total” 47,518,338 GenBank

  11. Release 145 Dec 2004 40.6 x 106 Records 44.5 x 109 Nucleotides 153 Gigabytes 705 files GenBank: NCBI’s Primary Sequence Database • full release every two months • incremental and cumulative updates daily • available only through internet • release notes: gbrel.txt ftp://ftp.ncbi.nih.gov/genbank/ ftp://genbank.sdsc.edu/pub ftp://bio-mirror.net/biomirror/genbank

  12. Molecular Databases • Primary Databases • Original submissions by experimentalists • Database staff organize but don’t add additional information • Example:GenBank • Derivative Databases • Human curated • compilation and correction of data • Example:SWISS-PROT, NCBI RefSeq mRNA • Computationally Derived • Example:UniGene • Combinations • Example:NCBI Genome Assembly

  13. C GA ATT GA ATT C C C ATT C ACT GA TA Curators Primary vs. Derivative Databases Sequencing Centers UniGene UniSTS EST GenBank Updated by NCBI STS Updated ONLY by submitters RefSeq: annotation pipeline GSS HTG INV VRT PHG VRL PRI ROD PLN MAM BCT RefSeq RefSeq: Entrez Gene and Genomes pipelines Labs

  14. The GenBank Record

  15. Header Feature Table Sequence GenBank Records The Flatfile Format

  16. LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS . = Title A Typical GenBank Record Entrez

  17. GenBank Record: Feature Table Entrez

  18. GenBank Record: Feature Table Entrez GenPept identifier Blast

  19. skip GenBank Record: sequence Blast

  20. http://www.ncbi.nlm.nih.gov/ NCBI Homepage

  21. Entrez NCBI Homepage Mendelian Inheritance in Man BLAST

  22. Online Help

  23. Using Entrez An integrated database search and retrieval system

  24. PubMed abstracts Taxonomy Genomes Nucleotide sequences Entrez: Neighboring and Hard Links Word weight 3-D Structure 3 -D Structure VAST Phylogeny (MMDB) Protein sequences BLAST BLAST

  25. GEO(gene expression omnibus, 基因表达汇编):收集、存贮微阵列基因表达数据的数据库。

  26. Unigene

  27. Database Searching with Entrez Using limits and field restriction to find mouse GAPD Linking and neighboring with mouse GAPD

  28. Mouse Entrez Nucleotides

  29. Document Summaries: Mouse[All Fields] 7 million records

  30. Data Rich,Knowledge Poor 不要把自己淹没于「数据信息的海洋」中, 要去找「知识的岛屿」。

  31. 什么是数据、信息、知识? 一定注意现在生物信息学存贮数据库叫DATABASE

  32. Mouse Entrez Nucleotides: Limits: Preview/Index

  33. Accession All Fields Author Name EC/RN Number Feature key Filter Gene Name Issue Journal Name Keyword Modification Date Organism Page Number Primary Accession Properties Protein Name Publication Date SeqID String Sequence Length Substance Name Text Word Title Word Uid Volume Field Restriction Entrez Nucleotides: Limits Mouse Exclude unwanted categories of sequences Gene Location Genomic DNA/RNA Mitochondrion Chloroplast Molecule Genomic DNA/RNA mRNA rRNA Only From RefSeq GenBank EMBL DDBJ

  34. Entrez Nucleotides: Limits: Organism Mouse

  35. 7,247,131[All Fields] -6,850,905[Organism] 397,226 Document Summaries: Mouse[Organism]

  36. Exclude Bulk Sequences, mRNA

  37. 502497

  38. Preview/Index

  39. Adding Terms: Preview/Index Search History

  40. glyceraldehyde 3 phosphate dehydrogenase

  41. mouse AND glyceraldehyde 3 phosphate dehydrogenase[Title]

  42. 161 Mouse GAPD Records

  43. 3 19

  44. History