890 likes | 1.29k Vues
《 医学信息检索与利用 》 Medical Information Retrieval. March 28, 2007. 黄利辉. 概论 - 信息检索系统 美国生物医学文献检索系统 Medline & PubMed. 信息检索的定义. Information Retrieval Is the science and practice of identification and efficient use of recorded media Biomedical literature Multimedia publishing
E N D
《医学信息检索与利用》Medical Information Retrieval March 28, 2007 黄利辉
概论-信息检索系统 • 美国生物医学文献检索系统 • Medline & PubMed
信息检索的定义 • Information Retrieval • Is the science and practice of identification and efficient use of recorded media • Biomedical literature • Multimedia publishing • Chemical structures • Cartographic materials • Genes and protein sequences • Video clippings • Etc.
MIR的发展 • 1879 Index Medicus • 1966 Medical Literature Analysis and Retrieval System (MEDLARS) • 1971 MEDLARS Online • 1980s full-text databases • 1990s world wide web • 1997 PubMed
信息检索的过程 • Indexing 索引 • Query formulation 建立检索策略 • Retrieval 检索 • Evaluation 检索结果评价 -- (Gerard Salton 1983)
信息检索的过程 Information Need Content Query Formulation Indexing Query Database (Content plusIndex) Evaluation and Refinement Retrieval Results
信息内容 • Content: • Media developed to communicate information or knowledge • Original content (primary literature) • Synoptic content • Bibliographic content • Full-text content • Evidence-based medicine
Original & synoptic content • Original content (primary literature):is developed through new observations and analysis of the world. • (Medical journal articles、conference proceedings、 white papers etc.) • Peer review • Authors develop synoptic content by extracting important observations and principles from sources of original content, as well as from personal experience. • Text book, practice guidelines, drug monographs, review articles etc.
Bibliographic & full-text • Bibliographic content is the information abstracted from the original source • Full-text content: • Citation information • The complete body text • Multimedia content
Evidence-based medicine • Diagnosis etiology prognosis treatment/prevention(Randomized controlled trials) • Original literature • Review articles • Systematic reviews (Cochrane Collaboration)
索引 • IR database= content + index • Index: • items and item attributes (inverted index) • item attribute (document No.) • Aspirin 1,5,6,9 • attack 3,6,7,8 • Heart 4,6,7,10 • Prevention 1,6,9
索引:如何建立检索策略 • Query formulation: • The process of stating information needs in terms of queries • Information need is the searcher’s expression, in her own language, of the information that she desires • “should middle-aged men be given a daily dose of aspirin to preventheart attack” • Aspirin AND prevention AND heartAND attack
索引的构建 • The goal of indexing is to produce the smallest, most efficient representation of the original content that will facilitate high-quality retrieval • Index items are units of information suitable for matching with a query • Index attributes describe facets of the item
Capture of content structure with indexes • Content structuring • Markup • Semantic regions • Exp. Woods • Author woods & forests
Indexing of bibliographic informaion • Medline • Information abstracted from the publication, such as the authors’ names, article title, article source, publication data, and authors’ abstract • Information added by a human indexer, such as subject headings and publication types
PMID- 16931225 • IS - 1523-6838 (Electronic) • TI - A case of "pure" preeclampsia with nephrotic syndrome before 15 weeks of • gestation in a patient whose renal biopsy showed glomerular capillary • endotheliosis. • AB - A 35-year-old Japanese woman for whom a previous health checkup showed • normal blood pressure and urinalysis results without serological • AD - Department of Internal Medicine and Division of Immunopathology, Clinical • Hospital, Chuoh, Japan. timasawa@yahoo.co.jp • AU - Joh K • LA - eng • PT - Case Reports • TA - Am J Kidney Dis • JT - American journal of kidney diseases : the official journal of the • National Kidney Foundation. • MH - Kidney Diseases/*etiology • MH - Kidney Glomerulus/*pathology • MH - Nephrotic Syndrome • MH - *Pre-Eclampsia • MH - Pregnancy • MH - Pregnancy Trimester, Second • EDAT- 2006/08/26 09:00 • SO - Am J Kidney Dis. 2006 Sep;48(3):495-501.
Medical Subject Headings (MeSH) • MeSH was developed by the NLM to represent important concepts in biomedicine. • 18,000 subject headings grouped into one of 15 trees • Diseases Category • Cardiovascular Diseases • Vascular Diseases • Hypertension • Hypertension, Malignant • Hypertension, Pregnancy-Induced • Hypertension, Renal • Hypertension, Renovascular
The reason that Medline have manually assigned controlled-vocabulary terms : • More thorough representation of the main concepts found in the paper • It facilitates retrieval by concept • Publication type: 1991 more detailed PT
信息需求 • Query formulation is the process by which information needs are translated into queries suitable for searching. • Person’s role in the healthcare process • Medical researchers • Clinicians • Generalist physician • specialist
建立检索策略 • semiautomated • Boolean Queries • Natural Language Queries
Field qualification: a designation fo which index or field should be searched • Text-word searching • Wildcard characters * #
检索 • Retrieval: • Matching queries against the index • Ranking or sorting the output by some criteria • Displaying the results to the user
检索 • Matching • Queries are compared against the index, and a result set is created • Ranking • The original result set is sorted or ranked by criteria (chronology, alphabetic ranking, relevance ranking ) • Display • The final result set is shown to the user
结果评价和反馈 • Evaluation • Refinement
Evaluation • Recall=Number of documents retrieved and relevant Number of relevant documents in database • Recall=Number of documents retrieved and relevant Number of documents retrieved
Recall and precision of clinician searchers at McMaster university (1990)
一.历史与背景 • INDEX MEDCUS • MEDLARS ONLINE • MEDLINE CD • MEDLINE WEB
美国医学文献检索体系 美国国立医学图书馆(NLM) http://www.nlm.nih.gov/ 前身是美国军医署图书馆,始建于十九世纪三十年代,1922年改名为军事医学图书馆。1952年再次改名为陆军医学图书馆,1956年美国总统批准“建立国家医学图书馆,促进医学进步,提高国家卫生和福利水平”的法案成为法律,正式成为美国的三个国家图书馆: 国会图书馆,国立医学图书馆,国立农业图书馆之一,归属美国国立卫生研究院(NIH)。
医学索引 INDEX MEDICUS(IM) 1865年曾经是一名军医的 John Shaw Billings 结合目录学创造性地运用自己的医学专业特长于1879年编制了一部卫生科学期刊文献指南,这就是当今世界上最著名的题录式医学文献检索期刊-医学索引 INDEX MEDICUS(IM)。从本世纪二十年代以来一直使用主题法(MeSH)。目前IM收录有世界上主要国家和地区用44种语言文字发表的生物医学及与医学有关的科技期刊3419种(1999年收编中文期刊57种),年文献量约30万条, 其中88%英文文献。
美国的医学泰斗William H. Welch 曾经指出:十九世纪美国对医学有四大贡献:麻醉术的发展,昆虫传播疾病的发现,现代公共卫生实验室的建立和军医署图书馆的发展及馆藏索引目录的编制,他认为这最后一项是这四大贡献中最重要的一项。
MEDLARS 医学文献分析与检索系统 Medical Literature Analysis and Retrieval System 美国国立医学图书馆于1963年正式建成世界上第一个医学文献计算机检索系统,是当今世界最具权威性的医学文献数据库检索系统。
MEDLINE • 1971年MEDLARS发展成为联机检索系统- Medline 目前收录期刊4300种。收录了1966年至今的1000多万条记录 • 1997年克林顿宣布提供Internet免费检索-PubMed
Web medline检索特点: • 免费提供题录和文摘 • 可与提供原文的网址链接 • 提供检索词自动转换匹配 • 操作简便、快捷
MEDLINE Web (FREE) PubMed (http://www.ncbi.nlm.nih.gov/PubMed) Internet Grateful Med (http://www.grateful.com) BioMedNet Evaluated Medline (http://www.bmn.com) Healthgate Medline (http://www.healthgate.com)
Bethesda NCBI简介:The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH • Establish public databases • Research in computational biology • Develop software tools for sequence analysis • Disseminate biomedical information
WWWAccess Entrez & BLAST
600,000 World 500,000 Internet Users 400,000 US Internet Users 300,000 200,000 100,000 Christmas and New Year’s Days 1998 1999 2000 2001 2002 2003 2004 2005 NCBI Web Traffic Users per day
OMIM PubMed PubMed Central 3D Domains Journals Structure Books CDD/CDART Entrez Taxonomy Protein Genome GEO/GDS UniSTS UniGene Nucleotide SNP PopSet The(ever expanding)Entrez System
PubMed abstracts Taxonomy Genomes Nucleotide sequences Entrez: Neighboring and Hard Links Word weight 3-D Structure 3 -D Structure VAST Phylogeny (MMDB) Protein sequences BLAST BLAST
Part 2. Data Flow and Processing Part 1. The Databases Part 3. Querying and Linking the Data Part 4. User Support A part of the NCBI Bookshelf
PubMed • 数据库主建单位:美国生物技术信息中心(NCBI)研制的数据库。 • 数据类型:期刊论文、综述、以及与其他数据资源链接。
数据收录 • MEDLINE • 4300余种生物医学期刊,内容涉及医学、护理、牙科、兽医、健康保健系统、前临床医学等学科。这些期刊来源于美国和世界上70多个国家和地区。 • 文献量达1千1百万条记录,并回溯到1966年。
In process citation • 提供MEDLINE尚未经规范处理的数据。 • 获MeSH词后,再加入MEDLINE。 • 记录中[record in process]的标记。
Publisher Supplied Citations 出版商直接向PubMed提供电子记录 包括MEDLINE未收录的部分记录
PubMed 与MEDLINE的区别 • 收录范围广:MEDLINE收录的部分生命科学相关文章的非医学专业期刊(物理、天文、化学等) • 收录记录新:在MEDLINE数据标引前的最新题录 • 文献类型全:提供电子原文链接(部分免费)