1 / 34

Evidence-Based Information Retrieval in Bioinformatics

Evidence-Based Information Retrieval in Bioinformatics. Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee. Goal of the Project.

kalani
Télécharger la présentation

Evidence-Based Information Retrieval in Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee

  2. Goal of the Project • The overall, long term goal of this research project is to contribute to evidence-based information retrieval in post-genomic medicine • proof of the effectiveness of the way particular information resources are used and combined in order to retrieve that information

  3. Aims • Specific Aim 1: Determine existing pitfalls in accessing literature on gene function • Specific Aim 2: Based on user warrant, determine the current state of evidence-based functional genomic retrieval • Specific Aim 3: Based on literary warrant, determine the current state of evidence-based functional genomic retrieval

  4. “Determine existing pitfalls in accessing literature on gene function” • That is the topic of my talk later today. • “Asymmetries in Retrieval of Gene Function Information”

  5. The Study • Investigated an example of different paths to the literature that might look to a user to be equivalent but which are not equivalent due to various features of the resources involved. • Knowledge that they are not equivalent requires knowledge of metadata about the resources.

  6. Three Paths Affymetrix Affymetrix Affymetrix Genbank Accession number Genbank Accession number Genbank Accession number Nucleotide Gene Pubmed links Pubmed links Pubmed Pubmed Pubmed Pubmed ID Pubmed ID Pubmed ID

  7. http://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/Features_and_probes.affxhttp://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/Features_and_probes.affx

  8. Three Paths Affymetrix Affymetrix Affymetrix Genbank Accession number Genbank Accession number Genbank Accession number Nucleotide Gene Pubmed links Pubmed links Pubmed Pubmed Pubmed Pubmed ID Pubmed ID Pubmed ID

  9. Methods • We first collected representative DNA Accession numbers associated with genes expressed in a microarray experiment designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia.

  10. Methods • Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. • Directly in the case of Entrez Pubmed • Indirectly, via Pubmed links in the case of Entrez Nucleotide and Entrez Gene • Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers.

  11. Summary of Pubmed ID’s by Accession Number Pubmed Nucleotide Gene

  12. Methods • Compared number of Pubmed ID’s produced for each Accession number by each path. • Applied non-parametric test: Kendall’s W • Pubmed versus Nucleotide versus Gene • p < .05

  13. Affymetrix Affymetrix Affymetrix Genbank Accession number Genbank Accession number Genbank Accession number Nucleotide Gene Pubmed links Pubmed links Pubmed Pubmed Pubmed Pubmed ID Pubmed ID Pubmed ID The Three Paths Are Not Equivalent ≠ ≠

  14. The SI field identifies secondary source databanks and accession numbers of outside resources discussed in MEDLINE articles. The field is composed of the source followed by a slash followed by an accession number and can be searched with one or both components, e.g., genbank [si], AF001892 [si], genbank/AF001892 [si]. The SI field and the Entrez sequence database links are not linked. The PubMed links to these databases are created from the reference field of the GenBank or GenPept flat file. These references include citations that discuss the specific sequence presented in these flat files. http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.box.pubmedhelp.Box_1_Search_Field_D#pubmedhelp.Secondary_Source_ID_

  15. “Based on user warrant, determine the current state of evidence-based functional genomic retrieval” • Interviews with biologists who use microarrays to study gene expression levels • Questions concern what methods for IR are used, why they consider the methods effective, what are criteria of success and failure, and how they see the role of biomedical librarians in the process

  16. Interviews in Progress • Five interviews currently scheduled at the University of Missouri-Columbia • Interviews being scheduled at University of Wisconsin-Milwaukee • In March we interviewed two subjects at NIG in Japan

  17. “Based on literary warrant, determine the current state of evidence-based functional genomic retrieval” • We wanted to investigate how and to what extent biological science researchers reported their information retrieval methods, including details of why they used the methods they did.

  18. Methods • We searched OVID Medline on October 1, 2004 for the period 1966 to September Week 4 2004 with the query “Oligonucleotide Array Sequence Analysis/”, producing 10746 results. • We then limited the results to English (10374), excluded “review articles” (9049), and limited to the years 2003 – 2004 (4798). We next ranked journals in the results by number of articles, and selected a population of all of the articles from the 13 top journals (n=1373). We randomly sampled 150 articles from that population.

  19. Methods • If the authors of the paper did report gene function, we wanted to know which information sources and retrieval methods they used, as well as the reasons they had for using them. • Functional Attribution Reported • Sources of Information Reported • Retrieval Strategy Reported • Grounds for Choice of Sources Reported • Grounds for Retrieval Strategy Reported

  20. Methods • How were details of sources and retrieval methods reported? • Methods or Procedures • Results • Discussion

  21. Results • Typical evidence for attribution of gene function consists of literature citations. • When a literature search (e.g. Pubmed search), or a search of other knowledge sources (e.g. NCBI databases), is cited as the source of evidence to support attribution of function, rarely are details of the search reported. • Reasons for using sources and retrieval methods not reported.

  22. Results • When information retrieval methods are described in the paper, they are typically mentioned only in the “Results” or “Discussion” sections of the paper, and not in the “Methods” section. • Wet bench methods are reported in more detail than dry bench methods.

  23. Implications for Information Practice

  24. Implications for Information Practice • There is a need to embrace a workflow concept • There is a need to develop standards for documentation in e-science • There is a need to use multidisciplinary teams to develop workflows

  25. “There is a need to embrace a workflow concept” • Call a scenario of the use of a combination of multiple information resources databases and analysis tools a workflow • Workflows are increasingly important for information retrieval and processing in the Life Sciences

  26. Computer based Information retrieval and processing Traditional Science “There is a need to develop standards for documentation in e-science” The Digitization of Science or E-science

  27. Life Science Information Retrieval and Processing Workflows

  28. Life Science Information Retrieval and Processing Workflows documentation

  29. Life Science Information Retrieval and Processing Workflows documentation technology to facilitate documentation

  30. Life Science Information Retrieval and Processing Workflows documentation editorial policy drivers technology to facilitate documentation

  31. KNOWLEDGE-ENABLED WORKFLOWS METADATA TOOLS INFORMATION ITEMS “There is a need to use multidisciplinary teams to develop workflows”

  32. KNOWLEDGE-ENABLED WORKFLOWS METADATA TOOLS INFORMATION ITEMS domain expert (scientist)

  33. KNOWLEDGE-ENABLED WORKFLOWS METADATA TOOLS INFORMATION ITEMS domain metadata expert (information specialist) domain expert (scientist)

  34. KNOWLEDGE-ENABLED WORKFLOWS METADATA domain metadata expert (information specialist) TOOLS domain expert (scientist) INFORMATION ITEMS

More Related