50 likes | 57 Vues
Explore how to efficiently manage and utilize the growing volume of unstructured supplementary data within UKPMC using a semantically-driven search approach. Enhance search capabilities, organize data, and enable contextual mapping with parent articles for better research insights.
E N D
UKPMC Supplementary Data Vic Lyte 28th April 2010
Background • Currently there is 277 GB of supplementary data within UKPMC and growing; • From 1.7M documents, 88,652 have 1 or more items of SD; • Consists of additional files that that the author has uploaded and feels add contextual richness to their article deposition process; • Individual documents are systematically marked up and tagged, supplementary data is not and exists in an unstructured form within a directory location attached to a given article; • Text & Data-mining initiatives offer cross aggregation and semantic views on document corpus but not extend to supplementary data due to its unstructured and granular nature.
Background • No plans to manually mark up this additional resource of file due to multifarious range of file format (n= 290) and idiosyncratic nature of data artefacts - wider provenance issue; • This presents a challenge in the exposing and aggregation management of these rich assets other than a direct 1 to 1 relationship with their parent article; • As this sub-corpus continues to grow there is benefit in exploring techniques offering a way to bring this potentially hidden material into an overall semantic search strategy;
Scenario • A researcher conducting a meta-analysis on RCT's related to pain management may want to identify: • what studies have been conducted in this area () • which semantic groupings occur from the document corpus in relation to 'perception from a psychological perspective' () • what questionnaires and associated data has been made available in the corresponding area of inquiry (X) • Currently not possible to cluster and group across the sub-corpus to achieve the last area due to these items being in the supplementary data layer.
R&D ActivityComplementary approach • Unstructured search approach; • Similar discovery paradigm in other knowledge sectors; • Use Autonomy IDOL to investigate how it can organise and expose SD within context (semantically-driven search); • Proven Data agnostic search and mining capability; • Contextual mapping with parent article(s) and associated data; • Machine-driven taxonomies and clustering; • Automatic metadata generation. • Development of a ‘Proof-of-Concept’ demonstrator;