UKPMC Supplementary Data

UKPMC Supplementary Data Vic Lyte 28th April 2010

Background • Currently there is 277 GB of supplementary data within UKPMC and growing; • From 1.7M documents, 88,652 have 1 or more items of SD; • Consists of additional files that that the author has uploaded and feels add contextual richness to their article deposition process; • Individual documents are systematically marked up and tagged, supplementary data is not and exists in an unstructured form within a directory location attached to a given article; • Text & Data-mining initiatives offer cross aggregation and semantic views on document corpus but not extend to supplementary data due to its unstructured and granular nature.

Background • No plans to manually mark up this additional resource of file due to multifarious range of file format (n= 290) and idiosyncratic nature of data artefacts - wider provenance issue; • This presents a challenge in the exposing and aggregation management of these rich assets other than a direct 1 to 1 relationship with their parent article; • As this sub-corpus continues to grow there is benefit in exploring techniques offering a way to bring this potentially hidden material into an overall semantic search strategy;

Scenario • A researcher conducting a meta-analysis on RCT's related to pain management may want to identify: • what studies have been conducted in this area () • which semantic groupings occur from the document corpus in relation to 'perception from a psychological perspective' () • what questionnaires and associated data has been made available in the corresponding area of inquiry (X) • Currently not possible to cluster and group across the sub-corpus to achieve the last area due to these items being in the supplementary data layer.

R&D ActivityComplementary approach • Unstructured search approach; • Similar discovery paradigm in other knowledge sectors; • Use Autonomy IDOL to investigate how it can organise and expose SD within context (semantically-driven search); • Proven Data agnostic search and mining capability; • Contextual mapping with parent article(s) and associated data; • Machine-driven taxonomies and clustering; • Automatic metadata generation. • Development of a ‘Proof-of-Concept’ demonstrator;

UKPMC Supplementary Data

UKPMC Supplementary Data

Presentation Transcript

Supplementary Data

Supplementary data

Supplementary Data

Supplementary Data 1

Supplementary data 1

SUPPLEMENTARY DATA

Supplementary Data

UKPMC

Figure 3S (Supplementary Data)

Supplementary Data 1

Supplementary Data II – Preface

Supplementary Data 3

Supplementary Data

Supplementary Data 4

Supplementary Data

Supplementary Data 5

Fig. 1 ( Supplementary data)

SUPPLEMENTARY DATA

Supplementary data

UKPMC and Dryad

Supplementary data 3

Supplementary data 2