1 / 10

SAIL: Documenting data content and quality, letting the computer take the strain

SAIL: Documenting data content and quality, letting the computer take the strain. Caroline Brooks Senior Research Analyst, College of Medicine, Swansea University Ann Wrightson Lead Technical Design Architect, NHS Wales Informatics Service

stefan
Télécharger la présentation

SAIL: Documenting data content and quality, letting the computer take the strain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAIL: Documenting data content and quality, letting the computer take the strain Caroline Brooks Senior Research Analyst, College of Medicine, Swansea University Ann Wrightson Lead Technical Design Architect, NHS Wales Informatics Service Hon. Research Associate, College of Medicine, Swansea University

  2. Swansea Health Informatics Research & NWIS • Partners in establishing and sustaining SAIL • Wider collaboration in usability testing and innovation • >Sharing skills & thinking around secondary uses of data

  3. Ideas and facts • General approaches in data research: • People have ideas and test them using the available facts • Ideas come from the available facts • But – facts are not so easy to see in the data! • Researchers need help... • Which data resources contain the facts I need? • What do I need to know about this data to use it well?

  4. What’s in this repository, anyway? • Dataset level – catalogue • What/from where/from whom/how collected/rights to use • Record level – dataset entry description • Data model (entity-relationship model) • Item level - field/attribute description • Data types/ranges/controlled terms

  5. How good is this data? What can it do for me? • Item • Population of this field/attribute - Why present? Why absent? • Significance of this field/attribute – What does it mean for me? • Record • Evidential value of presence &/or absence of particular record • Dataset • What work has already been done with this data?

  6. Work already done – www.saildatabank.org • SAIL databank website includes human readable dataset catalogue • Description, source, related publications, data model • Data Quality report (developed by SAIL team in 2013) • Standardized informative documentation for each dataset • Produced by automated analysis of data, published as PDF • Working with Canadian colleagues (MCHP and Pop Data BC) • Technology refresh of SAIL platform (CIPHER project – 2013-14)

  7. Work in progress • Machine-readable format for catalogue and data quality information • Data Documentation Initiative (DDI) format • Initial target: publish on website as download link in catalogue • Making outcomes of in-depth data quality work available for reuse • Algorithms that instantiate clinical & social research concepts • Evaluation of data coverage across populations of individuals • Knowledge sharing with NWIS data warehouse team

  8. Future directions • Further work on characterizing concepts in data – reproducible, reusable • How to make good use of SNOMED CT in source data • New knowledge & skills needed, also issues with old/new data • NWIS also working on this, another good area for collaboration • More general use of knowledge models alongside data • Comprehensive & integrated metadata reference architecture • Data annotation, e.g. using biomedical science ontologies

  9. Thank you for your attention

More Related