130 likes | 256 Vues
Join us for a comprehensive workshop focused on data management strategies for researchers. Held in London on June 30, 2009, and in Manchester on July 1, 2009, this session covers essential practices in data quality control, effective storage formats, versioning, and ensuring the authenticity of research data. Understand proper data handling to improve research quality, facilitate future use, and support data sharing. Gain insights into maintaining data integrity, preventing loss, and applying standard procedures throughout your research lifecycle.
E N D
Data quality control,Data formats and preservation,Versioning and authenticity,Data storage Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009
Good data management • good research • high quality data • needs to be planned • specific for purpose • data can be understood and used now and in future • data can then be shared and re-used
Quality control Data quality control at various stages: • data collection • e.g. instrument calibration; expert opinion; multiple measurements; computer assisted interviews • data entry, digitisation, transcription and coding - standardised and consistent procedures • e.g. set up validation rules for data entry; use input masks; detailed variable labelling; missing value coding; use controlled vocabularies or choice lists; best structure to organise data and data files • data checking and verifying - automated and/or manual • e.g. double entry; check for out-of-range values; apply random sample validation; statistical analyses (descriptives, frequencies, means, range, clustering) to detect errors or find anomalous values; verify data completeness
Data formats • choice of software format for digital data: • planned data analyses • software availability • hardware used • discipline specific standards and customs • digital data software dependent • digital data endangered by obsolescence of software/hardware • best formats for long-term preservation - standard formats, interchangeable formats, open formats • e.g. tab-delimited; comma-delimited (CSV); ASCII; OpenDocument format; SPSS portable; XML
Data format conversions • convert data for preservation or back-up, e.g. export, save as • beware of conversion errors: • loss of internal metadata • e.g. convert MS Access to tab-delimited tables • loss of editing, formatting, formulae • e.g. convert MS Word to RTF • truncation or loss of data • e.g. string variables lost in SPSS – STATA conversion • check for errors and changes after conversion Example 1: MS Excel to tab-delimited Example 2: Word to XML Example 3: Proprietary audio file (DVF) to WAV
MS Excel format Tab–delimited text format
Version control • keep track of different copies or versions of data files • which methods: • single site vs. across locations • single vs. multiple users • different versions to be stored vs. files to be synchronised • single user of data files: • file naming – unique file names with date or version number (avoid spaces!) e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTests_06-04-2008; BGHSurveyProcedures_00_04 • version control table or file history within or alongside data file • version control facility within software, e.g. MS Windows software • multiple users of data files • same as above • control rights to file editing: read/write permissions, e.g. Windows Explorer • versioning/file sharing software: check files out/in, e.g. SVN, VSS, Google Docs, Amazon S3 • manual merging of multiple entries/edits • synchronise files, e.g. MS SyncToy software
Authenticity of data • master files • assign responsibility for master files • record changes to master files
Data storage • digital storage media unreliable • file formats and physical storage media ultimately become obsolete • optical (CD, DVD) and magnetic media (hard drive, tapes) vulnerable and subject to physical degradation Best practice: • use data formats with long-term readability • storage strategy with at least two different forms of storage • copy/migrate data files to new media between two and five years after first created • check data integrity of stored data files at regular intervals (checksum) • know your back-up strategy: institutional/personal; network server/PC/laptop • maintain original copy, external local copy and external remote copy • test file recovery • Data Protection Act and data back-up – may require minimal data copies for personal data; secure storage
Example: data storage and preservation at UKDA • preservation copy (UKDA) • shadow copy (UKDA) • dissemination copy to reduce load on main system • near-site online copy (on campus) • off-site online copy • tape-based offline copy (UKDA) Multi-copy, multi-storage media and multi version resilience: scheduled nightly robotic 3-monthly
Good data management practice • plan data management early • assign roles and responsibilities • design data management according to needs and purpose of research • data management throughout research
Resources • ESDS (2008). Guide to good practice: micro data handling and security. http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf • Finch, L. & Webster, J. (2008). Caring for CDs and DVDs. NPO Preservation Guidance. Preservation in Practice Series. London, National Preservation Office. Available at http://www.bl.uk/npo/pdf/cd.pdf • UK Data Archive (2009). Manage and Share Data. http://www.data-archive.ac.uk/sharing/ See: http://www.data-archive.ac.uk/sharing/furtherstorage.asp