Managing your research data Ben Taylorson Academic Liaison Librarian
Outline • Why manage data? • What is data management? • Data life cycle • Putting together a plan • Actively managing data • Metadata • Backups • Versions • Storing and sharing
What are data? • Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results. • Qualitative or quantitative • Analogue or digital – both have challenges
Reasons to manage your data • Responsible conduct of research • Funding body grant requirements • Research integrity and replication • Increase research efficiency • Save time and resources • Enhance data security • Prevent duplication of effort by enabling others to use your data
Climategate • 1,000 private emails and many other documents were stolen or leaked from the University of East Anglia's (UEA) Climatic Research Unit (CRU) in November 2009 • While HoC Select Committee cleared them of scientific failings, it did find room for improvement in research practices.
Activity • What is data management? • Think about • A definition • Elements of data management • Questions/issues
A definition “ actively managing data for as long as it continues to be of scholarly, scientific, research and/or administrative interest […] managing it from its point of creation until it is determined not to be useful, and ensuring its long-term accessibility and preservation, authenticity and integrity. Adapted from Digital Curation Centre definition for digital curation It is notjustarchiving or preservation “”
What is data management? • Planning • Creation • Processing • Describing, archiving and organisation • Analysing • Preservation and security • Access and reuse • Ethics and privacy • Disposal
Data life cycle www.data-archive.ac.uk/create-manage/life-cycle
Specific Plans • ICPSR Framework www.icpsr.umich.edu/icpsrweb/content/ICPSR/dmp/framework.html • Digital Curation Centre Data Management Plan www.dcc.ac.uk/dmponline • Individual institutions e.g. Oxford www.admin.ox.ac.uk/rdm/dmp/plans/ and MIT http://libraries.mit.edu/guides/subjects/data-management/
Creating data • Types of data e.g. text, numerical, models, multimedia, software • Format e.g. Word or PDF, XML or Excel? Consider longevity and choose open formats • How much data will you produce? • How will you document it? • Will the data change or be updated? Tracking? • Will it be reproducible? • What if it was lost?
Metadata • Accurately describing your data so • you can find and understand it again efficiently • others can reuse your data easily • Descriptive, administrative, structural • Basic – files and folders in Windows • Complex – XML, Dublin Core • Where will this be stored? With the data? Will you need additional storage/software?
Basic metadata • For own use • Project level descriptor then breakdown into useful groupings • Unique element including date • PhD\Primary Research\Interviews\phase 1\Government officials\Highlevel\MrSmith15062011.mp3 • PhD\Primary Research\Interviews\phase 1\Government officials\Highlevel\MrSmith15062011.docx Flickr
More complex metadata • Dates • Funders • Language • Location • Rights • List of file names and relationships • Formats • Methodology
More complex metadata • Workflows • Sources • Versions • Checksums • Explanation of codes used in file names • List of codes used in files • Store metadata in a text file (such as a readme file or codebook) in the same directory as the data
Version control • Will you retain originals or overwrite as you go? • Will anyone else be editing the information and do you need to track these changes? • Need to consider this before deciding on naming conventions
Storage • Short term • Think about volume of data • Which media you will use do you need something more than DVD/portable hard drive • Security • Cost
Backups • Make 3 copies which are geographical distributed (original + external/local + external/remote) • ITS will do much of this for you but what if remote from Durham? • How frequently? • Analogue data • Consider digitising if unique
Preservation • Long-term, more strategic • Selection criteria • Time-scale – how long will it be saved for? • Disposal • Additional information necessary for deposit? • Does it need to be migrated? • Where will it be deposited? Will they manage it for you?
Where to preserve your data • UK Data Archive • Archaeology Data Service, History DS, Economic and Social DS, Oxford Text Archive • No one repository at Durham University for data, only outputs; speak to Sebastian Palucha at Main Library
Sharing • Will you share it? Are you obliged to share it? • Who will be interested in it? How might they use it? • Are there reasons not to fully disclose data? • How will it be accessed? • When will you make it available? Embargo? • Will you publish findings that rely on the data? • Consider FOI http://foiresearchdata.jiscpress.org/
Dissemination • Deposit in a specialist data centre, dedicated to archiving digital data • Submitting to a journal (may be required) • Deposit in a self-archiving system or an institutional repository • Via a project or institutional website • Informally on a peer-to-peer basis e.g. email
Activity • Thinking about the data life cycle, look at the ICPSR guidance • Try and fill in some of the sections of the DCC Data Management Plan • Have you identified any areas on which you will need to seek further advice?
Sources of guidance • Durham University • UK Data Archive (Social Sciences and Humanities) • Create and Manage Data • Digital Curation Centre • Research Information Network • Funders’ web sites
Conclusions • Good data management = good research practice • Needs management throughout its life cycle • Planning helpful and possibly a requirement of funders • Depositing data for preservation and access • Slides available at www.dur.ac.uk/library/research/