1 / 18

Developing Data Attribution and Citation Practices and Standards

This project aims to develop practices and standards for data attribution and citation in the domain of disease network modeling. It includes a review of data citation issues and technology, understanding the domain, documenting processes, and working with partners to create a demonstrator. The project is supported by SageCite and involves collaboration with Sage Bionetworks, a US-based non-profit organization focused on community-based data-intensive biological discovery.

janat
Télécharger la présentation

Developing Data Attribution and Citation Practices and Standards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monica Duke m.duke@ukoln.ac.uk Project Manager, SageCite Project http://blogs.ukoln.ac.uk/sagecite/ #sagecite Developing Data Attribution and Citation Practices and Standards An International Symposium and Workshop August 22-23, 2011 UKOLN is supported by:

  2. Citation in the domain of disease network modelling Funded: August 2010 – July 2011

  3. SageCite project overview • Review of data citation (issues, technology) • Understanding the domain • Sage Bionetworks partners in project • Site visit • Documenting processes (workflow tools)

  4. SageCite project overview • Demonstrator • Adding support for data citation • Using DataCite services • Working with publishers • Benefits analysis: KRDS Taxonomy

  5. www.sagebase.org • US-based non-profit organisation • Creating a resource for community-based, data-intensive biological discovery • Community-based analysis is required to build accurate model

  6. www.sagebase.org • US-based non-profit organisation • Creating a resource for community-based, data-intensive biological discovery • Community-based analysis is required to build accurate models

  7. Slide by Lara Mangravite Sage Bionetworks

  8. Sage data and processes • Idealised 7-stage process • A combination of phenotypic, genetic, and expression data are processed to determine a list of genes associated with diseases • Different people are responsible for different stages of the modelling process. One person oversees the whole process.

  9. Stage 1: Data Curation • basic data validation to ensure integrity and completeness • datasets include microarray data and clinical data.   • ensures that the format of the data is understood and the required metadata is present.

  10. Agreeing standards to support sharing • Derry J et. al Developing predictive Molecular Maps of Human Disease through Community-based Modeling. • http://precedings.nature.com/documents/5883/version/1/files/npre20115883-1.pdf

  11. Workflow capture using Taverna http://www.vimeo.com/27287109 Documenting data processes through workflow tools • supports better citation • makes the cited resource more re-usable • strengthening the reproducibility and validation of the research.

  12. Data Citation Purposes • For attribution • Leading to credit and reward • For reproducibility • Supports validation, re-use • Eric Schadt at Sage Bionetworks Congress 2011 • http://fora.tv/2011/04/16/Eric_Schadt_Map_Building (start at 4.28)

  13. Open challenges: attribution • Preserving link with original data • Some discipline-based repositories have their own identifiers • Bi-directional links • Attributing data creators • including individuals? • Defining creation of new intellectual object e.g. curated dataset? • Cultural challenge in recognising non-standard contributions; microattribution • New metrics • Identification of contributors

  14. Open challenges: reproducibility • Identification and granularity • Discipline identifiers, global identifiers • How much value has been added since the data entered the workflow? • Identifying processes and software

  15. Acknowledgements • UKOLN • Liz Lyon • Monica Duke • Nature Genetics • Myles Axton • PLoS Comp Bio • Phil Bourne • University of Manchester • Carole Goble • Peter Li • British Library • Max Wilkinson • Tom Pollard • Sage Bionetworks

More Related