180 likes | 308 Vues
In this presentation given at the DataCite meeting on June 8, 2010, IJsbrand Jan Aalbersberg discusses the evolution of data linking in ScienceDirect articles, highlighting past practices with supplementary data, manual and automatic entity linking, and the contemporary need for interoperable data access. The future vision emphasizes creating a seamless user experience by tightly integrating data and articles, fostering collaboration between publishers and data repositories, and establishing standards for data deposit and discoverability.
E N D
Linking Data fromScienceDirect Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Linking to & from Data from & to ScienceDirect Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Linking Data in ScienceDirect • The Past • Supplementary data • Entity links to databases • The Present • Some considerations • PANGAEA-type linking • A Future • Getting even closer connected
The Past (supplementary data) • Raw research data delivered as supplementary data • Available for limited number of data set types / formats • Data distributed over multiple articles and publishers • Format frozen in time – not maintained for preservation • Only available for smaller data sets (at most few 10 MBs) • Limited access due to use of existing publishing platforms • Data and article remain nicely coupled / packaged • Supplementary data always being peer-reviewed
The Past (entity linking - manual) • Authors manually identify (and tag) entities that are mentioned in articles and of which associated data is present (or registered) in databases, like GenBank, MINT, Uniprot, PDB, CCDC, ... • Very accurate and unambiguous • However, requiring author effort • Publisher takes care of actual linking • Reciprocal linking usually taken care of
The Past/Present (entity linking – automatic) • Sometimes automatically (e.g., NextBio and Reflect) • Easily extendable to new / other entities • Works retrospectively on older content • Does create recall / precision errors
The Present (some considerations) • STM, “Brussels Declaration”, June 2006: • “... believe that, as a general principle, data sets, raw data outputs of research, and sets or subsets of that data should wherever possible be made freely accessible ...” • Data sets should be freely accessible – at publisher? • Scientists prefer independent data repositories • Need for single domain-specific coordination • Huge costs for maintenance and preservation • Proper deposit mechanism needed • Through publisher? Extra overhead vs. ease of use • Enforcing deposit prior to publication • If community-supported, surely a possibility • Data set standardization is needed for optimal use
The Present (more considerations) • Scientist needs the combination of formal publication record and the raw data sets • To get optimal interoperability, close collaboration between publisher and data set repositories needed • Publisher should “enable and support” raw data sets • Submission: enforce if supported by community • Discoverability: interconnect article with data sets • Reciprocal linking at deepest level possible • PANGAEA-type linking • Data feeds from publisher to repositories? • Managing large amount of data set repositories? • DataCite as single discussion partner
The Present (PANGAEA linking) • Author submits article to publisher • Author submits data set to repository • At article publication, repository links article DOI to associated data set DOI, creating actual connection • User sees link to ScienceDirect from PANGAEA • User sees link to PANGAEA from ScienceDirect: SD Article SD Server articles USER PANGAEA Server data + associations link
A Future (tighter interoperability) • Not just a link to / from data and journal article • But provide integrated experience for scientist • Single page (environment) with data and article SD Article SD Server articles USER Supplementary Data Server data sets
A Future (tighter interoperability) • Not just a link to / from data and journal article • But provide integrated experience for scientist • Single page (environment) with data and article • Some users prefer it other way around; so also offer: Data Set Data Set Server data sets USER Article Server articles
A Future (inline supplementary data) • Structures submitted as supplementary data files (MOL files) • Displayed inline through Reaxys application / service
Linking to & from Data from & to ScienceDirect Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Creating the best User Experienceby integrating Data with Articles Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010
Creating the best User Experienceby integrating Data with Articlesrequires close collaboration between data set repositories and publishers Presented by: IJsbrand Jan Aalbersberg Hannover, DataCite Meeting Date: June 8, 2010