the australian national data service and open access to data
  1. THE AUSTRALIAN NATIONAL DATA SERVICE AND OPEN ACCESS TO DATA Andrew Treloar Director, ANDS Establishment Project ANDS: Building the Australian Research Data Commons

  2. Outline • Context • Blueprint • Goal • Structure • Progress • Internationally • Acknowledgements

  3. Context

  4. eResearch Co-ordinating Committee (2006) Thematic Issues • Continuing Need for a Focus • through national coordination • Human Capabilities • People, skills and understanding • Linkage of eResearch Resources • seamless access to resources • Access to Data • best practice data management and curation • Structural and Cultural Change • evolution of organisational structures and cultures • Awareness and Support • develop researchers’ ability to adopt eResearch • Service Clusters • Data • outreach, curation, data management • meta-services, location, access, movement • practice, providers and users • Computing • capability computing facilities • national computing environment • Interoperation • discipline services (tools ((software)) • user and operations support • collaboration services support • Access • the Australian access federation • the Australian research and education network

  5. Australian Code for theResponsible Conduct of Research • Describes the responsibilities of institutions and researchers in range of areas, including the management of research data and primary materials • Institutions are to retain research data, provide secure data storage, identify ownership, and ensure security and confidentiality of research data • Researchers are to retain research data and primary materials, manage storage of research data and primary materials, maintain confidentiality of research data and primary materials •

  6. NCRIS Investments

  7. NCRIS Budget Breakdown

  8. Data Compute Interoperation Access Platforms for Collaboration:Major Investments 2007-2011 Capability Computing Advanced models NCI - $26M The Data Commons Data Federations ANDS - $24M Research connectivity Seamless reach AAF+AREN - $6M Collaboration services Research workflows ARCS - $20M

  9. Blueprint =

  10. The ANDS Blueprint • Towards the Australian Data Commons (TADC) • Developed during 2007 by ANDS Technical Working Group • Mapped out coherent vision of what needs to be done in the data space • Available at

  11. TADC: Why Data? Why Now? • We are in an era of increasing data-intensive research • Almost all data is now born digital • Increasing amount of data generated(semi-)automatically • “Consequently, increasing effort and therefore funding will necessarily be diverted to data and data management over time” (TADC, p. 4)

  12. TADC: Need for standardisation • Software and hardware keep getting cheaper, wetware keeps getting more expensive • Fixing data management problems is enormously labour intensive and costly • “Consequently, standardisation within forms of data and simplification in the frameworks around retention, storage, access and use of data, and the elimination of differences whose resolution requires labour, must be made, if the on-going keeping and reuse of data is to remain affordable” (TADC, p. 5)

  13. TADC: Role of data federations • With more data online, more can be done • Possible now to answer questions unrelated to reasons why data was collected originally • Increasing focus on cross-disciplinary science • “Consequently greater clarity is needed over control and access to community-funded data, and the means of aggregating, federating and accessing such data are increasingly important” (TADC, p. 5)

  14. Changing Data, Changing Research • New scientific instruments • Large Hadron Collider at CERN will generate 1.5 gigabytes of data per second • the Square Kilometre Array (1 EB/day!) • New scientific Models • The mapping of the Human Genome: A billion DNA letters in a human sequence • Global climate models with ever finer resolution • New knowledge from unlocked data • Hubble data has to be shared six months after collection • Majority of published research from Hubble telescope data was not “first use” • • was free for two weeks, now isn’t 

  15. Goal

  16. The ANDS Goal • “to deliver greater access to Australia’s research data assets • in forms that support easier and more effective • data use and reuse” • TADC, p. 18 • And to be a “voice for data” • RF, 24/9/08

  17. ANDS implementation assumptions • ANDS doesn’t have enough money to fund storage • And so is predicated on institutionally-supported solutions • Not all data shared by ANDS will be open • ANDS aims to leverage existing activity, and coordinate/fund new activity • ANDS will only start to build the Australian Research Data Commons • ANDS governance and management arrangements are sized for the current funding

  18. Realising the goal • Develop user and owner frameworks for data commons • Develop and operate national registries and discovery • Seed the commons by connecting existing stores/federations • Increase capabilities across sector in data mgt, integration

  19. Structure

  20. ANDS Delivery Structure • ANDS has been structured as four inter-related and co-ordinated service delivery programs: • Developing Frameworks • Providing Utilities • Seeding the Commons • Building Capabilities • Plus candidate service development activities funded through National eResearch Architecture Taskforce projects

  21. Developing Frameworks (Monash) • Influencing relevant national policies • Building common understanding of data management issues and solutions across government, research funding agencies, and research intensive organizations Assisting OA by encouraging moves in favour of discipline-acceptable default data sharing practices

  22. Providing Utilities (ANU) • Building and delivering national technical services to support the data commons • Initial services • Discovery • Both “you come to us” and “we come to you” flavours • Probably a two-step process for some collections • Includes surfacing of ISO2146 entities (next slide) for web harvesting • Persistent identifier minting and management • Collections registry to underpin discovery • Plus Services Roadmap for later years • Providing capability within ANDS for integration of existing systems into Australian Data Commons Assisting OA by improving discoverability, particularly across disciplines

  23. ISO2146

  24. Seeding the Commons (Monash) • In targeted areas (because not enough resource to do everything), working to improve: • fabric for data management • amount of content • state of data capture and management • Selection process to identify targets • Plus, opportunistic content recruitment in first year Assisting OA by increasing the amount of content available, much of it (hopefully!) OA

  25. Building Capabilities (ANU) • Improving level of capability for research data management and research access to data • Train-the-trainer model • Two initial target populations • Early career researchers • Research support staff (IT, data management) • NOTE: Overlapping but different messages • Building a community around data management concerns Assisting OA by advocating to researchers for changed practices

  26. Progress

  27. ANDS: From Project to Service • Government asked Monash, ANU, CSIRO to set up ANDS • Establishment Project has met all its deliverables • DIISR has now signed contract for ANDS • First (interim) Business Plan available at • This will run until June 2009 • Next Business Plan needs to be complete by March 2009 for consideration and approval • ANDS will run until July 2011

  28. Australian Strategic Roadmap Review • Data Storage (p.21) • National data-fabric, based on institutional nodes • Shared Data (p. 22) • More ANDS • Coordination Component (p. 23) • Integration of eResearch activities • Expertise as an enabling infrastructure (p. 23) •

  29. National Innovation System Review • R7.10: A specific strategy for ensuring the scientific knowledge produced in Australia is placed in machine searchable repositories be developed and implemented using public funding agencies and universities as drivers • R7.14: To the maximum extent practicable, information, research and content funded by Australian governments including national collections should be made freely available over the internet as part of the global public commons… •

  30. Internationally

  31. Wellcome Trust • Policy on data management and sharing • The Trust “wishes to ensure that the outputs of the research it funds, including research data, are managed and used in ways that maximise public benefit.” • Benefits gained from research data “will be maximised when they are made widely available to the research community as soon as feasible, so that they can be verified, built upon and used to advance knowledge.” • Trust “expects the researchers that it funds to maximise the availability of research data with as few restrictions as possible” •

  32. National Institutes of Health • Final NIH Statement On Sharing Research Data (February 26, 2003) • “Data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health” • NIH “endorses the sharing of final research data to serve these and other important scientific goals” • Investigators “are expected to include a plan for data sharing or state why data sharing is not possible” •

  33. Acknowledgements ANDS Project Management Committee • Paul Bonnington, Monash • Cathrine Harboe-Ree, Monash • Alan McMeekin, Monash • David Groenewegen, Monash • Vic Elliott, ANU • Adrian Burton, ANU • Markus Buchhorn, ANU • Alex Zelinsky, CSIRO • David Toll, CSIRO • Tracey Hind, CSIRO • Clare McLaughlin/Jacqueline Cooke/Peter Nicholson, DIISR • Rhys Francis, AeRIC ANDS Organising Network • Andrew Treloar, Monash • David Groenewegen, Monash • Adrian Burton, ANU • Margaret Henty, ANU • Chris Blackall, ANU • Ross Wilkinson, CSIRO • Tracey Hind, CSIRO • John Morrissey, CSIRO Senior Representatives • Edwina Cornish, Monash • Robin Stanton, ANU • Alez Zelinsky, CSIRO