90 likes | 199 Vues
This final report details the Data Architecture for TeraGrid, covering the conceptual view of data, implementation goals, and key recommendations for software support, policy, and allocations. It emphasizes providing a comprehensive view of data infrastructure to support a wide range of needs across various communities.
E N D
Data Architecture Final Report Chris Jordan Assistance from Kelly Gaither, Phil Andrews, J Ray Scott, and a cast of dozens
Process • Over a year of discussion, interaction with various communities • Campus Champions • Science Gateways • Data Collections and Data WG • Teragrid conference BOFs • Team discussions and conference calls • Kelly Gaither, Phil Andrews, J Ray Scott
What is the Data Architecture • Providing a conceptual view of data • Understanding how data is used in individual projects and in a larger lifespan • Methodology for identifying gaps in data infrastructure • Not: • A specific set of hardware or software recommendation • A set of standards to be utilized by TeraGrid • A static view of the community and its needs
Data Architecture “Spectrum” View • Communities of Use • From individual projects to the global public • Length of Value • Scratch data (days to weeks) to irreproducible data (to infinity and beyond) • TeraGrid must support the full breadth of data needs across both axes in some fashion
Implementation Goals • Minimal number of new technologies • Minimally intrusive to users • TeraGrid and RPs don’t have to provide everything directly • TeraGrid must have a strategy to support all needs
Arch Recommendations 1 • Software Support for Data Management and Replication • iRODS Multi-Site deployment • Global Namespace/File System Service • Distributed Storage(J-WAN) and HSM/File system integration • Policy and Documentation support for Data Collections • Improved GIG and RP support for Data Collections WG
Arch Recommendations II • Policy and Allocations Support for Data Lifecycle Management • Ideally: Data Allocations • Minimally: Requirement to express needs in requests • Coordination with Data Infrastructure Providers • Coordination group formed, 1 DataNet Partner, Reddnet, HathiTrust members • User Services Support for Data Management • Chris working with Amit, Presentation to AUS team 11/5 • Workflow and Portal tools to support expression of data lifecycle needs
Implementation Issues • Are there any major issues or concerns? • Is there anything described that RPs would NOT participate in/support through existing resources and Data-WG staff time? • How to proceed with allocations process and policy changes?
References • Wiki Page: • http://teragridforum.org/mediawiki/index.php?title=Data_Architecture • Implementations Notes: • http://teragridforum.org/mediawiki/index.php?title=Data_Architecture_Recommendations_Implementation