1 / 51

DCAPE Project Update

DCAPE Project Update. Richard Marciano Chien -Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT. NHPRC Issued a Call… .

gent
Télécharger la présentation

DCAPE Project Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALTSALT

  2. NHPRC Issued a Call… • Design a digital preservation service with a business model for the archival community • Fill the needs of archival repositories that cannot build and sustain their own electronic records archive

  3. DCAPE Project • Distributed Custodial Archival Preservation Environments • Project was funded by NHPRC in 2008 (RE10010-08) • Officially started in December 2008 • Project extended through April 2012 • http://www.dcape.org/

  4. What is Distributed Custodial Preservation? • Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service • Archival repository retains legal custody • Archival repository remains responsible for archival functions, including preservation and access • Access to collections is controlled by archival repository

  5. DCAPE Partners • 28 people across 9 institutions and 2 staff at UNC, for a total of 32 participants • Cultural Entity: Getty Research Institute • Cyberinfrastructure: West Virginia University, Carleton University (Canada) • State Archives: California, Kansas, Michigan, Kentucky, North Carolina, New York • State Library: North Carolina • University Archives: Tufts • UNC: School of Information and Library Science (SILS), Sustainable Archives and Leveraging Technologies (SALT)

  6. DCAPE Goals • Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. • Services are based on policies (rules) that are defined by the archivist • Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE • A series of rules might “look” like this: • When files are ingested, replicate them in three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.

  7. DCAPE Goals • The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. • The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories. • Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.

  8. Project Tasks • Execute service agreements between UNC and partners to govern use of the test collections. • Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections. • Ingest test collections into iRODS and validate the rules and services. • Develop business model (including costs) for sustaining a repository service based on iRODS. • Develop model service agreements that define the standard and optional services of the repository.

  9. Role of iRODS • Preservation environment provides rule-based automation of archival functions (repeatable services) • Standard and optional services will be available • Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities

  10. SIP AIP DIP Virtual Loading Dock Preservation Area Life Cycle of Data DIP Reference Room

  11. SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 iRODS DCAPE Framework DIP Reference Room R2 R1

  12. 1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS DCAPE Capabilities DIP Reference Room R2 R1

  13. 1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS DCAPE Capabilities DIP Reference Room R2 R1 Replication

  14. Sample Rule sampleRule||delayExec(<PLUSET>1m</PLUSET><EF>2m</EF>,assign(*path,/samplePath)##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePath2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*DataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pResult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_Coll,"*newpath"),nop,nop,nop),nop),nop)|nop

  15. 1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS An Interface that is easy to manage the policies! 24 DIP Reference Room R2 R1

  16. Interface - Requirements • Hide the technical details • Show the information that archivists want to know • Be able to customize policies easily • Web-based, no installation required

  17. 1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS Demo I Checksum DIP Reference Room R2 R1 Replication

  18. 1 2, 3, 4, 5, 6, 7, 8 10 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 SIP AIP DIP Virtual Loading Dock Preservation Area V2 V3 P2 P3 V1 P1 15 iRODS Demo II Checksum & Virus Check DIP Reference Room R2 R1 No Replication

  19. DCAPE is More • More than a storage service or environment • More than a reference tool • DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records

  20. DCAPE Interface

  21. DCAPE Metadata • Follow Dublin Core model • Allow customization • Encourage standardization • Define • Source: creator, system, archivist • Level: collection, accretion, item • Accessibility: internal vs. public • Fields: Required vs. optional

  22. DCAPE Workflow • Define functionality at each stage • Virtual Loading Dock • Pre-accessioning • Ingestion • Preservation Area • Archival storage • Data management • Administration • Preservation planning • Reference Room • Access • Common services • Management

  23. DCAPE Business Model • Non-profit • Fees for services • Fees for storage • Storage and disaster prevention services • Software maintenance • Access and connectivity

  24. MetaArchive Cooperative • Encourage organizations to build their own preservation infrastructures rather than outsourcing to external vendors • 3 levels of membership: 3 yr commitment • Basic costs: • Equipment: 1st year, $4.6K server purchase • Staffing: 2% of a sys. admin’s time + POC admin + software eng. For content ingestion preparation • Storage: $1.00 / GB / year for content stored in net. • Yearly dues: • Sustaining Members: $5.5K / yr • Preservation Members: $3K / yr • Collaborative Members: varies • Cost scenarios: 2TB of content Sustaining Member: $27.1K / 3 yrs ---> ($5.5K (membership) + $2K (space) )x3 yrs + $4.6K (server) Preservation Member: $19.6K / 3 yrs ---> ($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server) Collaborative Member: $22.6K/ 3 yrs ---> ($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

  25. Archive-It • Subscription service from the Internet Archive, allowing institutions to build and preserve collections of born digital content • Allows users to crawl, scope, catalog, manage, and browse their archived collections • Collections are hosted at the IA data center and are available through URL and full-text search • a minimum of 2 copies of each collection are kept online • Cost Scenarios

  26. Storage Cost Model Scenarios 1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage, assuming the need for two copies – one on disk and one on tape (iRODS)? Answer: $2,900 + $1,400 x 1.5 = $5,000 2. Question: Whatistheyearlycostof 6 millionfiles (web crawl scenario) and 1 TB ofstorage, assumingtheneedfortwotapecopies (usingiRODS)? Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835 3. Question: Whatistheyearlycostof 100,000 filesand 20 TB ofstoragewithtwotapecopies (usingiRODS)? Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152

More Related