680 likes | 777 Vues
Learn how distributed version control systems like Git and Mercurial can streamline data and metadata management in research repositories. Explore case studies, backup strategies, and repository synchronization for efficient versioning.
E N D
Beyond code: Versioning data with Git and Mercurial Stephanie Collett and Martin Haye California Digital Library, University of California
Agenda • Background • Case Study #1: eScholarship Backup • Case Study #2: Zephir Metadata • Summary
Version Control Repository Code
Version Control Repository Data/Metadata
Case #1 eScholarship Data/Metadata Backup
10 files per work XML Metadata }
~500,000 files total XML Metadata }
XML Metadata Single Mercurial Repository
Working Repository Backup Repository Nightly Sync (hg push)
XML Metadata Single Mercurial Repository
XML Metadata Single Mercurial Repository .hgignore
Working Storage Backup Storage } { Nightly Sync (rsync)
30-60 minutes for the batch job
Logs Date } Commit History Annotation Change
Case #2 Zephir Metadata Management System
File system record/
File system record/ marc.xml
File system record/ marc.xml attrbutes.xml summary.xml transform.xsl
File system record/ .git/ marc.xml attrbutes.xml summary.xml transform.xsl
... /pairtree/ab/cd/e/record/.git /pairtree/ab/cd/ea/record/.git /pairtree/ab/cd/ez/record/.git /pairtree/ab/cd/f2/record/.git /pairtree/ab/cd/f9/record/.git /pairtree/ab/cd/ff/record/.git /pairtree/ab/cd/fm/record/.git /pairtree/ab/cd/fq/record/.git /pairtree/ab/cd/gi/record/.git /pairtree/ab/cd/gw/record/.git /pairtree/ab/cd/gz/record/.git /pairtree/ab/cd/hs/record/.git /pairtree/ab/cd/ht/record/.git /pairtree/ab/cd/i/record/.git ... 10 million }
Versioning + Audit Trail + Diffing + Debugging
record/ marc.xml
record/ marc.xml attrbutes.xml summary.xml transform.xsl
.git/ branches/ config description HEAD hooks/ index info/ objects/ refs/
43 files, ~132k record/ + record/.git
~132k x 10 million record/ + record/.git
43 files x 10 million record/ + record/.git