1 / 30

Building a Distributed Institutional Repository: NCSU Library's Digital Repository Projects

Learn about the North Carolina State University Libraries' projects in building a distributed institutional repository, including collections on electronic theses and dissertations, technical reports, faculty publications, and more. Explore their repository planning, governance structure, and partnerships with departments and institutes.

Télécharger la présentation

Building a Distributed Institutional Repository: NCSU Library's Digital Repository Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User Group ‘07

  2. Early Repository Planning • Digital Repository Planning Committee • What it wouldn’t be (at least to start) • Distributed community structure • Open submission • ‘Institutional’ Repository • What it would be (at least to start) • Library-managed collections • Building block for campus partnership • Learning opportunity

  3. Repository Building Blocks • NCSU Electronic Theses and Dissertations • Started 1997 • Mandatory since 2002 • Virginia Tech’s ETDdb • ~3,000 ETDs • NCSU Authors Database • Started 1995 • Access Database/Cold Fusion front-end • ~22,000 citations

  4. Repository Building Blocks (cont’d) • Technical Reports Print Collection • Campus Institutes and Departments • Massive fall-off in print distribution • Special Collections Resource Center • Digitized texts and photographs • Campus Newsletters • GIS Data • Library managed/acquired data collection • Homegrown data layer database/discovery tools

  5. Repository Plan • Target ‘Research’ collections first • Technical Reports • ETDs • Faculty Publications/Citations • Treat each collection as its own project • Actively pursue common technological solutions

  6. Technical Reports • DSpace Application • Lightly Customized • Library Harvested • Local Cataloging/Metadata database • Scripted Ingest Object Creation • Batch Ingest • Mix of ongoing submission by institute/departmental personnel and Library capture.

  7. Tech Rep Screenshot

  8. Technical Reports Item Detail

  9. Electronic Theses & Dissertations • Partnership with Graduate School • Hybrid System: DSpace and ETD-db • ETD-db submission/approval/management • Direct database extract for DSpace Ingest Object creation • Scheduled Batch Ingest process • DSpace Considerations/Alterations • Metadata Mapping • Author Browse (exclude contributor.advisor) • Various interface changes

  10. ETD-DB screenshot

  11. ETD DSpace screenshot

  12. Faculty Publications • Built on Existing Author Database • Rebuilt Authors DB from Access/ColdFusion to Oracle/PHP • Re-modeled data • Added Functionality • OpenURL • ‘Vita-like’ citation display • Full-text or submission links • Full-text stored in DSpace • Citation metadata and file exported by script • DSpace Identifier currently manually entered

  13. Faculty Publications Schematic Scholar Submit Citations and/or Text View full-text S+R Citations Web Submission Form Web interface (php) DSpace Item Display PostgreSQL (metadata) DSpace Java/JSP (full-text only) Oracle Faculty Publications DB (citations) Handle IDs File System (files) Access ISI Ann. Reps Etc. Add/Edit data Cataloging and Coll. Mgt.

  14. FacPubs Search Screen

  15. FacPubs result screenshot

  16. FacPubs Item screenshot

  17. Repository Governance • Internal • Digital Repository Planning Committee • Data Repository Architect • External • Faculty Repository Advisory Committee • Partnerships with departments and institutes

  18. NCGDAP: Overview • NDIIPP: National Digital Information Infrastructure and Preservation Program • Collaboration with Library of Congress • 1 of 8 three year projects to study long-term (50+ years) digital preservation • Objective: engage existing state/federal geospatial data infrastructures in preservation • Project approaches: Technical and Social

  19. Repository Requirements • Dim archive with possible future access • minimal IR/access component • Minimal repository imprint on data • repository agnostic ingest and export • Simple digital curation functions • Periodic MD5 checksum validation • Structured metadata index • Expected archived-data exchange • Leverage existing investments • Free Software with active community

  20. Automation: Threat and format analysis, validation Python wrappers for the following: • Anti-virus – ClamAV • Compressed files (tar, zip, gzip, bzip) • At-risk formats • Executable files (magic numbers) • Jhove validation

  21. Automation:Archive package organization • ESRI ArcGIS toolbar for selected formats

  22. Automation:Archive package organization • Rule-based python logic • filestem • extension relationships ( multi-file format validation) • directory structure • Manual intervention • NOID assignment

  23. Metadata:Seed file form • 'Transfer set' metadata capture in 'Seed file' • communicates with DSpace backend, generates xml used to inform later scripts

  24. Metadata:Communities and Collections • Search by type for 100+ communities • Facilitates creation and reduces errors

  25. Curation Processing • At-risk format migration, original retained • Agency-specific XML templates in ArcCatalog with synchronization flags • Provenance and curation metadata scripted

  26. Source Metadata Translation • Repository agnostic approach • Spokes for each transformation • Facilitates export from Dspace into other repositories • Generate Dspace QDC, METS; populate Workflow database

  27. Extra-repository AIP management • Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub • External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems • Integrates with existing GIS Lookup tool

  28. Repository Architecture Overview PostgreSQL One shared username. Separate database for each app repository tomcat instance Tomcat DSpace Internal Faculty Publications PHP/DSpace hybrid Repository (DSpace) • Technical Reports • ETDs Collections (DSpace) SCRC --Course Catalogs --Green ‘N’ Growing NDIIPP (DSpace) SCRC (DSpace) Asset Store/ ATABeast (sub-directory for each DSpace app)

  29. Upcoming Repository Related Projects • Enhancements to current system • XTF search interface • Inter-archive exchange • Digital Collections Repository • Special Collections Research Center • Other non-faculty collections • Data Repository • Scientific data • Statistical resources

  30. For More Information: • James Jackson Sanborn • james_sanborn@ncsu.edu • Jim Tuttle • jim_tuttle@ncsu.edu

More Related