130 likes | 238 Vues
Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center. Arctos: A 15 year history. MVZ: 1995 - Hired Stan Blum to develop relational data model (following modeling by Assoc. Systematic Collections).
E N D
Arctos/TACC CollaborationChris JordanTexas Advanced Computing Center
Arctos: A 15 year history • MVZ: 1995 - Hired Stan Blum to develop relational data model (following modeling by Assoc. Systematic Collections). • MVZ: 1997 - Hired John Wieczorek to implement model (desktop application) using Sybase and Versata. Partial implementation (e.g., no loans). • UAM: 1998-2000 - John W. migrated mammal data to Oracle, set up Versata. • UAM: 2002 - Dusty McDonald replaced Versata with ColdFusion, implemented full model (first web-based instance,aka Arctos). • MSB: 2003 – Joined Arctos at UAM (first multi-hosting instance). • MVZ and MCZ: 2005-2007 - Implemented separate instances of Arctos at Berkeley and Harvard (MVZ: first Postgres, then Oracle). • MVZ: 2009 - Moved hosting of data to Alaska (Virtual Private Database version).
Major repositories using the Arctos database: (34 collections of specimens or observations, 1.3M records)
TACC and TeraGrid • 10-year history of Research Cyberinfrastructure • Supercomputing, Visualization and Storage • Supported by NSF to provide research resources • TACC expansion of Data-focused support • 1 Petabyte dedicated online disk • 10 Petabytes offline archive • National network of replication resources
Data Diversity at TACC • Image Collections (Natural History, Art, etc) • Structured Data (Economics, Public Health) • BioMolecular Data (DNA, RNAseq, etc) • Physical Sciences/Simulation Data • Geographic data (Climate, Disaster Preparedness) • Integrated Infrastructure Supports Diverse Collections
Arctos is… A versatile online collections management system • Cataloged Items (ID, attributes, parts, etc.; batch uploading, downloading, editing; encumbrances) • Localities & Collecting Events (mapping, media, history) • Transactions (loans, accessions, borrows, permits; email reminders) • Usage (publications, projects, sponsors, GenBank) • Curatorial (object tracking, parts, condition, relations, etc.) • Determination history (identification, georef, attributes)
Breadth of Data in Arctos • Fish, amphibians, reptiles, mammals, birds and bird eggs/nests, plants, arthropods, fossils, molluscs • Specimens and observations • Media (images, audio) • Publications, fieldnotes Arctos constantly evolving to incorporate new kinds of data, e.g.,: • Better representation of non-publication documents (fieldnotes, correspondence) • Cultural collections (art, anthropology...) Nearly all that is known about an object (or observation) can be included in Arctos.
Arctos/TACC Partnership • Arctos hosts web/database resources • TACC hosts media collections • Images, Recordings, etc • Simple workflows for automated generation of thumbnails, JPG versions, MP3s, OCR • Replication policies automatically replicate to various storage locations • Images directly served from TACC to browsers
Arctos/TACC History • Initial work with UAF Herbarium in 2008 • Brought on MVZ Collections in 2009 • Ongoing work on web audio, OCR • New collections from UAF, UNM, others • Currently >300,000 digital objects under management • Support >100,000 downloads of original scans each year
Advantages for Collections Lower cost and management overhead Highly reliable, large-scale infrastructure No scalability issues Longer-term partnerships promote technical collaboration to add capabilities over time Provides built-in “Data Management Plan”
Long-Term Sustainability • TACC plan is to be a permanent research data resource • Arctos will evolve over time but the collections have permanent value • Infrastructure foundation is stable • Agency funding future is uncertain • Develop diverse funding sources and models to support robust, long-term operation
Ongoing Efforts Expansion of storage resources at TACC (~10PB online disk) Greater engagement in data management activities Working with BRC, ADBC awards and associated data iPlant Data/Genetic resources – link to specimen records?
Thanks for your Time • Steffi Ickert-Bond, UAF • Gordon Jarrell, UNM • Carla Cicero, MVZ • Michelle Koo, MVZ • Dusty Mcdonald, Arctos