1 / 10

A Very Brief Introduction to iRODS

A Very Brief Introduction to iRODS. Shu Huang RENCI. Fact Sheet. Integrated Rule-Oriented Data System that aims at managing distributed massive data Open source initiative (+13 year development and ~$20M NSF funding) Collaboration between UNC (DICE), RENCI, UCSD Applications:

idalia
Télécharger la présentation

A Very Brief Introduction to iRODS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Very Brief Introduction toiRODS Shu Huang RENCI

  2. Fact Sheet • Integrated Rule-Oriented Data System that aims at managing distributed massive data • Open source initiative (+13 year development and ~$20M NSF funding) • Collaboration between UNC (DICE), RENCI, UCSD • Applications: • Data grids, Institutional repositories, Libraries, Archives • Astronomy, High Energy Physics, Earth, Environment, Genomics… • Scale: hundreds of millions of files, petabytes of data, tens of federated data grids

  3. iRODSArchitecture User Can Search, Access, Add and Manage Data & Metadata Logical namespace: Users See shared “Virtual Collection” iRODS Metadata Catalog Track information iRODS Rule Engine Track Policies iRODS Data Servers Disk, Tape, etc. Access distributed data with Web-based Browser or iRODS GUI or Command Line clients. 3

  4. Managing Data - Virtualization Map from actions requested by the access method to a standard set of Micro-services. Map the standard Micro-services to standard operations. Map the operations to protocol supported by the operating system. Access Interface Standard Micro-services Data Grid Standard Operations (POSIX,ODBC…) Data obj, DB obj, Workflow

  5. Managing Computation • Why? • May be easier to move computation to data when data size is too large and computation is simple • Reduce latency by local processing can be critical • How? • Rule: Action | Condition | MS1, …, MSn | RMS1, …, RMSn • Micro-services: +250 well-defined functions • Rules invoked by servers to enforce policies • Rules invoked by clients to run workflows on servers

  6. Policies – Actionable Rules Retention, disposition, distribution, arrangement Authenticity, provenance, description Integrity, replication, synchronization Deletion, trash cans, versioning Archiving, staging, caching Authentication, authorization, redaction Access, approval, IRB, audit trails, report generation Assessment criteria, validation Derived data product generation, format parsing Federation of independent data grids

  7. Interfacing iRODS with DOA • iRODS and DOA have different focuses and they can work together. • Question: mapping between DOA Identifier and iRODS’ Logical name? • Identifier can be an attribute in iCAT metadata database • Handle system resolves Identifier to logical name • Development: • Micro-service to register and obtain prefix from handle system • Micro-service to assign suffix to object and store identifier in iCAT • Micro-service to sync icat to handle system: e.g. imv file1 file2 changes the logical file name, this MS should be invoked to update with the handle system

  8. Backup Slides

  9. Data Life Stages in iRODS’ View Data Collection Private Local Policy Data Grid Shared Distribution Policy Data Processing Pipeline Analyzed Service Policy Digital Library Published Description Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy

  10. Applications • International projects • Cyber Square Kilometer Array (radio astronomy), Cinegrid (movies) • National data grids • Australia, New Zealand, Portugal, UK, France • Federal agency archives • NASA Center for Climate Simulation, National Optical Astronomy Observatories, Ocean Observatories Intiative • Institutional repositories • French National Library, Carolina digital repository, Broad Institute genomics data, Sanger Institute

More Related