1 / 18

CRISP WP 17 1 / 2

CRISP WP 17 1 / 2. Proposed Metadata Catalogue Architecture Document. Work package 17 - IT & DM: Metadata Management and Data Continuum.

jon
Télécharger la présentation

CRISP WP 17 1 / 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CRISP WP 171 / 2 Proposed Metadata Catalogue Architecture Document

  2. Work package 17 - IT & DM:Metadata Management and Data Continuum • Objectives:choose, implement data management and metadata mining services and establish an environment permitting a data continuum from raw data to publications across the participating Research Institutes (RIs): ILL, ESRF, SLHC and EuroFEL. • Task plan: • Evaluate and adapt metadata catalogues according to theRIs requirements. • Deploy and integrate metadata catalogue • Prototype of data mining on metadata services. Bessone Nicola - ESRF

  3. Evaluate metadata catalogues:Use cases • Identified a list of requirement based on ILL, ESRF and DASY use cases. • Select a list of most suitable metadata catalogue system on the market. • Match the requirements with features proposed by the metadata catalogues. Bessone Nicola - ESRF

  4. Evaluate metadata catalogues:Requirements • AAA • AuthenticationModular integration of different authentication systems. • AuthorizationCustomizable access control system. • AccountingGranular logging information levels. Bessone Nicola - ESRF

  5. Evaluate metadata catalogues:Requirements • Metadata ModelCore Scientific Metadata Model (CSMD) already been developed at STFC Parameter Investigation Sample Datafile Study Dataset Bessone Nicola - ESRF

  6. Evaluate metadata catalogues:Requirements • Searching methodFulfill user’s search needs, being easy to use and to access (web).Provide data mining to Facilities and Scientific management about data use/access/search/modific. • Cross platform • Service APIStable set of API possibly programming language agnostic. Bessone Nicola - ESRF

  7. Evaluate metadata catalogues:Requirements • Sustainability • Open source • Project organization:Actively maintained, Release plan (documentation, update mechanism, backward comp.), Patch release process (security, bug fix) • Cutting edge Technology • LicenseFree of charge Bessone Nicola - ESRF

  8. Evaluate metadata catalogues:Requirements • Data policyDynamic authorization system. • Scalability & PerformanceILL host ~2’000 experiment /year producing ~10’000 datasets. Other facilities possibly more… • Data ingestionManually & automatic + possible harvest (OAI-PMH) • SecurityProtect intellectual property. Bessone Nicola - ESRF

  9. Evaluate metadata catalogues:Metadata catalogue systems • ICAT • Dspace • Fedora • Ckan • Invenio • Tardis • ISPyB • iRODS • SRB-MCAT • MS. Zentity Bessone Nicola - ESRF

  10. Evaluate metadata catalogues:Selection result • Different solutions have been explored, amongst them ICAT appears to be the only one that currently fits the Data Model requirements. This is the key element for a successful implementation in a reasonable time frame. Bessone Nicola - ESRF

  11. Evaluate metadata catalogues:ICAT • Authentication plug-in • Rulebased authorization mechanism • Flexible metadata model • Search method: full-text, numerical and string search and SQL like query syntax • Set of API (Java and Python) • Database configurable (Oracle, Posgres and MySQL) • Federated search via TopCAT • Core Scientific Meta-Data Model (CSMD) Bessone Nicola - ESRF

  12. Evaluate metadata catalogues:ICAT • Plug-in for DAWN/Mantid • Licence: FreeBSD • Web interface: TopCAT • In use at 11+ RIs Bessone Nicola - ESRF

  13. Evaluate metadata catalogues:ICAT • Work-in-progress: • Improve web interface (TopCat) • Possibility to harvest (OAI-PMH) • Installation process • Synonym mechanism • Integration with Umbrella authentication Bessone Nicola - ESRF

  14. Deploy and integrate ICAT:ESRF - Pilot Spec Spec Actual TomoDB metadata collect structure Spec 1 DB TomoDB 3 2 ICAT API 1 Tomo Xml RDBMS Tomo to ICATxml converter ICAT xml ingest ICAT Xml Web Service API 3 SMIS to ICATingester 2 SMIS Bessone Nicola - ESRF

  15. Deploy and integrate ICAT:ESRF - future NEW beamlinecontrol system Spec Spec Spec Spec New Sequencer Spec session ICAT API SMIS API DataManager Experiment metadataManagement WEBInterface RDBMS RDBMS Scientist controlling the Experiment Web Service API Web Service API Bessone Nicola - ESRF

  16. Deploy and integrate ICAT:ILL • Data policy published in Dec 2011 • Implementation Oct 2012 • ICAT deployment Dec 2012 • Currently, ingestion of the Data since Nov 2012 Bessone Nicola - ESRF

  17. Future work • Complete the deployment (ingestion) at the participating facilities. • Data mining • Collect uses cases from the different facilities • Currently all use cases are technically simple (no request for correlation for instance) • Work on the search engine (lucene) • Reporting Bessone Nicola - ESRF

  18. Bessone Nicola - ESRF

More Related