1 / 28

SemanTic Interoperability To access Cultural Heritage

SemanTic Interoperability To access Cultural Heritage. Lourens van der Meij Antoine Isaac Marjolein van Gendt OLP AIO Workshop January 27th 2006. Outline. Pilot Project introduction Goals Collection selection Mapping aspect of Pilot Project Thesauri formalisation Mapping tools

kezia
Télécharger la présentation

SemanTic Interoperability To access Cultural Heritage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SemanTic Interoperability To access Cultural Heritage Lourens van der Meij Antoine Isaac Marjolein van Gendt OLP AIO Workshop January 27th 2006

  2. STITCH Pilot Project Outline • Pilot Project introduction • Goals • Collection selection • Mapping aspect of Pilot Project • Thesauri formalisation • Mapping tools • Output of mapping task • Lessons learned

  3. STITCH Pilot Project Current Cultural Heritage (CH) Situation

  4. STITCH Pilot Project Research and development in CH • Portals for heterogeneous collections access Different databases/vocabularies/MD schemes • Syntactic interoperability Access can be granted • Semantic interoperability Links with original vocabularies/MD structures are lost

  5. STITCH Pilot Project Current Development in CH

  6. STITCH Pilot Project Pilot Project Goals • Show in a small use case, using • Two Cultural Heritage collections • Two controlled vocabularies • Existing mapping tools • Existing SW techniques – SKOS, RDF, RDFS, Sesame • (representation, reasoning, storage, mapping) • That: Semantic links between controlled vocabularies can result in integrated access to heterogeneous Cultural Heritage collections

  7. STITCH Pilot Project STITCH ultimate goal

  8. STITCH Pilot Project Pilot Project Modules

  9. STITCH Pilot Project Collection selection (1/3) • Domain: Cultural Heritage • Collections: • Medieval Illuminated Manuscripts from KB • Masterpieces from Rijksmuseum

  10. STITCH Pilot Project Collection selection (2/3) • Controlled vocabularies: Iconclass • Illuminated Manuscripts • > 24.000 concepts • 10+ levels • Keys • Structural digits • Cross-references • Bracketed text • etc. Lupus (wolf) Fol. 62r: column min. 50x60 Iconclass:25F23(WOLF)47I2133

  11. STITCH Pilot Project Title The Artist Painting a Cow in a Meadow Landscape Year 1850 Artist Hendrikus van de Sande Bakhuyzen Technique Oil on panel Dimensions 73,2 x 96,7 cm Object number SK-A-4163 Catalogue Man, Self portraits, Cattle, Dutch landscapes, Fields and meadows Collection selection (3/3) • Controlled vocabularies: ARIA • Masterpieces • <500 terms, some of them redundant • 2-levels • Fuzzy multi-inheritance • Top and Topia Terms

  12. STITCH Pilot Project Outline • Pilot Project introduction • Goals • Collection selection • Mapping aspect of Pilot Project • Thesauri formalisation • Mapping tools • Output of mapping task • Lessons learned

  13. STITCH Pilot Project Thesauri Formalisation • ARIA • CHIP issued a SKOS version • Only used Topia Terms • Iconclass • SKOS • Only used basic hierarchy • No keys/structural digits/keywords

  14. STITCH Pilot Project Mapping tools • S-Match, Trento • Required input: TAB indented trees • Tree-like structures mapper • http://dit.unitn.it/~accord/ • Falcon-AO, Nanjing • Required input : • Standard RDFS Class/subClassOf • Subdivision of Iconclass • Standard OWL ontology mapper • http://xobjects.seu.edu.cn/project/falcon/falcon.htm • Method • Lexical/element level matching • Oracle (e.g. Wordnet) • Structure matching

  15. STITCH Pilot Project Output of mapping task • Output format • S-Match: • Less General • More General • Equivalence • Iconclass vs. ARIA only gives IC LG ARIA • Falcon AO: • Equivalence • Confidence measure (always 1) • Sequence of mappings might indicate usefulness • Application specific requirements • UI needs precision • Annotators might need recall

  16. STITCH Pilot Project Output of mapping task (S-Match) – nice results

  17. STITCH Pilot Project Output of mapping task (S-Match) – awful results

  18. STITCH Pilot Project Lessons learned • Annotation of results • Lexical matching • Gloss vs. label • NLP • Non-convenient priorities are given to lexical elements • rdfs:label vs. rdf:about/ID • Oracle based matching • Wordnet Sense Disambiguation • Structure based matching • Structure overvaluation (BT vs. NT vs. EQ) • Thesaurus simplicity makes it (almost?) useless • No attributes, fuzzy hierarchies • Differences in hierarchical structure levels • Complex structure-based algorithms are not always intuitive

  19. STITCH Pilot Project Lessons learned • Annotation of results (contn’d) • Output format • Wrong kind of relation (RT, siblings) • 1-1 mapping • Precision: • S-Match: 41% (subset of IC) • Falcon-AO from 1 out of 1000 (subset of IC) • To 5% if data tricked • To 52% if artificial but realistic threshold is introduced • Manual cleaning needed for use in UI • Expert mapping • Size of vocabularies • Ambiguous e.g.: is Nature/World as celestial body/Animals equal to or a subclass of Animals? • To be continued 

  20. STITCH Pilot Project Lessons learned: Improvements • Lexical matching • Introduce NLP • Let only complete concepts match • …. Further research (decipher black-boxes) • Oracle based matching • Stricter Wordnet interpretation • Include other oracles • Structure based matching • Create thesaurus based structure mapping (RT, keywords, siblings) • ….. Further research (decipher black-boxes)

  21. STITCH Pilot Project Lessons learned: Conclusion We have ontology mappers, not thesaurus mappers • Input: needs pre-processing from thesaurus data • Output: needs re-interpretation of mapping relations • Mapping process • Using resources that may be absent from thesauri • E.g. properties • Not (properly) using all information found in thesauri • E.g. synonyms, RT, textual descriptions Leads to ‘low-quality’ thesaurus mapping

  22. STITCH Pilot Project Thanks! Any questions? ? User Interface Future work

  23. STITCH Pilot Project Collections Access: Single View • Facets based on 1 point of view and its associated concept scheme(s) • Access to objects indexed against concepts from other schemes • If mapping between their index and the concepts from single view A single point of view on integrated data set

  24. STITCH Pilot Project Collections Access: Combined View • Search based on 2 points of view • One facet uses 1 vocabulary from 1 point of view • Facets attached to the different points of view are presented • Simultaneous access to different points of view of the same data

  25. STITCH Pilot Project Collections Access: Merged View • Facets using a merged concept scheme • Mapping leads to hierarchical links between schemes • Making the links between vocabularies more visible during search • A way to ‘enrich’ weakly structured vocabularies

  26. STITCH Pilot Project Future work • A lot to do for the rest of STITCH! • Method • Thinking about roadmap for using ontology matching techniques for CH voc. • Taking into account MD schemes (structure) • Evaluation of mappings • Use cases • KB • Other institutions and projects • Practical • Scalability of tools • Deployment for SW data (distributed/centralized) • Implementation of thesaurus-specific (adaptations of) tools

  27. STITCH Pilot Project Future work • Concerning PP: • Mappings • Assessing criteria for proper application-specific evaluation • (Keep on) tuning tools to obtain better results for PP collections • Interface • Dynamic view switching/facet activation • Better use of all kinds of exploitable relationships • RT-like • Expert evaluation of the whole prototype • Integrating other collections

  28. STITCH Pilot Project What’s a thesaurus (Wikipedia) • A list of every important term (single-word or multi-word) in a given domain of knowledge; and • A set of related terms for each term in the list. • Possible relations and additions: • Scope Note • Related Term (RT) • Broader Term (BT) • Narrower Term (NT) • BT and NT are reciprocals • Use (USE) = non-preferred term -> preferred term • Used For (UF) = preferred term -> non-preferred term

More Related