310 likes | 328 Vues
the Semantically Enriched Archivist. Semantics for Archives & Records Management at OECD. 45th ICA / SIO Conference , Brussels, 22 May 2019. As archivists we often face issues….
 
                
                E N D
the Semantically Enriched Archivist Semantics for Archives & Records Management at OECD 45th ICA / SIO Conference, Brussels, 22 May 2019
As archivistsweoften face issues… Performance: http://www.riannegroen.com/sven-sachsalber-at-palais-de-tokyo.html (…by the way, it took the artist 18 hours to find the needle…)
because information withoutcontext… …islike a fishwithout water
Solution = Context + Structure Well, that’sexactly the Archivist’sbread and butter,
…the Fundamentals of archival description… Provenance Business Context Series Dossier as metadata in… Content Type Status So, if weembed the: Principle of Provenance Principle of Structure
Yes! How ? Through Semantic Analysis
How do wedevelopthese robots ? We develop on a set of test documents (Test corpus) We test on complete corpus and we put in production using Web Services We debug to correct patterns and disambiguate
Some OECD ArchivalExamples Problem 1: Wedon’t know what type of document itis! Document Type Classification Problem 2: Wedon’t have resources to index scanned documents manually! (OCR-ed) Document Indexing Problem 3: Full textsearchgivestoomanyresults! Topics and Geographical Areas Classification
Solution1 Document Type Classification Quality : 95 % Precision – 85 % Recall Is this document a Report, an Agenda, an Invoice ?
(OCR-ed) Document Indexing … • Overallqualityisremarkably goodBUT…. • 100% is not possible • And OCR canbe a challenge…
OCR = Problems Wecan normalise dates But titles are more difficult: (in French, lionceau = lion cub…)
BUT… Our biggest issue is: The« COLLECTION » Stamp
Solution 3 Topics and Geographical Areas Classification • Identify the 15 Best Topics and Geographicalareas usingthe Central OECD Taxonomies
Topics and Geographical Areas Classification Works remarkablywell…. Evenon OCR-ed documents!
Architecture Semantic Layer Data hub
Multi-view annotation graphs We tag a sameresourcein differentways Wecansee a sameresourcein contextfromdifferent « semantic » viewpoints We use several semantic robots, based on several different taxonomies (generic, innovation-oriented, etc…)
Conclusion By becomingSemanticallyEnriched Archivists, Librariansor Information Scientists wereally have become : KnowledgeGardeners Semantics are: Indispensable for our profession Trueenablers for KnowledgeDiscovery