1 / 19

EHRI Vocabularies and Linked Open Data : An enrichment?

EHRI Vocabularies and Linked Open Data : An enrichment?. Annelies van Nispen 15/05/2018. The EHRI Portal connecting archives and users. Online inventory of institutions and collections about the Holocaust

hutto
Télécharger la présentation

EHRI Vocabularies and Linked Open Data : An enrichment?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EHRI Vocabularies and Linked Open Data : An enrichment? CONNECTING COLLECTIONS Annelies van Nispen 15/05/2018

  2. The EHRI Portal connecting archives and users Online inventory of institutions and collections about the Holocaust • Making sources visible in a systematic fashion in order to counteract the fragmentation of the sources • Reveal interconnections (e.g. through a multilingual thesaurus; collation of authority files; relationships between originals and copies) • EHRI focuses on collection descriptions

  3. EHRI Portal https://portal.ehri-project.eu/

  4. EHRI Vocabularies • EHRI Thesaurus (subject terms) • Camps • Ghettos • Administrative districts • Places (Geonames) • Persons • Corporate bodies

  5. EHRI Vocabularies • Main tool for multilingual information • Retrieval & search functionality • Cataloguing and integration tool for incoming data • Holocaust related knowledge base, useful for further developments eg. NER, LOD or …..

  6. EHRI Vocabularies and Linked Open Data Experimentswith EHRI Vocabulariesand LOD • Places – Geonames • Persons – VIAF • Camps & Ghettos – Wikidata Aim: EnrichEHRIsVocabulariesandwherepossiblepublish as LOD

  7. EHRI Places & Geonames

  8. GeoNames Reconciliation - problematic cases • Places not listed in GeoNames (e.g. Altreich) • Places listed in GeoNames but missing spelling variants (e.g. Babyn Iar) • More than one location per place names, e.g. "Berlin" from "1(Berlin, sowjetischer Sektor)" mapped to 176 different locations • Access points which are difficult to disambiguate without context (e.g. "Bauer" can be the German word for "peasant", a German family name, or a German town)

  9. Geonames: More issues • access points withtyposnotclusteredbyOpenRefine (e.g. "Aushwitz" instead of Auschwitz) • access points wronglyfiltered out as person names (e.g. "Amsterdam, Landsmeer”) • Common nounssometimesgivefalsepositives, e.g. "Artillerie" from "1(Artillerie)" mappedto a part of town in New Caledonia • Problem: Historicalstates, such as Yugoslavia or Czechoslovakia, are notproperlylinkedtoparents / children in theGeoNames dataset

  10. EHRI Persons & VIAF

  11. EHRI Personalities and VIAF • Experiment with automatic matching to VIAF of persons data fromYadVashem, CDEC andCegesomawith manual quality check on matching results. • Issues : • Manypeoplecarrythesame name • Notenough information on birth/death dates, places or professiontodistinguishindividuals • Spelling variants/mistakes

  12. Outcome of experiment • 100 YV names, 68 were matched against entries in VIAF. High ambiguity in matching: a total of 234 matches, each name was matched 3.44 times • 68 matches: 31 were correct and 37 false positives. The ambiguity in cases of a correct match was sometimes higher, eg correct one in a set of 5/6 matches • Cegesoma and CDEC data give similar results, with CDEC data even much higher false positives

  13. Ghettos, Camps and Wikidata

  14. Import Ghettos in Wikidata • Name of the ghetto in different languages • Unique EHRI identifier for the ghetto • Associated place name and its unique identifier in Wikidata • Coordinates from Yad Vashem and/or USHMM • Unique identifiers from online resources, including The Yad Vashem Encyclopedia of the Ghettos During the Holocaust and the USHMM Holocaust Encyclopedia • Added statement qualifying the entry as a “ghetto in Nazi-occupied Europe”

  15. EHRI Ghettos & Wikidata

  16. Wikidata to EHRI Portal • English name of the ghetto • Place where the ghetto was located • Coordinates for the location • EHRI-assigned unique identifier for the ghetto Associated unique identifiers from online resources • Multilingual labels generated from the name of the places

  17. EHRI Vocabularies & LOD: An enrichment? Mixed results • Geonames set has problems, but we will use for further development • Personalities too much errors and sensitive vocabulary • Ghettos, Camps and Wikidata a positive experience 14

  18. CONNECTING KNOWLEDGE NIOD Institute for War, Holocaust and Genocide Studies (NL) CEGESOMA Centre for Historical Research and Documentation on War and Contemporary Society (BE) Jewish Museum in Prague (CZ) Center for Holocaust Studies at the Institute for Contemporary History in Munich (DE) YAD VASHEM The Holocaust Martyrs’ and Heroes’ Remembrance Authority (IL) United States Holocaust Memorial Museum (USA) Bundesarchiv (DE) The Wiener Library Institute for the Study of the Holocaust & Genocide (UK) Holocaust Documentation Centre (SK) Polish Center for Holocaust Research (PL) The Jewish Museum of Greece (GR) Jewish Historical Institute (PL) King’s College London (UK) Ontotext AD (BG) Elie Wiesel National Institute for the Study of Holocaust in Romania (RO) DANS Data Archiving and Networked Services (NL) Shoah Memorial, Museum, Center for Contemporary Jewish Documentation (FR) ITS International Tracing Service (DE) Hungarian Jewish Archives (HU) INRIA Institute for Research in Computer Science and Automation (FR) Vilna Gaon State Jewish Museum (LT) VWI Vienna Wiesenthal Institute for Holocaust Studies (AT) Foundation Jewish Contemporary Documentation Center (IT) EHRI is funded by the European Union

More Related