40 likes | 175 Vues
This session explores the complexities of enhancing bibliographic catalogue records from The European Library (TEL) through the lens of diverse user needs and multilingual capabilities. Participants will engage in tasks that challenge traditional boundaries by utilizing datasets in six languages: Dutch, English, French, German, Hungarian, and Portuguese. Emphasizing domain-specific terminology and controlled vocabulary, the session aims to strengthen techniques in word sense disambiguation. Attendees will work on hard topics, fostering the development of advanced techniques while also engaging with more traditional monolingual and bilingual tasks.
E N D
Broadening Ad-hoc Horizons Bibliographic catalogue records from The European Library (TEL) • challenging task with real data towards real users • monolingual, bilingual, multilingual tasks • six different languages: Dutch , English, French, German, Hungarian, Portuguese • domain-specific focused on controlled-vocabulary part of the records
Strengthening the Techniques Robust Task with Word Sense Disambiguation • ad-hoc collections annotated with “Word Sense” tags • LATIMES 1994 and GH 1995 • monolingual en, bilingual X en • hard topics will be chosen to give the possibility of developing advanced techniques to deal with them
SayingGood bye • More traditionalmonolingual and bilingualtask… • … but the collections are stillthere and you can continue toexperimentwiththem