80 likes | 212 Vues
Lexical Markup Framework Implementation experiences. Marc Kemps-Snijders. Core model. Lexical Resource. 1..n. Model is adorned with data categories and further extensions. 1..n. Lexicon. LMF specification assumes that data categories are selected from the
E N D
Lexical Markup Framework Implementation experiences Marc Kemps-Snijders
Core model Lexical Resource 1..n Model is adorned with data categories and further extensions 1..n Lexicon LMF specification assumes that data categories are selected from the Data Category Registry, e.g. ISOcat 1..1 1..n /Lemma/ Lexical Entry /Part of Speech/ 1..1 1..1 0..n 1..n Form Sense /Orthography/ /Gloss/ Extension
Approach • Different applications deal with complex access to lexicon content • Lexus • Deals with lexicon content • Lexical information in entries is represented as a tree structure • Multimedia may be added to lexical entries • Interaction with archived materials • Import from various sources (Shoebox, XML) • Export of structure and content • ViCos • Relations between (parts of) lexical entries • Relations between fragments (text and images) • Annex • Display of annotated media files (ELAN) • ISOcat • Data Category Registry (ISO 12620) • Contains standard linguistic concept definitions Example image links. Different parts of body refer to different lexical entries
Schema Editor • Users can modify lexicon structure • Modify extensions and data categories • Connect to concept registries • ISOcat access through web service interface • (see: http://www.isocat.org/rest/help.html) • Modify sort orders • Define view templates
Lexicon browser Lexical entries link directly into archived corpora, e.g. via Annex Different customizable views are offered to the user
ViCos is a kind of • Users can create conceptual space • Between arbitrary fragments from lexicon • Using their own relation types • Specify display options (colors, line types) is a kind of Animals Fish birds bats birds/fish Bird/fish
Implementation experiences • Lexical entries are considered to be tree structures • Relations across lexical entry fragments are modeled separately • Conversion into LMF model is not always self evident • Sort orders for data categories are not represented in LMF • View generation is not trivial • Multimedia fragments (images, audio, video, archived material) are supplemented with label • There is NO formal interchange format for LMF • LMF DTD (Annex R) is only informative in nature • “A user can decide to define another DTD or schema to implement LMF. It is also possible to use the XML structures that are defined in the Feature Structure Representation standard (i.e. ISO 24610-1 [33]).” • Proposed DTD only covers LexicalResources which combine Lexicons of the same structure. • W3Schema or RelaxNG schema are more appropriate • Each lexicon is assigned their own name space here. <!ELEMENT LexicalResource (feat*, GlobalInformation, Lexicon+, SenseAxis*, TransferAxis*, ContextAxis*)>