1 / 31

Modeling Lexical Entries in Bilingual Dictionaries —Or— Exegeting the UML Model

Modeling Lexical Entries in Bilingual Dictionaries —Or— Exegeting the UML Model. Mike Maxwell Linguistic Data Consortium. Three Levels of Abstraction. File formats Data models Ontologies. Conceptual Structure vs. Views. Data model = Conceptual/ Underlying structure

sharvani
Télécharger la présentation

Modeling Lexical Entries in Bilingual Dictionaries —Or— Exegeting the UML Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Lexical Entries in Bilingual Dictionaries—Or— Exegeting the UML Model Mike Maxwell Linguistic Data Consortium EMELD Workshop on Digitizing Lexical Information

  2. Three Levels of Abstraction • File formats • Data models • Ontologies EMELD Workshop on Digitizing Lexical Information

  3. Conceptual Structure vs. Views • Data model = Conceptual/ Underlying structure • View = layout, formatting • Examples of views: • Page layout • Definition numbers • Alphabetization • Filtered subsets EMELD Workshop on Digitizing Lexical Information

  4. Conceptual Structure vs. Views • Spanish-English and English-Spanish sides of bilingual dictionary: View • Spanish lexical entries, English lexical entries, and relations between them: Underlying structure EMELD Workshop on Digitizing Lexical Information

  5. UML Models • What is UML?“The Unified Modeling Language™ (UML) is the industry-standard language for specifying, visualizing, constructing, and documenting the artifacts of software systems. It simplifies the complex process of software design, making a ‘blueprint’ for construction.” (http://www.rational.com/uml/index.jsp) • Blueprint language • We’ll use small subset EMELD Workshop on Digitizing Lexical Information

  6. UML Models • Objects • Classes • Attributes • Links • Composition • Association • Class hierarchy EMELD Workshop on Digitizing Lexical Information

  7. UML Models • Normalization • Data item appears once • Attribute (‘field’) holds one type of data • Strings • MultiUnicode • MultiString EMELD Workshop on Digitizing Lexical Information

  8. SIL-developed Model • Bilingual lexicon(one-way: full information for vernacular language only) • Developed for LinguaLinks • Modified for Fieldworks • Embedded in larger model of language description (http://fieldworks.sil.org/ModelDoc/) EMELD Workshop on Digitizing Lexical Information

  9. Lexicon[see Lexicon.gif] • Front matter, appendices, … • Lexical entries • Lexemes (stems, roots, words) • Affixes • Larger constructs (idioms etc.) EMELD Workshop on Digitizing Lexical Information

  10. Lexical Entry[see LexEntries.gif] • Kinds of lexical entries • Major Entry • Subentry • Minor Entry EMELD Workshop on Digitizing Lexical Information

  11. Major Entries • LexMajorEntry • For morphemes and non-compositional word-level “things” • Stems, roots, affixes(not a theoretical statement!) • But citation forms can be words EMELD Workshop on Digitizing Lexical Information

  12. Subentries • LexSubentry • Subclass of LexMajorEntry • For multi-morphemic constructs: • Derivatives • Compounds • Idioms • Sayings • Phrasal verbs EMELD Workshop on Digitizing Lexical Information

  13. Subentries (cont’d) • Points to morphemes (etc.) of which it is composed • Does not “belong” to morphemes (LexMajorEntries) of which it is composed EMELD Workshop on Digitizing Lexical Information

  14. Minor Entries • LexMinorEntry • Subclass of LexMajorEntry(but usually much simpler) • For irregular forms (oxen, been, went) • Belong to a LexMajorEntry(but alphabetization is a view!) EMELD Workshop on Digitizing Lexical Information

  15. Parts of Lexical Entries • Lexica est omnis divisa in partes tres (plus a label): • Citation form (= the label) • Forms • Morphosyntactic information • Senses • No provision for etymology EMELD Workshop on Digitizing Lexical Information

  16. Parts of lexical entries:Citation Form • = Lemma, Headword, Canonical Form • CitationForm attribute • multiUnicode EMELD Workshop on Digitizing Lexical Information

  17. Parts of lexical entries:Forms [see MoForm.gif] • PronunciationsLexPronunciation (written form + sound) • AllomorphsMoForm (written form, morph type, phonological context…) • Underlying FormMoForm EMELD Workshop on Digitizing Lexical Information

  18. Parts of lexical entries: Morphosyntactic Information [see MSI.gif] • MoStemMsi (for Stems/ Roots, whether bound or free) • Part of speech • Inherent morphosyntactic features • Inflection class (= paradigm/ declension) • Exception features EMELD Workshop on Digitizing Lexical Information

  19. Parts of lexical entries: Morphosyntactic Information • MoInflectionalAffixMsi (for Inflectional Affixes) • Morphosyntactic features • Exception features EMELD Workshop on Digitizing Lexical Information

  20. Parts of lexical entries: Morphosyntactic Information • MoDerivationalAffixMsi (for Derivational Affixes) • From/ to POS • From/ to morphosyntactic features • From/ to inflection classes • From/ to exception features EMELD Workshop on Digitizing Lexical Information

  21. Parts of lexical entries:Senses [see LexSenses.gif] • LexSense: • Definition • Gloss • Scientific name • Pictures • Example sentences • Sub-senses (more LexSense objects) EMELD Workshop on Digitizing Lexical Information

  22. Parts of lexical entries:Senses • LexSense (cont’d): • Morphosyntactic information: points to a ‘MorphosyntaxInfo’ object • This MorphosyntaxInfo’ object can be shared among different senses of the same LexEntry:run = to jogrun = to go (to the store)(both can be nouns or intransitive verbs) EMELD Workshop on Digitizing Lexical Information

  23. Parts of lexical entries:Senses • LexSense (cont’d): • Use of shared ‘MorphosyntaxInfo’ object allows flexibility via views:The particular way in which definitions and other features of the dictionary article are presented comprise the macrostructure. Are definitions arranged by part-of-speech?… (Landau, Dictionaries: The Art and Craft of Lexicography, p. 99) • A view! EMELD Workshop on Digitizing Lexical Information

  24. Parts of lexical entries:Senses • LexSense (cont’d): • Points to set of ‘ReversalIndexEntry’ objects • Can be shared among senses belonging to the same or other LexEntries • Many-to-many relation between LexSenses and ReversalIndexEntries EMELD Workshop on Digitizing Lexical Information

  25. Parts of lexical entries:Senses • ReversalIndexEntry:Impoverished LexEntry • Name (= citation form) • POS • Sub-entriesAllows for reversal entries like: Green (adj.) to be green: yax EMELD Workshop on Digitizing Lexical Information

  26. Relationships among Senses:Synonyms [see LexSets.gif] • LexSimpleSetOne set per group of synonyms(asymmetry in model?) • ‘Members’ = LexSetItems, in turn pointing to a LexSense(LexSetItems are a throw-away class?) EMELD Workshop on Digitizing Lexical Information

  27. Relationships among Senses:Antonyms and other Binary Relations • LexPairRelations, owning sets of LexPairs • Allows: • Directed relations (e.g. individual-group) or • Undirected relations (e.g. antonyms) EMELD Workshop on Digitizing Lexical Information

  28. Relationships among Senses: Part-Whole, Generic-Specific • LexTreeRelations, owning sequence of LexTreeItems • Outline structure:(animal (mammal (dog cat)) (reptile (snake turtle))) EMELD Workshop on Digitizing Lexical Information

  29. Relationships among Senses: Scales • LexScale(relation not specified: asymmetry in model) • Negative-neutral-positive scales(tiny, small; medium; big, huge) • Positive (or neutral) scales(inch, foot, yard, furlong)(January, …December) EMELD Workshop on Digitizing Lexical Information

  30. Dialects • Q: What can vary between dialects? • A: Anything EMELD Workshop on Digitizing Lexical Information

  31. Dialects • Modeling dialects • Separate encodings • Separate lexicons • Mark objects for dialect(what level of granularity?) EMELD Workshop on Digitizing Lexical Information

More Related