1 / 23

Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010

Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010. Corey Miller ( cmiller@casl.umd.edu ), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis. Motivation.

didier
Télécharger la présentation

Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating a dual-use pandialectal Pashto grammarAF-PAK LEARN OmahaMay 17, 2010 Corey Miller (cmiller@casl.umd.edu), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis

  2. Motivation • Pashto is an indispensable Afghan language critical to our nation’s security • Pashto is difficult for English speakers • Updated, comprehensive, learner-oriented Pashto materials are needed • Grammar • Easy-access dictionary

  3. What makes Pashto difficult? • Ergativity • Up to four cases: direct, oblique, ablative, and vocative • Multiple noun and adjective declension classes • Variety of adpositions: prepositions, postpositions, and circumpositions • Retroflex consonants • Variety of verbal structures

  4. Project components Formal Grammar Descriptive Grammar Fieldwork Dictionary Parser Parser enables easy access to dictionary

  5. Fieldwork • Identified native speakers of Pashto from Afghanistan and Pakistan living in the US • Peshawar, Quetta, Pakistan • Kabul, Kandahar, Afghanistan • Create and run elicitation guides highlighting range of grammatical features • Review all paradigms and example sentences, note dialect variation • Digitally record all sessions

  6. Motivation for descriptive grammar • Existing materials suffer from liabilities • dated • cover single dialect • Tegey and Robson 1996: Kabul • Penzl 1955: Kandahar • Shafeev 1964: Kandahar • lack Pashto script (T&R has it)

  7. Goals for descriptive grammar • Contemporary data and presentation • Use of Pashto script and transcription throughout • Cover dialect variation wherever it applies

  8. Descriptive grammar • Pashto language, orthography, phonology • Adpositions • Pronouns • Nouns • Adjectives • Verbs • Dialectology • Miscellaneous

  9. Pashto dialects

  10. Pronoun paradigm: incorporation of dialect information

  11. Interlinear example sentences

  12. Adjective paradigm

  13. Formal grammar of inflectional affix

  14. Stem allomorphy in nouns

  15. Formal grammar of phonological rule

  16. Morphological parsing • Inputs • Formal grammar • Dictionary (Lexicon) • Output capability • Analysis: given an inflected form, produce possible headwords • Generation: given a headword, produce possible inflected forms

  17. Uses of morphological parser • Analysis capability enables dictionary lookup of inflected forms • Generation has pedagogical uses including self-testing

  18. How morphological analysis aids lookup • Inflected forms may differ substantially from citation forms • Experts can work around this problem, but non-experts often can’t

  19. The parser maps inflected forms to citation forms (headwords) What does this Pashto word mean? ولم Grammatical info: first person singular present imperfective Citation form:ويشتل What does this Pashto word mean? ولم ويشتل[wishtə́l] (verb) to shoot

  20. Conclusion • Updated descriptive grammar based on fieldwork • Formal grammar and lexicon feed parser • Parser enables simplified dictionary lookup • Faster, more informed processing of Pashto

  21. Conclusion • Updated descriptive grammar based on fieldwork • Formal grammar and lexicon feed parser • Parser enables simplified dictionary lookup • Faster, more informed processing of Pashto

  22. References • David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged Languages, Third International Joint Conference on Natural Language Processing, Hyderabad, India. • Maxwell, Michael and Anne David. 2008. Interoperable Grammars. First International Conference on Global Interoperability for Language Resources, Hong Kong. • Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).

  23. References • Penzl, Herbert. 1955. A Grammar of Pashto. Washington, DC: American Council of Learned Societies. • Tegey, Habibullah and Barbara Robson. 1996. A Reference Grammar of Pashto. Washington, DC: Center for Applied Linguistics. • Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. International Journal of American Linguistics 30.

More Related