1 / 16

Annotation for Hindi PropBank

Annotation for Hindi PropBank. Outline. Introduction to the project Basic linguistic concepts Verb & Argument Making information explicit Null arguments. Tasks to be carried out Timesheets, tips. Creation of Resources. For machines rather than humans

obert
Télécharger la présentation

Annotation for Hindi PropBank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotation for Hindi PropBank

  2. Outline • Introduction to the project • Basic linguistic concepts • Verb & Argument • Making information explicit • Null arguments • Tasks to be carried out • Timesheets, tips

  3. Creation of Resources • For machines rather than humans • Imagine a dictionary/ thesaurus for computers • A requirement for Natural Language Processing • Large annotated resources • Annotation implies addition of linguistic information • Tailored to language specific requirements • Needs to be as consistent as possible • Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation

  4. Hindi-Urdu Treebank Project • One of the first efforts to make a large-scale resource for Hindi-Urdu • Similar resources exist for Chinese, Arabic and English • Three main components • Hindi-Urdu dependency treebank • Hindi-Urdu PropBank • Hindi-Urdu phrase structure treebank [derived]

  5. PropBank • PropBank resource creation at CU Boulder • We annotate semantic information on top of syntactic information • PropBank involves annotation of predicate argument structure • Mainly concerned with verbs & their arguments • And the semantic nature of the arguments

  6. What are verbs? • Verbs are predicating elements e.gdaud, pii, baras etc • Encode (very broadly) actions and states • actions: hit, run, throw ; states: think, see, smell • Realization of these actions and states requires participants • Ram ne kaamkiyaa • ‘karnaa’ is realized by the doer & the thing done

  7. What are arguments? • In a sentence, e.g Ram ate an apple / Raam ne sebkhaaya: • A verb, ‘eat’ or ‘khaa’ predicate • A person eating ‘Raam’ ARGUMENT • Thing eaten ‘apple’ / ‘seb’ ARGUMENT • Without arguments, the meaning of the verb ‘ate’ is not realized completely • Together, they make up the predicate argument structure of the sentence

  8. Arguments show what’s important • Raam ne jaldi se sebkhaaya • Raam, seb are arguments • But ‘jaldi se’ is not • It’s all about the verb • It projects its need for certain arguments • Sift what’s mandatory from what’s optional

  9. Like Unix commands • Some commands require only one argument. • cd/home/student/ashwini • cphmwk1.txthmwk2.txt • If the command is typed with too many or too few arguments… you get an error

  10. Making information explicit • As speakers of Hindi or English, we already have knowledge of predicate argument structure • E.g. hari ___ pahuMcaa • Capturing this knowledge for the machine is essential • Ram ne sebkhaayaaurpaanipiyaa • Who drank the water?

  11. Tasks to be carried out • Argument identification & annotation • Null argument identification & annotation

  12. Training • For argument id & annotation: • Learn PropBank labels • Get familiar with annotation tools (Jubilee) • Identifying and labelling correctly • For null argument id & annotation • Recognizing syntactic constructions • Getting familiar with annotation strategy • Practice with doing both arguments & nulls

  13. Training • For argument id & annotation: Mid October • Learn PropBank labels • Get familiar with annotation tools (Jubilee) • Identifying and labelling correctly • For null argument id & annotation Mid Nov • Recognizing syntactic constructions • Getting familiar with annotation strategy • Practice with doing both arguments & nulls

  14. Training • First step: reading documentation • Followed by hands-on practice using Jubilee Note this wiki page: http://resourcesforannotators.wikispaces.com/

  15. Timesheets & tips • Bi-weekly timesheets • I will cross check the number of hours logged in • Turn in the timesheets at my Hellems mailbox in physical form, with your signature • Hellems is located opposite the UMC. The Linguistics dept mailboxes are on the 2nd floor

  16. Timesheets & tips • To get into the payroll system at ICS: • You need to meet Catherine Latzer at CINC • Centre for Innovation and Creativity, 1777 Exposition Drive, Room 171C,catherine.latzer@colorado.edu Phone: 303/735-4282 • Please go with the following 3 items • Your Identification documents e.g. passport • Social Security Number • A voided check. Tear out a check and write VOID across it diagonally

More Related