1 / 28

עיבוד שפות טבעיות - שיעור שבע Partial Parsing

עיבוד שפות טבעיות - שיעור שבע Partial Parsing. אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן. Syntax. The study of grammatical relations between words and other units within the sentence. The Concise Oxford Dictionary of Linguistics

tatum
Télécharger la présentation

עיבוד שפות טבעיות - שיעור שבע Partial Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. עיבוד שפות טבעיות - שיעור שבעPartial Parsing אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן 88-680

  2. Syntax • The study of grammatical relations between words and other units within the sentence.The Concise Oxford Dictionary of Linguistics • the way in which linguistic elements (as words) are put together to form constituents (as phrases or clauses)Merriam-Webster Dictionary 88-680

  3. Brackets • “I prefer a morning flight” • [S [NP [pro I]][VP [V prefer][NP [Det a] [Nom [N morning] [ N flight]]]]]] 88-680

  4. Parse Tree S VP NP NP Nom Det Noun Noun Pronoun Verb I prefer a morning flight 88-680

  5. Parsing • The problem of mapping from a string of words to to its parse tree is called parsing. 88-680

  6. Generative Grammar • A set of rules which indicate precisely what can be and cannot be a sentence in a language. • A grammar which precisely specifies the membership of the set of all the grammatical sentences in the language in question and therefore excludes all the ungrammatical sentences. 88-680

  7. Formal Languages • The set of all grammatical sentences in a given natural language. • Are natural languages regular? 88-680

  8. English is not a regular language! • anbn is not regular • Look at the following English sentences: • John and Mary like to eat and sleep, respectively. • John, Mary, and Sue like to eat, sleep, and dance, respectively. • John, Mary, Sue, and Bob like to eat, sleep, dance, and cook, respectively. 88-680

  9. Constituents • Certain groupings of words behave as constituents. • Constituents are able to occur in various sentence positions: • ראיתי את הילד הרזה • ראיתי אותו מדבר עם הילד הרזה • הילד הרזה גר ממול 88-680

  10. The Noun Phrase (NP) • Examples: • He • Ariel Sharon • The prime minister • The minister of defense during the war in Lebanon. • They can all appear in a similar context:___ was born in Kfar-Malal 88-680

  11. Prepositional Phrases • Examples: • the man in the white suit • Come and look at my paintings • Are you fond of animals? • Put that thing on the floor 88-680

  12. Verb Phrases • Examples: • Getting to school on timewas a struggle. • Hewas trying to keep his temper. • That womanquickly showed me the way to hide. 88-680

  13. Chunking • Text chunking is dividing sentences into non-overlapping phrases. • Noun phrase chunking deals with extracting the noun phrases from a sentence. • While NP chunking is much simpler than parsing, it is still a challenging task to build a accurate and very efficient NP chunker. 88-680

  14. What is it good for • The importance of chunking derives from the fact that it is used in many applications: • Information Retrieval & Question Answering • Machine Translation • Preprocessing before full syntactic analysis • Text to speech • Many other Applications 88-680

  15. What kind of structures should a partial parser identify? • Different structures useful for different tasks: • Partial constituent structure[NPI] [VPsaw [NPa tall man in the park]]. • Prosodic segments[I saw] [a tall man] [in the park]. • Content word groups[I] [saw] [a tall man] [in the park]. 88-680

  16. Chunk Parsing • Goal: divide a sentence into a sequence of chunks. • Chunks are non-overlapping regions of a text: • [I] saw [a tall man] in [the park]. • Chunks are non-recursive • a chunk can not contain other chunks • Chunks are non-exhaustive • not all words are included in chunks 88-680

  17. Chunk Parsing Examples • Noun-phrase chunking: • [I] saw [a tall man] in [the park]. • Verb-phrase chunking: • The man who [was in the park] [saw me]. • Prosodic chunking: • [I saw] [a tall man] [in the park]. 88-680

  18. Chunks and Constituency Constituents: [a tall man in [the park]]. Chunks: [a tall man] in [the park]. • Chunks are not constituents • Constituents are recursive • Chunks are typically subsequences of Constituents • Chunks do not cross constituent boundaries 88-680

  19. Chunk Parsing: Accuracy • Chunk parsing achieves higher accuracy • Smaller solution space • Less word-order flexibility within chunks than between chunks • Better locality: • Fewer long-range dependencies • Less context dependence • No need to resolve ambiguity • Less error propagation 88-680

  20. Chunk Parsing: Domain Specificity Chunk parsing is less domain specific: • Dependencies on lexical/semantic information tend to occur at levels "higher" than chunks: • Attachment • Argument selection • Movement • Fewer stylistic differences within chunks 88-680

  21. Chunk Parsing: Efficiency • Chunk parsing is more efficient • Smaller solution space • Relevant context is small and local • Chunks are non-recursive • Chunk parsing can be implemented with a finite state machine 88-680

  22. Psycholinguistic Motivations Chunk parsing is psycholinguistically motivated: • Chunks as processing units • Humans tend to read texts one chunk at a time • Eye-movement tracking studies • Chunks are phonologically marked • Pauses, Stress patterns • Chunking might be a first step in full parsing 88-680

  23. Chunk Parsing Techniques • Chunk parsers usually ignore lexical content • Only need to look at part-of-speech tags • Techniques for implementing chunk parsing: • Regular expression matching / Finite State Machines • Transformation Based Learning • Memory Based Learning • Others 88-680

  24. Regular Expression Matching • Define a regular expression that matches thesequences of tags in a chunk • A simple noun phrase chunk regexp:<DT>? <JJ>* <NN.?> • Chunk all matching subsequences:the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN[the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT mat/NN] • If matching subsequences overlap, the first one gets priority 88-680

  25. Chunking as Tagging • Map Part of Speech tag sequences to {I,O,B}* • I – tag is part of an NP chunk • O – tag is not part of • B – the first tag of an NP chunk which immediately follows another NP chunk • Example: • Input: The little cat sat on the mat • Output: B I I O O B I 88-680

  26. Chunking State of the Art • Depending on task specification and test set: 90-95% 88-680

  27. Homework 88-680

  28. Context Free Grammars • Putting the constituents together • Next Week… 88-680

More Related