1 / 46

Finding your way through the woods with GrETEL

Finding your way through the woods with GrETEL. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde. TABU-dag - June 14, 2013. GrETEL. Gr eedy E xtraction of T rees for E mpirical L inguistics Query engine for treebanks

booth
Télécharger la présentation

Finding your way through the woods with GrETEL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013

  2. GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query engine fortreebanks • Nederbooms projectExploitationof Dutch treebanksfor research in linguistics

  3. GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query engine fortreebanks • Nederbooms projectExploitationof Dutch treebanksfor research in linguistics • Goals • User-friendly tools • Access to large data files • Fastand accurate

  4. GrETEL • Greedy Extraction of Trees for Empirical Linguistics • Query engine for treebanks • Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch)

  5. TREEBANKS

  6. GrETEL • Greedy Extraction of Trees for Empirical Linguistics • Query engine for treebanks • Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch) • Parsere.g. Alpino (Van Noord 2006)

  7. ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.”

  8. ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.” XML trees Query language: XPath

  9. XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

  10. XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

  11. XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

  12. XPATH

  13. GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query treebanksbyexample

  14. GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query treebanksbyexample • First version => onlyfor LASSY treebank • New release => GrETELfor CGN treebank => update based on user reviews

  15. GrETEL the user • Example sentence • Indicate relevant itemsof the sentence • (Adapt XPath) • Select treebank • Inspect results • Parser (Alpino) • AutomaticallygenerateXPathexpression • Present results

  16. OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work

  17. CASE STUDY • Verbswithfixedpreposition • E.g. Hij keek met een bang hartje naar de heks. ‘he was lookingat the witchwith a heavy heart .’ • VERB + (…+) PREP LASSY: • Xpath query //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

  18. CASE STUDY • Verbs with fixed preposition • E.g. Hij keeknaar de heks. ‘he was lookingat the witch .’ • Discontinuous constructions! • E.g. Hij keek met een bang hartje naar de heks. ‘he was lookingat the witch with a heavy heart .’ • VERB + (…+) PREP

  19. GrETEL ONLINE

  20. INPUT

  21. ANNOTATION MATRIX

  22. ANNOTATION GUIDELINES

  23. XPATH GENERATOR

  24. Other treebank, other format … Hij keek met een bank hartje naar de heks • CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] • LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

  25. Other treebank, other format … Hij keek met een bang hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

  26. TREEBANK SELECTION

  27. RESULTS Verb plus fixed preposition • E.g. Hij keek naar de heks. ‘A number of trees fell down.’ • VERB + (…+) PREP  4004 matches in 3881 sentences

  28. RESULTS: table

  29. RESULTS: data

  30. RESULTS: trees

  31. OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work

  32. SEARCH OPTIONS  Below annotation matrix

  33. SEARCH OPTIONS Green versus red word order in Dutch • green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft • red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)

  34. SEARCH OPTIONS

  35. SEARCH OPTIONS

  36. SEARCH OPTIONS

  37. SEARCH OPTIONS

  38. SEARCH OPTIONS

  39. OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work

  40. CONCLUSIONS • GrETEL: search engine for Dutch treebanks • Input = naturallanguageexample • Output = sample of similarsentences • Syntacticconcordancer • Available online (via Mozilla Firefox) • No installationrequired

  41. FUTURE WORK • GrETEL2.0 • IncludeSoNaR corpus (ca 500M tokens) • More generic • AfriBooms • GrETELfor Afrikaans • Includeothertreebank formats

  42. CASE STUDY • Collective noun constructions • E.g. Een aantal bomen zijn omgevallen. ‘A number of trees fell down.’ • DET + NOUN + PLURAL NOUN • Discontinuous constructions! • E.g. Een groot aantal oude bomen zijn omgevallen. ‘A large number of old trees fell down.’

  43. Try it yourself at http://nederbooms.ccl.kuleuven.be/eng/gretel Thanks for your attention!

  44. Waaraan vs Waar … aan Waar denk je aan ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="adv"] and node[@rel="body" and @cat="sv1" and node[@rel="pc" and @cat="pp" and node[@rel="hd" and @pos="prep"]]]] and node[@rel="--" and @pos="punct"]] (4 results) • Waar bemoei je je mee? • Wanneer gaat een koortsstuip over in epilepsie?

  45. Waaraan denk je ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="pp"]] and node[@rel="--" and @pos="punct"]] (38 results) • Waarom werken we ? • Waartoe verbind ik mij als ouder door dit formulier in te vullen ? • Vanwaar die gulle hand van een Turkse overheid die in de schulden zwemt ?

  46. Hij klom de boom in //node[@cat="top" and node[@rel="--" and @cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="ld" and @cat="np" and node[@rel="det" and @pos="det"] and node[@rel="hd" and @pos="noun"]] and node[@rel="svp" and @pos="part"]] and node[@rel="--" and @pos="punct"]] (37 results) • Door haar winst komt Clijsters de top-20 binnen . • In feite ging minder dan de helft van Dorsets de rivier over . • Nederland gaat de bezettingstijd in .

More Related