460 likes | 585 Vues
GrETEL, which stands for Greedy Extraction of Trees for Empirical Linguistics, offers a powerful query engine that allows researchers to interact with syntactically annotated corpora, such as the Penn Treebank and LASSY. This tool provides user-friendly access to large data files and facilitates fast and accurate extraction of linguistic information. By employing XPath queries, users can evaluate complex constructions like verbs with fixed prepositions. GrETEL also supports ongoing user feedback for continuous improvement, making it essential for empirical linguistic research.
E N D
Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query engine fortreebanks • Nederbooms projectExploitationof Dutch treebanksfor research in linguistics
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query engine fortreebanks • Nederbooms projectExploitationof Dutch treebanksfor research in linguistics • Goals • User-friendly tools • Access to large data files • Fastand accurate
GrETEL • Greedy Extraction of Trees for Empirical Linguistics • Query engine for treebanks • Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch)
GrETEL • Greedy Extraction of Trees for Empirical Linguistics • Query engine for treebanks • Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch) • Parsere.g. Alpino (Van Noord 2006)
ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.”
ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.” XML trees Query language: XPath
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query treebanksbyexample
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query treebanksbyexample • First version => onlyfor LASSY treebank • New release => GrETELfor CGN treebank => update based on user reviews
GrETEL the user • Example sentence • Indicate relevant itemsof the sentence • (Adapt XPath) • Select treebank • Inspect results • Parser (Alpino) • AutomaticallygenerateXPathexpression • Present results
OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work
CASE STUDY • Verbswithfixedpreposition • E.g. Hij keek met een bang hartje naar de heks. ‘he was lookingat the witchwith a heavy heart .’ • VERB + (…+) PREP LASSY: • Xpath query //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
CASE STUDY • Verbs with fixed preposition • E.g. Hij keeknaar de heks. ‘he was lookingat the witch .’ • Discontinuous constructions! • E.g. Hij keek met een bang hartje naar de heks. ‘he was lookingat the witch with a heavy heart .’ • VERB + (…+) PREP
Other treebank, other format … Hij keek met een bank hartje naar de heks • CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] • LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
Other treebank, other format … Hij keek met een bang hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
RESULTS Verb plus fixed preposition • E.g. Hij keek naar de heks. ‘A number of trees fell down.’ • VERB + (…+) PREP 4004 matches in 3881 sentences
OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work
SEARCH OPTIONS Below annotation matrix
SEARCH OPTIONS Green versus red word order in Dutch • green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft • red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)
OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work
CONCLUSIONS • GrETEL: search engine for Dutch treebanks • Input = naturallanguageexample • Output = sample of similarsentences • Syntacticconcordancer • Available online (via Mozilla Firefox) • No installationrequired
FUTURE WORK • GrETEL2.0 • IncludeSoNaR corpus (ca 500M tokens) • More generic • AfriBooms • GrETELfor Afrikaans • Includeothertreebank formats
CASE STUDY • Collective noun constructions • E.g. Een aantal bomen zijn omgevallen. ‘A number of trees fell down.’ • DET + NOUN + PLURAL NOUN • Discontinuous constructions! • E.g. Een groot aantal oude bomen zijn omgevallen. ‘A large number of old trees fell down.’
Try it yourself at http://nederbooms.ccl.kuleuven.be/eng/gretel Thanks for your attention!
Waaraan vs Waar … aan Waar denk je aan ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="adv"] and node[@rel="body" and @cat="sv1" and node[@rel="pc" and @cat="pp" and node[@rel="hd" and @pos="prep"]]]] and node[@rel="--" and @pos="punct"]] (4 results) • Waar bemoei je je mee? • Wanneer gaat een koortsstuip over in epilepsie?
Waaraan denk je ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="pp"]] and node[@rel="--" and @pos="punct"]] (38 results) • Waarom werken we ? • Waartoe verbind ik mij als ouder door dit formulier in te vullen ? • Vanwaar die gulle hand van een Turkse overheid die in de schulden zwemt ?
Hij klom de boom in //node[@cat="top" and node[@rel="--" and @cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="ld" and @cat="np" and node[@rel="det" and @pos="det"] and node[@rel="hd" and @pos="noun"]] and node[@rel="svp" and @pos="part"]] and node[@rel="--" and @pos="punct"]] (37 results) • Door haar winst komt Clijsters de top-20 binnen . • In feite ging minder dan de helft van Dorsets de rivier over . • Nederland gaat de bezettingstijd in .