1 / 9

En -> Cz MT system based on tectogrammatics

En -> Cz MT system based on tectogrammatics. Zden ěk Žabokrtský IFAL, Charles University in Prague. Goals. primary goal to build a high-quality linguistically motivated MT system using the PDT layered framework secondary goal

kermit
Télécharger la présentation

En -> Cz MT system based on tectogrammatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. En->Cz MT systembased on tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague

  2. Goals • primary goal • to build a high-quality linguistically motivated MT system using the PDT layered framework • secondary goal • to create a system for testing the true usefulness of various NLP tools within a real-life application

  3. MT triangle in terms of PDT ? transfer t-layer a-layer analysis synthesis m-layer w-layer source language target language

  4. Building the first prototype... • chosen direction: English -> Czech • main design decisions: • several well-defined, linguistically relevant intermediate levels • modularity - decompose the task into many isolated subtasks • neutral w.r.t. chosen methodology (e.g. rules vs. statistics) • available resources • experience (and sw tools) from PDT and PCEDT • data (parallel corpora, translation dictionaries) • freely available NLP tools for analysis on the English side • an existing module for sentence synthesis on the Czech side

  5. MT “triangle” in the prototype English t-layer Czech t-layer English p-layer English a-layer English m-layer input English text output Czech text

  6. Building blocks (1) • EnglishW->EnglishM • segment the input text into sentences (Lingua::EN::Tagger from CPAN) • tokenize+tag the sentences (Lingua::EN::Tagger from CPAN) • lemmatize each token by using morpha tools and ispell • EnglishM->EnglishP • phrase-structure parsing (Lingua::CollinsParser from CPAN) • EnglishP->EnglishA • mark phrase heads (Collins’s heads + arrangements) • run constituencydependency transformation • assign (selected) analytical functions • mark subject nodes

  7. Building blocks (2) • EnglishA->EnglishT • determine the t-tree topology (collapsing fw. subtrees) • label t-nodes with t-lemmas • assign coordination/apposition functors • mark t-nodes corresponding to finite clauses • assign (some of) the remaining functors • fill the nodetype attribute • detect grammatical co-reference in relative clauses • determine the semantic part of speech • fill grammateme attributes (number, tense, degree...) • detect sentence modality

  8. Building blocks (3) • EnglishT->CzechT • transfer of formemes • transfer of lexemes • transfer of grammatemes • CzechT->CzechW • finding agreement links • adding auxiliary verbs (in complex verb forms) • adding prepositions and conjunctions • deriving word forms (conjugation, declination) • computing word order • adding punctuation • vocalization of prepositions • concatenation of word forms and sentences

  9. Translation sample • A Turkish girl has died from bird flu, days after her brother and sister died from the disease. The girl, 11, who lived on a poultry farm in eastern Turkey's Van province, was being treated in hospital after her siblings became infected with bird flu. The cases are the first human deaths from bird flu outside Asia, where the virus has killed more than 70 people. The hospital in Van is treating 15 others, three of whom are in a critical condition, according to a doctor there. The latest victim, Hulya Kocyigit, died early on Friday at the hospital. • Turecká dívka zemřela z ptačí chřipky dny after, že její bratr a sestra zemřeli z nemoci. Ďívka 11, kdo žilo v drůbeží farmě ve van provincii východního Turecka, jsoucno zacházet v nemocnici, že její sourozenci slušeli nakažený s ptačí chřipkou. Případy jsou přední lidské smrti z ptačí chřipky mimo Asii, kde virus zabilo than 70 lid. Nemocnice ve Van zachází 15 zbývajících, whom three of v kritické podmínce souzvuk lékaře tam. Nejpozdnější oběť Kocyigit Hulya zemřela brzy v pátku v nemocnici.

More Related