1 / 28

COLING 20 02, Taipei , 1 September 2002 Christian BOITET & TSAI Wang-Ju

C o e dition langu age  UNL to share the r e vision across the langu age s of a multilingual document : a unifying concept. COLING 20 02, Taipei , 1 September 2002 Christian BOITET & TSAI Wang-Ju GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.fr , Wang-Ju.Tsai@imag.fr

roger
Télécharger la présentation

COLING 20 02, Taipei , 1 September 2002 Christian BOITET & TSAI Wang-Ju

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coedition language UNL tosharethe revision across the languagesof a multilingual document : a unifying concept COLING 2002, Taipei, 1September 2002 Christian BOITET & TSAI Wang-Ju GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.fr, Wang-Ju.Tsai@imag.fr http://www-clips.imag.fr/geta/User/wang-ju.tsai/welcome.html

  2. Plan • Motivation : to achieve quality MT in a multi-source & multi-target environment • Why a pivot architecture and why UNL? • The prototype under construction : • principle : to divide the correspondence (graphtext) in 3 parts. • interface : necessity of modes from « naive » to « expert » • A problem : computing the 2nd correspondence • UNL-tree morpho-syntactic lattice

  3. L1 L1 L2 L2 L0 L0 P … … Ln Ln Motivation : multi-source & multi-target quality MT • intrinsic limitsof the classic procedure • 30 standard pages of 250 words, in hours (transl.+revision): • 40*nby THum, (12+24/3)*n = 20 nby THAM (6=,12≈,12≠) • 8*nwith THAM + MAT-R, and(30/3)*n = 10*nifMAT-R only • ==> (30/3)*n = 10or even(30/2)*n = 15, independently ofn ?

  4. Basic Idea : to edit the « pivot » indirectly, by editing the text • examples of successful « co-edition » • Ambassador (old), MULTIMETEO • 2 problems: • To build the correspondence • abstract stucture concret text • Without building new and costly resources • To constructa really usable interface

  5. Whyuse UNL as the pivot ? • briefreminder • UNL is • a project • an artificial langage • a format of amultilingual document • UNL language has unique features • even if it is perfectible !

  6. Language : a simpleUNL graph • Ronaldo has headed the ball into the left corner of the goal

  7. explain :01 explain :01 Isaac:01 agt obj agt obj fall fall obj Isaac :01 and obj and apple apple agt obj see see :01 Isaac:01 agt obj UNL (hyper) graph UNL tree (auxiliary) aUNL graph with recursion and its auxiliary UNL-tree Isaac sees that an apple falls and he explains it. agt(explain(icl>do).@entry,Isaac(icl>proper noun)) obj(explain(icl>do).@entry,:01) obj:01(fall(icl>occur).@entry,apple) and(explain(icl>do).@entry,see(icl>do)) agt(see(icl>do),Isaac(icl>proper noun) obj(see(icl>do),:01)

  8. What is the UNL language ? • Small ongoing controversy… • A way to look at a UNL (hyper)graph : • it corresponds to an utteranceU-L in language L • by representing the abstract structure • of an equivalent English utterance U-E • as « viewed from L » • ==> the semantic attributes not necessarily expressed in L may be absent : frequent under specification • aspect coming from French, • determination or number coming from Japanese, • etc.

  9. The reasons of using UNL inMT • (and it can be used in many other ways!) • Technical success of pivot MT does exist • (ATLAS, PIVOT, ULTRA, KANT) • UNL derives from the pivot of ATLAS-II (Fujitsu) • & was designed by the same author (H. Uchida) • Possible quality & coverage : • ATLAS-II has been the best E J system since > 10 years • It has 586.000 terms in each dictionary • Gives certainly a lesser asymptomatic quality than transfer of/via « multi-level structure» • BUT UNL can be « coedited » from any source language

  10. Theoriginal UNL-html format

  11. Theequivalent UNL-xml format • As simple as UNL-html • Open to all XML-related tools <unl:GS lang=“cn”> 我昨天在公園裡跑步</unl:GS> <unl:GS lang=“de”> Ich lief in den Park gestern. </unl:GS> <unl:GS lang=“el”> I ran in the pary yesterday.</unl:GS> <unl:GS lang=“es”> Yo corri ayer en el parque.</unl:GS> <unl:GS lang=“fr”> J’ai couru dans le parc hier. </unl:GS> </unl:S></unl:P></unl:D> <unl:Don=“WJT” dt=“04032002”> <unl:P number=“1”> <unl:S number=“1’> <unl:org: lang=“cn”> 我昨天在公園裡跑步 </unl:org> <unl:unl sn=“Ariane” pn=“WJT” dt=“04032002”> agt(run.@entry.@past,i) plc(run.@entry.@past,park.@def) tim(run.@entry.@past,yesterday) </unl:unl>

  12. Output of the UNL-viewer and display in a browser

  13. Scenario • User reads a multilingual document in language Li • User wishes to correct some errors in Li • User switches to the coedition environment • User’s corrections will be executed • later on the text • immediatelyon the graph • User asks for deconversion into Li • User iterates corrections if not satisfied, asks for deconversion into L1…Ln when OK • User returns to reading mode

  14. Example of « FB2004 » cited in the article • FB2004 = Forum Barcelona 2004 • project of « UNL usage normalisation » • Spanish, Italian, Russian, French, Hindi • 6-8 months in year 2001

  15. Example • Une cité retrouvera une zone côtière après un forum • A city will retrieve a coastal area after a forum • Some (not all) possible corrections: • La cité retrouvera une zone côtière après leForum • The city will retrieve a coastal area after the Forum

  16. The initial UNL graph agt(retrieve(icl>do).@entry.@future.@complete, city) tim(retrieve(icl>do).@entry.@future.@complete, after) obj(retrieve(icl>do).@entry.@future.@complete, zone(icl>place).@indef ) obj(after, forum) mod(zone(icl>place).@indef , coastal)

  17. The modifications needed • add".@def" on the nodes containing "city", "forum". • add « icl>proper name » as a restriction of the UW "forum".

  18. The corrected UNL graph agt(retrieve(icl>do).@entry.@future.@complete, city.@def) tim(retrieve(icl>do).@entry.@future.@complete, after) obj(retrieve(icl>do).@entry.@future.@complete, zone(icl>place).@indef ) obj(after, forum(icl>proper name).@def) mod(zone(icl>place).@indef , coastal)

  19. The interface • Accessible directly from a browser • Normal mode – without seeing the intermediate structure • Expert mode (for hackers, fans, young people…)

  20. English Show Graph Deconversion Find Lemma Find Correspondence Save Graph After a Forum, a city will retrieve a coastal zone. Une cité retrouvera une zone côtière après un forum. After the Forum, the city will have recovered a coastal zone. Spanish Ciudad recobrarà una zona de costal después Foro. La ciudad habrá recobrado una zona de costal después el Foro Italian Città ricuperarà une zona costiera dopo Forum. La città ha ricuperarà une zona costiera dopo il Forum. Japanese フォーラムの後で, 都市は沿岸水域を取り出 Original text Une cité retrouvera une zone côtière après un forum. フォーラムの後で, 都市は沿 岸水域を取り出すことを 持っている。 Possible Modifications Chinese 在博覽會之後,城市 將獲得一片海岸域 Second Deconversion 在博覽會之後,城市 將獲得 一片海岸域 La cité retrouvera une zone côtière après le Forum. Manual Insertion Simple text view Multiple text view Save Quit

  21. English Show Graph Deconversion Find Lemma Find Correspondence Save Graph After a Forum, a city will retrieve a coastal zone. Une cité retrouvera une zone côtière après un forum. After the Forum, the city will have retrieved a coastal zone. remember dormatory Spanish zone retrieve Ciudad recobrarà una zona de costal después Foro. a coastal after a Forum a area city La ciudad habrá una zona de costal después el Foro find Italian un cité retrouver un zone côtier après un Forum Città ricuperarà une zona costiera dopo Forum. La città ha ricuperarà une zona costiera dopo il Forum. Japanese フォーラムの後で, 都市は沿岸水域を取り出 フォーラムの後で, 都市は沿 岸水域を取り出すことを 持っている。 Original text Une cité retrouvera une zone côtière après un forum. Chinese Possible Modifications 在博覽會之後,城市 將獲得一片海岸域 Second Deconversion 在博覽會之後,城市 將獲得 一片海岸域 Manual Insertion Graph : lemma, lattice Save Quit

  22. Show Graph Deconversion Find Lemma Find Correspondence Save Graph English After a Forum, a city will retrieve a coastal zone. Une citéretrouvera une zonecôtièreaprès un forum. After the Forum, the city will have recovered a coastal zone. remember dormitory zone Spanish a retrieve a coastal after a Forum Ciudad recobrará una zona de costal después Foro. city area find un cité retrouver un zone côtier après un Forum La ciudad habrá recobrado una zona de costal después el Foro. indef art noun verb indef art noun adj prop indef art noun sin sin future sin sin sin sin sin Italian Città ricuperarà une zona costiera dopo Forum. La città ha ricuperato une zona costiera dopo il Forum. zone(icl>place)(.@indef, obj) coastal(mod) retrieve (icl>do)(.@entry.@future) Japanese フォーラムの後で, 都市は沿岸水域を取り出 after(tim) Forum(obj) city(agt) フォーラムの後で, 都市は沿 岸水域を取り出すことを 持っている。 Original text Une citéretrouveraunezonecôtièreaprès un forum. Chinese To Do la le Maj 在博覽會之後,城市 將獲得一片海岸域 Second Deconversion La cité retrouvera une zone côtière après le Forum. 在博覽會之後,城市 將獲得 一片海岸域 Manual Insertion Graph : correspondence Simple text view Multiple text view Save Quit

  23. Principlesof coedition • In general, it is impossible to deduce a modification on the graph from a modification on the text • For example, to replace "un" ("a") by "le" ("the") doesn’t imply that the following noun is determined (.@def) • "il aime la montagne" = "he likes mountains" • The revision is not made directly by modifying the text, but by usinga system of menus • The menu items have a « language side » and ahidden « UNL side »

  24. Construction of the correspondence • Division in three sub-correspondences • UNL graph • =1= • UNL treeobtained from unfolding the graph • =2= • SMS lattice or morpho-syntactic automaton • =3= • Text • For =2=, computation of the links by using • 1 or 2 dictionaries L0English or UNL • Known relations between L0 attributesand UNL attributes • Maximal projectivityprinciple (tree rotation)

  25. Following demo on Internet Explorer • Scenario • Views of the structures • Construction of the correspondence • Conclusion (anticipated!) • « Shared revision » seems possible • Coedition = concept unifyingMT andmultilingual generation

More Related