html5
1 / 44

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema. The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique. Computer Aided Translation Unit School of Computer Sciences U niversity S cience M alaysia. Presentation Outline.

Télécharger la présentation

The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique Computer Aided Translation Unit School of Computer Sciences University Science Malaysia

  2. Presentation Outline • Introduction • Structured String-Tree Correspondence (SSTC) • Synchronous Structured String-Tree Correspondence (SSTC) • EBMTbased on synchronousSSTC • The Construction of a BKB Based on the Synchronous SSTC • Bitext World-level Mapping (Word Alignment) • Bitext Synchronous Parsing Technique

  3. interval of the substring that corresponds to the node. interval of the substring that corresponds to the subtree having the node as root. X:SNODE = Y:STREE = Tree Tree 2-3 0-4 eat(2-3 /0-4) eat(2-3/0-4) mice (3-4/3-4) cats (1-2/0-2) cats (1-2/0-2) mice (3-4/3-4) all (0-1/0-1) all (0-1/0-1) String String 2 eat 3 0 all 1 cats 2 eat 3 mice 4 all cats eat mice 0-1 1-2 2-3 3-4 0 all 1 cats 2 eat 3 mice 4 X:SNODE Y:STREE TheStructured String-Tree Correspondence (SSTC) SSTC= string + arbitrary tree structure + correspondence Correspondence= node(X/Y)

  4. Tree Tree eat(2-3/0-4) eat(2-3/0-4) cats (1-2/0-2) mice (3-4/3-4) cats (1-2/0-2) mice (3-4/3-4) 0-2 1-2 all (0-1/0-1) all (0-1/0-1) String String 1cats2 0all 1 cats 2 0 all 1 cats 2 eat 3 mice 4 all cats eat 3 mice 4 X:STREE X:SNODE

  5. English source sentence “ he picks the ball up” Malay target sentence “dia kutip bola itu” Translation units MALAY ENGLISH E M IndexStree pick[v] up[p] (1-2+4-5/0-5) kutip[v] (1-2/0-4) (0-5,0-4) (0-1,0-1) (2-4,2-4) he[n] (0-1/0-1) ball[n] (3-4/2-4) dia[n] (0-1/0-1) bola[n] (2-3/2-4) (2-3,3-4) IndexSnode (1-2+4-5,1-2) the[det] (2-3/2-3) itu[det] (3-4/3-4) (0-1,0-1) (3-4,2-3) 0he1pick2the3ball4up5 0dia1kutip2bola3itu4 (2-3,3-4)

  6. English source sentence “ I did not give it to him” French target sentence “Je ne le lui ai pas donné” ENGLISH Translation units FRENCH IndexStree F E not [neg] (2-3/0-7) ne[neg] pas[neg] (1-2+5-6/0-7) (0-7,0-7) (0-2+3-7, 0-1+2-5+6-7) Did [v] give [v] (1-2+3-4/3-7) ai[v]donné [v] (4-5+6-7/0-1+2-5+6-7) (0-1,0-1) : IndexSnode I [n] (0-1/0-1) it [n] (4-5/4-5) to [p] (5-6/5-7) Je [n] (0-1/0-1) le [n] (2-3/2-3) lui [n] (3-4/3-4) (2-3, 1-2+5-6) (1-2+3-4, 4-5+6-7) him [n] (6-7/6-7) (0-1,0-1) (4-5,2-3) (5-6, - ) 0I1did2not3give4it5to6him7 0Je1ne2le3lui4ai5pas6donné7 (6-7,3-4)

  7. English source sentence “ hopefully Kim miss Dale” French target sentence “on espére que Dale manque á Kim” ENGLISH FRENCH F E miss [v](2-3/0-4) manque[v] á[p] (4-5+5-6/0-7) hopefully [adv] (0-1/0-1) Dale [n] (3-4/3-4) on[n]espére[v]que[c] (0-1+1-2+2-3/0-3) Kim [n] (6-7/6-7) Kim [n] (1-2/1-2) Dale [n] (3-4/3-4) 0 hopefully1 Kim2 miss3 Dale4 0on1espére2que3Dale4manque5á6Kim7 IndexStree (0-1,0-3) (3-4,3-4) (1-2,6-7) (0-4,0-7) Translation units (1-2,6-7) IndexSnode (0-1,0-1+1-2+2-3) (2-3,4-5+5-6) (3-4,3-4)

  8. Example-Based Machine Translation (EBMT) EBMT is the case-based reasoning approach to MT EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.

  9. Find closest related SL examples Retrieve Corresponding TL examples Combination Source sentence Target sentence For Source language For Target language correspondence BKB The general ArchitectureforEBMT

  10. Tagged source sentence source sentence tagger List of Sub-synchronous SSTCs constructed from the chosen example List of sub-synchronous SSTCs generated based on the source sentence BKB A chosen closest synchronous SSTC example The resultant synchronous SSTC target sentence EBMT based on synchronous SSTC. Different senses for the word “bank” : bank 1: a land beside the river. bank 2: a place to keep money. E.g:The1 man2 keep1 his1 money1 in1 the1 bank2. Replacement & Combination

  11. 1 2 English sentence: The lamp is off. Malay translation: Lampu itu padam. English sentence: He pick the ball up. Malay translation: Dia kutip bola itu. 3 4 English sentence: The green signal turn on. Malay translation: Isyarat hijau itu bertukar. English sentence: The old man drink tea. Malay translation: Lelaki tua itu minum teh. Source sentence: The old man picks the green lamp up

  12. 1E IndexStree 1M (0-5,0-4) pick(1)[v] up(1)[p] (1-2+4-5/0-5) kutip(1)[v] (1-2/0-4) (0-1,0-1) English sentence: He pick the ball up. Malay translation: Dia kutip bola itu. (2-4,2-4) (2-3,3-4) dia(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) ball(1)[n] (3-4/2-4) IndexSnode (1-2+4-5,1-2) itu(1)[det] (3-4/3-4) the(1)[det] (2-3/2-3) (0-1,0-1) (3-4,2-3) (2-3,3-4) 0he1pick2the3ball4up5 0dia1kutip2bola3itu4 2M 2E IndexStree (0-4,0-4) is[v](2) off(1)[adv] (2-3+3-4/0-4) padam(1)[v] (2-3/0-3) (0-2,0-2) (0-4,0-4) lamp(1)[n] (1-2/0-2) lampu(1)[n] (0-1/0-2) (0-1,1-2) IndexSnode (2-3+3-4,2-3) the(1)[det] (0-1/0-1) itu(1)[det] (1-2/1-2) (1-2,0-1) (0-4,0-4) 0lampu1itu2padam3 0the1lamp2is3off4 (0-1,1-2) Set of synchronous SSTCsrepresents Example-base. English sentence: The lamp is off. Malay translation: Lampu itu padam.

  13. 3E 3M IndexStree turn(1)[v] on(1)[adv] (3-4+4-5/0-5) bertukar(2)[v] (3-4/0-4) (0-5,0-4) (0-3,0-3) (0-1,2-3) English sentence: The green signal turn on. Malay translation: Isyarat hijau itu bertukar. signal(2)[n] (2-3/0-3) isyarat(1)[n] (0-1/0-3) (1-2,1-2) IndexSnode (3-4+4-5,3-4) hijau(1)[adj] (1-2/1-2) itu(1)[det] (2-3/2-3) green(1)[adj] (1-2/1-2) the(1)[det] (0-1/0-1) (2-3,0-1) (0-1,2-3) (1-2,1-2) 0the1green2signal3turn4on5 0Isyarat1hijau2itu3bertukar4 4E IndexStree 4M drink (1)[v] (3-4/0-5) (0-5,0-5) minum (1)[v] (3-4/0-5) (0-3,0-3) (0-1,2-3) man (1)[n] (2-3/0-3) (1-2,1-2) tea (1)[n] (4-5/4-5) lelaki (1)[n] (0-1/0-3) teh (1)[n] (4-5/4-5) (4-5,4-5) IndexSnode the (1)[det] (0-1/0-1) old (1)[adj] (1-2/1-2) (3-4,3-4) itu (1)[det] (2-3/2-3) tua (1)[adj] (1-2/1-2) (2-3,0-1) (0-1,2-3) 0the1old2man3drink4 tea5 (1-2,1-2) 0lelaki1tua2itu3minum4teh5 (4-5,4-5) English sentence:The old man drinks tea. Malay translation: Lelaki tua itu minum teh.

  14. (2) pick[v] up[p] (2-3+5-6/0-6) pick[v] up[p] (2-3+5-6/0-6) (1) turn[v]on[adv] (3-4+4-5/0-5) man[n] (2-3/0-3 ) boy[n] (1-2/0-2) ball[n] (4-5/3-5) signal[n] (2-3/0-3) the[det] (0-1/0-1) old[adj] (1-2/1-2) the[det] (0-1/0-1) the[det] (3-4/3-4) green[adj] (1-2/1-2) the[det] (0-1/0-1) green[adj] (1-2/1-2) pick[v] (3-4/ 0-8 ) 0the1green2signal3turn4on5 0the1boy2pick3the4ball5up6 (4) (3) drink[v] (3-4/0-5) is[v]off[adv] (2-3+3-4/0-4) lamp[n] (6-7/ 4-7 ) man[n] (2-3/0-3) man[n] (2-3/0-3) tea[n] (4-5/4-5) lamp[n] (1-2/0-2) lamp[n] (1-2/0-2) green[adj] (5-6/5-6) the[det] (4-5/4-5) old[adj] (1-2/1-2) the[det] (0-1/0-1) the[det] (0-1/0-1) old[adj] (1-2/1-2) the[det] (0-1/0-1) up[p] (7-8/-) 0the1old2man3drink4tea5 0the1lamp2is3off4 Source: the old man picks the green lamp up

  15. man[n] (2-3/0-3 ) man(1)[n] (2-3/0-3) IndexStree (1) lelaki (1)[n] (0-1/0-3) (0-3,0-3) (0-1,2-3) (1-2,1-2) the(1)[det] (0-1/0-1) tua (1)[adj] (1-2/1-2) old(1)[adj] (1-2/1-2) itu (1)[det] (2-3/2-3) the[det] (0-1/0-1) old[adj] (1-2/1-2) IndexSnode (2-3,0-1) 0the1old2man3 (0-1,2-3) 0lelaki1tua2itu3 (1-2,1-2) IndexStree (2) kutip(1)[v] (3-4/3-4) pick(1)[v] (3-4/3-4) pick[v] (3-4/ 0-8 ) (3-4,3-4) IndexSnode 3pick4 3kutip4 (3-4,3-4) lamp[n] (6-7/ 4-7 ) IndexStree (3) lamp(1)[n] (6-7/4-7) lampu(1)[n] (4-5/4-7) (4-7,4-7) (4-5,6-7) green(1)adj] (5-6/5-6) itu(1)[det] (6-7/6-7) hijau(1)[adj] (5-6/5-6) (5-6,5-6) the(1)[det] (4-5/4-5) IndexSnode green[adj] (5-6/5-6) the[det] (4-5/4-5) (6-7,4-5) (4-5,6-7) 4the5green6lamp7 4lampu5hijau6itu7 (5-6,5-6) IndexStree (4) up(1)[p] (7-8/7-8) up[p] (7-8/-) (7-8,-) IndexSnode 7up8 (7-8,-) Sub-synchronous SSTCs for the source sentence

  16. 1E IndexStree 1M pick(1)[v] up(1)[p] (1-2+4-5/0-5) kutip(1)[v] (1-2/0-4) (0-5,0-4) IndexStree (1) (0-1,0-1) he(1)[n] (0-1/0-1) English sentence: He pick the ball up. Malay translation: Dia kutip bola itu. dia(1)[n] (0-1/0-1) (0-1,0-1) (2-4,2-4) dia(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) (2-3,3-4) ball(1)[n] (3-4/2-4) IndexSnode 0dia1 0he1 IndexSnode (0-1,0-1) (1-2+4-5,1-2) itu(1)[det] (3-4/3-4) the(1)[det] (2-3/2-3) (0-1,0-1) (3-4,2-3) (2) IndexStree 0he1pick2the3ball4up5 0dia1kutip2bola3itu4 (2-3,3-4) kutip(1)[v] (1-2/0-4) pick(1)[v] (1-2/0-5) (0-5,0-4) IndexSnode 1pick2 1kutip2 (1-2,1-2) (3) bula(1)[n] (2-3/2-4) IndexStree ball(1)[n] (3-4/2-4) (2-4,2-4) (2-3,3-4) itu (1)[det] (3-4/3-4) the(1)[det] (2-3/2-3) IndexSnode (2-3,0-1) 2bula3itu4 2the3ball4 (3-4,2-3) IndexStree (4) up(1)[p] (4-5/ -) (- , -) IndexSnode 4up5 (4-5, -) Selected closed example Sub-synchronous SSTCs derived from the example

  17. Sub-synchronous SSTCs. Example sentence Source sentence man(1)[n] (2-3/0-3) IndexStree IndexStree (1) lelaki (1)[n] (0-1/0-3) (1) he(1)[n] (0-1/0-1) dia(1)[n] (0-1/0-1) (0-3,0-3) (0-1,0-1) (0-1,2-3) IndexSnode (1-2,1-2) the(1)[det] (0-1/0-1) tua (1)[adj] (1-2/1-2) old(1)[adj] (1-2/1-2) itu (1)[det] (2-3/2-3) 0dia1 0he1 IndexSnode (0-1,0-1) (2-3,0-1) 0the1old2man3 (0-1,2-3) 0lelaki1tua2itu3 (1-2,1-2) (2) IndexStree IndexStree (2) kutip(1)[v] (3-4/3-4) pick(1)[v] (3-4/3-4) kutip(1)[v] (1-2/0-4) pick(1)[v] (1-2/0-5) (0-5,0-4) (3-4,3-4) IndexSnode IndexSnode 1pick2 1kutip2 3pick4 3kutip4 (1-2,1-2) (3-4,3-4) (3) IndexStree IndexStree (3) lamp(1)[n] (6-7/4-7) bula(1)[n] (2-3/2-4) lampu(1)[n] (4-5/4-7) ball(1)[n] (3-4/2-4) (2-4,2-4) (4-7,4-7) (4-5,6-7) (2-3,3-4) green(1)adj] (5-6/5-6) itu(1)[det] (6-7/6-7) hijau(1)[adj] (5-6/5-6) (5-6,5-6) the(1)[det] (4-5/4-5) IndexSnode the(1)[det] (2-3/2-3) itu (1)[det] (3-4/3-4) IndexSnode (2-3,0-1) (6-7,4-5) (3-4,2-3) 2the3ball4 2bula3itu4 (4-5,6-7) 4the5green6lamp7 4lampu5hijau6itu7 (5-6,5-6) IndexStree (4) IndexStree (4) up(1)[p] (7-8/7-8) up(1)[p] (4-5/ -) (- , -) (7-8,-) IndexSnode IndexSnode 7up8 4up5 (7-8,-) (4-5, -)

  18. Source part Example part IndexStree IndexStree pick(1)[v] (3-4/3-4) kutip(1)[v] (3-4/3-4) kutip(1)[v] (1-2/0-4) pick(1)[v] (1-2/0-5) (0-5,0-4) (3-4,3-4) (2) (2) IndexSnode IndexSnode 3pick4 3kutip4 1pick2 1kutip2 (1-2,1-2) (3-4,3-4) Replacement 1E 1M 1E 1E 1M 1M IndexStree IndexStree IndexStree (0-5,0-4) (0-5,0-4) (0-5,0-4) (0-5,0-4) kutip(1)[v] (1-2/0-4) kutip(1)[v] (3-4/3-4) Pick(1)[v] (1-2/0-5) pick (1)[v] kutip(1)[v] (1-2/0-4) kutip(1)[v] (1-2/0-4) pick(1)[v] up(1)[p] (3-4+4-5/3-4) pick (1)[v] up(1)[p] (1-2+4-5/0-5) pick(1)[v] up(1)[p] (1-2+4-5/0-5) (0-1,0-1) (0-1,0-1) (0-1,0-1) 1-2 0-5 (2-4,2-4) (2-4,2-4) (2-4,2-4) (2-3,3-4) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) (2-3,3-4) (2-3,3-4) ball(1)[n] (3-4/2-4) dia(1)[n] (0-1/0-1) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) bola(1)[n] (2-3/2-4) ball(1)[n] (3-4/2-4) ball(1)[n] (3-4/2-4) he(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) IndexSnode IndexSnode IndexSnode (1-2 ,1-2) (1-2+4-5,1-2) (1-2+4-5,1-2) (1-2+4-5,1-2) itu(1)[det] (3-4/3-4) itu(1)[det] (3-4/3-4) itu(1)[det] (3-4/3-4) the(1)[det] (2-3/2-3) the(1)[det] (2-3/2-3) the(1)[det] (2-3/2-3) (0-1,0-1) (0-1,0-1) (0-1,0-1) (3-4,2-3) (3-4,2-3) (3-4,2-3) he pick the ball up 0-1 1-2 2-3 3-4 4-5 he pick the ball up 0-1 1-2 2-3 3-4 4-5 dia kutip bola itu 0-1 1-2 2-3 3-4 dia kutip bola itu 0-1 1-2 2-3 3-4 (2-3,3-4) (2-3,3-4) (2-3,3-4) 0he1pick2the3ball4up5 0dia1kutip2bola3itu4

  19. Source part Example part IndexStree man(1)[n] (2-3/0-3) lelaki (1)[n] (0-1/0-3) (1) (1) (0-3,0-3) IndexStree (0-1,2-3) he(1)[n] (0-1/0-1) dia(1)[n] (0-1/0-1) (0-1,0-1) (1-2,1-2) the(1)[det] (0-1/0-1) old(1)[adj] (1-2/1-2) tua (1)[adj] (1-2/1-2) itu (1)[det] (2-3/2-3) IndexSnode IndexSnode (2-3,0-1) (0-1,0-1) 0dia1 0he1 (0-1,2-3) 0the1old2man3 0lelaki1tua2itu3 (1-2,1-2) Replacement 1E 1M IndexStree 1E 1E 1M 1M 1E 1M IndexStree IndexStree IndexStree (0-5,0-4) kutip(1)[v] (3-4/3-4) (0-5,0-4) (0-5,0-4) (0-5,0-4) pick(1)[v] up(1)[p] (3-4+4-5/3-4) kutip(1)[v] (3-4/3-4) kutip(1)[v] (3-4/3-4) kutip(1)[v] (3-4/3-4) pick(1)[v] up(1)[p] (3-4+7-8/3-4) pick(1)[v] up(1)[p] (3-4+7-8/3-4) pick(1)[v] up(1)[p] (3-4+7-8/3-4) (0-1,0-1) (0-1,0-1) (0-1,0-1) (0-1,0-1) (0-1,0-1) (2-4,2-4) (2-4,2-4) (2-4,2-4) (2-4,2-4) (2-3,3-4) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) ball(1)[n] (3-4/2-4) dia(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) (2-3,3-4) (2-3,3-4) (2-3,3-4) he(1)[n] dia(1)[n] (0-1/0-1) dia(1)[n] (0-1/0-1) bola(1)[n] (2-3/2-4) bola(1)[n] (2-3/2-4) ball(1)[n] (3-4/2-4) ball(1)[n] (3-4/2-4) lelaki(1)[n] (0-1/0-3) he(1)[n] (0-1/0-1) he(1)[n] (0-1/0-1) ball(1)[n] (3-4/2-4) bola(1)[n] (2-3/2-4) man(1)[n] (2-3/0-3) 0-1 0-1 IndexSnode IndexSnode IndexSnode IndexSnode (1-2+4-5,1-2) itu(1)[det] (3-4/3-4) (1-2+4-5,1-2) (1-2+4-5,1-2) (1-2+4-5,1-2) the(1)[det] (2-3/2-3) itu(1)[det] (3-4/3-4) itu(1)[det] (3-4/3-4) the(1)[det] (0-1/0-1) old(1)[adj] (1-2/1-2) the(1)[det] (2-3/2-3) the(1)[det] (2-3/2-3) the(1)[det] (2-3/2-3) tua (1)[adj] (1-2/1-2) itu (1)[det] (2-3/2-3) itu(1)[det] (3-4/3-4) (0-1,0-1) (0-1,0-1) (0-1,0-1) (0-1,0-1) (0-1,0-1) (3-4,2-3) (3-4,2-3) (3-4,2-3) (3-4,2-3) (2-3,3-4) he pick the ball up 0-1 3-4 2-3 3-4 7-8 he pick the ball up 0-1 3-4 2-3 3-4 7-8 dia kutip bola itu 0-1 3-4 2-3 3-4 dia kutip bola itu 0-1 3-4 2-3 3-4 (2-3,3-4) (2-3,3-4) (2-3,3-4) 0he1pick2the3ball4up5 0dia1kutip2bola3itu4 0the1old2man3pick4the5ball6up7 0lelaki1tua2itu3kutip4bola5itu6

  20. 1E 1M IndexStree (0-5,0-4) kutip(1)[v] (3-4/3-4) pick(1)[v] up(1)[p] (3-4+7-8/0-8) (0-1,0-1) (2-4,2-4) (2-3,3-4) (0-1,0-1) lelaki(1)[n] (0-1/0-3) lampu(1)[n] (0-1/0-3) (2-4,2-4) man(1)[n] (2-3/0-3) lamp(1)[n] (2-3/0-3) (2-3,3-4) IndexSnode (1-2+4-5,1-2) the(1)[det] (0-1/0-1) old(1)[adj] (1-2/1-2) the(1)[det] (0-1/0-1) tua(1)[adj] (1-2/1-2) itu(1)[det] (2-3/2-3) itu(1)[det] (2-3/2-3) green(1)[adj] (1-2/1-2) hijau(1)[adj] (1-2/1-2) (0-1,0-1) (3-4,2-3) (2-3,3-4) the old man pick the green lamp up 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 lelaki tua itu kutip lampu hijau itu 0-1 1-2 2-3 3-4 4-5 5-6 6-7 (0-1,0-1) The translation (3-4,2-3) (2-3,3-4) lelaki tua itu kutip lampu hijau itu lelaki tua itu kutip lampu hijau itu Generation The translation for the source sentence is generated from the synchronous SSTC the Malay part, which is the String in the SSTC.

  21. Our approach overcomes these problems EBMT General Problems • How to utilize more than one example to translate one source sentence The construction of well-formed target language sentences from extracted fragments of a BKB. • lack of flexibility in representing translation relations between source and target substrings The treatment of wild linguistic phenomena, which are non-standard, e.g. crossed dependencies

  22. Transfer Approach to MT transfer Synthesis Analysis Target Source

  23. The general ArchitectureforEBMT Find closest related SL examples Retrieve Corresponding TL examples Combination Source sentence Target sentence For Source language For Target language correspondence BKB

  24. How to Construct The Bilingual Knowledge Bank (BKB)or(Example-Base) Substantial Reservation !!!

  25. S: English T: Malay Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh. The basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base. • The Construction of a BKB Based on the Synchronous SSTC Based on Bitext Synchronous Parsing Technique • BiText: Text that is available in two languages.

  26. Bilingual dictionary Sentence level Bi-text Phrase level Alignment Process word level English source Malay target English source Malay target English source Malay target ( S ( NP . ( ..(..))) ( S ( VP …( ..(..))) BKB Synchronous SSTC SSTC Editor English source Malay target Apple Pie Parser • Schema Parsing & POS Tagging for the English source text Build the SSTC for Malay target text based on the SSTC for the English source text using the word alignment Compile the APP output into SSTC for the English source text

  27. Bilingual dictionary Sentence level Bi-text Phrase level Alignment Process word level English source Malay target English source Malay target English source Malay target ( S ( NP . ( ..(..))) ( S ( VP …( ..(..))) BKB Synchronous SSTC SSTC Editor English source Malay target Apple Pie Parser

  28. Bitext World-level Mapping (Word Alignment) Real texts are noisy: - Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa. - crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.

  29. S: English T: Malay 0Idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah 8mudah9:10Iaitu11untuk12 mencari13perwakilan14yang15 sepadan16 bagi17suatu18ayat 19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27- 28contoh29.30 0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36 ±n Context Window Word Alignment The correspondence between the source and the target is denoted by an interval attached to each subtext according to its offset in the text.

  30. Cognate words Computer Komputer Dice coefficient Dice = 2prob(S,T) / [prob(S) + prob(T)] • The probabilities of S and T to occur in the text. • The probability of both to co-occur in the same bitext segment. ±n Context Window Word Alignment Find the TPCs between the source and the target. (Bilingual dictionary) Bilingual dictionary

  31. contoh(6-7) Example(4-5) contoh(28-29) basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8) bagi(2-3) penghuraian(3-4) berasaskan(4-5) – (5-6) contoh (6-7) basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8) – (27-28) contoh(28-29) ±n Context Window Word Alignment Find out the chains for all possible TPCs for a source word.

  32. For every chain, calculate the weight W: len(seq): length of continuous sequence of words. len(gap): length of the gaps between the words in the chain. len(chain): length of the chain. contoh(6-7) W=1.39 Example(4-5) contoh(28-29) W=0.60 ±n Context Window Word Alignment

  33. S: English T: Malay The basic idea of example-based parsing is very simple 0Idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah 8mudah9:10Iaitu11untuk12 mencari13perwakilan14yang15 sepadan16 bagi17suatu18ayat 19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27- 28contoh29.30 0The1basic2idea3of4example5-6based7parsing8is9very10simple11: 12It13is14to15find16the17corresponding18representation19for20an21input22sentence23based24on25the26representations27of28similar29sentences 30in31the32example33-34base35 .36 Ideaasasbagipenghuraianberasaskan–contohadalahmudah • Bitext Synchronous Parsing Technique

  34. Bilingual dictionary Sentence level Bi-text Phrase level Alignment Process word level English source Malay target English source Malay target English source Malay target ( S ( NP . ( ..(..))) ( S ( VP …( ..(..))) BKB Synchronous SSTC SSTC Editor English source Malay target Apple Pie Parser

  35. Apple Pie Parser (APP) • It is a bottom-up probabilistic chart parser to find the parse tree for an input text (English). • It was developed at New York University. • The parser generates a syntactic tree in PennTreeBank bracketing. • It is Free, and available to download with the source code. • http://cs.nyu.edu/cs/projects/proteus/sekine

  36. APP (S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple))) Apple Pie Parser (APP) The basic idea of example-based parsing is very simple The representation structure and the POS for the source English is obtained

  37. Bilingual dictionary Sentence level Bi-text Phrase level Alignment Process word level English source Malay target English source Malay target English source Malay target ( S ( NP . ( ..(..))) ( S ( VP …( ..(..))) BKB Synchronous SSTC SSTC Editor English source Malay target Apple Pie Parser

  38. S (Ø/0-11) Tree NP (Ø/0-8) VP (Ø/8-11) is (8-9/8-9) PP(1) (Ø/3-8) ADJP(1) (Ø/9-11) NPL(1) (Ø/0-3) of (3-4/3-4) NPL(1) (Ø/4-8) Very simple (9-11/9-11) The basic idea (0-3/0-3) Example-based parsing (4-8/4-8) String 0the1basic2idea3of4example5-6based7parsing8is9very10simple11 Compile the APP output to SSTC structure (S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))

  39. The basic idea of example-based parsing is very simple Ideaasasbagipenghuraianberasaskan–contohadalahmudah S (Ø/0-9) S (Ø/0-11) Tree Tree NP (Ø/0-7) NP (Ø/0-8) VP (Ø/8-11) VP (Ø/7-9) is (8-9/8-9) adalah (7-8/7-8) PP(1) (Ø/3-8) PP(1) (Ø/2-7) ADJP(1) (Ø/9-11) ADJP(1) (Ø/8-9) NPL(1) (Ø/0-2) NPL(1) (Ø/0-3) of (3-4/3-4) bagi (2-3/2-3) NPL(1) (Ø/3-7) NPL(1) (Ø/4-8) Very simple (9-11/9-11) mudah (8-9/8-9) The basic idea (0-3/0-3) Idea asas (0-2/0-2) Penghuraian berasaskan-contoh (3-7/3-7) Example-based parsing (4-8/4-8) String String 0idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah8mudah9 0the1basic2idea3of4example5-6based7parsing8is9very10simple11 Lexical Transfer

  40. Bilingual dictionary Sentence level Bi-text Phrase level Alignment Process word level English source Malay target English source Malay target English source Malay target ( S ( NP . ( ..(..))) ( S ( VP …( ..(..))) BKB Synchronous SSTC SSTC Editor English source Malay target Apple Pie Parser

  41. File Edit Correspondences Windows S(Ø/0-11) S(Ø/0-9) NP (Ø/0-8) NP (Ø/0-7) VP (Ø/8-11) VP (Ø/7-9) NPL(1) (Ø/0-3) NPL(1) (Ø/0-2) PP(1) (Ø/3-8) is (8-9/8-9) PP(1) (Ø/2-7) adalah (7-8/7-8) ADJP(1) (Ø/9-11) ADJP(1) (Ø/8-9) of (3-4/3-4) bagi (2-3/2-3) NPL(1) (Ø/4-8) NPL(1) (Ø/3-7) The basic idea (0-3/0-3) Idea asas (0-3/0-3) Very simple (9-11/9-11) mudah (8-9/8-9) Example-based parsing (4-8/4-8) Penghuraian berasaskan-contoh (3-7/3-7) 0the1 basic2 idea3 of4 example5 –6 based7 parsing8is9 very10 simple11 0Idea1 asas2 bagi3 penghuraian4 berasaskan5 –6 contoh7adalah 8 mudah9 The synchronous SSTC editor.

  42. Discussion Thank you…..

More Related