1 / 47

Monotrans: Human-Computer Collaborative Translation

Monotrans: Human-Computer Collaborative Translation. Crowdsourcing Translation with People Who Speak Only One Language. Chang Hu, Ben Bederson , Philip Resnik Human-Computer Interaction Lab Computational Linguistics and Information Processing Lab University of Maryland.

Télécharger la présentation

Monotrans: Human-Computer Collaborative Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monotrans: Human-Computer Collaborative Translation Crowdsourcing Translation with People Who Speak Only One Language Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and Information Processing Lab University of Maryland

  2. Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline

  3. Languages on Internet by Population Source: Global Reach, Internet World Stats

  4. Languages on Internet by Population Source: Global Reach, Internet World Stats

  5. Languages on Internet by Population Source: Global Reach, Internet World Stats

  6. A real-world problem: International Children’s Digital Library www.childrenslibrary.org

  7. Machine Translation (MT) (餐厅= restaurant, dining hall) • Large volume, cheap, fast • Unreliable quality

  8. Professional Translators • High quality, but slow and expensive • (even for common language pairs)

  9. Translation with the Crowd • Bottle neck: bilingual people

  10. Translation with the Crowd Translation with the Monolingual Crowd • vs. 75,000 contributors • Wikipedia: 800 translators

  11. Machine Translation Monolingual Human Participation Affordability Amateur Bilingual Human Participation Professional Bilingual Human Participation Quality

  12. Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline

  13. Basic Idea Source language speaker Target language speaker Inaccurate translation Original source sentence MT Inaccurate back translation Fluent translation MT Fluent, accurate source sentence MT Et cetera…

  14. An (Richer) Example

  15. MT

  16. MT

  17. MT MT

  18. MT MT

  19. MT MT MT enrichment Nous entendons En général In general Get along

  20. MT MT MT enrichment

  21. MT MT MT enrichment

  22. MT MT MT enrichment MT

  23. MT MT MT enrichment MT

  24. Monotrans Protocol

  25. Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline

  26. Web link Image Mark OK Mark unclear

  27. Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline

  28. Preliminary Evaluation • Older version of the UI (same protocol) • Children’s book, Russian to Chinese • 2 Russian speakers and 4 Chinese speakers formed 4 Pairs* • 1 hour per pair

  29. Results • 44 sentences (6 pages) worked on • 28 sentences finished(≈ 4 pages) • Overall translation speed: 50 words per hour • professional translator speed: 250 words per hour

  30. Evaluation

  31. Google Translate …

  32. … Monotrans

  33. Where to from here? • Larger and more formal validation of the protocol • Richer annotations ✓Images ✓Web links ✓Marking correct spans ✓Marking incorrect spans Paraphrase Word clouds …?? • Large-scale crowd support • (CrowdFlow talk @1:20PM)

  34. Monolingual translation can help large-scale translation Translation with monolingual people is actually feasible Take-Away Message

  35. Sponsors

  36. Q&A Thank You

  37. Backup slides

  38. Project information from one language to another using word alignments as a bridge Illustration of how this has been done for natural language annotation Projected annotation [Kolak 2005]

  39. Projected annotation Everybody has heard the business by Cinderella Everybody has heard the business by Cinderella Everybody has heard the business by Cinderella Tout le monde doit entendre l'histoire de Cendrillon Tout le monde doit entendre l'histoire de Cendrillon Tout le monde doit entendre l'histoire de Cendrillon MT MT MT => Pilot experiment results: Projected annotations helped improve translation Everybody has heard the story about Cinderella

  40. One of my examples involves rmvngllthvwlsfrmthwrdsndshwngthtthrdrcnstllndrstndthsntnc.

  41. Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT Everybody has hear story about Cinderella Everybody has heard the story about Cinderella I. Detectable and Correctable Error Pilot experiment results: Post-editing machine translation output by monolingual people improves translation quality

  42. Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT MT Everybody has hear story about Cinderella Everybody has heard the business by Cinderella Communication needed Everybody has heard the story about Cinderella II. Detectable but not Correctable Error

  43. Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT MT Everybody has hear story about Cinderella Everybody has heard the business by Cinderella Everybody has heard the story about Cinderella II. Detectable but not Correctable Error Pilot experiment results: Communication through enrichment channel can improve translation

  44. Three Types of Errors Tout le monde doit entendre l'histoire de Cendrillon. MT MT MT Everybody has hear story about Cinderella Everybody has heard the business by Cinderella Everybody loves the story about Cinderella Need more redundancy Everybody has heard the story about Cinderella III. Undetectable Error Add more redundancy, reduce it to type I or type II

  45. Prototype Evaluation (1=unintelligible, 4=very intelligible) (1=not translated, 5=full meaning) System seems promising

More Related