1 / 20

Can Controlled Language Rules increase the value of MT?

Can Controlled Language Rules increase the value of MT?. Fred Hollowood & Johann Rotourier Symantec Dublin. Localisation Challenge. Databases filled with English content Large volumes Perishable Technical Fast delivery Cost effective. Goals. Reduce cost of Translation to 30%

Télécharger la présentation

Can Controlled Language Rules increase the value of MT?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

  2. Localisation Challenge • Databases filled with English content • Large volumes • Perishable • Technical • Fast delivery • Cost effective

  3. Goals • Reduce cost of Translation to 30% • Implement CL within the authoring community • Foster the use of editor software to police the CL rule set • Identify the most efficient MT system for each target language • Develop Post-Editing guidelines • Refine Symantec glossaries to assist in dictionary preparation

  4. Controlled Language MT system Rule Sets Terminology Style Editors Language Pairs Jp, De, Fr, It, Es Post Editing Assessment Controlled Language and MT

  5. Sequence of Events • Identify a corpus • Develop a test suite • Develop terminology • Work with MT engines • Assess results

  6. Two Questions • How effective are CL rules in terms of post-editing effort? • Which CL rules provide the best results?

  7. Corpus Selection • Origin • stream of XML messages • Volume • 30,000 words • Process • Use TM technology to pre-process raw XML to provide strings for MT • Use Macros to tidy up untranslatable text

  8. Terminology Extraction • Extraction • Tools: Wordsmith Tools 4 • Removal of duplicates • Spelling variants • Hyphenation variants • Capitalisation variants • Symbol/Plain • Abbreviation/Plain • Removal of synonyms

  9. Custom Dictionaries • Current MT systems • Systran Premium 4.0 • Logomedia Translate Pro • Differing capabilities • Differing function • Per target language • Grammars • Styles

  10. Test Suite • 59 rules examined • 17 of which already encapsulated in Symantec’s writing guidelines • Classification • 8 lexical • 40 syntactic • 11 textual

  11. Controlled Language Sources

  12. Testing the Rules • Process • Find an example sentence that does not conform to the rule • Edit it to conform to all other rules under study • Minimize the linguistic complexity (single test) • Apply the CL rule • Repeat the procedure to obtain 3 test examples • Test Suite • 59 rules expressed as 177 sentences

  13. Post Editing Guidelines • Ensure information transfer • Modify what is grammatically deviant from commercial quality • Modify what is lexically essential for understanding in target. • Avoid the use of synonyms for the sake of originality • Don’t forget that all the words are probably present in the output ( possibly wrong order) • Remember style does not matter but information accuracy does. • Don’t dally, if an improvement is not obvious, move along

  14. Metrics Generation • Quality levels • Excellent (4), Good (3), Medium (2), Poor (1) • Uncontrolled source generates output A • Controlled source generates output B • Focus is on Usability • Evaluation by native speakers • Further study is being done to link into other systems of quality evaluation • Blackjack • SAE J 2450

  15. Overall evaluation (French)

  16. Overall evaluation (Japanese)

  17. Overall evaluation (German)

  18. Preliminary Results • CL significant impact • Benefit varies by language • Lots of scope for further study • Some rules are more effective than others (score range: 0- 17) • Symantec’s implied rules have mixed effectiveness • Recommend 7 additional rules

  19. Additional rules • Rules with an impact in all languages • Do not omit words within lexical items, even when the term has already been used in the sentence (12). Repeat the head noun with conjoined articles or prepositions. (15) • Do not use slashes to list lexical items (except for product names). (14) • Always write a verb next to its particle. (17) • Only use the modal ‘could’ when the sentence contains ‘if’, otherwise use ‘can’. (10) • Be very careful with the –ing words: If it is a gerund, use an article in front of it. (7). If it is introducing a new clause, use ‘by’ in front it (8). If it is modifying a noun in a non-finite clause, replace it with a relative clause. (5) •  Make sure that every segment can stand syntactically alone. (11) • Avoid footnotes in the middle of a segment. Turn footnotes into independent segments. (11)

  20. Next Steps • Apply subsets of rules to a larger corpus. • Language checker Acrolinx • Increase the number of MT engines studied • Comprendium/Prompt (European languages) • Fujitsu/Nova’s PC Transer (Japanese) • Further refine Post Editing guidelines • Keep abreast of upgrades in current systems • Bugs fixed • New versions of software • Move to a production pilot project

More Related