1 / 17

Rules Based Machine Translation

Rules Based Machine Translation. Fred Hollowood. Consultant. Sample Agenda. Introduction. 1. Rules Based Machine Translation. 2. Post-Editing. 3. Quality Measurement. 4. Controlled Language. 5. Introduction. The Aim

reese
Télécharger la présentation

Rules Based Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rules Based Machine Translation Fred Hollowood Consultant RBMT and CL

  2. Sample Agenda Introduction 1 Rules Based Machine Translation 2 Post-Editing 3 Quality Measurement 4 Controlled Language 5 RBMT and CL

  3. Introduction • The Aim • Bring rapid, cost-effective translation to Symantec’s product and service divisions • Connect Symantec’s CMS to translation technologies • Metrics on the reduction of translation costs and time to market • The Approach • Structure source content so it accommodates MT • Use a language checker to monitor source grammar • Promote terminology as a key process and deliverable • Proactive rather than reactive • Define measures to monitor and drive productivity • GTM, Meteor, BLEU • Work with post-editors to ensure a win-win Technology Initiative - The Aim RBMT and CL

  4. Rules Based Machine Translation Flowchart of Rule-Based Machine Translation (RBMT) TL Text SL Text Synthesis Transfer Analysis TL Lexicon & Grammars SL Lexicon & Grammars SL->TL Lexical & Structural Rules RBMT and CL

  5.       Remote Human Activity Text Processing Systran Engine System Control Phases MT Process Overview Controlled Language Authoring Automated Pre-processing User Dictionary Translation System Normalisation Dictionary Automated Post-processing Human Post-Editing RBMT and CL

  6. Post-Editing • Fundamentally same relationship as with traditional vendor • Increased daily throughput expected for Post Edited content (6-8k Vs 2.5k p/day) • Style requirements have been critically reviewed in the light of PE • E.g. stylistic inconsistencies are acceptable for post-edited content RBMT and CL

  7. Measurement RBMT and CL

  8. Metrics based on Comprehensibility RBMT and CL

  9. Quality by Human Inspection RBMT and CL

  10. From the machine From the post-editor GTM Scoring RBMT and CL

  11. Quality Metrics by Language Project Scores by Language French: 73% Spanish: 68% Italian: 59% German:57% RBMT and CL

  12. Example Style rules • Avoid using a colon after a drive letter • Avoid “he”, “she”, “he/she”, and “s/he” • Use numerals for all measurements over 10 • Use the serial comma • Do not use more than two adverbs or adjectives in a series • Keep the subject and verb close to each other early in a sentence • Avoid meaningless openers • Avoid progressive tense when describing product use • Do not use future when describing product use • Make positive statements that tell users what to do or what they need to know • Use sentence-style capitalization for bulleted lists • Use a colon at the end of a sentence to introduce a bulleted list • Punctuate imperative sentences in bulleted lists • Use number × number • Use a hyphen in a unit • Repeat the unit of measure RBMT and CL

  13. CL rules based on CDG • Avoid using the passive voice • Do not use more than 25 words in a sentence (original recommendation was 20) • Use relative pronouns • Use complementizers (“that”) • Avoid unnecessary words (such as “basic” or “just”) • Do not use 'this' or 'that' when they are not followed by a noun • Place all non-translatable text on its own line (programming code snippets) RBMT and CL

  14. CL rules for MT • Do not use slashes to list lexical items • Do not write the full name of each operating system • Avoid –ing words • Use a noun at the start of subordinate clause • Repeat the head noun in ambiguous coordinated structures • Use a hyphen to indicate the first part of a compound • Use articles in specific contexts (for disambiguation) • Keep both parts of a two-part verb together • Use "could" with "if“ • Avoid parenthetical expressions in the middle of a sentence RBMT and CL

  15. Examples of CL Violation • Keep both parts of a two-part verb together • This document gives directions to turn email scanning on or off. • Dieses Dokument gibt Richtungen zum Umdrehung E-Mail-Prüfung an oder weg. • Ce document donne des directions à l'analyse du courrier électronique de tour en fonction ou hors fonction. • This document gives directions to turn on or turn off email scanning. • Dieses Dokument gibt Richtungen, E-Mail-Prüfung zu aktivieren oder zu deaktivieren. • Ce document donne des directions pour activer ou désactiver l'analyse du courrier électronique. RBMT and CL

  16. Lessons Learned • Strict implementation when there is: • New content • Little leverage • Time • Rules can be context-sensitive • Different results depending on client application • May not always flag tag problems • Language-specific rules • Probably best implemented as: • Pre-processing step • Normalization dictionaries • CL + MT is not sufficient • Terminology work to update dictionaries • PE when specific qualify standard is required RBMT and CL

  17. Fred Hollowood fred@fredhollowoodconsulting.com RBMT and CL

More Related