10 likes | 236 Vues
a.as.o.os.tro 1 cas. a.as.o.os 43 african cas jurídic l. a.as.o 59 cas citad jurídic l. a.as.os 50 afectad cas jurídic l. a.o.os 105 impuest indonesi italian jurídic. as.o.os 54 cas implicad jurídic l. a.as 199 huelg incluid industri inundad. a.os 134
E N D
a.as.o.os.tro 1 cas a.as.o.os 43 african cas jurídic l ... a.as.o 59 cas citad jurídic l ... a.as.os 50 afectad cas jurídic l ... a.o.os 105 impuest indonesi italian jurídic ... as.o.os 54 cas implicad jurídic l ... a.as 199 huelg incluid industri inundad ... a.os 134 impedid impuest indonesi inundad ... as.os 68 cas implicad inundad jurídic ... a.tro 2 cas cen a.o 214 id indi indonesi inmediat ... as.o 85 intern jurídic just l ... o.os 268 human implicad indici indocumentad ... tro 16 catas ce cen cua ... a 1237 huelg ib id iglesi ... as 404 huelg huelguist incluid industri ... o 1139 hub hug human huyend ... os 534 humorístic human hígad impedid ... Hierarchical CIC lattice derived from Spanish. Each CIC box contains the c-suffixes comprising the CIC, the c-stem count of the CIC, and a sample of the CIC’s c-stems. LETRAS Architecture (Grey areas = existing components or data) Transfer Engine Source Language Text Lexemes and Grammatical Features Lattice of Partial Translations Target Language Text Decoder Morphological Analyzer Task 4 Task 4 Morphology Rules Transfer Rules Task 1 Task 2 Transfer Rule Induction Rule Refinement Module Translation Correction Tool Morphology Rule Induction Additional source-language Text Informant Task 3 Feature Detection Word-aligned Parallel Text Readable Grammar Algorithm Data Elicitation Tool Informant MILE Architecture (provides data feeds for LETRAS) Elicitation Corpus Navigation me.mes.med bla e.es.ed blam Ø.s.d blame Ø.s blame solve e.es blam solv me.mes bla Ø.d blame me.med bla e.ed blam s.d blame mes.med bla es.ed blam e blam solv me bla Ø blame blames blame roams roamed roaming solve solves solving es blam solv mes bla s blame roam solve med bla roa ed blam roam d blame roame Hierarchical c-suffix set inclusion links Morpheme boundary links Portion of a CIC lattice consisting of the word forms: blame, blames, blamed, roams, roamed, roaming, solve, solves, solving. AVENUE / LETRAS Rule-based MT, whether transfer or interlingual, requires several computational-linguist decades to build an MT system for a new LCTL into English, and comparable effort to debug and refine. Moreover, it may prove difficult or impossible to find computational linguists skilled in each LCTL of interest. The currently-favored MT research paradigms are corpus-based MT methods, whether statistical or example-based, but these require voluminous quantities of professionally-translated aligned parallel text for training, typically 1-to-10 million words or more. Such quantities of high-quality parallel text are simply not available for most LCTLs. In contrast, the LETRAS approach requires neither LCTL-versed computational linguists nor large quantities of parallel text. Instead, LETRAS requires a small (10-20 thousand word) linguistically-balanced translated and aligned elicitation corpus, a modest monolingual corpus in the LCTL, and access to a bilingual native informant, who need not have linguistic skills. ; Rule to transfer Chinese question sentences {S,3} ; Unique rule identifier ; production rules: SL and TL type and constituent or POS sequences S::S : [NP VP "吗"] -> [AUX NP VP] ( ; Constituent alignments (x1::y2) ; NP to NP (x2::y3) ; VP to VP ; Parsing (x-side) constraints, build feature structure ((x0 subj) = x1) ; Assign NP’s features to subj ((x0 subj case) = nom) ((x0 act) = quest) (x0 = x2) ; Transfer (xy) constraints ((y2 case) = (x0 subj case)) ; Generation (y-side) constraints ; Insert AUX on target side based on ; value constraints ((y1 form) = do) ; Enforce value and agreement restrictions on y-side ((y3 vform) =c inf) ; verb must be infinitive ((y1 agr) = (y2 agr)) ) Working with computer tools in Temuco, Chile Transfer Rule Format In order to present the learning algorithms, we must first explain the learning objective, i.e. the transfer rules. The rules in the Avenue system follow a specific formalism. The below transfer rule between Chinese and English illustrates this formalism: The left side shows an example of compositionality. The right side shows a successful application of the Seeded Version Space algorithm. Example of initial screen with incorrect translation (left), and the same example screen with sentence in the process of being corrected (right) with the TCTool.