Deterministic Part-of-Speech Tagging with Finite-State Transducers

Deterministic Part-of-Speech Tagging with Finite-State Transducers by Emmanuel Roche and Yves Schabes 정 유 진 KLE Lab. CSE POSTECH 98. 10. 16

Introduction • Stochastic approaches to NLP have often been preferred to rule-based approaches • Eric Brill (1992) : rule-based tagger by inferring rules from a training corpus • rules are automatically acquired • require drastically less space than stochastic tagger • but, considerably slow •  Deterministic Finite-State Transducer • (Subsequential Transducer) CS730B Statistical NLP

Overview of Brill’s Tagger • Structure of the tagger • Lexical tagger (Initial tagger) • Unknown word tagger • Contextual tagger • Inefficiency • Individual rules is compared at each token of the input (Fig.3) • Potential interaction between rules (Fig.1) • Complexity : RKn • R : # of contextual rules n : # of input words • K : max # of tokens which rules require CS730B Statistical NLP

Finite-State Transducer (1) • Finite-State Transducer T = (, Q, i, F, E) •  : finite alphabet Q : finite set of states • i : initial state F : set of final state • E : set of transitions (q, a, w, q’) on Q (  {}) *Q • Deterministic F.S. Transducer T = (, Q, i, F, , , ) •  : deterministic state transition func. ( q  a = q’) •  : deterministic emission func. ( q  a = w’ ) •  : final emission func. ( (q) = w for q  F ) CS730B Statistical NLP

Finite-State Transducer (2) • state transition function • d (q,a) = {q’ Q | w’  * and (q,a,w’,q’)  E} • emission function •  (q,a,q’) = {w’  * | (q,a,w’,q’)  E} CS730B Statistical NLP

Construction of the Finite-State Tagger (1) • 1. Turn each contextual rule into a finite-state transducer • 2. Local extension of the transducer (algorithm of Fig.17) vbn vbd PRETAG np np/np vbn/vbd 0 1 2 np/np ?/? np/np ?/? 0 vbn/vbd 1 CS730B Statistical NLP

Construction of the Finite-State Tagger (2) • 3. Combines all transducers into one single transducer • (algorithm of Elgot and Mezei) • 4. Transforming the obtained transducer into an equivalent • subsequential (deterministic) transducer (algorithm of Fig.21) • Advantage • Requires n steps to tag a sentence of length n, independently of the number of rules and the length of the context • Eliminate inefficiencies of Brill’s tagger CS730B Statistical NLP

Local Extension Algorithm / 1 a/b b/c {1} transd 2 b/c 0 a/b b/d {2} transd 3 Fig.18 b/d {0} identity ?/? 4 a/b 0 b/b a/a {} transd ?/? {0,1} identity Fig.19 1 a/a 2 CS730B Statistical NLP

Determinization Algorithm 1 a/b h/h 3 0 a/c e/e 2 h/bh Fig.13 (2, ) (0, ) a/ (1,b) (2,c) 0 1 2 e/ce Fig.22 CS730B Statistical NLP

Lexical Tagger • The first step of the tagging process : looking up each word in a dictionary (Fig.9) • To achieve high speed : (Fig.10) • Represent the dictionary by a deterministic finite-state automaton • (algorithm of Revuz) • Advantage • fast access : 12,000 words / second • small storage space : 742Kb (ASCII form)  360Kb • Unknown words Tagger • same techniques used CS730B Statistical NLP

Implementation of Finite-State Transducer • Represented by a two-dimensional table • row : states • column : alphabet of all possible input letters • content : output of the transition a . . . qn . . . w CS730B Statistical NLP

Evaluation • Overall performance comparison (Fig.11) • Stochastic Tagger : Church’s trigram tagger (1988) • Rule-based Tagger : Brill’s tagger • All taggers were trained on the Brown corpus and used same lexicon of Fig.10 • Speeds of the different parts of finite-state tagger (Fig.12) • Low-level factors (storage access) dominate the computation CS730B Statistical NLP

Deterministic Part-of-Speech Tagging with Finite-State Transducers