Conditional Random Fields

Conditional Random Fields • A form of discriminative modelling • Has been used successfully in various domains such as part of speech tagging and other Natural Language Processing tasks • Processes evidence bottom-up • Combines multiple features of the data • Builds the probability P( sequence | data)

Transition functions add associations between transitions from one label to another State functions help determine the identity of the state Conditional Random Fields /k/ /k/ /iy/ /iy/ /iy/ • CRFs are based on the idea of Markov Random Fields • Modelled as an undirected graph connecting labels with observations • Observations in a CRF are not modelled as random variables X X X X X

State Feature Weight λ=10 One possible weight value for this state feature (Strong) Transition Feature Weight μ=4 One possible weight value for this transition feature State Feature Function f([x is stop], /t/) One possible state feature function For our attributes and labels Transition Feature Function g(x, /iy/,/k/) One possible transition feature function Indicates /k/ followed by /iy/ Conditional Random Fields • Hammersley-Clifford Theorem states that a random field is an MRF iff it can be described in the above form • The exponential is the sum of the clique potentials of the undirected graph

Conditional Random Fields • Conceptual Overview • Each attribute of the data we are trying to model fits into a feature function that associates the attribute and a possible label • A positive value if the attribute appears in the data • A zero value if the attribute is not in the data • Each feature function carries a weight that gives the strength of that feature function for the proposed label • High positive weights indicate a good association between the feature and the proposed label • High negative weights indicate a negative association between the feature and the proposed label • Weights close to zero indicate the feature has little or no impact on the identity of the label

Experimental Setup • Attribute Detectors • ICSI QuickNet Neural Networks • Two different types of attributes • Phonological feature detectors • Place, Manner, Voicing, Vowel Height, Backness, etc. • Features are grouped into eight classes, with each class having a variable number of possible values based on the IPA phonetic chart • Phone detectors • Neural networks output based on the phone labels – one output per label • Classifiers were applied to 2960 utterances from the TIMIT training set

Experimental Setup • Output from the Neural Nets are themselves treated as feature functions for the observed sequence – each attribute/label combination gives us a value for one feature function • Note that this makes the feature functions non-binary features.

Experiment 1 • Goal: Implement a Conditional Random Field Model on ASAT-style phonological feature data • Perform phone recognition • Compare results to those obtained via a Tandem HMM system

Experiment 1 - Results • CRF system trained on monophones with these features achieves accuracy superior to HMM on monophones • CRF comes close to achieving HMM triphone accuracy

Experiment 2 • Goals: • Apply CRF model to phone classifier data • Apply CRF model to combined phonological feature classifier data and phone classifier data • Perform phone recognition • Compare results to those obtained via a Tandem HMM system

Experiment 2 - Results Note that Tandem HMM result is best result with only top 39 features following a principal components analysis

Experiment 3 • Goal: • Previous CRF experiments used phone posteriors for CRF, and linear outputs transformed via a Karhunen-Loeve (KL) transform for the HMM sytem • This transformation is needed to improve the HMM performance through decorellation of inputs • Using the same linear outputs as the HMM system, do our results change?

Experiment 3 - Results Also shown – Adding both feature sets together and giving the system supposedly redundant information leads to a gain in accuracy

Experiment 4 • Goal: • Previous CRF experiments did not allow for realignment of the training labels • Boundaries for labels provided by TIMIT hand transcribers used throughout training • HMM systems allowed to shift boundaries during EM learning • If we allow for realignment in our training process, can we improve the CRF results?

Experiment 4 - Results Allowing realignment gives accuracy results for a monophone trained CRF that are superior to a triphone trained HMM, with fewer parameters

Conditional Random Fields