Understanding the Probabilistic Spell Checker Based on Noisy Channel Model

CS621/CS449Artificial IntelligenceLecture Notes Set 8 : 27/10/2004 CS-621/CS-449 Lecture Notes

Outline • Probabilistic Spell Checker (continued from Noisy Channel Model) • Confusion Matrix CS-621/CS-449 Lecture Notes

Probabilistic Spell Checker Noisy Channel Model • The problem formulation for spell checker is based on the Noisy Channel Model w t (wn, wn-1, … , w1) (tm, tm-1, … , t1) • Given t, find the most probable w : Find that ŵ for which P(w|t) is maximum, where t, w and ŵ are strings: Noisy Channel ŵ Guess at the correct word Correct word Wrongly spelt word CS-621/CS-449 Lecture Notes

Probabilistic Spell checker • Applying Bayes rule, • Why apply Bayes rule? • Finding p(w|t) Vs p(t|w) ? • P(w|t) or P(t|w) have to be computed by counting c(w,t) or c(t,w) and then normalizing them • Assumptions : • t is obtained from w by a single error of the above type. • The words consist of only alphabets ŵ CS-621/CS-449 Lecture Notes

Confusion Matrix Confusion Matrix: 26x26 • Data structure to store c(a,b) • Different matrices for insertion, deletion, substitution and transposition • Substitution • The number of instances in which a is wrongly substituted by b in the training corpus (denoted sub(x,y) ) CS-621/CS-449 Lecture Notes

Confusion Matrix • Insertion • The number of times a letter y is inserted after x wrongly( denoted ins(x,y) ) • Transposition • The number of times xy is wrongly transposed to yx ( denoted trans(x,y) ) • Deletion • The number of times y is deleted wrongly after x ( denoted del(x,y) ) CS-621/CS-449 Lecture Notes

Confusion Matrix • If x and y are alphabets, • sub(x,y) = # times y is written for x (substitution) • ins(x,y) = # times x is written as xy • del(x,y) = # times xy is written as x • trans(x,y) = # times xy is written as yx CS-621/CS-449 Lecture Notes

Example • Correct document has ws • Wrong document has ts • P(maple|aple) = # (maple was wanted instead of aple) / # (aple) • P(apple|aple) and P(applet|aple) calculated similarly • Leads to problems due to data sparsity. • Hence, use Bayes rule. CS-621/CS-449 Lecture Notes

Understanding the Probabilistic Spell Checker Based on Noisy Channel Model

Understanding the Probabilistic Spell Checker Based on Noisy Channel Model

Presentation Transcript

Artificial Intelligence

Lectures on Artificial Intelligence – CS364 Introduction to Uncertainty Management

CS B551: Elements of Artificial Intelligence

Technical Issues: Artificial Intelligence

CS 541: Artificial Intelligence

Artificial Intelligence The different levels of language analysis

CptS 440 / 540 Artificial Intelligence

Artificial Intelligence: Planning

Artificial Intelligence Technologies for Web Intelligence

CS 541: Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence

Artificial Gravity

Lecture 1: Prerequisites

Artificial Intelligence and Software that Learns and Evolves

CS347 – Introduction to Artificial Intelligence

DCP 1172 Introduction to Artificial Intelligence

CS621 : Artificial Intelligence

Artificial Life

Artificial Intelligence