1 / 11

CHARM

CHARM. Lecture 1 Outline of the Problem. The Problem 1. The Maltese Alphabet A a B b Ċ ċ D d E e F f Ġ ġ G g Għ għ H h a be ċe de e ef ġe ge ajn akka Ħ ħ I i Ie ie J j K k L l M m N n O o P p ħe i ie je ke elle emme enne o pe Q q R r S s T t U u V v W w X x Ż ż Z z

eve
Télécharger la présentation

CHARM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHARM Lecture 1 Outline of the Problem

  2. The Problem 1 The Maltese Alphabet A a B b Ċ ċ D d E e F f Ġ ġ G g Għ għ H h a be ċe de e ef ġe ge ajn akka Ħ ħ I i Ie ie J j K k L l M m N n O o P p ħe i ie je ke elle emme enne o pe Q q R r S s T t U u V v W w X x Ż ż Z z qe erre esse te u ve we exxe że zej We will refer to ordinary characters that could yield Maltese characters as charms

  3. The Problem 2 from KullĦadd FIL-KRIZI li ghandna fit-turizmu fil-gzejjer taghna l-aghar li qed jintlaqtu huma l-lukandi tal tliet stilel. L-ahhar studju li sar mid-Deloitte ghall-Assocjazzjoni Maltija tal-Lukandi u Ristoranti jghidilna kif in-nuqqas tal turisti u z-zieda fl-ispejjez ghal dawn il-lukandi fissru li ghamlu telf tal 19.8% fir-rata tal qliegh taghhom u fosthom kien hemm min salva biss anki fl-aqwa tas-sajf permezz tal l-istudenti. L-istess studju juri li 70% tas-sidien tal dawn il-lukandi jibzghu li se jkomplu jbatu min-nuqqas tal turisti u se jkollhom hafna kmamar vojta fix-xhur li gejjin.

  4. The Problem 3 Is there some way in which we can recover the special Maltese characters automatically? If so • What is the underlying algorithmic model? • What knowledge must the programme bring to bear? • What resources are needed to build the knowledge base?

  5. Noisy Channel Modelfor Sentence Translation (Brown et. al. 1990) target sentence sourcesentence sentence diagram from Jurafsky & Martin

  6. Algorithmic Model • Noisy channel model is domain independent. • Brown applied it to the domain of translation from source language to target language. • We can use it for the domain of words.

  7. Noisy Channel at Word Level KullĦadd source NOISY CHANNEL KullHadd target

  8. Main Algorithm: Four Steps • See target word t • Generate the set S of all possible source words for that word. • Pick the most probable source word s in S • Output s

  9. Step 1: See Target Word • Preprocessing • noise • case • punctuation • hyphen • Tokenisation • words • numbers • other

  10. Step 2 • Generate S If t contains charms generate S = {s | forall 0 < i <= len(t) s[i] = t[i] \/ s[i] = m(t[i]) }

  11. Step 3 • Pick the most probable source word s in return argmax(P(s)) for s in S • This is covered in lecture 2

More Related