1 / 8

Spl chkng cmc txt (Spell Checking CMC text)

Spl chkng cmc txt (Spell Checking CMC text). Christopher Johnson. Introduction. What is Computer Mediated Communication (CMC)? Short Message Service Blogs (Twitter) E-mail Instant messages Observed language during such communication Lo (Microsoft Messenger)

ricky
Télécharger la présentation

Spl chkng cmc txt (Spell Checking CMC text)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Splchkngcmctxt(Spell Checking CMC text) Christopher Johnson

  2. Introduction • What is Computer Mediated Communication (CMC)? • Short Message Service • Blogs (Twitter) • E-mail • Instant messages • Observed language during such communication • Lo (Microsoft Messenger) • Happy bdayhpe u hv a gd day x (SMS) • Awe! Ur so welcome! Sorry I was so sleepy! Lol (Twitter)

  3. What is the problem? • Most people are in contact with some form of CMC • Children • Adults • People can hide behind any persona they create for themselves • For example Paedophiles • Lure children by pretending to be other children

  4. How can we solve it? • Man reading every message? No • Would this suffice anyway? • Autonomous processing of messages? Yes • Well at least the most appropriate way.

  5. How can we do that? • We need an understanding of the messages • SMS • Blogs • E-mails • And others • We know that abbreviations are used • But how can we expand these abbreviations back to standard text? • What about misspellings • How do we get a large real world corpus to train and test on?

  6. What tools already exist? • VARD • NLP techniques • N-grams (This project will use Bigrams and Trigrams) • Phonetic algorithms • Soundex • Metaphone • These tools are commonly used for spell checkers • But how well do these apply to CMC?

  7. Proposed Plan • Research into current techniques which could be applicable • Create a large corpus of CMC text • Improve techniques for very similar languages • (English CMC and CMC) • Create a system which can distinguish between CMC text and unabridged text • Test the systems success rate. • Convert CMC to unabridged text • (Ambitious, therefore only if time)

  8. References • The Real World - National Education Association Health Information Network • http://bnetsavvy.org/wp/a-teen-talks-about-texting-and-what-parentseducators-need-to-know-about-it/ • About VARD 2 – Baron, Alistair • http://www.comp.lancs.ac.uk/~barona/vard2/ • Lawrence Philips' Metaphone Algorithm - Atkinson, Kevin • http://aspell.net/metaphone/ • The Soundex Indexing System – The National Archives • http://www.archives.gov/publications/general-info-leaflets/55.html

More Related