1 / 41

Measurement and Classification of Humans and Bots in Internet Chat

Measurement and Classification of Humans and Bots in Internet Chat. By Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang College of William and Mary. Outline. Background Measurement Classification System Experimental Evaluation Conclusion. Outline. Background Measurement

jovita
Télécharger la présentation

Measurement and Classification of Humans and Bots in Internet Chat

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement and Classification of Humans and Bots in Internet Chat By Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang College of William and Mary

  2. Outline • Background • Measurement • Classification System • Experimental Evaluation • Conclusion

  3. Outline • Background • Measurement • Classification System • Experimental Evaluation • Conclusion

  4. Bots • Bots - programs that automate human tasks • web bots automate browsing the web • chat bots automate online chat • can be harmful and/or helpful

  5. Chat Bots vs. BotNets • BotNets – networks of compromised machines • some use chat systems (IRC) for C&C, others use P2P, HTTP, etc. • abuse various systems • Chat Bots – automated chat programs • some are helpful, e.g., chat loggers • can abuse chat systems and their users

  6. The Chat Bot Problem • The Problem – chat bots abuse chat services (e.g., AOL, Yahoo!, MSN) • send spam • spread malicious software • mount phishing attacks • Our focus is on the Yahoo! chat system

  7. A Typical Chat Alice12 entered the room. Alice12 entered the room. Alice12:Hi room. Alice12 entered the room. Alice12:Hi room. Bob34: hi alice Alice12 entered the room. Alice12:Hi room. Bob34: hi alice Susie88:any guys want to let a cute girl move in with them! hehe Alice12 entered the room. Alice12:Hi room. Bob34: hi alice Susie88:any guys want to let a cute girl move in with them! hehe Alice12:What’s up? Alice12 entered the room. Alice12:Hi room. Bob34: hi alice Susie88:any guys want to let a cute girl move in with them! hehe Alice12:What’s up? Bob34: not much Alice12 entered the room. Alice12:Hi room. Bob34: hi alice Susie88:any guys want to let a cute girl move in with them! hehe Alice12:What’s up? Bob34: not much Susie88:can you guys see me on my web-cam?? (its in my profile)

  8. Yahoo! Chat • Yahoo! chat is a large commercial chat service • over 3,000 chat rooms AUTH, CHAT, IM, …

  9. Yahoo! Chat • Yahoo! chat system • client connects to a server • servers relay messages to/from clients

  10. Outline • Background • Measurement • Classification System • Experimental Evaluation • Conclusion

  11. Measurement • August-November 2007 – we collect data • August 2007 – Yahoo! adds CAPTCHA • must pass to join a chat room • protocol update, prevents some 3rd party clients from accessing chat • October 2007 – bots are back • some bots return before 3rd party clients

  12. Measurement • September and October 2007 • very few chat bots • August and November 2007 • many chat bots • 1,440 hours of chat logs • 147 chat logs • 21 chat rooms

  13. Measurement • To create our dataset, we read and label the chat users as • human, bot, or ambiguous • In total, we recognized 14 different types of chat bots • different triggering mechanisms • different text generation techniques

  14. Triggering Mechanisms • Timer-Based • periodic timers, e.g., 40 seconds • random timers, e.g., 45-125 seconds • Response-Based • responds to other users Sam77:Bob12, you’re just full of questions, aren’t you? Sam77:Bob12, lots of evidence for evolution can be found here http://

  15. Text Generation • Character Padding Fiona88:anyone boredjn wanna chat?uklcss • Synonym Phrases Marjorie99:Hi Babes! Marjorie Here! Inspect My Site Marjorie99:Mmmm Folks! Im Marjorie! View My Webpage • Odd Line or Word Spacing • Message Replay

  16. Types of Chat Bots • Periodic Bots – sends messages based on periodic timers • Random Bots – sends messages based on random timers • Responder Bots – responds to messages of other users • Replay Bots – replays messages of other users

  17. Humans • inter-message delay – evidence of heavy tail • message size – well fit by Exponential (λ=0.034)

  18. Periodic Bots • inter-message delay – several clusters with high probabilities • message size – messages built from templates approximate a normal distribution

  19. Random Bots • inter-message delay – Equilikely distribution at 40, 64, and 88; Uniform distribution 45-125 • message size – messages selected from a small database

  20. Responder Bots • inter-message delay – human-like timing • message size – multiple templates of different lengths

  21. Replay Bots • inter-message delay – cluster with high probabilities (replay bots are periodic) • message size – human-like size, well fit by Exponential (λ=0.028)

  22. Outline • Background • Measurement • Classification System • Experimental Evaluation • Conclusion

  23. Classification System • Entropy Classifier • detects abnormal behavior • based on message sizes and inter-message delays • accurate but slow • Machine Learning Classifier • detects “learned” patterns • based on message content • fast but must be trained

  24. Observation – chat bots are less complex than humans, and thus, lower in entropy exploits the low entropy of chat bots Corrected Conditional Entropy Test (CCE) estimates higher-order entropy Entropy Test (EN) estimates first-order entropy Entropy Classifier 24

  25. Machine Learning Classifier • Observation - chat spam like email spam is a text classification problem • exploits message content of chat bots • CRM114 • a powerful text classification system • several built-in classifiers: HMM, KNN/Hyperspace, OSB, SVM, Winnow, etc. • we use OSB

  26. ENTROPY CLASSIFIER BOT CORPUS HUMAN CORPUS CLASSIFY AS CHAT BOT CLASSIFY AS HUMAN INPUT MACHINE LEARNING CLASSIFIER • Hybrid Classification System • entropy classifier builds and maintains the bot corpus • machine learning classifier uses the bot and human corpora

  27. Outline • Background • Measurement • Classification System • Experimental Evaluation • Conclusion

  28. Experimental Evaluation • Types of Chat Bots • Periodic Bots • Random Bots • Responder Bots • Replay Bots • Classifiers • entropy classifier – 100 messages • machine learning classifier – 25 messages

  29. Experimental Evaluation • Classification Tests • Ent – entropy classifier • SupML – fully-supervised ML classifier, trained on AUG BOTS • SupMLre – fully-supervised ML classifier, retrained on NOV BOTS • EntML – entropy-trained ML

  30. Entropy Classifier • EN – entropy • CCE – corrected conditional entropy • (imd) – inter-message delay • (ms) – message size

  31. EN(imd) and CCE(imd) • problems against responder bots • detect most other chat bots

  32. EN(ms) and CCE(ms) • problems against random and replay bots • detect most other chat bots

  33. OVERALL • detects all chat bots • false positive rate is ~0.01 • 100 messages

  34. Entropy and Machine Learning Classifiers • Ent – entropy classifier (from last slide) • SupML – fully-supervised machine learning • SupMLre – SupML retrained • EntML – entropy-trained machine learning

  35. Ent • OVERALL results from previous slide

  36. SupML • has problems against November bots • needs to be retrained for new bots • SupMLre • detects all bots

  37. EntML • false positive rate is ~0.0005 (Ent is ~0.01) • 25 messages

  38. Outline • Background • Measurement • Classification System • Experimental Evaluation • Conclusion

  39. Conclusion • Measurements • overall, chat bots are less complex than humans • some chat bots more human-like • Classification System • exploits benefits of both classifiers • quickly classifies known chat bots • accurately classifies unknown chat bots

  40. Conclusion (cont.) • Future Work • investigate more advanced chat bots • explore applications of entropy on other forms of bots (e.g., web bots) • explore other applications of entropy (e.g., detecting covert timing channels)

  41. Questions? Thank You!

More Related