1 / 34

Chen LIN * , Jiang-Ming YANG + , Rui CAI + , Xin-jing WANG + , Wei WANG * , Lei ZHANG +

SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS. Chen LIN * , Jiang-Ming YANG + , Rui CAI + , Xin-jing WANG + , Wei WANG * , Lei ZHANG + * Fudan University + Microsoft Research Asia. OUTLINE. Motivation

dougal
Télécharger la présentation

Chen LIN * , Jiang-Ming YANG + , Rui CAI + , Xin-jing WANG + , Wei WANG * , Lei ZHANG +

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMULTANEOUSLY MODELING SEMANTICS AND STRUCTURE OF THREADED DISCUSSIONS: A SPARSE CODING APPROACH AND ITS APPLICATIONS Chen LIN *, Jiang-Ming YANG +, Rui CAI +, Xin-jing WANG +, Wei WANG *, Lei ZHANG + *Fudan University +Microsoft Research Asia

  2. OUTLINE • Motivation • Challenges • Model • Application • Reply reconstruction • Junk post detection • Expert finding • Experiments • Conclusion

  3. THREADED DISCUSSIONS Chat rooms IMs root reply Mailing lists Web forums

  4. IMPORTANT DATA SOURCES

  5. MINING SEMANTICS & STRUCTURE Junk Identification Expert Search Measure post quality …

  6. Semantics & Structure CHALLENGE Junk Post Post Quality

  7. SEMANTIC & STRUCTURE Semantic: Topics Structure: Who reply to who

  8. Semantics & Structure CHALLENGE Junk Post Post Quality

  9. JUNK POST

  10. Semantics & Structure CHALLENGE Junk Post Post Quality

  11. POST QUALITY valuable post

  12. MODEL • Purpose: Simultaneously modeling • semantics • Structures • Methodology • Intuitive • Matrix based • Sparse coding root reply

  13. INTUITION

  14. A THREAD HAS SEVERAL TOPICS

  15. SEMANTIC REPRESENTATION OF THREAD Project posts to topic space D X Θ • Minimize:

  16. A POST IS RELATED TO PREVIOUS POSTS approximate each post as linear combination of previous posts Minimize Θ b:

  17. A POST IS RELATED TO A FEW TOPICS government cobol

  18. SPARSE SEMANTICS OF POST D X Θ • Minimize:

  19. A POST IS RELATED TO A FEW POSTS Minimize approximate each post as linear combination of previous posts Θ b: Sparse

  20. OPTIMIZE THEM TOGETHER Model semantic Model structure

  21. APPLICATIONS • Reply reconstruction • Capability of recognizing structure • Junk identification • Capability of capturing semantics • Expert finding • Capability of measuring post quality

  22. REPLY RECONSTRUCTION Document Similarity Topic Similarity Structure Similarity

  23. DATA SET Slashdot Apple discussion

  24. BASELINES • NP • Reply to Nearest Post • RR • Reply to Root • DS • Document Similarity • LDA • Latent Dirichlet Allocation • Project documents to topic space • SWB • Special Words Topic Model with Background distribution • Project documents to topic and junk topic space

  25. EVALUATION

  26. , JUNK IDENTIFICATION • D= • X = • Θ = • Probability of junk

  27. DATA SET Slashdot Apple discussion

  28. BASELINES • DF • SVM • Classify posts as junk posts & non-junk posts • SWB • Special Words Topic Model with Background distribution • Project documents to topic and junk topic space

  29. EVALUATION

  30. EXPERT FINDING

  31. BASELINES • LM • Formal Models for Expert Finding in Enterprise Corpora. SIGIR 06 • Achieves stable performance in expert finding task using a language model • PageRank • Benchmark nodal ranking method • HITS • Find hub nodes and authority node • EABIF • Personalized Recommendation Driven by Information Flow. SIGIR ’06 • Find most influential node

  32. EVALUATION • Bayesian estimate

  33. DISCUSSION • Parameters vs. Model Complexity • Linear regression • SMSS model Though the number of parameters is increased, the projection space is shrunk by the prior knowledge. Prior knowledge Prior knowledge

  34. CONCLUSION • Purpose • Mine the semantics • Mine the structure • Highlight • Simultaneously model the • Semantic • Structure • Applications are designed to evaluate the model • Reply reconstruction • Junk identification • Expert Finding

More Related