180 likes | 326 Vues
A Twitter Recommend System based on Topic Modeling. Tweetool (0. 1 100 version) Final Report. Yilei Qian Computer Science University of Southern California qianyilei.usc@gmail.com. Ideas. Following too many points on Twitter Too many news every day
E N D
A Twitter Recommend System based on Topic Modeling Tweetool(0. 1 100 version)Final Report YileiQian Computer Science University of Southern California qianyilei.usc@gmail.com
Ideas • Following too many points on Twitter • Too many news every day • Cannot find the interested and valued news • Don’t know the name which user want to follow • Need someone to recommend who to follow • Need someone to recommend the hottest news • Use topic modeling to re-rank all the user
Topic Modeling • a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. • Always used in natural language processing. • Reference Papers: • Steyvers,m. and Griffiths, T., “Probabilistic topic models,” Hand book of latent semantic analysis • Blei, D.M and Ng, A.Y and Jordan, M.I, “Latent Dirichlet Allocation”, The Journal of Machine Learning Research 2003
Label based LDA • Step: • Build the LDA Model • Train the model instance by train document • Run the LDA for all the data based on trained model instance • Problem: • Punctuation marks. E.g. “”,.={}() … • Frequent words. E.g I , you…. • Other Noise
Result Generate • By Angle Value = • By Distance Value =
13-Dimension Topics Art & Design Book Business Charity Entertainment Family Fashion Food & Drink Health Music News Science & Technology Sports
Languages & Tools • Web UI: HTML + AJAX(Unfinished) +CSS(unfinished)+Twitter REST API • Android UI: Java, Android 2.1(unfinished) • Server Side: Java 1.6, Servlet 2.0, Spring 3.0, Hibernate 3.3 • Twitter API: Twitter4j 2.2.1 (300 request per hour) • Server: Tomcat 7.08 • Database: MySQL 5.5 • Data Package: JSON • Develop Platform: Eclipse 3.4 • Total code lines: 2000(+) + 2421 + 462 = 5000(+) • Subversion: • http://tweetool-yilei.googlecode.com/svn/trunk/tweetool-yilei-read-only
Architecture Mobile Device HTML Servlets APPLICATION CONTEXT Work Flow Work Flow Work Flow Hibernate DAO Twitter fetch DB LLDA Tweetool
Problems(endless T_T) • High noise in topic model • Few words, Odd marks, Abbreviation • Unfamiliar with Twitter API, A lot of bugs • Transaction Problems • The Ugly UI • Poor performance • Don’t have enough time. Many functions are unfinished • Tweetool system should be reconstructed !!! • Environment: 7000+Users 22,0000+Tweets
Future Work • Try to finish it • Debug • Build a better train file • Add feedback function • Better topics classification
Android UI Title Titile News News News Function Button Function Button Function Button Function Button Main Menu News Menu