170 likes | 285 Vues
This overview discusses various data mining methodologies and their applications across different domains. It highlights key conferences and workshops such as SIGMOD, KDD, and ICML, focusing on machine learning, sentiment analysis, and social network analysis. The paper presents structured prediction, deep learning approaches in web design, and explores crowdsourcing for data collection. Noteworthy contributions include language-independent sentiment mining models for platforms like Twitter, as well as advanced algorithms for characterizing user behavior in social networks.
E N D
Data Mining Reading and Sample Application Xietao Sept. 2013
outline • Basic Info • Paper Glance
Basic Info • SIGMOD VLDB ICDE PODS --- Database • KDD --- Data Mining • SNA-KDD --- workshop on SNS • ICML --- Machine Learning • SIGIR --- Info retrieval
Course&Tools • Andrew Moore • Coursera (Bio-Datamining) • OCW-MIT • Weka --- waikato (New Zealand) • Rapid Miner --- Yale • IlliMine --- UIUC • Alpha Miner --- HKU • Potter’s wheel A-B-C --- UCB
Paper Glance • ICML: • Neural Network\PCA\SVM\Framework • “Data-driven Web Design” • KDD: • Algorithm\Classfier\SNS\Cluster\Singular • “Learning from Crowds in the presence of Schools of Thought” • SNA-KDD • Twitter\Facebook\Weibo\Influence\Rumor • “Language-independent Bayesian sentiment mining on Twitter ”
Data-Driven Web Design • Conf: ICML 2012 • Author: • Ranjitha Kumar Stanford University • Jerry O. TaltonIntel Corporation • Salman Ahmad MIT • Scott R. KlemmerStanford University
Abstract • Applying machine learning methods to web design problems • Structured prediction • Deep learning • Probabilistic program induction • Enable useful interactions for designers
Detail • Structured prediction : Rapid retargeting • Deep learning : Design-based Search • Probabilistic program induction : Operationalizing design patterns
Learning from Crowds in the Presence ofSchools of Thought • Conf : KDD 2012 • Author: • YuandongTian CMU • Jun Zhu THU
Abstract • Crowdsourcing: effective way to collect large-scale experimental data from distributed workers • Target: Identify reliable workers as well as unambiguous tasks
Detail • Gold standard: task is objective with one correct answer • Schools of thought: each task may have multiple valid answers
Language-independent Bayesian sentiment mining on TwitterLanguage-independent Bayesian sentiment mining on Twitter • Conf : SNA-KDD 2011 • Author: • Alex Davies University of Cambridge • ZoubinGhahramani University of Cambridge
Abstract • New Language-independent model for sentiment analysis of short, social-network statues • Machine learning \ Bayesian Classfier
Detail • Tweet is short, Senti-Icon shows a lot • Asymmetric Dirichletdistribution for word probability on sentiment • Iteratively Update the distribution and compute the probabilily
More • “Joint Optimization of Bid and Budget Allocation in Sponsored Search” • KDD 2012 by SJTU • “Analysis and identification of spamming behaviors in SinaWeibomicroblog” • SNA-KDD 2013 by SJTU