1 / 44

Data Mining in Practice: Techniques and Practical Applications

Data Mining in Practice: Techniques and Practical Applications . Junling Hu May 14, 2013. What is data mining?. Mining patterns from data Is it statistics? Functional form? Computation speed concern? Data size Variable size Is it machine learning? Big data issue

odysseus
Télécharger la présentation

Data Mining in Practice: Techniques and Practical Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Practice:Techniques and Practical Applications Junling Hu May 14, 2013

  2. What is data mining? • Mining patterns from data • Is it statistics? • Functional form? • Computation speed concern? • Data size • Variable size • Is it machine learning? • Big data issue • New methods: network mining

  3. Examples of data mining • Frequently bought together • Movie recommendation

  4. More examples of data mining • Keyword suggestions • Genome & disease mining • Heart monitoring

  5. Overview of data mining • Frequent pattern mining • Machine Learning • Supervised • Unsupervised • Stream mining • Recommender system • Graph mining • Unstructured data • Text, • Audio • Image and Video • Big data technology

  6. Frequent Pattern Mining • Diaper and Beer • Product assortment • Click behavior • Machine breakdown ?

  7. The case of Amazon • Count frequency of co-occurrence • Efficient algorithm

  8. Machine Learning Process

  9. Machine Learning • Supervised • Unsupervised (clustering)

  10. Binary classification Input features Output class Data point

  11. Classification (1) • Decision tree

  12. Classification (2): Neural network • Perceptron • Multi-layer neural netowrk

  13. Head pose detection

  14. Support Vector Machine (SVM) • Search for a separating hyperplane • Maximize margin

  15. Perceived advantage of SVM • Transform data into higher dimension

  16. Applications of SVM: Spam Filter Input Features: • Transmission • IP address --167.12.24.555 • Sender URL -- one-spam.com • Email header • From --“admin@one-spam.cpm” • To --“undisclosed” • cc • Email Body • # of paragraphs • # words • Email structure • # of attachments • # of links

  17. Logistic regression • Advantage: Simple functional form • Can be parallelized • Large scale

  18. Applications of logistic regression • Click prediction • Search ranking (web pages, products) • Online advertising • Recommendation • The model • Output: Click/no click • Input features: page content, search keyword, User information

  19. Regression • Linear regression • Non-linear regression • Application: • Stock price prediction • Credit scoring • employment forecast

  20. History of Supervised learning

  21. Semi-supervised learning • Application: • Speech dialog system

  22. Unsupervised learning: Clustering • No labeled data • Methods • K-means

  23. Categories of machine learning

  24. Applications of Clustering • Malware detection • Document clustering: Topic detection

  25. Graphs in our life • Social network • Molecular compound Friend recommendation Drug discovery

  26. Graph and its matrix representation Adjacency matrix 1 2 3 4 6 5

  27. The web graph Page 2 Page 1 Hyperlink Anchor text Anchor text Anchor text Page 3 Anchor text

  28. PageRank as a steady state • Transition matrix P= • PageRank is a probability vector such that

  29. Discover influencers on Twitter • The Twitter graph • Node • Link • A PageRank approach: TwitterRank 2 3 Following 1 4 5

  30. Facebook graph search • Entity graph • Natural language search • “Restaurants liked by my friends”

  31. Recommending a game

  32. Recommendation in Travel site

  33. Prediction Problems • Rating Prediction • Given how an user rated other items, predict the user’s rating for a given item • Top-N Recommendation • Given the list of items liked by an user, recommend new items that the user might like ? ****

  34. Explicit vs. Implicit Feedback Data • Explicit feedback • Ratings and reviews • Implicit feedback (user behavior) • Purchase behavior: Recency, frequency, … • Browsing behavior: # of visits, time of visit, time of staying, clicks

  35. Collaborative Filtering • Hypotheses • User/Item Similarities • Similar users purchase similar items • Similar items are purchased by similar users • Matching characteristics • Match exists between user’s and item’s characteristics

  36. User-User similarity • User’s movie rating

  37. Item-item similarity

  38. Application of item-item similarity • Amazon

  39. SVD (Singular Value Decomposition)

  40. Latent factors

  41. Application of Latent Factor Model • GetJar

  42. Ranking-based recommendation

  43. Application in LinkedIn • Ranking-based model

  44. Thanks and Contact • Co-author: Patricia Hoffman Contact: • junlinghu@gmail.com • Twitter: @junling_tech

More Related