1 / 14

Data Mining

Data Mining. -Association Rules By Rui Zhao. What is data mining?. The automated extraction of hidden predictive information from database Allows users to analyze large databases to solve business decision problems.

dianne
Télécharger la présentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining -Association Rules ByRui Zhao

  2. What is data mining? • The automated extraction of hidden predictive information from database • Allows users to analyze large databases to solve business decision problems. • An extension of statistics, with a few artificial intelligence and machine learning twists thrown in. • Attempts to discover rules and patterns from data.

  3. Consider a catalog retailer who needs to decide who should receive information about a new product. The information operated on by the data mining process is contained in a historical database of previous interactions with customers and the features associated with the customers, such as age, zip code, their responses. The data mining software would use this historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are the most likely to respond. The operational business software can then feed the results of the decision to the appropriate touch point systems (call centers, direct mail, web servers, email systems, etc) so that the right customers receive the right offers. example

  4. Applications of data mining • Require some sort of Prediction: for example: when a person applies for a credit card, the credit-card company wants to predict if the person is a good credit risk. • Looks for Associations: for example: if a customer buys a book, an on-line bookstore may suggest other associated books.

  5. Associations Rule Discovery • Task: Discovering association rules among items in a transaction database. • An association among two items A and B means that the presence of A in a record implies the presence of B in the same record: A => B • In general: A1, A2, … => B

  6. Association Rules(cont.) • Retail shops are often interested in associations between items that people buy. • Someone who buys bread is quite likely also to buy milk. association rule: bread => milk • A person who brought the book Database System Concepts is quite likely also to buy the book Operating System Concepts. association rule: DSC => OSC

  7. Association rules (cont.) • Two numbers: • Support:is a measure of what fraction of the population satisfies both the antecedent and the consequent of the true. • Confidence:is a measure of how often the consequent is true when the antecedent is true.

  8. Association rules (cont.) • Used to find all rules in a basket data • Basket data also called transaction data • analyze how items purchased by customers in a shop are related • discover all rules that have:- • support greater than minsup specified by user • confidence greater than minconf specified by user

  9. Association rules (cont.) • Let I = {i1, i2, …im} be a total set of items D is a set of transactions d is one transaction consists of a set of items d  I • Association rule: • X  Y where X  I ,Y  I and X  Y =  • support = (#of transactions contain X  Y ) /D • confidence = (#of transactions contain X  Y ) / #of transactions contain X

  10. example • Example of transaction data: • CD player, music’s CD, music’s book • CD player, music’s CD • music’s CD, music’s book • CD player • I = {CD player, music’s CD, music’s book} • D = 4 • #of transactions contain both CD player, music’s CD =2 • #of transactions contain CD player =3 • CD player  music’s CD (sup=2/4 , conf =2/3 )

  11. Strong Association Rule • User sets support and confidence thresholds. (e.g. at least 100 relations, 80% confidence) • Rules above support threshold have LARGE support. • Rules above confidence threshold have HIGH confidence. • Rules satisfying both are said to be STRONG.

  12. Association rules • How are association rules mined from large databases ? • Two-step process: • find all frequent item sets • generate strong association rules from frequent item sets

  13. Classification vs. Association • Classification • to mine a small set of rules existing in the data to form a classifier or predictor • it has a target attribute • dataset are in the form of relation table • Association • dataset are transaction data • has no fixed target • can fixed it, thus can be used for classification • A=a, B=b  Class = yes • A=c  Class = no

  14. References • Professor Lee’s lectures • http://www.cs.sjsu.edu/~lee/cs157b/cs157b.html • Web-Site • http://www.thearling.com/ • pizza.unbsj.ca/~owen/backup/courses/OLAP-2004/dm.pdf

More Related