820 likes | 825 Vues
Data Mining (and machine learning). ROC curves Rule Induction Basics of Text Mining. Two classes is a common and special case. Two classes is a common and special case. Medical applications: cancer, or not? Computer Vision applications: landmine, or not?
E N D
Data Mining(and machine learning) ROC curves Rule Induction Basics of Text Mining
Two classes is a common and special case Medical applications: cancer, or not? Computer Vision applications: landmine, or not? Security applications: terrorist, or not? Biotech applications: gene, or not? … …
Two classes is a common and special case Medical applications: cancer, or not? Computer Vision applications: landmine, or not? Security applications: terrorist, or not? Biotech applications: gene, or not? … …
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. False Negative: also to be minimised – miss a landmine / cancer very bad in many applications
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. False Negative: also to be minimised – miss a landmine / cancer very bad in many applications True Negative?:
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition?
Sensitivity: 100% Specificity: 25% YES NO YESNO
Sensitivity: 93.8% Specificity: 50% YESNO
Sensitivity: 81.3% Specificity: 83.3% YESNO YES NO
Sensitivity: 56.3% Specificity: 100% YESNO YES NO
Sensitivity: 100% Specificity: 25% YES NO YESNO 100% Sensitivity means: detects allcancer cases (or whatever) but possibly with many false positives
Sensitivity: 56.3% Specificity: 100% YESNO YES NO 100% Specificity means: misses some cancer cases (or whatever) but no false positives
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases? A highly sensitive test for cancer: if “NO” then you be sure it’s “NO” Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”. With many trained classifiers, you can ‘move the line’ in this way. E.g. with NB, we could use a threshold indicating how much higher the log likelihood for Y should be than for N
ROC curves David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Rule Induction • Rules are useful when you want to learn a clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible • There are a number of different ways to ‘learn’ rules or rulesets. • Before we go there, what is a rule / ruleset?
Rules IF Condition … Then Class Value is …
Rules are Rectangular YESNO IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Rules are Rectangular YESNO IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
A Ruleset IF Condition1 … Then Class = A IF Condition2 … Then Class = A IF Condition3 … Then Class = B IF Condition4 … Then Class = C …
What’s wrong with this ruleset? (two things) YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
What about this ruleset? YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Two ways to interpret a ruleset: As a Decision List IF Condition1 … Then Class = A ELSE IF Condition2 … Then Class = A ELSE IF Condition3 … Then Class = B ELSE IF Condition4 … Then Class = C … ELSE … predict Background Majority Class
Two ways to interpret a ruleset: As an unordered set IF Condition1 … Then Class = A IF Condition2 … Then Class = A IF Condition3 … Then Class = B IF Condition4 … Then Class = C Check each rule and gather votes for each class If no winner, predict background majority class
Three broad ways to learn rulesets 1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!
Three broad ways to learn rulesets 2. Use any good search/optimisation algorithm. Evolutionary (genetic) algorithms are the most common. You will do this coursework 3. This means simply guessing a ruleset at random, and then trying mutations and variants, gradually improving them over time.
Three broad ways to learn rulesets 3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage
Take each class in turn .. YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Pick a random member of that class in the training set YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Next class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Next class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
And so on… YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
2012 Students’ implementation choices for DMML CW1 2014 “Word Clouds” - word frequency patterns provides useful information
Classify sentiment • “Word Clouds” • - word frequency patterns • provides useful information • …which can be used to predict • a class value / category / signal • … in this case • the document(s) are “tweets • mentioning our airline over • past few hours” • class value is a satisfaction • score, between 0 and 1 ACS Index Twitter sentiment http://www.inside-r.org/howto/mining-twitter-airline-consumer-sentiment
http://necsi.edu/research/social/newyork/sentimentmap/ sentiment map of NYC more info from tweets, this time, a “happiness” score.
“similar pages” Based on distances between word frequency patterns
Predicting relationship between two people based on their text messages
Can you predict class: Desktop, Laptop or LED-TV from word frequencies of product description on amazon ?