Machine Learning in Practice Lecture 4

Machine Learning in PracticeLecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif

What is Sleep Apnea? Can we predict who will experience it? http://www.snoresnomore.com/images/snorin2.gif

Plan for Today • Any questions • About the second assignment? • Announcements • Quiz 1 Answer Key on Blackboard • Comments about Assignment 1 • Kwiatowska Paper • Error Analysis • Data Cleansing • Trees vs Tables • Weka helpful hints • ARFF format

General Points – Assignment I • Book Keeping Issues • Write your name on the assignment • Write the assignment number • Format the assignment using MS Doc • Please don’t use .docx formats • Visualize the tree in graphical fashion • Use Right-Click and then take a screen shot • Embed the figure in doc file

Visualizing the Tree

Kwiatowksa Paper

Clinical Prediction Rules • Example application of machine learning • Rules created by medical practitioners based on their experience • Can we use machine learning contribute to the accumulation of medical wisdom? • ER1 = If BMI > 40 and Age > 65 and Gender = male • Then OSA = Yes • ER2 = If BMI < 25 and Age < 25 and Gender = female • Then OSA = No

Methodological Flaw?Human generated rules don’t cover most of the data ER1 ER2

Machine Learning Result

ML Wins some and Loses Others ER1 ER2

Compare results • ER1 = If BMI > 40 and Age > 65 and Gender = male • Then OSA = Yes • ER2 = If BMI < 25 and Age < 25 and Gender = female • Then OSA = No Note: the paper says this is the tree for set B.

Claims from paper… • Learned rules were largely consistent with human generated rules • Automatic • If BMI > 28.03 and Gender = Male • Then OSA = Yes • Human • If BMI > 40 and Age > 65 and Gender = Male • Then OSA = Yes • Do you buy their argument?

More claims… • Is this a contradiction? • Automatic • If Gender = Male and MP = 2 • Then OSA = No • Human • If BMI > 40 and Age > 65 and Gender = Male • Then OSA = Yes • What about the relationship between age and MP or BMI and MP?

What is the Mallampati classification? http://www.accessmedicine.com/search/searchAMResultImg.aspx? rootterm=mallampati+score&rootID=46310&searchType=1

Thought questions • Would you trust medical “wisdom” that comes from data mining? • What would be your concerns? • What would you want to know about how the “wisdom” was learning?

Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries

Trees versus Tables

Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees

Weka Helpful Hints

Use the visualize tab to view 3-way interactions

Click in one of the boxes to zoom in Use the visualize tab to view 3-way interactions

Use the visualize tab to view 3-way interactions

Weka Data Structures

ARFF format

Types of Attributes • Numeric (continuous) • @attribute temperature numeric • Real numbers or integers • Can be compared (less than, greater than, equality, inequality) • Some algorithms treat numeric scales as ratios or look at “distances” • Some methods normalize numeric scales • Some machine learning algorithms treat numbers as nominal values

Types of Attributes • Nominal (categorical) • @attribute outlook {sunny, overcast, rainy} • Finite number of pre-specified values • Values are just labels (the actual label is not meaningful to the algorithms) • Values are not ordered and cannot be compared except for equality/inequality

Types of Attributes • Strings (just like nominal, makes troubleshooting text processing more convenient) • @attribute description string • Value can be any string in quotes “Look, Mom! No hands!” • Can be converted to a vector of numeric attributes, each representing one word

Types of Attributes • Date (numeric) • @attribute today date ‘YYYY-MM-dd-THH:mm:ss’ • 2006-01-24-T12:00:00 • Specified as strings but then converted to numbers when file is read

Reasoning About Time

Not Bad Performance with a Simple Split

Threshold is Off

Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D

Questions?

Machine Learning in Practice Lecture 4

Machine Learning in Practice Lecture 4

Presentation Transcript

Machine Learning: Lecture 4

Machine Learning in Practice Lecture 9

Machine Learning 4

Machine Learning in Practice Lecture 3

Machine Learning in Practice Lecture 18

Machine Learning in Practice Lecture 12

Machine Learning in Practice Lecture 19

CS 461: Machine Learning Lecture 4

CS 461: Machine Learning Lecture 4

Machine Learning – Lecture 4

Machine Learning in Practice MidTerm Review

Machine Learning in Practice Lecture 14

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 5

Machine Learning in Practice Lecture 8

Machine Learning: Lecture 6

Machine Learning: Lecture 5

Machine Learning in Practice Lecture 26

Machine Learning in Practice Lecture 27

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 6