400 likes | 405 Vues
This lecture discusses the application of machine learning in predicting sleep apnea. It explores the use of machine learning algorithms to contribute to the accumulation of medical wisdom and compares results with human-generated rules. The lecture also covers data cleansing techniques and the differences between decision trees and decision tables.
E N D
Machine Learning in PracticeLecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif
What is Sleep Apnea? Can we predict who will experience it? http://www.snoresnomore.com/images/snorin2.gif
Plan for Today • Any questions • About the second assignment? • Announcements • Quiz 1 Answer Key on Blackboard • Comments about Assignment 1 • Kwiatowska Paper • Error Analysis • Data Cleansing • Trees vs Tables • Weka helpful hints • ARFF format
General Points – Assignment I • Book Keeping Issues • Write your name on the assignment • Write the assignment number • Format the assignment using MS Doc • Please don’t use .docx formats • Visualize the tree in graphical fashion • Use Right-Click and then take a screen shot • Embed the figure in doc file
Clinical Prediction Rules • Example application of machine learning • Rules created by medical practitioners based on their experience • Can we use machine learning contribute to the accumulation of medical wisdom? • ER1 = If BMI > 40 and Age > 65 and Gender = male • Then OSA = Yes • ER2 = If BMI < 25 and Age < 25 and Gender = female • Then OSA = No
Methodological Flaw?Human generated rules don’t cover most of the data ER1 ER2
ML Wins some and Loses Others ER1 ER2
Compare results • ER1 = If BMI > 40 and Age > 65 and Gender = male • Then OSA = Yes • ER2 = If BMI < 25 and Age < 25 and Gender = female • Then OSA = No Note: the paper says this is the tree for set B.
Claims from paper… • Learned rules were largely consistent with human generated rules • Automatic • If BMI > 28.03 and Gender = Male • Then OSA = Yes • Human • If BMI > 40 and Age > 65 and Gender = Male • Then OSA = Yes • Do you buy their argument?
More claims… • Is this a contradiction? • Automatic • If Gender = Male and MP = 2 • Then OSA = No • Human • If BMI > 40 and Age > 65 and Gender = Male • Then OSA = Yes • What about the relationship between age and MP or BMI and MP?
What is the Mallampati classification? http://www.accessmedicine.com/search/searchAMResultImg.aspx? rootterm=mallampati+score&rootID=46310&searchType=1
Thought questions • Would you trust medical “wisdom” that comes from data mining? • What would be your concerns? • What would you want to know about how the “wisdom” was learning?
Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries
Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries
Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries
Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees
Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees
Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees
Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees
Click in one of the boxes to zoom in Use the visualize tab to view 3-way interactions
Types of Attributes • Numeric (continuous) • @attribute temperature numeric • Real numbers or integers • Can be compared (less than, greater than, equality, inequality) • Some algorithms treat numeric scales as ratios or look at “distances” • Some methods normalize numeric scales • Some machine learning algorithms treat numbers as nominal values
Types of Attributes • Nominal (categorical) • @attribute outlook {sunny, overcast, rainy} • Finite number of pre-specified values • Values are just labels (the actual label is not meaningful to the algorithms) • Values are not ordered and cannot be compared except for equality/inequality
Types of Attributes • Strings (just like nominal, makes troubleshooting text processing more convenient) • @attribute description string • Value can be any string in quotes “Look, Mom! No hands!” • Can be converted to a vector of numeric attributes, each representing one word
Types of Attributes • Date (numeric) • @attribute today date ‘YYYY-MM-dd-THH:mm:ss’ • 2006-01-24-T12:00:00 • Specified as strings but then converted to numbers when file is read
Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D