1 / 40

Machine Learning in Practice Lecture 4

This lecture discusses the application of machine learning in predicting sleep apnea. It explores the use of machine learning algorithms to contribute to the accumulation of medical wisdom and compares results with human-generated rules. The lecture also covers data cleansing techniques and the differences between decision trees and decision tables.

nestorl
Télécharger la présentation

Machine Learning in Practice Lecture 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in PracticeLecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif

  3. What is Sleep Apnea? Can we predict who will experience it? http://www.snoresnomore.com/images/snorin2.gif

  4. Plan for Today • Any questions • About the second assignment? • Announcements • Quiz 1 Answer Key on Blackboard • Comments about Assignment 1 • Kwiatowska Paper • Error Analysis • Data Cleansing • Trees vs Tables • Weka helpful hints • ARFF format

  5. General Points – Assignment I • Book Keeping Issues • Write your name on the assignment • Write the assignment number • Format the assignment using MS Doc • Please don’t use .docx formats • Visualize the tree in graphical fashion • Use Right-Click and then take a screen shot • Embed the figure in doc file

  6. Visualizing the Tree

  7. Visualizing the Tree

  8. Kwiatowksa Paper

  9. Clinical Prediction Rules • Example application of machine learning • Rules created by medical practitioners based on their experience • Can we use machine learning contribute to the accumulation of medical wisdom? • ER1 = If BMI > 40 and Age > 65 and Gender = male • Then OSA = Yes • ER2 = If BMI < 25 and Age < 25 and Gender = female • Then OSA = No

  10. Methodological Flaw?Human generated rules don’t cover most of the data ER1 ER2

  11. Machine Learning Result

  12. ML Wins some and Loses Others ER1 ER2

  13. Compare results • ER1 = If BMI > 40 and Age > 65 and Gender = male • Then OSA = Yes • ER2 = If BMI < 25 and Age < 25 and Gender = female • Then OSA = No Note: the paper says this is the tree for set B.

  14. Claims from paper… • Learned rules were largely consistent with human generated rules • Automatic • If BMI > 28.03 and Gender = Male • Then OSA = Yes • Human • If BMI > 40 and Age > 65 and Gender = Male • Then OSA = Yes • Do you buy their argument?

  15. More claims… • Is this a contradiction? • Automatic • If Gender = Male and MP = 2 • Then OSA = No • Human • If BMI > 40 and Age > 65 and Gender = Male • Then OSA = Yes • What about the relationship between age and MP or BMI and MP?

  16. What is the Mallampati classification? http://www.accessmedicine.com/search/searchAMResultImg.aspx? rootterm=mallampati+score&rootID=46310&searchType=1

  17. Thought questions • Would you trust medical “wisdom” that comes from data mining? • What would be your concerns? • What would you want to know about how the “wisdom” was learning?

  18. Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries

  19. Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries

  20. Data Cleansing Obvious things you can fix… • Inconsistent naming of nominal values • Names with or without middle initial • Nick name versus real name • Typos • City or street names may change over time • Street names may change depending on the block • Inconsistencies in how forms are filled out • Address and phone number fields in different countries

  21. Trees versus Tables

  22. Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees

  23. Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees

  24. Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees

  25. Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table Decision Tables vs Decision Trees

  26. Weka Helpful Hints

  27. Use the visualize tab to view 3-way interactions

  28. Click in one of the boxes to zoom in Use the visualize tab to view 3-way interactions

  29. Use the visualize tab to view 3-way interactions

  30. Weka Data Structures

  31. ARFF format

  32. Types of Attributes • Numeric (continuous) • @attribute temperature numeric • Real numbers or integers • Can be compared (less than, greater than, equality, inequality) • Some algorithms treat numeric scales as ratios or look at “distances” • Some methods normalize numeric scales • Some machine learning algorithms treat numbers as nominal values

  33. Types of Attributes • Nominal (categorical) • @attribute outlook {sunny, overcast, rainy} • Finite number of pre-specified values • Values are just labels (the actual label is not meaningful to the algorithms) • Values are not ordered and cannot be compared except for equality/inequality

  34. Types of Attributes • Strings (just like nominal, makes troubleshooting text processing more convenient) • @attribute description string • Value can be any string in quotes “Look, Mom! No hands!” • Can be converted to a vector of numeric attributes, each representing one word

  35. Types of Attributes • Date (numeric) • @attribute today date ‘YYYY-MM-dd-THH:mm:ss’ • 2006-01-24-T12:00:00 • Specified as strings but then converted to numbers when file is read

  36. Reasoning About Time

  37. Not Bad Performance with a Simple Split

  38. Threshold is Off

  39. Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D

  40. Questions?

More Related