Enhancing Precision in Rule Mining from Surveys and Questionnaires

Mining Rules from Surveys and Questionnaires Scott Burton and Richard Morris CS 676 Presentation 12 April 2011

Surveys and Questionnaires • Frequently Used • Problems for data mining • Rarity • Related and dependent questions • Ordinal / Likert scale

Association Rule Mining Market basket analysis Cookies -> Milk

Our Goal: Improve Precision Standard Algorithms/Approaches • Apriori, MS-Apriori • Too many rules • Rules are not “interesting” or actionable • Finding the needle in the haystack Our goal • Improve Precision • How do you measure “interestingness?”

Interestingness Measures • Mostly based on Support or Confidence • Considered about 40 different metrics • All seemed to favor the wrong types of rules

Our Datasets • Smoking habits of middle school students in Mexico • Global Youth Tobacco Survey for the Pan American Health Organization (GYTSPAHO) • ~65 Questions and 13,000 responses • HINTS (Health Information National Trends Survey) • hints.cancer.gov • 2007 response data had ~475 Questions and 8,000 responses • We focused on a subset of ~100 questions

Apriori vs. MS-Apriori Apriori (Figure 1) MS-Apriori (Figure 2)

Related and Dependent Questions True but worthless rules • Do you smoke=no -> Did you smoke last week=no Our approach • Cluster similar questions • Remove any intra-cluster rules 1 2 3 7 4 8 9 5 6

Creating Clusters • Distance Metrics • Bi-conditional prediction • Attribute vs. Attribute-Value pair • Involving the subject matter expert

A Sample Clustering of Questions (see handout)

Effects of Cluster Pruning MS-Apriori (Figure 2) After cluster pruning (Figure 3)

Similar Rules Abstract Viewpoint: • A B -> C D • A -> C D • A B -> C • A B Z -> C D

Similar Rule Pruning

Effects of Similar Rule Pruning After cluster pruning (Figure 3) After Similar Rule Pruning (Figure 4)

Ordinal and Likert Data Two Approaches • Pre-process • Post-process Likert Ordinal

Effects of Pre-Binning (Figure 5)

Other Examples • HINTS Data (see handout, Figures 6-10)

Conclusions and Future Work Conclusions • Increased precision of “interesting” rules • More work to be done Future work • Tuning of existing processes • Handle numerical data • Handle questions not asked to everyone • Handle questions with multiple responses • Try other record matching techniques for similar rule pruning

Enhancing Precision in Rule Mining from Surveys and Questionnaires

Enhancing Precision in Rule Mining from Surveys and Questionnaires

Presentation Transcript

Mining Association Rules

Designing Surveys Questionnaires

Surveys and Questionnaires

Mining Association Rules

Mining Association Rules

Eurostat experience with staff satisfaction questionnaires and surveys

Surveys and Questionnaires

Interest surveys, Questionnaires, and Interviews

Surveys and Questionnaires

Questionnaires and Interviews

Surveys I: Guidelines for Questionnaires

Surveys and Questionnaires

Surveys and Questionnaires

Association Rules Mining

Mining temporal interval relational rules from temporal data

RESULTS FROM QUESTIONNAIRES

Surveys, Proposed Mining Regulations and other things…

Mining Association Rules from Stars

Mining Association Rules from Microarray Gene Expression Data

Mining Fuzzy Spatial Association Rules from Image Data

Mining Association Rules from Microarray Gene Expression Data

Surveys and Questionnaires