410 likes | 618 Vues
Addressing Machine Learning Challenges to Perform Automated Prompting. Barnan Das. PhD Preliminary Exam. November 8, 2012.
 
                
                E N D
Addressing Machine Learning Challenges to Perform Automated Prompting Barnan Das PhD Preliminary Exam November 8, 2012 ***Self-portraits by William Utermohlen, an American artist living in London, after he was diagnosed with Alzheimer’s disease in 1995. Utermohlen died from the consequences of Alzheimer’s disease in March 2007.
36 million Worldwide Dementia population Actual and expected number of Americans >=65 year with Alzheimer’s $200 billion Payment for care in 2012 15 million Unpaid caregivers Source: World Health Organization and Alzheimer’s Association.
Automated Prompting Help with Activities of Daily Living (ADLs)
Existing Work • Rule-based (temporal or contextual) • Activity initiation • RFID and video-input based prompts for activity steps Our Contribution • Learning-based • Sub-activity level prompts • No audio/video input
System Architecture Published at ICOST 2011 and Journal of Personal and Ubiquitous Computing 2012.
prompt Off-line Classification of Activity Steps no-prompt
Class Distribution Total number of data points 3980
Existing Work • Preprocessing • Sampling • Over-sampling minority class • Under-sampling majority class • Oversampling minority class • Spatial location of samples in Euclidean feature space
Proposed Approach • Preprocessing technique • Oversampling minority class • Based on Gibbs sampling Attribute Value Markov Chain Node Submitted at Journal of Machine Learning Research, 2012.
Proposed Approach Markov Chains Minority Class Samples Majority Class Samples
(wrapper-based)RApidlyCOnvergingGibbs sampler: RACOG & wRACOG • Differ in sample selection from Markov chains • RACOG: • Based on burn-in and lag • Stopping criteria: predefined number of iterations • Effectiveness of new samples is not judged • wRACOG: • Iterative training on dataset, addition of misclassified data points • Stopping criteria: No further improvement of performance measure (TP rate)
Experimental Setup Implemented Gibbs sampling, SMOTEBoost, RUSBoost
Results (RACOG & wRACOG) Geometric Mean (TP Rate, TN Rate) TP Rate
Results (RACOG and wRACOG) ROC Curve
Overlapping Classes in Prompting Data 3D PCA Plot of prompting data
Existing Work • Discard data of the overlapping region • Treat overlapping region as a separate class
Cluster-Based Under-Sampling(ClusBUS) Form clusters Under-sampling interesting clusters Published in IOS Press Book on Agent-Based Approaches to Ambient Intelligence, 2012.
s1 s2 Unsupervised Learning of Prompt Situations on Streaming Sensor Data s4 s1 s3 s2
Motivation • Several hundred man-hours to label activity steps • High probability of inaccuracy • Needs activity-step recognition model
Modeling Activity Errors Abnormal Occurrence Delayed Occurrence
Modeling Delayed Occurrence Elapsed Time Sensor Frequency
Predicting Errors At every sensor event evaluate: Likelihood of sensor si occurrence for participant pj Probability of elapsed time for current nth occurrence of sensor si Probability of all sensor frequency for current nth occurrence of sensor si
Preliminary Experiments Elapsed Time No observable trend Sensor Frequency No observable trend
Current Obstacles • Noisy data • Unwanted sensor events, specifically, object sensors • Erroneous activity sequences not suitable for model evaluation
Proposed Plan • Identifying suitable distributions for modeling sensor frequency and elapsed time • Finding out additional statistical measures that can model the errors better • Building generalized prompt model for all six ADLs (if at all possible(?)) • Need data to evaluate proposed model • Synthetically generate erroneous sequences from normal sequences(?) • Collect more data if necessary