Thesis Proposal

Thesis Proposal PrActive Learning: Practical ActiveLearning, Generalizing Active Learning for Real-World Deployments

Generic example system flow for interactive classification problems Majority transactions cleared Domain system pricing and validation Machine Learning model Large volume (in millions) of transactions coming in Transactions paid Minority transactions flagged for auditing • Common Characteristics • Skewed class distribution (minority events) • Concept/Feature drift • Expensive domain experts • Biased sampling of labeled historical data

Interactive Classification Applications • Fraud detection • Network Intrusion detection • Video Surveillance • Information Filtering / Recommender Systems • Error prediction/Quality Control

Interactive Classification Setting Trained Classifier Ranked List scored by classifier Unlabeled + Labeled Data • Classifier trained from labeled data • Human (user/expert) in the loop using the results but also providing feedback at a cost • Goal: Maximize the Return on Investment which is equivalent to the productivity of the human

Factorization of the problem Cost-Sensitive Exploitation Cost-Sensitive Active Learning Exploration-Exploitation Tradeoffs Standard Ranking / Relevance Feedback Active Learning

Interactive Classification-High Level Picture Unlabeled Data (t) Trained Classifier (1,…,t-1) Labeled Data (1,…,t-1) Ranked List Labeled Data (1,…,t)

Thesis Contributions • Problem Statement: How to generalize active learning to incorporate crucial factors like differential utility of a labeled example(dynamic/variable exploitation), dynamic cost of labeling an example, concept drift in a unified framework that makes the deployment of such learning systems practical • Contributions • Generalization of Active Learning along the following dimensions • Differential utility of a labeled example • Dynamic cost of labeling an example • Tackling concept drift • A unified framework to solve these considerations jointly • First solution: Optimizing joint utility function based on cost, exploration utility and exploitation utility • Second solution: Using Upper Confidence Bound approach with contextual multi-armed bandit setup to incorporate the different factors • Empirical Evaluation of the proposed framework • Using evaluation metric motivated by real business tasks • Datasets • Synthetic dataset • Real world dataset: Health Insurance Claims Rework • Cost-Sensitive Exploitation

Situating the thesis work wrt related work • Knowledge • Based Learning • Feature level • Knowledge encoding (GE) Cost-sensitive Active Learning • PrActive • Learning • Differential Utility • Dynamic cost • Concept Drift • Proactive • Learning • Unreliable Oracle • Oracle variation

Factorization– Cost/Exploitation/Exploration • Type of model pre-determined by the domain need • Following are the 3 possible types of models: • Uniform • Each example gets the same value from the model • Variable • Each example can potentially have different value that is a function of it’s features • Markovian • Each example has variable value which is a function of it’s features and the history(ordered) of examples already labeled

Utility Function

Joint Optimization Algorithm

Evaluation Metric • Return on Investment • The net dollar amount saved by auditing the claim per dollar amount invested/spent • Net Savings/Net Cost • Net Savings=Net dollar amount saved-Cost • The net dollar amount can be determined independent of the exploitation model • For Claims rework: Admin cost savings or Med Cost savings + admin cost savings • Long term evaluation (difficult to see the exploration effects in short time windows)

Factorization - continued • Each factor: Cost, Exploration, Exploitation can have following 3 setups: • Uniform • Variable: static/pre-determined • Variable: dynamic/online

Thesis Contributions • Define the novel area of interactive classification for skewed class distribution • Problem definition/setup • Characterization of the problem • Factorization of the problem • Hypothesis: Jointly managing the different factors involved will lead to better overall performance metric over time than considering the factors in isolation • Framework for solving interactive skewed classification problems • Modules: Cost model, Exploitation model, Exploration model, Utility function, Joint optimization algorithm, Evaluation metric • Demonstrate the usefulness of the framework for: • Synthetic data • Generalization • Health Claims Error prediction problem • Temporal Active Learning • Cost-Sensitive Exploitation

Trained Classifier Ranked List Unlabeled + Labeled Data

Unlabeled Data (t) Trained Classifier (t-1) Labeled Data (t-1) Ranked List Labeled Data (t)

Thesis Contributions • Problem Statement: What are the considerations for developing/deploying a long running system for interactive classification task where the system is assisting human experts in solving business tasks • Contributions • Define the area of interactive classification for skewed class problems motivated by deploying these learning systems that run over time • Framework for solving interactive skewed classification problems • Defining the trade-offs between exploitation, exploration and cost • What are the relevant metrics to evaluate such systems • Hypothesis: Jointly managing the different factors involved will lead to better overall performance metric over time than considering the factors in isolation • Explore, evaluate and compare solutions for the framework • First approach: Defining a joint utility function and optimizing for the utility function • Second approach: Using upper confidence bounds with contextual multi-armed bandit • Demonstrate the usefulness of the framework for: • Synthetic data • Health Claims Error prediction problem • For demonstrating generalization • Handling temporal drift with active sampling from evolving unlabeled pool • Cost-Sensitive Exploitation

Thesis Proposal

Thesis Proposal

Presentation Transcript

Essentials of Thesis Proposal

Essentials of Thesis Proposal

Thesis Proposal Title

Developing Master’s Thesis Proposal

Thesis Proposal

Thesis Proposal

Thesis Proposal

THESIS PROPOSAL

WEB THESIS PROPOSAL

Thesis Lighting Design Proposal

Thesis Proposal

Thesis proposal

Thesis Proposal

Doctoral Thesis Proposal

PhD Thesis Proposal

Thesis Proposal

Coursework Master’s Thesis Proposal

THESIS RESEARCH & PROPOSAL

Thesis proposal

Thesis proposal writing help

Thesis Proposal

Thesis Proposal

Presentation Transcript

Essentials of Thesis Proposal

Essentials of Thesis Proposal

Thesis Proposal Title

Developing Master’s Thesis Proposal

Thesis Proposal

Thesis Proposal

Thesis Proposal

THESIS PROPOSAL

WEB THESIS PROPOSAL

Thesis Lighting Design Proposal

Thesis Proposal

Thesis proposal

Thesis Proposal

Doctoral Thesis Proposal

PhD Thesis Proposal

Thesis Proposal

Coursework Master’s Thesis Proposal

THESIS RESEARCH &amp; PROPOSAL

Thesis proposal

Thesis proposal writing help

THESIS RESEARCH & PROPOSAL