Modeling and Detecting Anomalous Topic Access

Modeling and Detecting Anomalous Topic Access Siddharth Gupta1, Casey Hanson2, Carl A Gunter3, Mario Frank4, David Liebovitz4, Bradley Malin6 1,2,3,4Department of Computer Science, 3,5Department of Medicine, 6Department of Biomedical Informatics1,2,3University of Illinois at Urbana-Champaign, 4University of California, Berkeley, 5Northwestern University, 6Vanderbilt University

Outline of the talk • Motivation and Challenges • Our Contributions • Dataset Description • Random Topic Access (RTA) Model • Random Topic Access Detection (RTAD) Model • Evaluation and Results

EMR Access Breach Reported on April 2013 • The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March 2009- October 2012). • Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud. • One of the offender was the insider in the hospital without prior. • How can we efficiently model and detect these types of attacks in the healthcare system.

Motivation • Two broad classes of threats: • Inside Threats: the behaviors of hospital users (staff) that adversely affects the healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR. • Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records. • Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals. • Deterrent:The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them

Classical Detection Methodologies • Build a classifier on labeled data to differentiate anomalous users from legitimate users. • Real healthcare data is not labeled. • Current methods use injection of synthetic anomalous users and evaluate on them.

Random Object Access • In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset. • We call such a system, ROA (random object access). • The resulting user doesn’t appear to be a plausible attacker in the real hospital setting.

Our Contributions Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes. User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials. Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns.

Data Set Fig a) Summary Statistics for Audit Logs Fig b) Summary Statistics for Patient Records

Random Topic Access (RTA) Model • Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users. • Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior. • Model users as samples from a Dirichlet distribution over topic multinomials.

Latent Dirichlet Allocation (LDA) LDA

Topic Distributions

Topics Distributions Kidney Topic Neoplasm Topic Obstetric Topic Diagnosis Topics

Characterizing Users

Multidimensional Scaling: Patient Diagnosis

RTA: Simulating Users • r ~ Dir() with n dimensions, where n is the number of topics. a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital. b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting c.) Indirect User: user type resembles an even blend of the topics of many specialized users

Population Distribution A. Directed Users α = 0.1 α = 0.01 B. Purely Random Users C. Indirected Users α = 1 α = 100

Role Distribution Purely Random Users Masquerading Users Anomalous Users Real Users Indirect Users NMH Resident Fellow CPOE

Random Topic Access Detection (RTAD) • Random Topic Access Detection (RTAD):an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k-nearest neighbor k-NN detection scheme for classification. • Methodology • LDA: define patient topics, and user typing to represent users in the topic space. • RTA user injection: generate three types of anomalous users and insert into each role at a 5% mix rate. • Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous. • Evaluation Metric: best Area Under the Curve (AUC) for each , role combination.

Results - I The best AUC across all evaluated dimensions is plotted for each role performing poor for .

Results - II The best AUC across all evaluated dimensions is plotted for each role performing well or near average for .

Sponsors: Thank You ! Contact: sid88in@gmail.com

Modeling and Detecting Anomalous Topic Access

Modeling and Detecting Anomalous Topic Access

Presentation Transcript

Access Point One: Purpose and Modeling

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses

Access Control Policies: Modeling and Validation

Text Mining and Topic Modeling

Detecting Spoofing and Anomalous Traffic in Wireless Networks via Forge-Resistant Relationships

Anomalous

anomalous

Communication Topic 10: Detecting Sound

Suspicious and Anomalous Behavior

ANOMALOUS

Key Debate Topic: Access and Equity

Detecting Botnets With Anomalous DNS Traffic

Access Control Policies: Modeling and Validation

Topic modeling

Topic 7: GIS Models and Modeling

Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)

Topic #9 – Linear Modeling ‏

Topic Modeling using Semantic and Network structure

Hierarchical Linear Modeling for Detecting Cheating and Aberrance

TOPIC : Memory modeling

Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)