1 / 21

Modeling and Detecting Anomalous Topic Access

Modeling and Detecting Anomalous Topic Access. Siddharth Gupta 1 , Casey Hanson 2 , Carl A Gunter 3 , Mario Frank 4 , David Liebovitz 4 , Bradley Malin 6

aqua
Télécharger la présentation

Modeling and Detecting Anomalous Topic Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling and Detecting Anomalous Topic Access Siddharth Gupta1, Casey Hanson2, Carl A Gunter3, Mario Frank4, David Liebovitz4, Bradley Malin6 1,2,3,4Department of Computer Science, 3,5Department of Medicine, 6Department of Biomedical Informatics1,2,3University of Illinois at Urbana-Champaign, 4University of California, Berkeley, 5Northwestern University, 6Vanderbilt University

  2. Outline of the talk • Motivation and Challenges • Our Contributions • Dataset Description • Random Topic Access (RTA) Model • Random Topic Access Detection (RTAD) Model • Evaluation and Results

  3. EMR Access Breach Reported on April 2013 • The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March 2009- October 2012). • Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud. • One of the offender was the insider in the hospital without prior. • How can we efficiently model and detect these types of attacks in the healthcare system.

  4. Motivation • Two broad classes of threats: • Inside Threats: the behaviors of hospital users (staff) that adversely affects the healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR. • Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records. • Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals. • Deterrent:The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them

  5. Classical Detection Methodologies • Build a classifier on labeled data to differentiate anomalous users from legitimate users. • Real healthcare data is not labeled. • Current methods use injection of synthetic anomalous users and evaluate on them.

  6. Random Object Access • In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset. • We call such a system, ROA (random object access). • The resulting user doesn’t appear to be a plausible attacker in the real hospital setting.

  7. Our Contributions Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes. User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials. Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns.

  8. Data Set Fig a) Summary Statistics for Audit Logs Fig b) Summary Statistics for Patient Records

  9. Random Topic Access (RTA) Model • Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users. • Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior. • Model users as samples from a Dirichlet distribution over topic multinomials.

  10. Latent Dirichlet Allocation (LDA) LDA

  11. Topic Distributions

  12. Topics Distributions Kidney Topic Neoplasm Topic Obstetric Topic Diagnosis Topics

  13. Characterizing Users

  14. Multidimensional Scaling: Patient Diagnosis

  15. RTA: Simulating Users • r ~ Dir() with n dimensions, where n is the number of topics. a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital. b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting c.) Indirect User: user type resembles an even blend of the topics of many specialized users

  16. Population Distribution A. Directed Users α = 0.1 α = 0.01 B. Purely Random Users C. Indirected Users α = 1 α = 100

  17. Role Distribution Purely Random Users Masquerading Users Anomalous Users Real Users Indirect Users NMH Resident Fellow CPOE

  18. Random Topic Access Detection (RTAD) • Random Topic Access Detection (RTAD):an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k-nearest neighbor k-NN detection scheme for classification. • Methodology • LDA: define patient topics, and user typing to represent users in the topic space. • RTA user injection: generate three types of anomalous users and insert into each role at a 5% mix rate. • Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous. • Evaluation Metric: best Area Under the Curve (AUC) for each , role combination.

  19. Results - I The best AUC across all evaluated dimensions is plotted for each role performing poor for .

  20. Results - II The best AUC across all evaluated dimensions is plotted for each role performing well or near average for .

  21. Sponsors: Thank You ! Contact: sid88in@gmail.com

More Related