1 / 35

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques. Pradeep Mohan * Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava.

tacey
Télécharger la présentation

Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan* Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava *Contact: mohan@cs.umn.edu

  2. Biography • Education • Ph.D., Student, Department. of Computer Science and Engineering., University of Minnesota, MN, 2007 – Present. • B. E., Department. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. 2003-2007 • Major Projects during PhD • US DoJ/NIJ- Mapping and analysis for Public Safety • CrimeStat .NET Libaries 1.0 : Modularization of CrimeStat, a tool for the analysis of crime incidents. • Performance tuning of Spatial analysis routines in CrimeStat • CrimeStat 3.2a - 3.3: Addition of new modules for spatial analysis. • US DOD/ ERDC/ TEC – Cascade models for multi scale pattern discovery • Designed new interest measures and formulated pattern mining algorithms for identifying patterns from large crime report datasets. 1

  3. Thesis Related Publications Cascading spatio-temporal pattern discovery (Chapter 2) • P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery: A summary of results. In Proc. Of 10th SIAM International Conference on Data Mining 2010 (SDM 2010, Full paper acceptance rate 20%) • P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering (TKDE). (Accepted Regular Paper, In Press ~20% Acceptance Rate) Regional co-location pattern discovery (Chapter 3) • P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers, Z.Jiang, N.Wayant. A spatial neighborhood graph based approach to Regional co-location pattern discovery: summary of results. In Proc. Of 19th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%) Crime Pattern Analysis Application (Chapter 4) • S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou. Crime pattern analysis: A spatial frequent pattern mining approach. M. Leitner (Ed.), Crime modeling and mapping using Geospatial Technologies, Springer (Accepted with Revisions). 2

  4. Outline • Introduction • Motivation • Problem Statement • Our Approach • Future Work 4

  5. Motivation: Public Safety • Crime generators and attractors • Identifying events (e.g. Bar closing, football games) that lead to increased crime. Question: What / Where are the frequent crime generators ? • Identifying frequent crime hotspots • Law enforcement planning • Courtsey: www.startribune.com Predicting the next location of burglary. Question: Where are the crime hotspots ? • Predicting crime events • Predictive policing (e.g. Predict next location of offense, forecast crime levels around conventions etc.) Question: What are the crime levels 1 hour after a football game within a radius of 1 mile ? • Courtsey: https://www.llnl.gov/str/September02/Hall.html Other Applications: Epidemiology 5

  6. Scientific Domain: Environmental Criminology Routine activity theory and Crime Triangle Crime pattern theory Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepnum=8 Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16 • Crime Event: Motivated offender, vulnerable victim (available at an appropriate location and time), absence of a capable guardian. • Crime Generators : offenders and targets come together in time place, large gatherings (e.g. Bars, Football games) • Crime Attractors : places offering many criminal opportunities and offenders may relocate to these areas (e.g. drug areas) 6

  7. Outline • Introduction • Problem Statement • Spatio-temporal frequent pattern mining problem • Challenges • Our Approach • Future Work 7

  8. Spatio-temporal frequent pattern mining problem • Given: • Spatial / Spatio-temporal framework. • Crime Reports with type, location and / or time. • Spatial Features of interest (e.g. Bars). • Interest measure threshold (Pθ) • Spatial / Spatio-temporal neighbor relation. • Find: • Frequent patterns with interestingness >= Pθ • Objective: • Minimize computation costs. • Constraints : • Correctness and Completeness. • Statistical Interpretation (i.e. account for autocorrelation or heterogeneity) 8

  9. Cascading ST Patterns(Inputs: Spatial, Temporal Neighborhood - 0.5 miles, 20 mins, Threshold - 0.5) Illustration: Output Time T1 Time T2 > T1 Time T3>T2 Aggregate(T1,T2,T3) CSTP: P1 C B A Bar Closing(B) Assault(A) Drunk Driving (C) Regional Co-location patterns (Inputs: Spatial Neighborhood – 1 mile, Threshold- 0.25) a 9

  10. Challenges Time partitioning misses relationships Time T3>T2 Time T1 Time T2 > T1 • Spatio-temporal Semantics • Continuity of space / time • Partial order • Conflicting Requirements • Statistical Interpretation • Computational Scalability • Computational Cost • Exponential set of Candidate patterns {Null} C A B B C A C C C B A B A A B C B A C B C A C A A B C B B A A A B B ………. ………. C B C C A B B A C C A A.4 A.2 Space partitioning misses relationships Aggregate(T1,T2,T3) B.2 A.3 A.4 A.2 C.2 C.4 B.1 C.3 C.1 a A.5 A.1 B.2 # Patterns = Exponential (# event types) C.4 A.3 C.1 C.2 B.1 A.5 A.1 C.3 10

  11. Our Contributions • New Spatio-temporal frequent pattern families. • Ex: Cascading ST Patterns and Regional Co-location patterns. • Novel interest measures guarantee statistical interpretation and computable in polynomial time. • Scalable algorithms based on properties of spatio-temporal data and interest measures. • Experimental evaluation using synthetic and real crime datasets. 11

  12. Outline • Introduction • Problem Statement • Our Approach • Big Picture • Cascading Spatio-temporal pattern discovery • Other Frequent Pattern Families • Future Work 12

  13. Cascading ST pattern (CSTP) Time T1 Time T2 > T1 Time T3>T2 Aggregate(T1,T2,T3) CSTP: P1 C B A Bar Closing(B) Drunk Driving (C) Assault(A) • Input: Crime reports with location and time. a • Output: CSTP • Partially ordered subsets of ST event types. • Located together in space. • Occur in stages over time. 14

  14. Related Pattern Semantics:ST Data mining Spatio-temporal frequent patterns Others Partially Ordered Unordered (ST Co-occurrence) Totally Ordered (ST Sequences) Our Work (Cascading ST patterns ) • ST Co-occurrence [Celik et al. 2008, Cao et al. 2006] • Designed for moving object datasets by treating trajectories as location time series • Performs partitioning over space and time. • ST Sequence [Huang et al. 2008, Cao et al. 2005 ] • Totally ordered patterns modeled as a chain. • Does not account for multiply connected patterns(e.g. nonlinear) • Misses non-linear semantics. • No ST statistical interpretation. 16 15

  15. C.2 Interpretation Model: Directed Neighbor Graph (DNG) CSTP: P1 A.1 A.4 A.2 • Nodes: Individual Events • Directed Edge (N1  N2) iff • Neighbor( N1, N2) • and • After(N2, N1) C B.1 B.2 A.3 C.2 C.3 A.3 B.1 B A C.1 C.3 C.4 C.4 A.1 C.1 B.2 A.5 A.2 A.4 TimeT1 TimeT2 TimeT3 A.5 Bar Closing(B) Assault(A) Drunk Driving (C) 17

  16. C.2 Statistical Foundation: Interest Measures CSTP: P1 • Instances of CSTP P1 : (BA, BC, AC) are • (B1A1, B1C1, A1C1) • (B1A3, B1C2, A3C2) • ? ?(B1A1; A1 C2; B1  C2) • Cascade Participation Ratio : CPR (CSTP, M) : • Conditional Probability of an instance of CSTP in neighborhood, given an instance of event-type M • Examples • Cascade Participation Index: CPI(CSTP) • Min ( CPR(CSTP, M) ) over all M in CSTP • Example: A.1 C B.1 C.3 A.3 B A C.4 C.1 B.2 A.5 A.2 A.4 18

  17. Analytical Evaluation:Statistical Interpretation Spatial Statistics: ST K-Function (Diggle et al. 1995) • Cascade Participation Index (CPI) is an upper bound to the ST K-Function per unit volume. Example: A.1 A.1 A.1 B.1 B.1 B.1 A.3 A.3 A.3 A.2 A.2 A.2 B.2 B.2 B.2 20

  18. Comparison with Related Interest Measures C.2 CSTP: P1 A.1 C B.1 C.3 A.3 B A C.4 C.1 B.2 A.5 A.2 A.4 19

  19. Computational Structure: CSTP Miner Algorithm • Basic Idea • Initialization • for k in (1,2…3..K-1) and prevalent CSTP found do • Generate size k candidates. • Compute CSTP instances / Materialize part of DNG • Calculate interest measure and select prevalent CSTPs. • end • Item sets in Association rule mining • Chemical compounds/sub graphs in graph mining. • Directed acyclic graph in CSTP mining Not part of a conventional apriori setting 21

  20. CSTP Miner Algorithm: Illustration • CPI Threshold = 0.33 {Null} C.2 0 0.4 0.8 A B B B B A A A A B B A C C 0.75 B C A C A 0.2 0 B A.1 B.1 C C C C C.3 A.3 0.4 0.4 0.8 C.4 C.1 A.5 B.2 A.2 0.4 A.4 • Spatio-temporal join 22

  21. Computational Structure: CSTP Miner Algorithm Fixed Parameters: Spatial neighborhood = 0.62 miles and temporal neighborhood = 1hr, CPI threshold = 0.0055 • Key Bottlenecks • Interest measure evaluation • Exponential pattern space • Computational Strategies • Reduce irrelevant interest measure evaluation • Filtering strategies • Compute interest measure efficiently • Time Ordered Nested Loop Strategy • Space-Time Partition Join Strategy 23

  22. CSTP Miner Algorithm: Interest Measure Evaluation • ST Join Strategies: Perform each interest measure computation efficiently • Time Ordered Nested Loop (TONL) Strategy • Space-Time Partitioning (STP) Strategy = volume of ST neighborhood C.2 A.1 B.1 C.3 A.3 ST join by plane sweep Space C.4 C.1 A.5 A.2 B.2 A.4 Time # Edges = 13 24

  23. CSTP Miner Algorithm: Filtering Strategies • Multi resolution ST Filter: Summarizing on a coarser neighborhood yields compression in most cases. CPI Threshold = 0.33 Space Actual Relation Coarse Relation Time 27

  24. Experimental Evaluation :Experiment Setup Goals 1. Compare different design decisions of the CSTPM Algorithm - Performance: Run-time 2. Test effect of parameters on performance: - Number of event types, Dataset Size, Clumpiness Degree Experiment Platform: CPU: 3.2GHz, RAM: 32GB, OS: Linux, Matlab 7.9 28

  25. Experimental Evaluation :Datasets Lincoln, NE Dataset Real Data • Data size: 5 datasets • Drawn by increments of 2 months • 5000- 33000 instances • Event types: • Drawn by increments of 5 event types • 5 – 25 event types. Synthetic Data • Data size: 5 datasets • 5000- 26000 instances • Event types: • 5 – 25 event types. • Clumpiness Degree: • 5- 25 instances per event type per cell. 29

  26. Experimental Evaluation:Join strategy performance Question: What is the effect of dataset size on performance of join strategies? Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25) Trends:ST Partitioning improves performance by a factor of 5-10 on synthetic data and by a factor of 3 on real data. 30

  27. Lincoln, NE crime dataset: Case study • Is bar closing a generator for crime related CSTP ? Bar locations in Lincoln, NE Questions • Is bar closing a crime generator ? • Are there other generators (e.g. Saturday Nights )? • Observation: Crime peaks around bar-closing! K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10-7 , K =0.41) 35

  28. Lincoln, NE crime dataset: Case study 36

  29. Outline • Introduction • Problem Statement • Our Approach • Big Picture • Cascading Spatio-temporal pattern discovery • Other Frequent Pattern Families • Future Work 38

  30. Regional co-location patterns (RCP) • Input: Spatial Features, Crime Reports. • Output: RCP (e.g. < (Bar, Assaults), Downtown >) • Subsets of spatial features. • Frequently located in certain regions of a study area. 39

  31. Statistical Foundation: Accounting for Heterogenity • Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality. Regional Participation Ratio Example Regional Participation index Example Quantifies the local fraction participating in a relationship. 40

  32. Conclusions • Proposed SFPM techniques (e.g., Cascading ST Patterns and Regional Co-location patterns) honor ST Semantics (e.g., Partial order, Continuity). • Interest measures achieve a balance between statistical interpretation and computational scalability. • Algorithmic strategies exploiting properties of ST data (e.g., multiresolution filter) and properties of interest measures enhance computational savings. 42

  33. Future Work – Short and Medium Term X: Unexplored 43

  34. Future Work – Long Term • Exploring interpretation of discovered patterns by law enforcement. • ST Predictive analytics, Predictive models based on SFPM and Predictive policing. • Towards Geo-social analytics for policing (e.g. Criminal Flash mobs, gangs, groups of offenders committing crimes) • New ST frequent pattern mining algorithms based on depth first graph enumeration. • ST frequent pattern mining techniques that account for patron demographic levels. • Explore evaluation of choloropeth maps via ST frequent pattern mining. 43

  35. Acknowledgment • Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities. • This Work was supported by Grants from U.S.ARMY, NGA and U.S. DOJ. • Advisor: Prof. Shashi Shekhar, Computer Science, University of Minnesota. • Thesis committee. • U.S. DOJ – National Institute of Justice: Mr. Ronald E. Wilson (Program Manager, Mapping and Analysis for Public Safety) , Dr. Ned Levine (Ned Levine and Associates, CrimeStat Program) • U.S. Army – Topographic Engineering Center: Dr. J.A.Shine (Mathematician and Statistician, Geospatial Research and Engineering Division ) and Dr. J.P. Rogers (Additional Director, Topographic Engineering Center) • Mr. Tom Casady, Public Safety Director (Formerly Lincoln Police Chief), Lincoln, NE, USA Thank You for your Questions, Comments and Attention! 44

More Related