1 / 19

Spatiotemporal Stream Mining Applied to Seismic+ Data

Spatiotemporal Stream Mining Applied to Seismic+ Data. Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA mhd@engr.smu.edu. Outline. Work in Progress! Input/Feedback Needed!. CTBTO Data CTBTO Modeling Requirements EMM. CTBTO Data.

stillman
Télécharger la présentation

Spatiotemporal Stream Mining Applied to Seismic+ Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA mhd@engr.smu.edu CTBTO Data Mining/Data Fusion Workshop

  2. Outline Work in Progress! Input/Feedback Needed! CTBTO Data CTBTO Modeling Requirements EMM CTBTO Data Mining/Data Fusion Workshop

  3. CTBTO Data • Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide • Spatial (source and sensor) • Temporal • STREAM Data As a Data Miner I must first understand your DATA CTBTO Data Mining/Data Fusion Workshop

  4. From Sensors to Streams Stream Data - Data captured and sent by a set of sensors Real-time sequence of encoded signals which contain desired information. Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Stream data is infinite - the data keeps coming.

  5. CTBTO & Data Mining Data Mining techniques must be defined based on your data and applications Can’t use predefined fixed models and prediction/classification techniques. Must not redo massive amounts of algorithms already created. CTBTO Data Mining/Data Fusion Workshop

  6. CTBTO + DM Requirements • Model: • Handle different data types (seismic, hydroacoustic, etc.) • Spatial + Temporal (Spatiotemporal) • Hierarchical • Scalable • Online • Dynamic • Anomaly Detection: • Not just specific wave type or data values • Relationships between arrival of waves/data • Combined values of data from all sensors CTBTO Data Mining/Data Fusion Workshop

  7. EMM (Extensible Markov Model) • Time Varying Discrete First Order Markov Model • Nodes are clusters of real world states. • Overlap of learning and validation phases • Learning: • Transition probabilities between nodes • Node labels (centroid or medoidof cluster) • Nodes are added and removed as data arrives • Applications: prediction, anomaly detection CTBTO Data Mining/Data Fusion Workshop

  8. Research Objectives • Apply proven spatiotemporal modeling technique to seismic data • Construct EMM to model sensor data • Local EMM at location or area • Hierarchical EMM to summarize lower level models • Represent all data in one vector of values • EMM learns normal behavior • Develop new similarity metrics to include all sensor data types (Fusion) • Apply anomaly detection algorithms CTBTO Data Mining/Data Fusion Workshop

  9. 2/3 1/2 N3 2/3 N1 2/3 1/2 N3 1/3 1/1 N2 N1 N1 1/2 2/3 1/3 1/1 N2 1/3 N2 N1 1/3 N2 N3 1/1 1 N1 1/1 2/2 1/1 N1 EMM Creation/Learning <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>

  10. Input Data Representation Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Need not come from same sensor types. Similarity/distance between vectors used to determine creation of new nodes in EMM. CTBTO Data Mining/Data Fusion Workshop

  11. Anomaly Detection with EMM Detected unusual weekend traffic pattern • Objective: Detect rare (unusual, surprising) events • Advantages: • Dynamically learns what is normal • Based on this learning, can predict what is not normal • Do not have to a priori indicate normal behavior • Applications: • Network Intrusion • Data: IP traffic data, Automobile traffic data • Seismic: • Unusual Seismic Events • Automatically Filter out normal events Weekdays Weekend Minnesota DOT Traffic Data

  12. EMM with Seismic Data Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Initial Testing with RDG data Use amplitude, period, and wave type CTBTO Data Mining/Data Fusion Workshop

  13. New Distance Measure • Data = <amplitude, period, wave type> • Different wave type = 100% difference • For events of same wave type: • 50% weight given to the difference in amplitude. • 50% weight given to the difference in period. • If the distance is greater than the threshold, a state change is required. • amplitude = | amplitudenew – amplitudeaverage | / amplitudeaverage • period = | periodnew – periodaverage | / periodaverage CTBTO Data Mining/Data Fusion Workshop

  14. EMM with Seismic Data States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively. CTBTO Data Mining/Data Fusion Workshop

  15. Preliminary Testing RDG data February 1, 1981 – 6 earthquakes Find transition times close to known earthquakes 9 total nodes 652 total transitions Found all quakes CTBTO Data Mining/Data Fusion Workshop

  16. . EMM Nodes CTBTO Data Mining/Data Fusion Workshop

  17. Hierarchical EMM CTBTO Data Mining/Data Fusion Workshop

  18. Now What? DATA NEEDED Interest DM COMMUNITY NOISE MAY NOT BE BAD KDD CUP CTBTO Data Mining/Data Fusion Workshop

  19. References Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp 258-265. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. CTBTO Data Mining/Data Fusion Workshop

More Related