1 / 72

Data Mining, Decision Trees and Earthquake Prediction

CS157B Lecture 4. Data Mining, Decision Trees and Earthquake Prediction. Professor Sin-Min Lee. What is Data Mining?. Process of automatically finding the relationships and patterns, and extracting the meaning of enormous amount of data. Also called “knowledge discovery”. Objective.

airlia
Télécharger la présentation

Data Mining, Decision Trees and Earthquake Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS157B Lecture 4 Data Mining, Decision Trees and Earthquake Prediction Professor Sin-Min Lee

  2. What is Data Mining? • Process of automatically finding the relationships and patterns, and extracting the meaning of enormous amount of data. • Also called “knowledge discovery”

  3. Objective • Extracting the hidden, or not easily recognizable knowledge out of the large data… Know the past • Predicting what is likely to happen if a particular type of event occurs … Predict the future

  4. Application • Marketing example • Sending direct mail to randomly chosen people • Database of recipients’ attribute data (e.g. gender, marital status, # of children, etc) is available • How can this company increase the response rate of direct mail?

  5. Application (Cont’d) • Figure out the pattern, relationship of attributes that those who responded has in common • Helps making decision of what kind of group of people the company should target

  6. Data mining helps analyzing large amount of data, and making decision…but how exactly does it work? • One method that is commonly used is decision tree

  7. Decision Tree • One of many methods to perform data mining - particularly classification • Divides the dataset into multiple groups by evaluating attributes • Decision tree can be explained a series of nested if-then-else statements. • The Decision Tree is one of the most popular classification algorithms in current use in Data Mining

  8. Decision Tree (Cont’d) • Each non-leaf node has a predicate associated, testing an attribute of data • Leaf node represents a class, or category • To classify a data, start from root node and traverse down the tree by testing predicates and taking branches

  9. Example of Decision Tree

  10. What is a Decision Tree? • 20-Questions Example • Progressive Yes-No Decisions Until an Answer is Obtained • 20-Questions Machine at Linens & Things • Key to the Phylum – classification tool • Carl Linnaeus, Swedish Botanist, 1730’s • Classifies known species: • Kingdoms, Phyla, Classes, Orders, Families, Genera, and Species

  11. What is a Decision Tree?

  12. What are Decision Trees Used For?

  13. Refund Refund Yes Yes No No NO NO MarSt MarSt Married Married Single, Divorced Single, Divorced TaxInc TaxInc NO NO < 80K < 80K >= 80K > 80K YES YES NO NO How to Use a Decision Tree Start from the root of tree. Test Data Deduction

  14. categorical categorical continuous class How to Make a Decision Tree Splitting Attributes Refund Yes No NO MarSt Induction Single, Divorced TaxInc NO < 80K > 80K YES NO Model: Decision Tree Training Data

  15. Hunt’s Algorithm • Let Dt be the set of training records that reach a node t • General Procedure: • If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt • If Dt is an empty set, then t is a leaf node labeled by the default class, yd • If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset. Dt ?

  16. Refund Refund Yes No Yes No Don’t Cheat Marital Status Don’t Cheat Marital Status Single, Divorced Refund Married Married Single, Divorced Yes No Don’t Cheat Taxable Income Cheat Don’t Cheat Don’t Cheat Don’t Cheat < 80K >= 80K Don’t Cheat Cheat Hunt’s Algorithm Don’t Cheat

  17. Measure of Purity: Gini • Gini Index for a given node t : (NOTE: p( j | t) is the relative frequency of class j at node t). • Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information • Minimum (0.0) when all records belong to one class, implying most interesting information

  18. Advantage of Decision Tree • simple to understand and interpret • require little data preparation • able to handle nominal and categorical data. • perform well with large data in a short time • the explanation for the condition is easily explained by boolean logic.

  19. Advantages of Decision Tree • Easy to visualize the process of classification • Can easily tell why the data is classified in a particular category - just trace the path to get to the leaf and it explains the reason • Simple, fast processing • Once the tree is made, just traverse down the tree to classify the data

  20. Decision Tree is for… • Classifying the dataset which • The predicates return discrete values • Does not have an attributes that all data has the same value

  21. CMT catalog: Shallow earthquakes, 1976-2005

  22. INDIAN PLATE MOVES NORTH COLLIDING WITH EURASIA Gordon & Stein, 1992

  23. COMPLEX PLATE BOUNDARY ZONE IN SOUTHEAST ASIA Northward motion of India deforms all of the region Many small plates (microplates) and blocks Molnar & Tapponier, 1977

  24. India subducts beneath Burma microplateat about 50 mm/yrEarthquakes occur at plate interface along the Sumatra arc (Sunda trench)These are spectacular & destructive results of many years of accumulated motion

  25. NOAA

  26. IN DEEP OCEAN tsunami has long wavelength, travels fast, small amplitude - doesn’t affect ships AS IT APPROACHES SHORE, it slows. Since energy is conserved, amplitude builds up - very damaging

  27. TSUNAMI WARNING Because seismic waves travel much faster (km/s) than tsunamis, rapid analysis of seismograms can identify earthquakes likely to cause major tsunamis and predict when waves will arrive Deep ocean buoys can measure wave heights, verify tsunami and reduce false alarms

  28. HOWEVER, HARD TO PREDICT EARTHQUAKES recurrence is highly variable Sieh et al., 1989 Extend earthquake history with geologic records -paleoseismology M>7 mean 132 yr s 105 yr Estimated probability in 30 yrs 7-51%

  29. EARTHQUAKE RECURRENCE AT SUBDUCTION ZONES IS COM PLICATED In many subduction zones, thrust earthquakes have patterns in space and time. Large earthquakes occurred in the Nankai trough area of Japan approximately every 125 years since 1498 with similar fault areas In some cases entire region seems to have slipped at once; in others slip was divided into several events over a few years. Repeatability suggests that a segment that has not slipped for some time is a gap due for an earthquake, but it’s hard to use this concept well because of variability GAP? NOTHING YET Ando, 1975

  30. SEPTEMBER 19, 1985 M8.1 A SUBDUCTION ZONE QUAKE ALTHOUGH LARGER THAN USUAL, THE EARTHQUAKE WAS NOT A “SURPRISE” A GOOD, MODERN BUILDING CODE HAD BEEN ADOPTED AND IMPLEMENTED 1985 MEXICO EARTHQUAKE

  31. EPICENTER LOCATED 240 KM FROM MEXICO CITY 400 BUILDINGS COLLAPSED IN OLD LAKE BED ZONE OF MEXICO CITY SOIL-STRUCTURE RESONANCE IN OLD LAKE BED ZONE WAS A MAJOR FACTOR 1985 MEXICO EARTHQUAKE

  32. 1985 MEXICO EARTHQUAKE: ESSENTIAL STRUCTURES--SCHOOLS

  33. 1985 MEXICO EARTHQUAKE: STEEL FRAME BUILDING

  34. 1985 MEXICO EARTHQUAKE: POUNDING

  35. 1985 MEXICO EARTHQUAKE: NUEVA LEON APARTMENT BUILDINGS

  36. 1985 MEXICO EARTHQUAKE: SEARCH AND RESCUE

  37. Definition • Characteristics • Project:California Earthquake Prediction)

  38. Characteristics (cont.)

  39. Characteristics (cont.) • 2. Locality: information transferred by a neuron is limited by its nearby neurons. • CAEP: short term earthquake prediction is highly influenced by it’s geologic figure locally.

  40. Characteristics (cont.) • 3. Weighted sum and activation function with nonlinearity: input signal is weighted at the synoptic connection by a connection weight. • CAEP: nearby location will be weighted with each activation function.

More Related