3D Geological Modeling: Solving as Classification Problem with Support Vector Machine

Earth Sciences Sector 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine Groundwater A. Smirnoff, E. Boisvert, S. J.Paradis

Objectives • Find an algorithm for automating the 3D modeling procedure from sparse data • Test the algorithm on available data • Make conclusions about its applicability

Possible Input Data • Well data • Surface geology maps • Cross-section data • Can be used alone or in combination

Algorithms Currently in Use and Their Limitations • Voronoi diagrams • Potential fields • Normally require too much information and/or additional procedures • What if we only have a few sections to start with?

Given a set of points in 3D with known geological information • For the rest of points in reconstruction space, information is not available • Based on known points, classify the rest into known number of units (classes) Unit 1 Reconstruction Space Unit 2 3D Reconstruction as a Classification Problem

Available Classification Methods • Bayesian classification • a priory knowledge of probabilities • Nearest-Neighbor classifiers • extremely sensitive to parameter choice and scaling • Decision trees • not flexible with many samples • Neural networks • slow and difficult to use • Support Vector Machine (SVM) • relatively new method • becoming more and more popular

SVM Algorithm • Input: Take a set of training samples with known features and classes • Model: Build a model (boundary) separating the training samples • Output: Classify any new (unclassified) or test samples using the model

1. Original 2. Training set 3. Output Z Y X Binary Reconstruction

Input Data: • Total points: 389235 • Training Set: 17452 (4.48%) - 2 units on 11 sections • Points to be classified: 371783 Results: • Total classified: 371783 • Success: 361909 (97.34%) • Failure: 9874 (2.66%) Input Data and Results

Training Sections Section5 Section10 Section11 Section4 Section2 Section3 Section6 Section7 Section8 Section9 Section 1 100 90 80 70 60 Success Rate (%) 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 All Model Sections Detailed Analysis (Class 1)

Peeking into the SVM Black Box • A simple case: two classes and two features (e.g., length of petal and sepal in flowers) • Training Set: known data vectors : xi, wherei = 1, …., l

Linearly separable data • Which linear separator is the best? • V.Vapnik (1995) suggested maximum margin Maximum Margin 1 1< 3< 2 Class: +1  10 10 2 3 1/2 9 9 8 8 7 7 Support Vectors 6 6 Feature 2 Feature 2 5 5 1/2 Class: -1 4 4 1/2 3 3 2 2 1 1 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 Feature 1 Feature 1 Maximum Margin Separating Hyperplane (MMSH)

If wTx+b = 0 is separating hyperplane: • Decision function: f(x) = sign(wTx+b), x is a test sample Class: +1 wT xi + b >0 HMSH xi xi 10 9 xi x3 8 xi 7 xi wT x + b = 0 Class: -1 xi 6 x1 Feature 2 wT xi + b < 0 5 xi 4 3 x2 xl 2 xi 1 xi xi 1 2 3 4 5 6 7 8 9 10 Feature 1 Hard Margin Classification-HMSH

For wTx+b = 0 consider a pipe defined by: • Then: or yi (wTxi+b) 1 • Maximize distance between: wTx+b 1 Maximize Distance Class: +1 wT xi + b >+1 xi xi 10 x3 9 xi 8 xi wT x + b = +1 7 Class: -1 wT x + b = -1 xi xi 6 wT xi + b < -1 Feature 2 5 wT x + b = 0 xi 4 x1 3 x2 xl 2 xi 1 xi xi 1 2 3 4 5 6 7 8 9 10 Feature 1 How to Maximize the Margin?

Distance between : wTx+b 1 is given as: • Then: xi xi 10 x3 • Or: • Quadratic optimization problem • Solution exists 9 xi 8 xi wT x + b= +1 7 wT x + b= -1 xi xi 6 Feature 2 5 wT x + b = 0 xi 4 x1 3 x2 xl 2 xi 1 xi xi 1 2 3 4 5 6 7 8 9 10 Feature 1 Problem Formulation

Data are noisy, not easily separable • Allow classification errors by introducing slack variable: • Support vectors: ones with distance ½  from SMSH + misclassified ones xi  xi  xi 10 x3 9 Support Vectors xi 8 xi 7 • Thus: • Where C – cost or penalty parameter xi HMSH xi 6 xi x1 Feature 2 5 SMSH xi 4 3 x2 xl 2 xi 1 xi xi 1 2 3 4 5 6 7 8 9 10 Feature 1 Soft Margin Classification - SMSH

Data are separable or separable with some noise – no problem (HMSH or SMSH) • What if data is not linearly separable in data space? • Find a function to re-map data into a higher-dimensional space (feature space) where it is separable e.g., xR1->  R2 f(x) Class: +1 Class: -1 x 0 x 0 Non-Separable Data

f(x) Data (Input) Space R1 Class: +1 1. Problem 3. Solution x x 0 Class: +1 0 Class: -1 Feature Space R2:  (x) = (x, x2) x2 Class: +1 2. Solution x 0 Class: -1 Non-Linear SVM

How to find the function in more complicated situation? • We do not need to explicitly know the function! • Formulation and solution of optimization problem use only inner products of vectors • Kernel function  inner product of some function in its feature space • Thus the final decision function is: f(x) = ΣTx + b (i weighing factors i >0 only for support vectors) K(xi,x)= φ(xi)Tφ(x) f(x) = Σαiyi K(xi,x) + b Kernel Trick

Known kernel functions: linear, polynomial, radial-basis function (RBF), etc. • The RBF is the most general form of kernel: • The decision function then: • The only adjustable kernel parameter is  K(xi,xj) = f(x) = Σαiyi + b Kernel Functions

Using geological units as classes • Using X, Y, Z coordinates as features • Using non-linear SM SVMwith RBF kernel • Using LIBSVM from National University of Taiwan • Only two parameters to control: C and  • Selecting parameters is a black art, done on try and see basis • Simple grid search with validation is recommended e.g., C=2-8, 2-7, …, 215;  = 2-15, 2-14, …, 212 How Did We Use SVM?

All Experiments: lg () 9 8 7 6 5 4 lg () 12 11 10 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 lg (C) Proposed Range: C=2-3- 215;  = 24- 29 - Best Binary Result (97.79% at C=21, =26) - Previous Example (97.34%) -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 lg (C) C and Grid Search

Low C, High  High C, High  Avg C, Avg  Low C, Low  High C, Low  Influence of C and 

1 - Organic 2 - Littoral 3 - Clay 4 - Esker 5 - Till 6 - Bedrock Multi-Class Classification 1. Original 2. Training set 3. Output Z X Y

Data Statistics and Results

100 Bedrock 90 80 Esker 70 Clay 60 50 Success(%) Till 40 Littoral 30 20 Organic 10 0 0.01 0.1 1 1 10 TrainingPointsperClass(%) Success per Class

Area 1.00E+09 Bedrock Clay Till Reconstructed 1.00E+08 Esker Organic Littoral 1.00E+07 1.00E+07 1.00E+08 1.00E+09 Volume Original 1.00E+11 Bedrock 1.00E+10 Till Clay Reconstructed 1.00E+09 Esker Littoral 1.00E+08 Organic 1.00E+07 1.00E+07 1.00E+08 1.00E+09 1.00E+10 1.00E+11 Original Area and Volume Comparison

Conclusions • The SVM can successfully be used in single and multi-unit 3D geological reconstructions: • Reasonable results are obtained with just a few training sections • Parameters must be picked from the range: C=2-3- 215;  = 24- 29 • Low C values - less details, more generalized model • High C values - more details, less generalized model • Additional Experiments Demonstrated: • Number of units can vary (all units must be represented in training set) • Sections can be arbitrarily located • Other types of information (well data, surface geology maps) can be used

References • Abe, S., 2005. Support Vector Machines for Pattern Classification. Springer-Verlag, London, 343 pp. • Cristianini, N., Shawe-Taylor, J., 2000. Support Vector Machines. Cambridge University Press, 189 pp. • Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 311 pp.

3D Geological Modeling: Solving as Classification Problem with Support Vector Machine