Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm

Experiments with MRDTL –A Multi-relational Decision Tree Learning Algorithm Experiments with MRDTL –A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory Department of Computer Science and Graduate Program in Bioinformatics and Computational Biology Iowa State University Ames, IA 50011, USA www.cs.iastate.edu/~honavar/aigroup/html * Support provided in part by National Science Foundation, Carver Foundation, and Pioneer Hi-Bred, Inc.

Motivation Importance of multi-relational learning: • Growth of data stored in MRDB • Techniques for learning unstructured data often extract the data into MRDB Expanding of the techniques for multi-relational learning: • Blockeel’s framework (ILP)(1998) • Getoor’s framework (first order extensions of PM)(2001) • Knobbe’s framework (MRDM)(1999)Problem: no experimental results available Goals • Perform experiments and evaluate performance of the Knobbe’s framework • Understand strengths and limits of the approach

Multi-Relational Learning Literature • Inductive Logic Programming • First order extensions of probabilistic models • Multi-Relational Data Mining • Propositionalization methods • PRMs extension for cumulative learning for learning and reasoning as agents interact with the world • Approaches for mining data in form of graph Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, 1997;Holder and Cook, 2000; Gonzalez et al., 2000

Problem Formulation Given: Data stored in relational data base Goal: Build decision tree for predicting target attribute in the target table Example of multi-relational database schema instances

… … … … sunny not sunny {d1, d2} {d3, d4} Temperature hot not hot No No {d3} {d4} Yes Propositional decision tree algorithm. Construction phase {d1, d2, d3, d4} Tree_induction(D: data) A = optimal_attribute(D) if stopping_criterion (D) return leaf(D) else Dleft := split(D, A) Dright := splitcomplement(D, A) childleft := Tree_induction(Dleft) childright := Tree_induction(Dright) return node(A, childleft, childright) Outlook

Grad. Student Staff Staff Grad. Student Staff Grad. Student Grad. Student GPA >2.0 GPA >2.0 MR setting. Splitting data with Selection Graphs Department Graduate Student Staff complement selection graphs

Grad.Student GPA >3.9 What is selection graph? • It corresponds to the subset of the instances from target table • Nodes correspond to the tables from the database • Edges correspond to the associations between tables • Open edge = “have at least one” • Closed edge = “have non of ” Grad.Student Department Staff Specialization=math

Automatic transforming selection graphs into SQL query Staff SelectdistinctT0.id FromStaff Where T0.position=Professor Position = Professor Select distinctT0.id FromStaff T0, Graduate_Student T1 Where T0.id=T1.Advisor Staff Grad. Student Generic query: select distinctT0.primary_key fromtable_list wherejoin_list andcondition_list Staff Grad. Student SelectdistinctT0.id FromStaff T0 Where T0.id not in ( Select T1. id From Graduate_Student T1) Grad. Student Select distinct T0. id From Staff T0, Graduate_Student T1 WhereT0.id=T1.Advisor T0. id not in ( Select T1. id From Graduate_Student T1 Where T1.GPA > 3.9) Staff Grad. Student GPA >3.9

Staff Staff Grad.Student Staff Grad.Student Grad.Student … … Staff Grad. Student Staff Grad.Student GPA >3.9 GPA >3.9 … … … … MR decision tree • Each node contains selection graph • Each children selection graph is a supergraphof the parent selection graph

Staff Staff Grad.Student Staff Grad.Student Grad.Student … … Staff Grad. Student Staff Grad.Student GPA >3.9 GPA >3.9 … … … … How to choose selection graphs in nodes? Problem: There are too many supergraph selection graphs to choose from in each node Solution: • start with initial selection graph • find greedy heuristic to choose supergraphselection graphs: refinements • use binary splits for simplicity • for each refinementget complement refinement • choose the best refinement basedon information gain criterion Problem: Somepotentiallygood refinementsmay give noimmediate benefit Solution: • look ahead capability

Department Grad.Student Staff Refinements of selection graph • add condition to the node - explore attribute information in the tables • add present edge and open node –explore relational properties between the tables Grad.Student Department Staff Specialization=math Grad.Student GPA >3.9

Grad.Student Department Staff Department Grad.Student Grad.Student GPA >3.9 Grad.Student Department Staff Staff Grad.Student Grad.Student Department Staff GPA >3.9 Grad.Student GPA >3.9 Refinements of selection graph refinement • add condition to the node • add present edge and open node Specialization=math Position = Professor Specialization=math complement refinement Specialization=math Position != Professor

Grad.Student Department Staff Department Grad.Student Grad.Student GPA >3.9 Grad.Student Department Staff Staff Grad.Student Grad.Student Department Staff GPA >3.9 Grad.Student GPA >3.9 Grad.Student GPA >2.0 Refinements of selection graph refinement GPA >2.0 • add condition to the node • add present edge and open node Specialization=math Specialization=math complement refinement Specialization=math

Department Grad.Student Grad.Student Department Staff Staff Grad.Student Grad.Student Department Staff GPA >3.9 Department Grad.Student #Students >200 GPA >3.9 Refinements of selection graph refinement Grad.Student Department Staff Specialization=math • add condition to the node • add present edge and open node #Students >200 Grad.Student GPA >3.9 Specialization=math complement refinement Specialization=math

Department Grad.Student Grad.Student Department Staff Department Staff Grad.Student Grad.Student Department Staff GPA >3.9 Grad.Student GPA >3.9 Department Refinements of selection graph refinement Grad.Student Department Staff Specialization=math • add condition to the node • add present edge and open node Grad.Student GPA >3.9 Specialization=math complement refinement Note: information gain = 0 Specialization=math

Department Staff Grad.Student Grad.Student Department Staff Staff Grad.Student Grad.Student Department Staff GPA >3.9 Grad.Student Staff GPA >3.9 Refinements of selection graph refinement Grad.Student Department Staff Specialization=math • add condition to the node • add present edge and open node Grad.Student GPA >3.9 Specialization=math complement refinement Specialization=math

Staff Department Grad.Student Grad.Student Department Staff Staff Grad.Student Grad.Student Department Staff Staff GPA >3.9 Grad.Student GPA >3.9 Refinements of selection graph refinement Grad.Student Department Staff • add condition to the node • add present edge and open node Specialization=math Grad.Student GPA >3.9 Specialization=math complement refinement Specialization=math

Grad.S Department Grad.Student Grad.Student Department Staff Staff Grad.Student Grad.Student Department Grad.S Staff GPA >3.9 Grad.Student GPA >3.9 Refinements of selection graph refinement Grad.Student Department Staff • add condition to the node • add present edge and open node Specialization=math Grad.Student GPA >3.9 Specialization=math complement refinement Specialization=math

Department Grad.Student Grad.Student Department Staff Department Staff Grad.Student GPA >3.9 Department Look ahead capability refinement Grad.Student Department Staff Specialization=math Grad.Student GPA >3.9 Specialization=math complement refinement Grad.Student Department Staff Specialization=math Grad.Student GPA >3.9

Department Grad.Student Grad.Student Department Staff Department Department Staff complement refinement Department Grad.Student Grad.Student Department GPA >3.9 Staff Grad.Student GPA >3.9 #Students > 200 refinement Look ahead capability Grad.Student Department Staff Specialization=math Grad.Student GPA >3.9 #Students > 200 Specialization=math Specialization=math

Grad.Student … … Staff Grad. Student Staff Grad.Student GPA >3.9 GPA >3.9 … … … … MR decision tree algorithm. Construction phase Staff for each non-leaf node: • consider all possible refinements and their complements of the node’s selection graph • choose the best ones based on information gain criterion • create children nodes Staff Grad.Student Staff Grad.Student

MR decision tree algorithm. Classification phase Staff for each leaf: • apply selection graph of the leaf to the test data • classify resulting instances with classification of the leaf Staff Grad.Student Staff Grad.Student Grad.Student … … Staff Grad. Student Staff Grad.Student GPA >3.9 GPA >3.9 … … … … Staff Grad. Student Staff Grad. Student Position =Professor GPA >3.9 …………….. GPA >3.9 Department Department 70-80k 80-100k Spec=math Spec=physics

Experimental results. Mutagenesis • Most widely DB used in ILP. • Describes molecules of certain nitro aromatic compounds. • Goal: predict their mutagenic activity (label attribute) – ability to cause DNA to mutate. High mutagenic activity can cause cancer. • Class distribution. • 5 levels of background knowledge: B0, B1, B2, B3, B4. They provide richer descriptions of the examples. The first three levels (B0, B1, B2) are used only.

Experimental results. Mutagenesis • Results of 10-fold cross-validation for regression friendly set. • Size of decision trees.

Experimental results. Mutagenesis • Results of leave-one-out cross-validation for regression unfriendly set. • Two recent approaches (Sebag and Rauveirol, 1997) and (Kramer and De Raedt, 2001) using B3 have achieved 93.6% and 94.7%, respectively for mutagenesis database.

Experimental results. KDD Cup 2001 • Consists of a variety of details about the various genes of one particular type of organism. • Genes code for proteins, and these proteins tend to localize in various parts of cells and interact with one another in order to perform crucial functions. • Task: Prediction of gene/protein localization (15 possible values) • Targettable: Gene • Target attribute: Localization • 862 training genes, 381 test genes. • Challenge: many attribute values are missing. • Approach: using a special value to encode a missing value.Result: accuracy of 50% Have to find good techniques for filling in missing values.

Experimental results. KDD Cup 2001 • Approach: Replacing missing values by the most common value of the attribute for the class.Results:- accuracy ofaround 85% with a decision tree of 367 nodes, with no limit in the number of times an association can be instantiated.- accuracy of80%, when limiting the number of times an association can be instantiated.- accuracy ofaround 75% is obtained when following associations only in the forward direction. This shows that providing reasonable guesses for missing values can significantly enhance the performance of MRDTL on real world data sets. In practice, since the class labels for test data are unknown, it is not possible to apply this method. • Approach: Extension of the Naïve Bayes algorithm for relational dataResult:-no improvement comparing to the first approach Have to incorporate handling missing values into decision tree algorithm

Experimental results. Adult database • Suitable for propositional learning. One table, 6 numerical attributes, 8 nominal attributes. • Information from 1994 census. • Task: determine whether a person makes over 50k a year. • Class distribution for adult database: • Result after removal of missing values and using original train/test split: 82.2%. • Filling missing values with Naïve Bayes approach yields 83% • C4.5 result: 84.46%

Summary • the algorithm is a promising alternative to existing algorithms, such as Progol, Foil, and Tilde • the running time is comparable with the best existing approaches • if equipped with principled approaches to handle missing values it is an effective algorithm for learning real-world relational data • the approach is an extension of propositional learning, and can be successfully applied for propositional learning • Questions: - why can’t we split the data based on the value of the attribute in arbitrary table right away? - is there less restrictive and more simple way of representing the splits of data than selection graphs? - the running time for computing the first nodes in decision tree is much less then for the rest of the nodes. Is it unavoidable? Can we implement the same idea more efficiently?

Future work • Incorporation of the more sophisticated techniques for handling missing values • Incorporating of more sophisticated pruning techniques or complexity regularizations • More extensive evaluation of MRDTL on real-world data sets • Development of ontology-guided multi-relational decision tree learning algotihms to generate classifiers at multiple levels of abstraction [Zhang et al., 2002] • Development of variants of MRDTL for classification tasks where the classes are not disjoint, based on the recently developed propositional decision tree counterparts of such algorithms [Caragea et al., 2002] • Development of variants of MRDTL that can learn from heterogeneous, distributed, autonomous data sources, based on recently developed techniques for distributed learning and ontology based data integration

Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm