220 likes | 317 Vues
Explore theory refinement processes, errors, solutions, and systems like EITHER, FORTE, and AUDREY II for improving data theories. Learn about correcting over-generalization and over-specialization to refine original theories accurately.
E N D
Theory Revision Chris Murphy
The Problem • Sometimes we: • Have theories for existing data that do not match new data • Do not want to repeat learning every time we update data • Believe that our rule learners could perform much better if given basic theories to build off of
Two Types of Errors in Theories • Over-generalization • Theory covers negative examples • Caused by incorrect rules in theory or by existing rules missing necessary constraints • Example: uncle(A,B) :- brother(A,C). • Solution: uncle(A,B) :- brother(A,C), parent(C,B).
Two Types of Errors in Theories • Over-specialization • Theory does not cover all positive examples • Caused by rules having additional, unnecessary constraints or missing rules in the theory that are necessary to proving some examples • Example: uncle(A,B) :- brother(A,C), mother(C,B). • Solution: Uncle(A,B) :- brother(A,C), parent(C,B).
What is Theory Refinement? • “…learning systems that have a goal of making small changes to an original theory to account for new data.” • Combination of two processes: • Using a background theory to improve rule effectiveness and adequacy on data • Using problem detection and correction processes to make small adjustments to said theories
Basic Issues Addressed • Is there an error in the existing theory? • What part of the theory is incorrect? • What correction needs to be made?
Theory Refinement Basics • System is given a beginning theory about domain • Can be incorrect or incomplete (and often is) • Well refined theory will: • Be accurate with new/updated data • Make as few changes as possible to original theory • Changes are monitored by a “Distance Metric” that keeps a count of every change made
The Distance Metric • Adds every addition, deletion, or replacement of clauses • Used to: • Measure syntactical corruptness of original theory • Determine how good a learning system is at replicating human created theories • Drawback is that it does not recognize equivalent literals such as less(X,Y). And greq(Y,X). • Table on the right shows examples of distance between theories, as well as its relationship to accuracy
Why Preserve the Original Theory? • If you understood the original theory, you’ll likely understand the new one • Similar theories will likely retain the ability to use abstract predicates from the original theory
Theory Refinement Systems • EITHER • FORTE • AUDREY II • KBANN • FOCL, KR-FOCL, A-EBL, AUDREY, and more
EITHER • Explanation-based and Inductive Theory Extension and Revision • First system with ability to fix over-generalizing and over-specialization • Able to correct multiple faults • Uses one or more failings at a time to learn one or more corrections to a theory • Able to correct intermediate points in theories • Uses positive and negative examples • Able to learn disjunctive rules • Specialization algorithm does not allow positives to be eliminated • Generalization algorithm does not allow negatives to be admitted
FORTE • Attempts to prove all positive and negative examples using the current theory • When errors are detected: • Identify all clauses that are candidates for revision • Determine whether clause needs to be specialized or generalized • Determine what operators to test for various revisions • Best revision is determined based on its accuracy when tested on complete training set • Process repeats until system perfectly classifies the training set or until FORTE finds that no revisions improve the accuracy of the theory
Specializing a Theory • Needs to happen when one or more negatives are covered • Ways to fix the problem: • Delete a clause: simple, just delete and retest • Add new antecedents to existing clause • More difficult • FORTE uses two methods... • Add one antecedent at a time, like FOIL, choosing the antecedent that provides the best info gain at any point • Relational Pathfinding – uses graph structures to find new relations in data
Generalizing a Theory • Need to generalize when positives are not covered • Ways FORTE generalizes: • Delete antecedents from an existing clause (either singly or in groups) • Add a new clause • Copy clause identified at the revision point • Purposely over-generalize • Send over-general rule to specialization algorithm • Use inverse relation operators “identification” and “absorption” • These use intermediate rules to provide more options for alternative definitions
AUDREY II • Runs in two main phases: • Initial domain theory is specialized to eliminate negative coverage • At each step, a best clause is chosen, it is specialized, and the process repeats • Best clause is the one that contributes the most negative examples being incorrectly classified and is required by the fewest number of positives • If best clause covers no positives, it is deleted, otherwise, literals are added in a FOIL-like manner to eliminate covered negatives
AUDREY II • Revised theory is generalized to cover all positives (without covering any negatives) • Uncovered positive example is randomly chosen, and theory is generalized to cover the example • Process repeats until all remaining positives are covered • If assumed literals can be removed without decreasing positive coverage, that is done • If not, AUDREY II tries replacing literals with new conjuction of literals (also uses FOIL-type process) • If deleting and replacement fail, system uses a FOIL-like method of determining entirely new clauses for proving the literal
KBANN • System that takes a domain theory of Prolog style clauses, and transforms it into knowledge-based neural network (KNN) • Uses the knowledge base (background theory) to determine topology and initial weights of KNN • Different units and links within KNN correspond to various components of the domain theory • Topologies of KNNs can be different than topologies that we have seen in neural networks
KBANN • KNNs are trained on example data, and rules are extracted using an N of M method (saves time) • Domain theories for KBANN need not contain all intermediate theories necessary to learn certain concepts • Adding hidden units along with units specified by the domain theory allows the network to induce necessary terms not stated in background info • Problems arise when interpreting intermediate rules learned from hidden nodes • Difficult to label them based on the inputs they resulted from • In one case, programmers labeled rules based on the section of info that they were attached to in that topology
System Comparison • AUDREY II is better than FOCL at theory revision, but it still has room for improvement • Its revised theories are closer to both original theory and human-created correct theory
System Comparison • AUDREY II is slightly more accurate than FORTE, and its revised theories are closer to the original and correct theories • KR-FOCL addresses some issues of other systems by allowing user to decide among changes that have the same accuracy
Applications of Theory Refinement • Used to identify different parts of both DNA and RNA sequences • Used to debug student written basic Prolog programs • Used to maintain working theories as new data is obtained