Decision Abstention Reduces Errors A Decision Abstaining N-version Genetic Programming

Decision Abstention Reduces ErrorsA Decision Abstaining N-version Genetic Programming Kosuke Imamura Robert B. Heckendorn Terence Soule James A. Foster The Initiative for Bioinformatics and Evolutionary Studies at the University of Idaho; NIH NCRR grant 1P20RR016454-01; and NIH NCRR grant NIH NCRR 1P20RR016448-01; and NSF grant NSF EPS 809935.

What is the problem with GP? • Genetic Programming is unstable algorithm. • A trained individual produces faulty outputs. • Performance of equally fit individuals over the training data may widely vary on unseen data sets. Consequently, • Given multiple equally high fit individuals, selecting one for actual use is a gamble.

The paper in a nutshell • Reducing errors of GP, and • Reducing performance fluctuations of GP Questions addressed: • What is an optimal ensemble GP? • When should a decision be abstained? Proposal: an ensemble of GP(N-version Genetic Programming)with decision abstention

So, what is N-version Programming? • Correct output is: C • Incorrect output is:I programA: I C I C I C 3 faults programB: C C C I I C 2 faults programC: C I C C C I 2 faults Result C C C C I C 1 fail individual average faults=2.3 (fault masking)

Then, our task is … • Our task is to make sure that fault-masking occurs among individuals • Phenotypicdiversityis a necessary condition.(disregard genotypic diversity) • Phenotypic diversity must quantifiablybe defined.

What is the definition of diversity?A Probabilistic Approach • Individuals must be reasonably high fit.(avoids combination of low fit individuals) • Independent Faults must be observed (quantifiable, individual learning). • Example: if the fault rates of individuals are the same, then expected fault is under an area of a binomial probability density function

How do we find a probabilistically optimal ensemble? 1. Mass produce high fit individuals (we did it by an isolated island model on a cluster). 2. Combine individuals to form an ensemble. 3. Check if the error rate of the above ensemble is the expected error rate of independent faults. 4. If the error rate is close enough to the expected rate, then done. 5. Else form another ensemble and goto 2.

Contributions of NVGP • Defines the diversity in a quantifiablemanner at a phenotypic level. • Provides a theoretically-backed-up evolution stopping criteria (optimal ensemble). • The proposed diversity quantification metric is applicable to other training based algorithms such as Neural Networks

An Idea Behind Abstention • Why can’t a machine say, “I don’t know”? • With abstention, a machine outputs, “Yes”, “No”, and “Don’t know” on a binary decision problem.

NVGP Demonstration problem (A Classification Problem) • Ecoli DNA promoter region classification (a segment of DNA is a promoter region or not) • Implementation: - Linear Genome machines - Isolated island model - Inexpensive Beowulf cluster

Decision Abstention • A decision abstention occurs, when there is no decisive vote among the ensemble modules to make decision. • Unanimous vote is the most decisive • Tie vote is the least decisive • Needs ((N+1)/2 + h) votes h is an abstention threshold

Figure 1. Error rate distribution intervals of the single best versions and the corresponding N-voter NVGP ensemble at a 90% limit. Leftmost, middle, and rightmost bars are distribution of single-version, 15-voter, and 31 voter system respectively. Results summary in two slide Performance of NVGP alone

1.0 0.8 abstention rate 0.6 error rate 0.4 0.2 0.0 1 3 5 7 9 11 13 15 NVGP with decision abstention 50 % error reduction 0% error

Effect of decision abstention Correct:C Incorrect:I Abstention threshold=1 programA: C C I C I C 2 faults programB: C I C I C C 2 faults programC: C I C I C I 3 faults programD: I C C I C C 2 faults programE: C I C C C I 2 faults Majority vote: C I C I C C 2 fail Abstention : C ? C ? C ? 0 fail

Trade-off between error reduction and abstention rate Adjusted Errors: Q= Ea + N, Ea the number of errors with abstention, N is the number of don’t know outputs, is a penalty weight. Trade-off: Q= E0(E0 number of errors by simple majority) (=0.5) abstention threshold test1 test2 test3 test4 test5 0 6.7 8.0 10.1 7.7 6.8 1 7.2 8.3 10.0 7.9 7.1 2 8.5 9.2 9.8 8.4 7.8 3 10.5 10.5 10.1 9.5 9.3

Conclusion • Abstention avoids random guesses. • High accuracy can be obtained at high abstention rate (Too much abstention makes the system of little use). • Abstention potentially indicates that the training set was not appropriate for particular instances. • For safety critical applications, a smaller value would be appropriate for the trade-off analysis. That is, do not penalize heavily when an ensemble is trying to avoid a random guess.

Future Research • Embed individual confidence • Thus, abstention occurs at both individual and ensemble bases.

Decision Abstention Reduces Errors A Decision Abstaining N-version Genetic Programming