Reliability and Information Gain

Reliability and Information Gain Ida Sprinkhuizen-Kuyper Evgueni Smirnov Georgi Nalbantov (UM/EUR)

Outline • Reliability vs. accuracy • Version Spaces • Support Vector Machines • Version Space Support Vector Machines • Beating existing algorithms • Conclusions

Reliability vs. accuracy • Accuracy is a global measure • It gives no information about individual instances • In practice we need reliability of the instance (patient, diagnosis, …) in order to take an acceptable decision

Version Spaces • A version space is the set of all hypotheses consistent with the training set • Strong point: Unanimous voting results in high reliability • Problem: Noise can result in version space collapse

Support Vector Machines • SVM’s and kernel methods make a trade-off between accuracy on the training set and complexity of hyperplane generated by the kernel used • They try to find a hyperplane with margin as large as possible and error term as small as possible

VSM (2) • Strong point: Natural way to handle noise • Problem: How to measure reliability of an individual instance

VSSVM • Combine the best of the two worlds! • First attempt: Version Space Support Vector Machines • Implement the concept of unanimous voting by using SVM’s:An instance is positive iff no SVM hyperplane exists that thinks it is negative

VSSVM (2) • Find kernel (RBF or polynomial) and parameters (C: cost of error; G for RBF or E for Pol) such that the training set is separated by the corresponding SVM • Classification:Add instance as positive (negative). If the new set is not separable, the instance is negative (positive). If the set is separable in both cases, the instance is not classified.

VSSVM (3) • Results with Leave-one-Out:

Beating Existing Algorithms • VSSVM give complete reliability on the coverage c • Consider an arbitrary algorithm with accuracy a (e.g. the best algorithm until now) • How can we beat that?

Beating Existing Algorithms (2) • Answer: information gain! • Algorithm CovA uses • the algorithm Cov with coverage c and 100% accuracy on c and • the algorithm A with accuracy a • Theorem: the information gain of CovA is positive with respect to both the algorithms A and Cov!

Reliability gives Information Gain • Theorem:Let reliability information be given such that and let Eaand Er be the entropy given by the accuracy a and the reliabilities ri ,then the information gain IG = Ea – Eris positive.

Conclusions • Version spaces are powerful! • Implementation by SVM handles noisy training sets • Implementation by SVM and RBF kernels prevents Version Space collapse by noisy training instances • Unanimous voting results in reliability • Reliability results in information gain

Future Research • Extension to more than two classes • Extension to the nonseparable case • Reliability of training instances • …

Example

Reliability and Information Gain

Reliability and Information Gain

Presentation Transcript

Reliability and Validity

Assessing the Validity and Reliability of Online Medical Information

VALIDITY AND RELIABILITY

Alberta Reliability Standards Stakeholder Information Session

Information Gain

Maintenance and Reliability

Source for Information Gain Formula

Safety and Reliability

Information Gain, Decision Trees and Boosting

Thinking Critically about Information: Reliability

Experience and reliability.

Validity and Reliability

Validity and Reliability

How to gain information

Gain Better Performance & Reliability with PCB Membrane Keypads

Reliability and Validity

Preliminary Information for Reliability Discussion

Transactions and Reliability

Weight Gain Weight Gain Diet and Muscle Gain Workout Tips