1 / 15

Reliability and Information Gain

Reliability and Information Gain. Ida Sprinkhuizen-Kuyper Evgueni Smirnov Georgi Nalbantov (UM/EUR). Outline. Reliability vs. accuracy Version Spaces Support Vector Machines Version Space Support Vector Machines Beating existing algorithms Conclusions. Reliability vs. accuracy.

thimba
Télécharger la présentation

Reliability and Information Gain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability and Information Gain Ida Sprinkhuizen-Kuyper Evgueni Smirnov Georgi Nalbantov (UM/EUR)

  2. Outline • Reliability vs. accuracy • Version Spaces • Support Vector Machines • Version Space Support Vector Machines • Beating existing algorithms • Conclusions

  3. Reliability vs. accuracy • Accuracy is a global measure • It gives no information about individual instances • In practice we need reliability of the instance (patient, diagnosis, …) in order to take an acceptable decision

  4. Version Spaces • A version space is the set of all hypotheses consistent with the training set • Strong point: Unanimous voting results in high reliability • Problem: Noise can result in version space collapse

  5. Support Vector Machines • SVM’s and kernel methods make a trade-off between accuracy on the training set and complexity of hyperplane generated by the kernel used • They try to find a hyperplane with margin as large as possible and error term as small as possible

  6. VSM (2) • Strong point: Natural way to handle noise • Problem: How to measure reliability of an individual instance

  7. VSSVM • Combine the best of the two worlds! • First attempt: Version Space Support Vector Machines • Implement the concept of unanimous voting by using SVM’s:An instance is positive iff no SVM hyperplane exists that thinks it is negative

  8. VSSVM (2) • Find kernel (RBF or polynomial) and parameters (C: cost of error; G for RBF or E for Pol) such that the training set is separated by the corresponding SVM • Classification:Add instance as positive (negative). If the new set is not separable, the instance is negative (positive). If the set is separable in both cases, the instance is not classified.

  9. VSSVM (3) • Results with Leave-one-Out:

  10. Beating Existing Algorithms • VSSVM give complete reliability on the coverage c • Consider an arbitrary algorithm with accuracy a (e.g. the best algorithm until now) • How can we beat that?

  11. Beating Existing Algorithms (2) • Answer: information gain! • Algorithm CovA uses • the algorithm Cov with coverage c and 100% accuracy on c and • the algorithm A with accuracy a • Theorem: the information gain of CovA is positive with respect to both the algorithms A and Cov!

  12. Reliability gives Information Gain • Theorem:Let reliability information be given such that and let Eaand Er be the entropy given by the accuracy a and the reliabilities ri ,then the information gain IG = Ea – Eris positive.

  13. Conclusions • Version spaces are powerful! • Implementation by SVM handles noisy training sets • Implementation by SVM and RBF kernels prevents Version Space collapse by noisy training instances • Unanimous voting results in reliability • Reliability results in information gain

  14. Future Research • Extension to more than two classes • Extension to the nonseparable case • Reliability of training instances • …

  15. Example

More Related