1 / 21

Testing the Limits of a QSAR Model:

Testing the Limits of a QSAR Model: How many cases are actually needed to develop a reliable predictive model? C. Matthew Sundling, Curt M. Breneman, Mark J. Embrechts, Changjian Huang, Xiaohua Wu, N. Sukumar April 9th, 2008. Related RECCR Presentations.

colby-riley
Télécharger la présentation

Testing the Limits of a QSAR Model:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing the Limits of a QSAR Model: How many cases are actually needed to develop a reliable predictive model? C. Matthew Sundling, Curt M. Breneman, Mark J. Embrechts, Changjian Huang, Xiaohua Wu, N. Sukumar April 9th, 2008

  2. Related RECCR Presentations • Dr. Dominic Ryan - Stability of rank order predication models of top ten testing molecules QSAR model stability: How much information is in the data? (COMP, Morial Convention Center rm. 347, Wednesday, 3:05 pm) • Prof. Mark Embrechts - One-class SVM for outlier detection and applicability of a model to a specific testing set. Testing the validity range of QSAR models using one-class support vector machines. (COMP, Morial Convention Center rm. 347, Thursday, 1:30pm)

  3. Objective • How do models function as training data is reduced? • Hypothesis: if (1) a model is "stable", and the (2) descriptors are appropriate to the effect being modeled, then a great deal of training information could be removed without the testing predictions to degrade significantly.

  4. Datasets BP - boiling points ACE - Angiotensin-Converting Enzyme inhibitors AChE - acetylcholinesterase inhibitors Lombardo - blood-brain barrier (BBB) partitioning Artemisinin - anti-malarial compounds (298 compounds) (112 compounds) (60 compounds) (70 compounds) (179 compounds) difficulty

  5. Descriptors MOE - “classic” 2D descriptors TAE - electron density derived surface property distributions SS - surface statistics of TAE property distributions Wavelets - alternative representation of TAE information PEST - shape-property 3D hybrid of TAE property distributions ALL - combination of all descriptors

  6. Testing Procedure Training Set (70%) Training Set Dataset 90% Training Set Subset Testing Set (30%) Testing Set (30%) Testing Set (30%) PLS models of five components were used throughout the study.

  7. Typical Results AChE BP Artemisinin ACE

  8. Repeating Testing Procedure Training Set Training Set Training Set Training Set (70%) Training Set Training Set Dataset 90% Training Set Training Set Training Set Subset Training Set Testing Set (30%) Testing Set (30%) Testing Set (30%)

  9. Multiple Training Sets

  10. Performance Instability

  11. Multiple Testing Sets Dataset Training Set (70%) Training Set (70%) Training Set (70%) … Testing Set (30%) Testing Set (30%) Testing Set (30%) … Repeat modeling study …

  12. Multiple Testing Sets

  13. Distance(trainingset,testset) Can I understand more about the relationship between the training data and the testing data?

  14. Distance(trainingset,testset) Distance function = Sum of Euclidean distances for testing molecules to nearest neighbor training molecule

  15. Distance(trainingset,testset)

  16. Distance(trainingset,testset)

  17. Conclusion: Stable?

  18. Conclusion: Data vs. Information?

  19. Conclusion: Potential applications? Extend ensemble models to include stability analysis?

  20. Conclusion: Your model’s performance is dependent on your training data. (Duh!) It’s hard to know when you have enough.

  21. Thanks! Curt M. Breneman Mark J. Embrechts Changjian Huang Xiaohua Wu N. Sukumar

More Related