200 likes | 446 Vues
A machine learning approach to identifying false positives in chemical drug screens.
 
                
                E N D
PAINS Train PAINS Train Identifying false positives in drug screens Identifying false positives in drug screens Derrick DeConti Derrick DeConti Insight Health Data Science Insight Health Data Science
Drug Screens Target of Interest
Non-specific (promiscuous) Specific Target D Target D Target A Target A Target B Target B Target C Target C
Pan Assay Interference Compounds (PAINS)
Current Practice ●Poor sensitivity ●Experience – Learned heuristics ●Poor precision ●Structural similarity – Hard filters ●High FDR
Machine Learning Approach ●Based on chemical structure ●Creates own substructure classification ●Provide a likelihood of promiscuity
Labeled Data PAINS Non-PAINS – Baell 20101 – ChEMBL – Empirically derived 1. Baell, J. B., & Holloway, G. a. (2010). New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of Medicinal Chemistry, 53(7), 2719–2740. doi: 10.1021/jm901137j
Transformation of Data with RDKit [ 1 0 0 ..., 0 1 0] 2,048 bit vector
Clustering of PAINS Overlap in cluster-based classification
Predictive Classification Random Forest ●Binary format of data ●Interdependence within vector Testing ●5-fold validation ●20% left out for validation set
Secondary Chemical Set ●Neglected tropical disease compound set ●Composed of two structural classes ●Drug-like molecules
Validation ●Spike in PAINS – Into secondary chemical set ●Test previously trained random forest – Versus cluster based method
FDA Drug Promiscuity ●1,055 FDA approved drugs – 38 classified as promiscuous ●86% cluster closely
About Me Bioinformatics Genomics