Bridging cheminformatics and bioinformatics using protein structures

Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001

SELECTING THE BEST TARGETS High Validity and Drugability Requires a Unifying Informatics Framework • Disease-association doesn’t make a protein a target - requires validation as point of intervention in pathway • Having good biological rationale doesn’t make a protein tractable to chemistry (drugable) • Genomics, HTS and Combichem have increased numerical throughput many hundred fold - overload of poorly integrated data, shortfall in productivity Target Validation Process Drug Discovery Process Target Selection Disease Target Leads Clinic Bioinformatics Cheminformatics Inpharmatica’s protein structure focus - uniquely placed to assess both parameters 1

BIOPENDIUM AND CHEMATICA ctgacaagtatgaaaacaacaagctgattg tccgcagagggcagtctttctatgtgcaga ttgacctcagtcgtc Biopendium Chematica Genome Data Target Structure Lead Hypotheses protein target validation drug discovery and selection 2

STRUCTURE-BASED METHODS FIND MANY HOMOLOGUES (AND PUTATIVE TARGETS) NOT DETECTABLE FROM SEQUENCE SIMILARITY Biochemical function and drugability defined by 3D structure, not sequence - structure is better conserved AHHLDRPGHNMCEAGFWQPILL Test Sequence 100% % SEQUENCE ID Standard Approaches 30% Inpharmatica AdvancedApproaches 0 3

BIOPENDIUM • Inputs - all public (or proprietary) protein data • Proprietary methods • Genome-Threader • QBI-Y-Blast • Reverse Search Maximisation • Massive computation • 1 million cpu hour set of calculations employing the most advanced algorithms (1100 processor farm) • Applied to 600,000 sequences, 14,000 structures + bound ligands • Yields 670m precalculated protein relationships • Query results in 15 minutes vs. two weeks with traditional bioinformatics in an Oracle database Protein Information • Structures • Sequences • Bound ligands • Families • Functions 4

THE INPHARMATICA BIOPENDIUM Proprietary structures Proprietary seq. ORF prediction Genbank Swissprot Prints Prosite Enzyme PDB Taxonomy Link complementary data in the 7 resources Mask sequences Pairwise sequence searches Precalculated data for 600,000 protein sequences. (scores and alignments for each hit) Threading based approaches Profile based searches Relational Database Processed PDB to XMAS data Ligplot Interactive sequence alignment editor Inpharmatica enhanced RasMol 3D viewer Ligplot ligand interaction editor Inpharmatica Workbench 5

DRUGABLE TARGET DISCOVERY Finding a novel brain metalloprotease BIOPENDIUM • Novel brain protein identified CHEMATICA • Drugable site identified 7

CHEMATICA IS…. Site Identification Site Mapping Fragment Mapping Pharmacophore Generation Gene Family Data Views Ligand 2-D structures Chemical annotation of PDB ‘real’ ligand structures Gene family structures consensus family analysis Database of putative/known binding sites site mapping and pharmacophore generation similarity searching/clustering of sites large scale virtual screening resource 8

Site identification - How sites in a protein structure are delineated? • Sphere is placed between the VDW surfaces of each atom pair. • Any neighbouring atoms penetrating sphere cause its size to be reduced. • Repeat for all possible atom pairs. • Generate surface around surviving sphere to define site region. SURFNET: A program for visualizing molecular surfaces, cavities and intermolecular interactions. Laskowski R A (1995), J. Mol. Graph., 13, 323-330. 9

Physical Parameters of the clefts 8 largest sites are stored together with their physical parameters Volume Hydrophobic content Polar content surface accessibility …… In total - 20 parameters calculated. 10

Prediction of binding/active sites • Rule driven: • use of Neural Netsa on a training set of • 100 ligand/protein PDBs • Validation: • success rate = 90% on a extended set of 500 PDBs a backpropagation net -7-5-1 network 11

How XSITE potential is derived? • 3-D distributions of 20 different atom types about the 20 amino acids are calculated. • No assumption of energy terms. X-SITE: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins. Laskowski R A, Thornton J M, Humblet C & Singh J (1996), Journal of Molecular Biology, 259, 175-201. 12

Data set Used (1) 521 non-homologus protein chains* from PDB that satisfy no two sequence identity is > 20% resolution <1.8Å R factor < 0.2 AND (2) 376 protein-ligand PDB structures for studying additional atom types other than those from peptides and proteins, such as Cl, F. Note: The PDB has about 14K entries! *cullpdb_pc20_res1.8_R0.2_d001130_chains521 (R. Dunbrack, Jr.) U. Hobohm, M. Scharf, R. Schneider, "Selection of representative protein data sets." Protein Science, 1, 409-417 (1993). 13

Projecting XSITE distributions onto the predicted binding site Application of XSITE distributions to side-chains making up the calculated protein binding site 14

How Pharmacophore is generated? • Compare the XSITE predictions generated for the different probe atoms at a 3D grid of densities encompassing the region of the binding site. • The higher the value at a given grid-point the higher the likelihood of finding that type of atom at that location. • For each probe atom, it derives a “best” map. • The net result is a new set of 3D grid maps, one per probe atom, holding only those regions where that atom scored higher than the others. 15

What is fragments mapping? • In-built database of more than 100 small molecule fragments - most common functional groups and represent the common building blocks that satisfy drug-like elements used in chemistry. • Privileged structures from companies. 16

How is fragments mapping done? • Each atom in a fragment is assigned one of the 20 atom type. • Each fragment is placed at every grid-point within the binding site and subjected to 300 rotations. • At each rotation a score is calculated using the appropriate X-SITE predictions for the atom types that the fragment contains. 17

CHEMATICA Gene Family Data Views • Curated, high-quality annotation and presentation of important ‘drugable’ gene families • NHRs, kinases, caspases, GPCRs,…. • Contains ligand structure information • Contains crystal environment classification • Automatic alerts for newly released structures • Multiple structure comparison options 18

CHEMATICA Consensus Family Analysis • Size and topology of binding sites for MMP-1 & MMP-8 are similar, but detailed interactions differ • Spheres signify negative charge requirement in different areas of the binding pockets • provides potential for specificity MMP-1MMP-8MMP-13MMP-3 19

Validation Study Taken two sets of data from literature 1) GOLD (Jones, Willett, Glen, Leach and Taylor) Genetic Optimization for Ligand Docking (71% success rate in ligand binding mode in 100 pdbs) our method - 70% 2) SUPERSTAR (Verdonk, Cole and Taylor) Empirical method for interactions in proteins (67% success rate for original 4 probes ~67% in 122 pdbs) our method - 84% 1. Jones et al. J. Mol. Biol. (1997) 267, 727-748 2. Verdonk et al. J. Mol. Biol. (1999) 289, 1093-1108 20

Acknowledgements • Inpharmatica • Alex Michie • John Overington • Simon Skidmore • UCL • Roman Laskowski • Adrian Shepherd • Janet Thornton 21

Bridging cheminformatics and bioinformatics using protein structures

Bridging cheminformatics and bioinformatics using protein structures

Presentation Transcript

Protein structures

Visualizing Protein Structures and Structural Bioinformatics

Basics of Protein Bioinformatics and Structural Bioinformatics

Bridging Bioinformatics and Chem(o)informatics

Bioinformatics and Protein Sequence Analysis

Using X-ray structures for bioinformatics

Bioinformatics and Protein Structural Analysis

BIOINFORMATICS Structures

Protein bioinformatics and systems biology

BIOINFORMATICS Structures

BIOINFORMATICS Structures

Protein Structures

BIOINFORMATICS Structures

CAP5510 – Bioinformatics Protein Structures

Bioinformatics Algorithms and Data Structures

Protein Structures and Methods

Protein Structures

BIOINFORMATICS Structures

BIOINFORMATICS Structures

Protein structures

Bridging cheminformatics and bioinformatics using protein structures