Analyzing the Simplicial Decomposition of Spatial Protein Structures Rafael Ördög, Zoltán Szabadka, Vince Grolmusz
Aims of our research • Aims • Easy to use protein database containing relevant geometrical data on proteins. (Capable of treating thousands of PDB entries at once.) • Drug discovery by data mining in the database.
Steps of our research • Steps • Cleaning and restructuring the PDB (RS-PDB) • Done by Zoltan Szabadka • Creating a database of geometrical & chemo-geometrical data • Under construction in our present research • Discovering rules, and creating learning systems for ligand pre-docking. • Mostly later work
Delaunay Decompositions • To find the Delaunay decomposition of a set, we have used the qhull algorithm, its source is available at: http://www.qhull.org/.
Important properties of Delaunay decompositions • Regions are defined by circum spheres being empty (Region is empty as well) • Regions are tetrahedra except if more than 4 points are on the same sphere.
Important properties of Delaunay decompositions • Partition of the convex hull of A. • The graph defined by the edges of the Delaunay regions: Delaunay Graph • Can be used for searching closest neighbors
Delaunay decomposition of heavy atoms of the protein in 1n9c with the ligand
Important properties of Delaunay decompositions • The “dual” structure can solve the Post Office problem. • Partitioning the city into service areas of given post offices, so that every one belongs to the closest post office. • Duality here is only theoretical, in practice it is the same structure. (Voronoi diagram.)
Previous work • Singh, Tropsha and Vaisman • The point set was chosen to be the set of Cα atoms of the protein • Aim: predict secondary protein structure • In contrast: we chose the point set to be the set of all heavy atoms. (Non hydrogen atoms.)
Tetrahedrality: 0 for regular tetrahedra, and < 1 (Si<j(li-lj)2)(15 (Sili / 6)2) Volume and tetrahedrality
Frequency • Two dimensional temperature plots of the frequency of regions with given volume and tetrahedrality. • In all proteins (Our whole database) • In a given protein
Classifying by corner atoms • Question: are the different peaks in the earlier plots in connection with the function of the corner atoms? • Classification by the symbols of corner atoms • Classification by hetid of the residues the atom is found in. • Question: How frequent are different corner atom sets?
Frequency of metals in different types of tetrahedra • Ca appears almost exclusively in the vicinity of four Oxygen • Zn prefers NOSS and NNNO type of tetrahedra, but also frequent in CNOO NNOO NOOO • Only Zn was found in NOSS
About the geometric extension • Presently we cannot handle: • Missing atoms • Precision errors, non-tetrahedral regions • The PDB is handled as a juggled input • The resulting database can only be used for quality statistical purposes. • Strongly restricted database. • No missing atoms, 2.2 Ǻ resolution, includes protein • 5757 such PDB (June 23, 2006 ) • Our current research addresses the problems above.
Recent problems • For example aromatic rings should be on one circle, in one plane, hence on one sphere, but they refuse to be: • Distortion is minor, not recognizable by eye • Is it just measuring error? • Or is it due to the structure around the ring? • In contrast some atoms not expected to fall on one sphere tend to do so.
Structure of the geometric extension • Essential: • Corner • Reference to the atoms in the RS-PDB • Region • the radius and coordinates of the center of the circum sphere • volume and tetrahedrality of the tetrahedron • three type of bond graphs code • hetid, atom name, and symbol set assigned to the regions corner set and more • Additional: Edge, Neighbor, (Ligand) Atom