200 likes | 214 Vues
This research paper discusses the problem of simplifying proteins for various computational applications, such as visualization, matching, and searching in protein databases. It introduces a new algorithm for line segment-based simplification of proteins, with better performance than previous solutions. Experimental results demonstrate the effectiveness of the proposed algorithm on protein chains with thousands of atoms.
E N D
Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX 75080
Problem definition • Input: indexed sequence of balls B= {B1, B2, …, Bn} in R3, with each Bi specified by a center and radius pair (pi,εi). • Let C= {p1, p2, …, pn}, be the set of center points. • Find set of stablers defined by a subset P= {pi1, pi2, …, pim} of C such that: • i1=1, im=n, and ijє{1,2,…,n}, for j= 1,2,…,m, • ij < ij+1, for j=1,2,…,m-1, • The line segment pijpij+1 (or the line pijpij+1 ) stabs each of the balls {Bij, Bij+1, …, Bij+1}, • There is no other subset P’ of C satisfying the first three conditions and of smaller size than P, i.e., m isminimized. Ball Bi εi pi Stabler
Applications • Simplification of proteins for visualization, manipulation, (approximate) matching and searching in protein database, and neural map representation. • The problem is a generalization of the polygonal chain simplification problem. pi ε Approximating segment
Key difference from chain simplification: ε Chain simplification εi Our simplification
Sergey Bereg, Cylindrical Hierarchy for Deforming Necklaces, International Journal of Computational Geometry & Applications, 14(1-2): 3-18, 2004 Compute optimal cylindrical cover of a necklace with n beads (balls) in R3 in polynomial time. The n balls are ordered in sequence; if not, the problem is NP-hard. Related Works
Related Works • Binhai Zhu, Approximating 3D points with Cylindrical Segments, International Journal of Computational Geometry & Applications, 14(3),189-201,2004. • Given a set S of n points in R3, compute k cylindrical segments enclosing S such that the sum of their radii is minimized. • For unordered points: NP hard. • Polynomial time approximation scheme (PTAS) for any fixed k>1 is possible. • Used for constructing neural maps and some other computational biology applications.
Related Works • Frederic Vivien, Nicolas Wicker, Minimal Enclosing Parallelepiped in R3, CG:T&A, 29(2004), 177-190. • Find min. volume parallelepiped enclosing a set of n points. • O(n6) time.
Our results • Quadratic or near quadratic time solutions for line segment stablers. • Subcubic, O(n2.4logO(1)n) time for line stablers • Experimental results: • for proteins with thousands of atoms, our solutions have much better performance than previous solutions; • actual running time is much smaller than the worst case time shows.
Lp metric: distance between two points X and Y = For example, X=(x1,y1), Y = (x2,y2) L1= |x1-x2| + |y1-y2| L2= L∞=|x1-x2| if |x1-x2| ≥ |y1-y2| or |y1-y2| if |x1-x2| < |y1-y2| Y 5 3 L1=7 L2=5 L∞=4 X 4 Line segment based simplification , used for protein simplification
Line segment based simplification • L2 metric: O(n2logn) time, O(n2) space algorithm. • similar to the polygonal chain simplification algorithm of Daescu et.al. • replace the line segment pipj by two rays; for each ray, intersect it and the projections from pi (or pj) of the balls {Bi+1,Bi+2,…,Bj} with a plane. • reduces to deciding whether the projection of pi along the ray is within the common intersection of some disks (the projected balls).
Line segment based simplification • L1 or L∞ metric: O(n2) time, O(n) space. • L1 (L∞)“balls” are cubes (crosspolytopes). • Main idea: the common intersection of the projections of n L1 or L∞ “balls”, from any view point onto any plane, if not empty, is a convex region bound by O(1) edges.
Line segment based simplification • O(n2) time and space if each Bi is a convex polytope and the complexities of the projections Proj(Bi, W, p) of the Bi’s from any point p onto any plane W satisfy the condition: • The intersection of the projection Proj(Bi, W, p) of the Bi’s from any point p onto any plane W is a convex polygon of size O(n). • The algorithm is similar to the one for the L2 metric.
Line based simplification • For n indexed points P = {p1, p2, …, pn} in Rd, d≥ 3, with O(n3-3/(└f(d)/2┘+1) *logO(1)n) time and space one can report for each line pipj, 1≤i<j≤n, the farthest point pk with i<k<j • f(d) = O(d2). • For protein chains, the radius of each ball Bi takes value from a small set, and f(d) = 32 = 9 • the minimum size set P of stablers can be found with O(n2.4logO(1)n) time and space.
Main idea: • Use a (constant number of) balanced binary tree structure. • At each node, construct a farthest-point-from-line data structure, balanced with respect to the number of queries. • O(n2) queries overall.
Experimental Results • Use RMSD to measure the similarity between the original and the simplified chains. • Different number of atoms in the original and the simplified chains.
Experimental Results 1CA2: 256 alpha carbons Simplified 1CA2: 168 alpha carbons RMSD= 0.62 Å
Experimental Results 1DDZ_A: 481 alpha carbons Simplified 1DDZ_A : 340 alpha carbons RMSD= 0.44 Å
Experimental Results 1DDZ_B: 481 alpha carbons Simplified 1DDZ_B : 351 alpha carbons RMSD= 0.43 Å
Conclusions • The RMSDs are very small, the simplified backbones are similar to the originals, while having significantly simpler representations (e.g., about 33% reduction in size for 1CA2). • The simplified chains can be used in place of the original ones in visualization, alignment, classification of protein structures, etc.