Enhancing Protein Structure Evaluation: Introducing TR Score for Better Model Assessment

Three scores:TS, TR and CS ShuoYong Shi, RuslanSadreyev, Jing Tong, David Baker and Nick V. Grishin http://prodata.swmed.edu/CASP8 Howard Hughes Medical Institute, Department of Biochemistry,University of Texas SouthwesternMedical Center at Dallas

GDT-TS: the best single score Why? Because it is 4 scores in one – from 4 different superpositions. GDT-TS=[N(1)+N(2)+N(4)+N(8)]/(4N) where N(r) is the number of superimposed residue pairs with the CA–CA distance < r Å, and N is the total number of residues in the target. Since many approaches are trained to produce models scoring better according to some evaluation method, flaws in the evaluation method will result in better-scoring models that will not represent real protein structure in any better way. One of such dangers is compression of coordinates, which decreases the gyration radius and may increase some scores based on Cartesian superpositions. http://prodata.swmed.edu/CASP8/evaluation/Scores.htm

Compression is bad, but GDT-type scores favor it

Attraction and repulsion in scores GTD–TS score measures the fraction of residues in a model within a certain distance from the same residues in the structure after a superposition. This approach is based on a "reward". Taking an analogy with physical forces, such a score is only the "attraction" part of a potential, and there is no "repulsion" component in GDT–TS. It might have been reasonable a few years ago, when predictions were quite poor. It was important to detect any positive feature of a model, since there were more negatives about a model than positives. Today, many models reflect structures well. When the positives start to outweigh the negatives, it becomes important to pay attention to the negatives. Thus we introduced a "repulsion" component into the GDT–TS score. When a residue is close to its "correct" residue, GDT–TS rewards it, and if a residue is too close to "incorrect" residues (other than the residue that is modeled), we subtract a penalty from the GDT–TS score. This idea was suggested by David Baker as a part of our collaboration on CASP and model improvement. We call the score Ruslan Sadreyev and ShuoYong Shi developed in the Grishin Lab based on this idea TR, i.e. ‘The Repulsion'. TR score, in addition to rewarding for close superposition of corresponding model and target residues, penalizes for close placement of other residues.

TR score is calculated as follows: 1. Superimpose model with target using LGA in the sequence-dependent mode, maximizing the number of aligned residue pairs within distance cutoff=4Å. 2. For each aligned residue pair, calculate a GDT–TS - like score: S0(R1, R2) = 1/4 [N(1)+N(2)+N(4)+N(8)], where N(r) is the number of superimposed residue pairs with the Ca–Ca distance <r Å. 3. Consider individual aligned residues in both structures. For each residue R, choose residues in the other structure that are spatially close to R, excluding the residue aligned with R and its immediate neighbors in the chain. Count numbers of such residues with Ca-Ca distance to R within cutoffs of 1, 2, and 4Å. (As opposed to GDT–TS, we do not use the cutoff of 8Å as too inclusive). 4. The average of these counts defines the penalty assigned to a given residue R: P(R) = 1/3 * [N(1) + N(2) + N(4)]. 5. For each aligned residue pair (R1, R2), the average of penalties for each residue P(R1, R2) = 1/2 * (P(R1) + P(R2)) is weighted and subtracted from the GDT–TS score for this pair. The final score is prohibited from being negative: S(R1, R2) = Max [ S0(R1, R2)-w*P(R1, R2), 0 ]. Among tested values of weight w, we found that w=1.0 produced the scores that were most consistent with the evaluation of model abnormalities by human experts.

Segments of superimposed structure (black) and model (red) With 1A distance cutoff. Superposition does not look very good, but assume that only segments of larger structures are shown, and the rest of the structures looks better 3 2 1 1 2 3 Scale: 0.6A

GDT-TS calculation for 1A: find the number of corresponding atoms within 1A. Total GDT-TS contribution: 0+1+0=1 3 1.34A 2 1 1.34A 0.6A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: For those residue pairs that contribute to GDT-TS find “incorrect” atoms within 1A Residue pairs (1,1) and (3,3) do not contribute to penalty, as they do not contribute to GDT-TS. 3 1.34A 2 1 1.34A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: For those residue pairs that contribute to GDT-TS find “incorrect” atoms within 1A Residue pair (2,2) may contribute to penalty. 3 2 1 0.6A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 1: from the structure (black) Which “incorrect” residues in the model are within 1A from residue 2 in the structure? 3 2 1 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 1: from the structure (black) Which “incorrect” residues in the model are within 1A from residue 2 in the structure? It is residue 1 (0.84A), as residue 2 is “correct”, and residue 3 is 1.2A away. 3 1.2A 2 1 0.6A 0.84A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 1: from the structure (black) In the structure, only residue #1 of the model contributes count 1 to the penalty 3 1.2A 2 1 0.6A 0.84A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 2: from the model (red) Which “incorrect” residues in the model are within 1A from residue 2 in the model? 3 2 1 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 2: from the model (red) Which “incorrect” residues in the model are within 1A from residue 2 in the model? It is residue 1 (0.84A) and residue 3 (0.84A) , as residue 2 is “correct”. 3 2 1 0.84A 0.84A 0.6A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 2: from the model (red) In the model, residues #1 and #2 contribute total count 2 to the penalty 3 2 1 0.84A 0.84A 0.6A 1 2 3 Scale: 0.6A

Penalty calculation for 1A: Step 3: averaging penalty contributions from the structure and the model Total penalty= (penalty from the structure + penalty from the model)/2 = (1+2)/2=1.5 The penalty for 1A is 1.5

TR contribution for 1A: Compute it as TR = GDT – weight * penalty Check if TR<0, set it to 0. Weight TR 0.25 1 - 0.25*1.5=0.625>0 0.5 1 - 0.5*1.5=0.25>0 1 1 - 1*1.5=-0.5<0, so set TR to 0 Next: compute these for 2A, 4A and 8A, average, and divide by the structure length

R=0.991 Correlation between TR score (vertical axis) and GDT-TS (horizontal axis) Scores for top 10 first server models were averaged for each domain shown by its number positioned at a point with the coordinates equal to these averaged scores. Domain numbers are colored according to the difficulty category suggested by our analysis: black - FM (free modeling); red - FR (fold recognition); green - CM_H (comparative modeling: hard); cyan - CM_M (comparative modeling: medium); blue - CM_E (comparative modeling: easy).

Comparison of remote homologs: compressing one homolog can increase GDT_TS Sample of N=2050 of pairs of SCOP domains sharing superfamily Lower third by DALI Z (2.0<DALI Z<5.8) 2.0<DALI Z<5.0 N=680 N=540

In 40% pairs of remote homologs GDT_TS increases with compression Domain pairs where compression causes GDT_TS growth (N=239 of 540)

Compression of FR models can cause GDT_TS growth SAM-T08 server All 108 models of FR targets Models of FR targets where compression causes GDT_TS growth (N=43)

Contact score CS Scores comparing intramolecular distances between a model and a structure (contact scores) have different properties than intermolecular distance scores based on optimal superposition. One advantage of such scores is that superpositions, and thus arguments about their optimality, are not involved. The problems with developing a good a contact score are 1) contact definition; 2) mathematical expressions converting distance differences to scores. CS score is calculated as follows: 1. contact between residues is defined by a distance ≤8.4Å between their Cα atoms. 2. The difference between such distances in a model and a structure is computed and used as a fraction of the distance in the structure. 3. Fractional distances above 1 (distance difference above the distance itself) are discarded and exponential is used to convert distances to scores (0→1). The factor in the exponent is chosen to maximize the correlation between contact scores and GDT–TS scores. 4. These residue pair scores are averaged over all pairs of contacting residues. We call this score CS, i.e. 'contact score', for short.

R=0.962 Correlation between Contact score CS (vertical axis) and GDT-TS (horizontal axis) Scores for top 10 first server models were averaged for each domain shown by its number positioned at a point with the coordinates equal to these averaged scores. Domain numbers are colored according to the difficulty category suggested by our analysis: black - FM (free modeling); red - FR (fold recognition); green - CM_H (comparative modeling: hard); cyan - CM_M (comparative modeling: medium); blue - CM_E (comparative modeling: easy).

Server rankings on all targets in domains for three scores On 143 domains, ranking does not change much with score, illustrating that 1) scores correlate with each other and 2) the ranking is robust.

Server rankings on FR domains for three Z-scores On 28 FR domains, ranking shows small variations illustrating the differences between individual scores and between servers.

Summary: 1. Single score is not enough for model evaluation. 2. Do not train your method on a single score. 3. Introduction of “repulsion terms” in the score is useful, as it penalizes compression and may help improving alignments. 4. Superposition-independent contact scores are fast and easy to compute, accurate and correlate well with superposition-based scores.

Acknowledgement Our group Collaborators Shuoyong Shi Jing Tong Ruslan Sadreyev Lisa Kinch Jimin Pei Ming Tang Sasha Safronova Yuan Qi Hua Cheng Jamie Wrabl Indraneel Majumdar Erik Nelson Yong Wang S. Sri Krishna Bong-Hyun Kim Dorothee Staber David Baker U. Washington Kimmen Sjölander UC Berkeley William Noble U. Washington HHMI, NIH, UTSW, The Welch Foundation

Enhancing Protein Structure Evaluation: Introducing TR Score for Better Model Assessment