"The Phase Problem"

"The Phase Problem"

when we measure the intensity ( ), we lose all information about a(hkl) • we need to recover this information in order to reconstruct the image of the molecule

Can we extract some information from the intensities? • A.L. Patterson—what if we T(I(hkl))? • P(uvw) only has significant value if and are both centered on an atom • all phases = 0

The Patterson Function • P(uvw) is an interatomic vector map of allatoms in the unit cell • convolution of electron density with itself • every atom makes interatomic vector with itself  origin peak dominates • for N atoms, N2 peaks, N2-N of which don't lie at the origin

three-atom structure Patterson map for three-atom structure

The Patterson Map • small simple structures are easy to deconvolute • origin size, noise, complicated nature makes it intractable for large structures • contains info about interatomic distances and distance between different molecules in unit cell • basis for the two means for solving the phase problem: the heavy atom method (isomorphous replacement, MAD, etc.) and molecular replacement)

Isomorphous Replacement • basic idea: add small number of atoms with many electrons (e.g. Hg), use to get a(hkl) • data from native and "derivatized" crystal used to locate heavy atoms (HAs) • requires isomorphism between crystals • most molecules in crystal must have HA • from HA positions, can get estimate of a(hkl)

Steps in Isomorphous Replacement • create derivative crystal and collect data • determine HA positions • refine HA parameters • estimate protein phase angles • evaluate electron density map and perform density modification if necessary (it almost always is)

Step 1: Creating the Derivative • take crystal, soak in stabilizing solution containing heavy atom (toxic) • completely empirical w.r.t. HA amount and soaking time • non-isomorphism and crystal damage vs. low or no HA incorporation • take advantage of reactive functional groups in molecule under study

Commonly-used Heavy Atoms • Hg: • reacts with cysteine thiols and disulfide bonds (better above pH 6, when Cys gets protonated) • can be chelated by histidine (best above pH 7, unless anionic mercury complex) • organic mercurials (e.g. methylmercury acetate) are particularly reactive (and toxic) • can access hydrophobic core Cys

Pt • K2PtCl4 is the single most successful heavy atom compound • reacts with Cys, Met sulfurs, His if anionic complex (like PtCl42-) • Met reaction is nearly insensitive to pH • gold makes similar, less reactive complexes

Hard Cations • uranyl cation – UO2+ • linear, binds to negatively-charged groups from Glu and Asp • at low pH, have been located near Ser, Thr • lanthanides—15 members, decreasing ionic radius • allows for selection of appropriately-sized ion for binding site

Thallium and Lead • "in-between" character, can react with either carboxylates or imidazoles, depending upon oxidation state • Heavy Atom Databank

Solving a Heavy Atom Substructure • Patterson map for all atoms is: • interested only in positions of heavy atoms: • for isomorphous replacement: • for anomalous scattering (Friedel's Law no longer holds):

anomalous DF usually denoted • isomorphous or anomalous Patterson is: • use estimate of structure factor for heavy atom only to generate Patterson

Locating Heavy Atoms • for any space group, determine all Patterson vectors • simple example: P21 • equivalent positions: • Patterson vectors:

Harker Sections • in P21 example, Harker section is at y = ½ (±½ are equivalent because they are separated by exactly 1 unit cell repeat) • all atoms related by that symmetry operation have interatomic vectors that "pile up" on Harker sections • peaks are very pronounced (if experiment worked)

P21 continued • y position is arbitrary (no fixed origin on this axis) • if more than one HA site, picking first fixes y for all others • examine "cross peaks" between heavy atoms • HA #1 – 0.23, 0, 0.08 • HA #2 – 0.14, ?, 0.44 • should be cross peak at 0.09, ?, -0.36 (and -0.09, -?, 0.36)

P212121 example—more complicated • equivalent positions: • Patterson vectors:

P212121 • Harker sections at x=½, y=½, z=½ • peaks on any Harker section will give two (e.g. x and y) of the three positional coordinates • must pair up peaks on one section with the others to get third coordinate • make sure all are consistent • this is easy to mess up

Estimating protein phase angles • assuming no measurement error and no error in the HA parameters:

Multiple Isomorphous Replacement • (in principle) phase ambiguity can be resolved by presence of a second derivative with different sites or with multiple wavelengths or anomalous scattering • density modification can also do this (in principle) • in practice, things are not so unambiguous

Difference Fourier • once you have an initial phase estimate, additional derivatives can be easier to analyze • origin of HA substructure now fixed, all other derivatives must obey this origin or info not cumulative • no more Pattersons to deconvolute, now calculate "difference fourier synthesis"

Difference Fourier • gives Fourier map of new HA (lots of noise plus hopefully one or more strong peaks) • x,y,z all determined • susceptible to wishful thinking • should check against Patterson of that derivative • should give rise to better behavior in HA refinement and (most importantly) BETTER MAPS

Refinement of HA Parameters • seeks to minimize "lack of closure" • modify xi, yi, zi, occi, Bi for each of i derivatives such that fh is more consistent with Fph • changing the above changes the length and orientation (i.e. phase) of fh • Fp + fh = Fph(calc), so minimize differences between observed and calculated structure factors for each derivative

Old vs. New • used to treat as least-squares minimization: • unstable, bias for initial values prevalent • not really a least-squares problem • requires maximum-likelihood treatment • beyond scope of lecture • better handling of non-isomorphism (refine rather than estimate)

Statistical Evaluation max = 1.0 (no phase error), measure of how sharp probability distribution is

Note on Statistical Evaluation • all of those numbers can be manipulated to seem great in the absence of real information • e.g. multiple derivatives with identical sites will artificially sharpen probability distribution • LOOK AT THE MAP!

Where do we go now? • initial experimentally phased maps often have lots of noise and are difficult to interpret • need some way to improve information content

Density Modification • use known properties of protein electron density to improve results • create description of what generic electron density should "look like" and modify existing map accordingly • iterate until convergence (or failure)

FT Map Density Modification New Map FT

Map Properties to be Exploited • solvent region should be "flat" • no negative density in protein region • electron density histograms • non-crystallographic symmetry • pattern matching

Solvent Flattening (or Flipping) • 1-D example:

Steps in the Process • locate the molecular boundary • get rid of negative density within boundary, replace noisy density outside boundary with constant value: • Calculate F, f from modified map • Combine with experimental phases to get new phases

Is it Working? • look at the map • correlation coefficient between Fobs and Fcalc from modified map should increase if it's working (of course this isn't a foolproof measure) • fixed-point iteration—poor convergence properites

Additional Methods • can simultaneously perform solvent flattening and other methods to get even better results • non-crystallographic symmetry averaging most powerful constraint on phases • statistically more sophisticated density modification can possess better convergence properties, improve phases in more cases

Molecular Replacement

What is Molecular Replacement? • use phases from an existing model as approximate starting phases for protein of interest • advantages: • no heavy atom soaks, etc. • chain traced in proper direction, etc. (rebuilding rather than building from scratch)

disadvantages: • phases biased towards original model • since errors in phases are perfectly correlated with errors in model, they are difficult to overcome • in contrast, in experimental phasing, errors in model are uncorrelated to errors in initial phases • \ model constraints (bond lengths and angles, etc.) provides powerful means of overcoming errors in starting phases

How do we use model phases as initial phases? • must determine orientation and position of known model in unit cell of structure to be solved: • six components: • three rotational angles (a,b,g) • three translational values (x,y,z)

If we did this the brute force way… • 3 100Å cell edges = (100)3 points • all of rotation space in 1º increments = (360)3 points • total = 4.7 x 1013 points, with 1 FFT per point • if each FFT took 1ms, this would take 1490 years

"Divide and Conquer" • separate rotation and translation searches • above search now has 47 million rotation and 1 million translation points to search • what function is dependent on rotation only? • what function is dependent on translation only? • ANSWER: the Patterson function (for both)

Different Patterson Vectors for Different Purposes • intramolecular vectors (describe shape of molecule) clustered around origin—applicable to rotation search • intermolecular vectors (describe distribution of molecules in unit cell) not near origin—applicable to translation search

How to Separate Intra- from Intermolecular Vectors • spherical interference function (G) applied to rotation function (R): • G is Fourier transform of a sphere centered at origin with radius defined to encompass most intramolecular vectors and NO intermolecular vectors

"The Phase Problem"