MIR: Multiple Isomorphous Refinement

Contact Author: Keith Watenpaugh, The Upjohn Company,

Physical and Analytical Chemistry, Kalamazoo, MI 49001, USA

MIR is a general multiple isomorphous replacement program developed as a collaborative project of the National Resource for Computational Chemistry (Alden et al., 1983). MIR calculates reflection phases and refines heavy atom parameters for macromolecular compounds calculates reflection phases and refines heavy atom parameters for macromolecular compounds by the method of multiple isomorphous replacement. The reflection intensity data for a native protein and a series of heavy atom isomorphous derivatives is used in the calculation. In addition the observed anomalous scattering of both the native and isomorphous substances may be used in the phase determination and parameter refinement. The trial coordinates of the heavy atoms of the derivatives must be supplied. These coordinates may be refined by means of the program. The phases of the reflections of the parent protein are estimated and may be refined. The estimation of phases may be carried out by one of two methods: A modified Blow-Crick procedure (1959), which is a probability method. This method is here given the acronym FAZPRB. The other method is the minimum variance Fourier coefficient method of Sygusch (1977) which is given the acronym MVFC. The phases of the native may then be refined by adjustment of the most probably phase by a method developed by Bricogne (1982). In this method the most probable structure factor of the native is adjusted so as to minimize the sum of the closure errors. This minimization is carried out by an application of Newton's method in the complex plane.

Acknowledgement Of Contributing Authors

The MIR program was written under the auspices of the National Resource for Computation in Chemistry at the Lawrence Berkeley Laboratory. It was started as a cooperative programming effort carried out in three workshop sessions by the following authors (in alphabetical order). The use of the program should refer to Alden et al. (1983).

Richard Alden University of California at San Diego

Gerard Bricogne Cambridge University

Stephan Freer University of California at San Diego

Sydney Hall University of Western Australia

Wayne Hendrickson Naval Research Lab., Washington

Pella Machin SRC Computer Centre, Daresbury

Robert Munn University of Maryland

Arthur Olson NRCC

George Reeke Rockefeller University

Steven Sheriff Naval Research lab., Washington

James Stewart University of Maryland

Jurgen Sygusch University of Sherbrooke

Lynn Ten Eyck University of North Carolina

Keith Watenpaugh University of Washington

Introduction

The coordinates of the heavy atoms of the isomorphous derivatives may be refined either by conventional block diagonal or by full matrix least squares. In the block diagonal approximation the parameters of each isomorph are refined independently. In the full matrix method all the parameters of all the isomorphs are refined simultaneously. The implicit dependence of the phases is taken into account in the calculation of the partial derivatives of the total lack of closure. This accounting leads to a correction to the normal matrix which reflects the interaction among all parameters, even if they belong to distinct isomorphous compounds. This method allows for the joint role of all isomorphs in the definition of the most probable structure factors of the native. In essence the correction accounts for the effect of the parameter shifts on the redefinition of the phases.

The ultimate result of the use of the MIR program is a subset of reflections for the parent compound, each with an estimated phase. These phases with the structure factor amplitudes may be processed to produce an electron density map. The calculation requires that a standard bdf be prepared which has a collection of reflections for which multiple observations are available. These observations will usually be for the native protein, one or more isomorphous derivatives and the anomolous reflections for both compounds. The trial coordinates of the heavy atoms of the isomorphous derivatives must be known from an analysis of their Patterson functions or some other source. The methods of phasing and refinement are specified by means of the control lines shown below. The kinds of statistics which may be printed are many and may be specified by the user.

Mir Phasing By Probability Methods

The option in this program for MIR phasing by probability methods is based on a slight revision of the treatment of phases introduced by Blow and Crick (1959). In keeping with the general philosophy of this program to deal in rather than in F, the residuals between observation and calculation are taken in rather than in F. In particular, the closure error for isomorphous-replacement is defined as

ε(φ)ISO = ||2 - | +|2 (1)

as suggested by Hendrickson and Lattman (1970). This is used instead of the conventional definition of Blow and Crick,

ε'(φ)ISO = || - | + | (2)

In the case of anomalous scattering information, the closure errors are defined analogously in terms of as proposed by Hendrickson (1979),

ε(φ) = ΔPH(3)

rather than in terms of F as given by North (1965) and Matthews (1966).

Blow and Crick treat the closure errors in terms of a Gaussian distribution such that

P(φ) = N exp {-ε2} (4)

where is the variance, or mean-square value, of the closure errors. When closure errors are defined as in equations 1 and 3 the exponent in equation 4 can be reduced to a simplified form so that

P(φ) = N' exp {Acos φ + Bsin φ + C cos 2φ+ D sin 2φ} (5)

where the phasing coefficients, A, B, C, and D, incorporate the information. This is the representation used in this program. Expressions for the coefficients for isomorphous replacements are taken from equation 6 of Hendrickson and Lattman (1970):

(6)

= -[ 4 ( + (7)

(8)

(9)

where and are the real and imaginary parts of the structure factor calculated from the atomic parameters of the heavy atoms in the jth derivative. The coefficients for anomalous scattering derive directly from equations 6 and 9 of Hendrickson (1979) and are as follows:

(10)

(11)

(12)

(13)

where ΔPHj = ((+) - (-))PHj, and are real and imaginary components of the heavy-atom scattering due to the real parts of the scattering factors, and a'PHj and b'PHj are those due to the imaginary parts of the scattering factors.

An essential problem to be addressed in the probability method is the estimation of appropriate value for and . It is generally observed that the E'2 values for the Blow-Crick formulation are not very strongly dependent on either scattering angle or intensity. However, since by comparison of (1) and (2)

εISO (14)

it follows that standard errors, = <ε2ISO>, for the treatment based on will depend directly on intensity. One way to account for this dependence is simply to tabulate E value in categories of intensity as well as scattering angle (Hendrickson, Love, Karle, 1973). This makes for a rather coarse function. Another approach is to assume a Gaussian distribution and relate errors to the Blow-Crick formulation in order to take account of the intensity dependence. This is the approach described by Blundell and Johnson (1976) and generalized by Hendrickson (1979) to

= 3 E'4 + 4( + σ)E'2 (15)

The approach adopted in the MIR program is based on the innovation of Ten Eyck and Arnone (1976) to divide closure errors into a part accounted for by variances of the observations and a residual closure error. The residual closure error accounts for expected variance due to lack-of-isomorphism, model incompleteness or other factors.

(16)

The first partial derivative is unity and the second is

(17)

when no prior knowledge of the phases is available this has the expected value of .

However, by intentionally erring toward over extinction, the relation used here is

(18)

Notice that since by the Ten Eyck-Arnone procedure

E'2ISO

and, since

,

much of the intensity dependence displayed in (15) should be accomodated in this formulation of the standard errors. The analogous formula used for anomalous scattering information is

(19)

Residual closure errors are tabulated in bins of intensity and sinθ/λ. See below for the definition of a "bin".

The procedure adopted here for combining the phase information from the several sources assumes each to be independent of the others. This is the standard assumption when using the probability method for MIR phasing and is a potential weakness. The product is formed simply by adding phasing coefficients:

(20)

The evaluation of centroid phases and figures-of-merit is accomplished using the numerical integration technique introduced by Dickerson, Kendrew and Strandberg (1961). Hence,

(21)

and (22)

which yield m and φc. In the case of "centric" reflections where the phase must be one of two possible values special formulae apply Hendrickson (1979) and Hendrickson (1971). Thus

(23)

and (24)

Note that the equation on which the isomorphous replacement phasing described above is based is equation 4 of Hendrickson (1970). This formula supposes that there is no imaginary component of the heavy atom scattering. A more appropriate formulation can readily be derived from equations 7 and 8 of Hendrickson (1979). Thus, if, as is done here, one defines

(25)

then (26)

and

(27)

This compares with Equation 4 of Hendrickson (1970),

(28)

by inclusion of the term. This will ordinarily be small in comparison with other terms. However, it can readily be included. It only modifies two definitions given above for A(ISO) and B(ISO), equations 6 and 7, thus,

(29)

(30)

In the case that the native structure includes anomalous scatterers

(31)

but and still both refer only to the structure factor contributions from the real and imaginary components of the scattering from the replacing heavy atoms and not at all to the scattering from anomalous scatterers in the native structure.

As written, the probability phasing option does not treat the case in which only or has been measured. In such cases, even approximately, new phasing coefficients based on

(32)

or (33)

using equation 6 or 7 of Hendrickson (1979) for and respectively, would have to be derived. Note that the formulae used in the probability phasing option for anomalous scattering information needs to be interpreted cautiously in the case that the native structure contains anomalous scatterers. The relationships will be correct as given provided that the following definitions are close to the truth.

The summation for each of the following is from i = 1 to the number of jth derivatives only.

(34)

(35)

The summation for each of the following equations is from i = 1 to the number of jth derivatives + anomalous scatterers of native.

(36)

(37)

Then (38)

(39)

Note further that Lynn Ten Eyck stated in a private communication that he chose to use the evaluation of

used in (17) in order to give a "worst case" estimate for the error. If this is the goal, Hendrickson in a second communication asserts that the appropriate value, at cos(φ-ϕ) = 1.0 gives

(40)

In as much as is usually less than , the added term, , will usually dominate . Notice that the expected value without phase knowledge is

(41)

and that the present program has

(42)

Mir Phasing By The MVFC Method

MIR phasing by the MVFC method requires the simultaneous least-squares refinement of both the heavy atom parameters of the derivatives and of the native protein phases. The method, due to Sygusch (1977), takes into account the experimental error in the measurement of intensities. It can include the implicit correlation existing among protein phases and heavy atom parameters. It is free of bias since all weights are based on experimental error in the measurement of the intensities. The weights are used for the refinement of both the heavy atom parameters and the native protein phases. These weights allow for both the error in the native structure amplitudes and the error in the derivative structure amplitudes. The MVFC method does not require a probability distribution to determine the phase of the reflections of the native protein. The method yields the expected values of the cosine and sine of the phase directly, ready for use in the calculating of the electron density of the native protein. A figure of merit is obtained which takes into account not only the error in the phases but also the error in the native protein structure amplitudes.

The minimum variance Fourier coefficient method of phase refinement for optimum phase determination requires a good estimate of the variance (error) of each measured reflection. This variance estimate is essential for the success of the phase refinement. The factors that contribute to the variance are counting statistics, short term fluctuations in the x-ray source, absorption correction errors, crystal decomposition errors, and inter-crystal scaling errors. The MVFC method, as programmed in the MIR program, will allow one to begin the phasing procedure without any prior knowledge of the phases. Special care must be taken for weak reflections. Unless the weak reflections (< 3σF) are free from bias or are corrected for bias due to the square root transformation of

||2 (i.e. is not equal to ). When this condition holds the weak reflections should not be used during the refinement. This is because the error of the bias is of the order of, or greater than, the heavy atom signal which is being estimated.

It is essential that MVFC phase refinement be carried out concurrently with the heavy atom parameter refinement. Occupancy and thermal motion parameters of each and every heavy atom site should be refined simultaneously. If these conditions are not observed refinement to the lowest minimum will be slow. Anisotropic thermal motion parameter refinement of the heavy atoms is recommended.

In the case of heavy atom derivatives which share common sites, the usual matrix of normal equations must be modified to take into account the correlation among the derivatives. This is accomplished by forming the additional off-diagonal matrix blocks among the corresponding derivatives. It is recommended that the full normal equations matrix be used throughout the refinement. This will speed convergence and automatically avoid certain difficulties in refinement. For example, the case in which one derivative unduly dominates the refinement.

The MVFC method may be used to assess the quality of any data set included in or deleted from the overall refinement. The weighted mean squared error, referred to as the "goodness of fit", is the quotient of the least squares sum and the difference between the total number of observations and the number of parameters refined (including phases). This quantity (GOF) will increase if a poorer heavy atom data set is added, and it will decrease if the heavy atom derivative data set added is better than any of the currently included heavy atom derivatives. This relationship will also hold true if a heavy atom derivative data set is taken out of the refinement. It should be noted that the GOF always pertains to the previous cycle of refinement. In the presence of systematic error the GOF is greater than its real value of 1.0. If the GOF is less than 1.0 then the weights that are being used in the refinement are either inaccurate or will need to be adjusted by a scale factor.

Bricogne Method Of Refinement

This method is not fully implemented in this version. Bricogne has suggested an improved mathematical formulation for the refinement of phases in the MIR procedure. This formulation is incorporated in the MIR program and should overcome the most serious pitfall of the classical method; that is, the dominance of good derivatives over poorer ones. This enhancement does not require any increase in the number of parameters to be refined.

The conventional approach to parameter refinement from acentric reflections was originally conceived as a straightforward adaptation of the least-sqares method previously used for centric data: the "most probable" or the "best" estimates of the phases, as defined by Blow and Crick (1959), were simply made to play a role analogous to that of the signs of centric reflections. Blow and Matthews (1973) found this method to have poor convergence properties unless it was ensured that the acentric phase estimates used in the refinement were independent of the parameters which were being refined. In reaction to these difficulties, a more conservative scheme was devised, under the name of " method", in which the use of acentric phase estimates was avoided altogether. A common feature of both amendments was that the phase information used in the refinement had to be purposely impoverished, in order to make it truly independent of the parameters to be refined and thus avoid bias. Sygusch (1977) recognized that these restrictions could be lifted if the acentric phases were no longer deemed to be "estimates", but were instead treated as extra parameters and refined along with the others. The enormous increase in the number of variables dictated the use of a diagonal approximation. Introducing this approximation muted the original purpose of accommodating the correlations among phases and parameters.

A further analysis of the problem leads to a solution which overcomes the aforementioned difficulties without causing any increase in the number of variables. The main idea is that structure factor estimates for acentric reflections are implicit functions of the parameters which are being refined. This dependence is expressed analytically by means of the implicit function theorem, and is shown to result (via the chain rule) in a correction to the partial derivatives from which the normal equations are to be formed. The normal matrix, previously block-diagonal, is now full, as all parameters interact even if they belong to distinct isomorphous compounds.

In this way, all sources of bias are removed without discarding any of the available phase information, and no further parameters need to be refined because the native structure factor estimates have been eliminated. The occurrence of a full normal matrix also allows a unified treatment of situations which previously disturbed the block-diagonal structure of the standard normal equations, namely: non-crystallographic symmetry in the heavy-atom constellation, common sites shared by several derivatives, and anomalous scatterers in the native structure.

This formulation is equally applicable to cylindrically averaged data obtained from fibers or oriented gels, where the problem of bias in the conventional approach would have been even more formidable.

In the classical treatment, parameters describing the isomorphous substitution are refined by least-squares so as to minimize the total lack of closure of the phase triangles at the "most probable" phases. In the derivation of the normal equations, these most probable phases are treated as constraints, so that the normal matrix is block-diagonal. In the Bricogne method the most probable phases are treated as quantities defined implicitly by the values of all the isomorphous derivative parameters. This implicit dependence is taken into account in the calculation of the partial derivatives of the total lack of closure, and this leads to a correction to the normal matrix. This correction reflects the interaction among all parameters (even if they belong to distinct isomorphous compounds). Since in this treatment, the normal matrix is now full, instead of block-diagonal, a considerable increase in the speed of convergence should result. The correction takes into account the effect of the parameter shifts on the redefinition of the phases.

The formulation of the refinement is enhanced further, in the sense that the Blow and Crick approximation (that is error-free) is abandoned. In this method the implicit dependence of the two coordinates of a "most probable native structure factor", not just a "most probable phase", is decoded and incorporated into the correction. This correction takes into account the implicit dependence of the coordinates on all the refined parameters.

The most probable native structure factor heavy-atom contribution, most probably structure factor for jth compound.

= 1/()

All the local quantities are understood to be subscripted hkl. The global residual is zero where the local residuals are:

The normal equations for the minimization of E are then

/∂p ∂/∂q)]hkl} δp= Σhkli(( - ) ∂/∂q]hkl

The normal matrix is block-diagonal, since no term in the residual depends simultaneously on parameters belonging to two distinct isomorphous compounds (excluding the case of common sites).

For each hkl, the coordinates x(hkl) and y(hkl) are those of the "most probable native structure factors", i.e. of the point where is minimum. In the standard formulation, they are treated as constants when partial derivatives are evaluated. However, as a result of their definition, they are in reality implicit functions of all the isomorph parameters p,q,..... . Any alteration of a parameter belonging to a particular isomorph will induce a readjustment of and for all reflections, which in turn will modify the residuals of all the other isomorphs. In other words, all the parameters interact through their joint contribution to the definition of the most probable native structure factors from which the residuals are evaluated.

This effect can be described analytically, by first using the implicit function theorem to decode the dependence of and on the parameters p,q,....; and then by using the chain rule to evaluate the corrections to the partial derivatives which result from this dependence.

For this purpose, some extra notation will be useful.

Let G = G(x,y; p,q,...) be any function of x,y,p,q,.... . Suppose that x and y are implicitly defined as functions of p,q,.... by the condition that a function E(x,y; p,q,....) be minimal with respect to x and y, i.e.:

∂E/∂x = 0 and ∂E/∂y = 0 (43)

| ∂2E/∂2E/∂x∂y | (44)

| | positive definite

| ∂2E/∂y∂x ∂2E/∂ | (45)

Let G*(p,q,...) denote the function obtained from G when x and y are thus defined in terms of p,q,...:

G*(p,q,...) = G(x(p,q,...),y(p,q,...); p,q,....) (46)

The problem is to evaluate the partial derivative ∂G*/∂p.

Differentiation of the constraint equations (44) and (45) by the chain rule gives:

(47)

(48)

This system of two linear equations, in the two unknowns ∂x/∂p and ∂y/∂p, can always be solved, since the determinant is always positive by condition (46). Its solution gives the desired local dependence of x and y on the isomorph parameters:

| ∂x/∂p | | ∂2E/∂2E/∂x∂y |-1 | -∂2E/∂x∂p |

| | = | | | | (49)

| ∂y/∂p | | ∂2E/∂y∂x ∂2E/∂| |-∂2E/∂y∂p |

Now the chain rule, applied to G*, gives:

∂G*/∂p = ∂G/∂p + ∂G/∂x ∂x/∂p + ∂G/∂y ∂y/∂p (50)

The function G of interest for the least-squares refinement is. The final result is that the partial derivatives ∂/∂p in the normal equations should be replaced by the ∂r*i/∂p, whose full expression is:

∂r*i/∂p = ∂/∂p - {∂2E/∂2E/∂x∂y - (∂2E/∂x∂y)2} -1Z (51)

T | ∂/∂x | | ∂2E/∂-∂ 2E/∂x∂y | | ∂2E/∂x∂p |

where Z = | | | | | |

| ∂/∂y | |-∂2E/∂y∂x ∂2E/∂x2| |∂ 2E/∂y∂p |

Mir Statistics

There are three types of lies: lies, damned lies and statistics.

-attributed to Benjamin Disraeli

The following description includes the analytical formulation of all the statistics collected by the program.

Definition of Terms

ANOCALC calculated anomalous difference on:

ANOCALC (on F) calculated anomalous difference of F: FPHCALC(+) - FPHCALC(-)

ANOOBS observed anomalous difference on :

ANOOBS (on F) observed anomalous difference on F: FPHOBS(+) - FPHOBS(-)

ANOM anomalous amplitude of heavy atom model

E normalized structure factor

FH amplitude of heavy atom model

FP amplitude of parent

FPHCALC amplitude associated with vector sum of vector(FP) (at either best or most probable phase) and vector(FH) FPHOBS observed amplitude of isomorph

M figure of merit

N number of reflections

V volume of unit cell

W weight = 1/variance()

Statistics Over the Entire Data Set

The phase differences between (or within) cycles in degrees give a check upon the convergence of the refinement in addition to being able to follow the parameter shifts. This is especially important in FAZPRB mode as refinement of the parameters is often a slow and oscillatory process.

Change in best phase between cycles is defined as:

Σ [|phase(best,this cycle) - phase(best,last cycle)|] / N

where N in this case is defined as the total number of acentric reflections.

The number of changes in centric reflections and the total number of centric reflections give one a feel for the quality of the centric data as there should be very few centric phases changing signs.

In FAZPRB mode one also obtains the change in the most probable phase between cycles and the difference between best and most probable phases within a cycle. These are defined analogously to the change in the best phase above. All three quantities should decrease as the refinement proceeds. However, the difference between cycles of the most probable phase will in most cases be worse than that of the best phase. The most probable angle is defined as the angle during the numerical integration which has the maximum probability. Thus when the most probable phase changes value the effect will be a quantum leap.

The mean square error in electron density was originally defined by Dickerson et al. (1961) as:

Note, however, that if the reflection multiplicity is known the above simplifies to:

(reflection multiplicity)(1-m)2]

The figure of merit histogram is obtained by dividing the figure of merit by 0.1 and adding 1 to the appropriate category. In traditional Blow-Crick type phasing the figure of merit measures nothing more than the unimodality and sharpness of the phase probability distribution. There have been cases where higher figures of merit were obtained from wrong models than correct ones, but, to my knowledge none of those have ever been reported in the literature. Nevertheless, results of Sigler and coworkers on eukaryotic initiator tRNA (Schevitz et al., 1979) suggest that when there is truth in the heavy atom model there is a strong correlation between a high figure of merit and the "goodness" of the parent phase.

When anomalous scattering data are used, a count is made of the number of reflections for which the signs of observed and calculated anomalous scattering are the same or different. The more that the same the greater confidence one has in both the data and the model.

Statistics Divided into Categories

The remaining statistics, with the exception of the agreement of the signs of the calculated and observed anomalous differences, are calculated by breaking each statistic down into categories based on the range and a bin based on or, more accurately, a quasi-normalized structure factor (see below). These two methods provide for a relatively even number of reflections in each grouping so that each grouping or bin is of approximately equal reliability. This is important in FAZPRB mode as the closure errors need to be updated on every cycle. Blow and Crick (1959) showed that at least with hemoglobin data there was a dependence of the closure error on the scattering angle and this dependence has been observed consistently since. In addition, as described elsewhere in this manual, the Hendrickson and Lattman (1970) formulation of Blow-Crick type phasing subsumes a term containing an amplitude into the closure error thus making the closure error dependent on this quantity. The adjustment of closure errors to account for observational errors (variances) of Ten Eyck and Arnone (1976) should, in principle, account for most or all of this dependence. Nevertheless, it was thought crucial to allow observation of the closure errors as dependent upon both the scattering angle and a function of the amplitude.

Procedure for Bins on

The following explanation of the calculation of bins based on is due to W. A. Hendrickson (personal communication). Normalized structure factors can be made by using a K-curve or Wilson plot, but a rough-and-ready value can be found by normalizing in shells. Thus

From Howells et al. (1950) we know that the fraction of reflections with a normalized intensity below a specified value can be determined from the distribution. If N(z) is the fraction of reflections with z' <= z and z' = = I/<I>, then:

acentric N(z) = 1 - exp(-z)

and centric

where erf is the "error function." It is difficult to find the inverse function for the centric case, but if we are satisfied to base our categories on the acentric distribution then z= -ln(1-n(z)).

Thus in practice the average is calculated for each range. The maximum normalized structure factor for each bin is calculated by the above equation and the largest value is arbitrarily set to 100. The normalized structure factor is then tested against the maximum for each bin until it is found to be smaller than one of them.

Statistics Calculated for MVFC and FAZPRB

The average figure of merit is defined as: Σ(m) / N

RCULLIS (Cullis et al., 1961) is defined as:

Σ | |FPHOBS +/- FP| - FH | / Σ | FPHOBS - FP |

The problem of "cross-overs" needs to be prevented (see Blundell and Johnson, 1976, p338) from falsely inflating the statistic.

There are six cases for the Cullis R factor and their mirror images.

Case 1 FP FH FH FP

FH ----> --------->----> -or- <----<---------

FP ---------> --------------> -or- <--------------

FPH --------------> FPH FPH

Case 2 FP FH FH FP

FP ----> ---->---------> -or- <---------<----

FH ---------> --------------> -or- <--------------

FPH --------------> FPH FPH

Result: ABS(FPH-FP) yields FH in both cases 1 and 2

Case 3 FP FP

FH ----> --------------> -or- <--------------

FPH ---------> ---------><---- -or- ----><---------

FP --------------> FPH FH FH FPH

Case 4 FP FP

FPH ----> --------------> -or- <--------------

FH ---------> ----><--------- -or- ---------><----

FP --------------> FPH FH FH FPH

Result: ABS(FPH-FP) yields FH in both cases 3 and 4

N.B. that in case 4 FH > FPH, but crossover has not occurred.

Case 5 FPH FP FP FPH

FP ----> ---------><---- -or- ----><---------

FPH ---------> --------------> -or- <--------------

FH --------------> FH FH

Case 6 FPH FP FP FPH

FPH ----> ----><--------- -or- ---------><----

FP ---------> --------------> -or- <--------------

FH --------------> FH FH

Result: ABS(FPH+FP) yields FH in both cases 5 and 6

Therefore in cases 5 and 6 we have crossover.

There is an alternative way to determine RCULLIS. One could calculate FPA*FPHCALCA + FPB*FPHCALCB, where FPA and FPHCALCA are the real parts of FP and FPHCALC, respectively and FPB and FPHCALCB are the imaginary parts of FP and FPHCALC, respectively. If this sum is positive then FP and FPH are pointed in the same direction and crossover has not occurred. If it is negative then FP and FPH are pointed in opposite directions and crossover has occurred. One the uses the SIGN function to determine the sign of 1.0 and this is multiplied times FP. Symbolically:

FP*SIGN(1.0,(FPA*FPHCALCA + FPB*FPHCALCB))

RKRAUT (Kraut et al., 1962) is defined as:

Σ | FPHOBS - FPHCALC | / Σ FPHOBS

RFAZPOWER is the weighted closure error divided by weighted heavy atom model amplitude. It is identical to the R factor, RMODULUS, in the MIR phasing program written by Rossmann and coworkers (Adams et al. (1969)) and is defined as:

Σ (FPHOBS - FPHCALC) / ΣFH

Note that as this value approaches or becomes greater than 1.0, it becomes impossible for the structure factor of the heavy atom model to close the phase triangle for most reflections. This condition is referred to as the loss of "phasing power."

Phase of parent minus phase of heavy atom model is a statistic suggested by Dodson (1976) which is sensitive to problems with scaling. The expected value is 90&#176;, that is the phases should be entirely uncorrelated. It is defined as:

The rms weighted isomorphous closure error is based on F rather than F**2 and is defined as follows:

The rms weighted anomalous closure error is based on F rather than and is defined as follows (note if parent has anomalous scattering it is also calculated for the

Statistics Calculated in the FAZPRB Mode Only

The parent has the following values calculated for it. The rms parent variance is defined as:

When anomalous scattering from the parent is included the following further statistics are provided. The rms anomalous closure error is defined as (Hendrickson, 1979):

The rms anomalous closure error adjusted for variances is defined as (see Ten Eyck and Arnone (1976) for a derivation of this value with respect to F rather than ):

Note that negative terms are not summed.

The rms anomalous differences are defined as:

.

For each isomorph other than the parent the following values are calculated.

The rms closure error is defined as (Hendrickson and Lattman, 1970):

The rms closure error adjusted for variances is defined as (see Ten Eyck and Arnone (1976) for a derivation with respect to F rather than):

Note that negative terms are not summed.

The rms isomorphous differences are defined as:

The rms joint variance is defined as (see program decribed by Adams et al. (1969) for a version of this statistics based on σFP and σFPHOBS):

The rms heavy atom model contribution is defined as:

The rms anomalous closure error is defined identically to that of the parent above.

The rms anomalous closure error adjusted for variances is defined analogously to that of the parent above:

Note that negative terms are not summed.

The rms anomalous differences are defined identically to the parent above.

Update Of Closure Errors

The closure errors are updated on the following basis:

1. Closure errors are only updated when new ones have been calculated, i.e. there were observations in a particular bin.

2. Uses adjusted closure errors as defined by Ten Eyck and Arnone (1976).

3. Closure errors used for isomorphous are from centric zones, if available.

4. Otherwise isomorphous closure errors are updated from the total reflections (which in this case must be acentric) if there were any measurements.

The above procedure may not be wise for the following reasons:

1. One or just a few centric reflections may determine the closure error for a bin.

2. Acentric reflections, properly weighted, might provide a better estimate when there is little or no centric data. But there is currently no mechanism for providing "proper weights." This probably could be done by user on the fazprb line. However, some experience might lead us to a better default weight than 1.0).

Using MIR to Refine Phases From Another Source

MIR can be used to refine phases derived from another source. For example:

1. Do an FC calculation

2. Run SIMWGT to put the phase information into the output bdf as phase set 2

3. Run MIR using the fazin line

fazin init 2 comb 2 (see Examples below)

The init signal causes the phases to be tied to phase set 2 before the first cycle. The comb signal causes the phases generated during the refinement with the heavy-atom derivative to be "tethered" to phase set 2. To use the fazin feature, the input bdf must have phase information present in items 700-704 or 705-709 etc of lrrefl:, corresponding to phase sets previously determined. At the present time MIR writes phases in items 700-704 (Phase Set 1).

The new field in the mvfc line allows the return to an ab initio phase calculation by MIR. Previously, once phase set 1 was in place it was always used on subsequent runs. This function is independent of fazin. Note that fazin makes no sense in the absence of a previous phase set.

File Assignments

Reads reflection data from the input archive bdf

Writes the reflection data to the output archive bdf

Examples

MIR

datset cheynat 48. 0.

datset cheyuo2 42.7 4.

noref all u pp ppa

mvfc *7 newphases

cycle 5

In this example, an initial set of phases will be determined by the Blow-Crick algorithm included in MVFC subroutines, whether or not phases exist on the bdf, followed by MVFC phasing. Because this is the first pass, overall scales and U's are being refined but undividual U's and population parameters are not refined.

MIR

noref cheyno2 k c

noref all u

mvfc

cycle 5

In this example overall scales and U's are not refined while individual population parameters are refined.

MIR

datset cheynat

datset cheyno2

noref all pp ppa

fazprb

cycle 5

In this example individual atom U's are refined but not the population parameters. FAZPRB (modified Blow-Crick) phasing method will be applied.

MIR

datset cheynat

datset cheyuo2

datscl cheynat 37.9 0.0

datscl cheyuo2 33.4 2.1

noref all ppa

fazin init 1 comb 2

mvfc

cycle 5

In this example, an initial set of phases exist as phase set 1 (items 700-704 in lrrefl:) which will be used as starting phases. This set of phases will be combined with phases from phase set 2 (items 705-709 in lrrefl:). The contribution from each phase set will be weighted by its figure-of-merit. Phase set 2 in this example could be a set of Fc phases from a partial model that has been run through SIMWGT to generate figures-of-merit of the phase beased on agreement of Fc and Fo (see SIMWGT). The ratio between anomalous population parameter and real population parameter will not be refined.

MIR

datset cheynat

datset cheyuo2

datscl cheynat 37.9 0.0

datscl cheyuo2 33.4 2.1

noref all ppa

fazin init 2 comb 2

mvfc

cycle 5

This exmple is the same as the previous one except that the Fc phase set and not a previous MIR phase set is used as the initial phase set to go into the MVFC phasing. The phases will be continually tethered to the Fc phases with appropriate weighting.

MIR

datset cheynat

datset cheyuo2

datscl cheynat 37.9 0.0

datscl cheyuo2 33.4 2.1

noref all ppa

fazin init 2

mvfc

cycle 5

In this example, the initial phases will be from the Sim weighted Fc (see SIMWGT). On subsequent cycles of refinement, the starting phases will no longer be remembered.

References

Adams, M.J., Haas, D.J., Jeffrey, B.A., McPherson, A., Jr., Mermall, H.L., Rossmann, M.G., Schevitz, R.W. and Wonacott, A.J. 1969. Low Resolution Study of Crystalline L-Lactate Dehydrogenase. J. Mol. Biol., 41, 159.

Alden, R.A., Bricogne, G., Freer, S.T., Hall, S.R., Hendrickson, W.A., Machin, P., Munn, R.J., Olsen, A.J., Reeke, G.N., Sheriff, S., Stewart, J.M., Sygusch, J., Ten Eyck, L.F. & Watenpaugh, K.D. 1983. Cooperative Programming in Crystallography. Comput. Chem., 7, 137-148.

Blow, D.M. and Crick, F.H.C. 1959. The Treatment of Errors in the Isomorphous Replacement Method. Acta Cryst., 12, 794.

Blow, D.M., and Matthews, B.W. 1973. Parameter Refinement in the Multiple Isomorphous-Replacement Method. Acta Cryst., A29, 56-62.

Blundell, T.L. and Johnson, L.N. 1976. Protein Crystallography. Academic Press: London.

Bricogne,G. 1982. in Computational Crystallography. Ed. D. Sayre. Oxford Press: New York. p223.

Cullis, A.F., Muirhead, H., Perutz, M.F., Rossmann, M.G. and North, A.C.T. 1961. The Structure of Haemoglobin. VIII. A Three-dimensional Fourier Synthesis at 5.5 Angstrom Resolution: Determination of the Phase Angles. Proc. Roy. Soc., A265, 15.

Dickerson, R.E., Kendrew, J.C. and Strandberg, B.E. 1961. The Crystal Structure of Myoglobin: Phase Determination to a Resolution of 2 Angstrom by the Method of Isomorphous Replacement. Acta Cryst., 14, 1188.

Dodson, E.J. 1976. A Comparison of Different Heavy Atom Refinement Procedures. Crystallographic Computing Techniques. Eds. F.R. Ahmed, K. Huml and B. Sedlacek, Munksgaard: Copenhagen, 259.

Hendrickson, W.A. 1971. Some Aids for Breaking the Phase Ambiguity in the Single Isomorphous Replacement Method. Acta Cryst., B27, 1474-1475.

Hendrickson, W.A. 1979. Phase Information from Anomalous Scattering Measurements. Acta Cryst., A35, 245.

Hendrickson, W.A. and Lattman, E.E. 1970. Representation of Phase Probability Distributions for Simplified Combination of Independent Phase Information. Acta Cryst., B26, 136.

Hendrickson, W.A., Love, W.E., and Karle, J. 1973. Crystal Structure Analysis of Sea Lamprey Hemoglobin at 2 Angstrom Resolution. J. Mol. Biol., 74, 331-361.

Howells, E.R., Phillips, D.C., and Rogers, D. 1950. The Probability Distribution of X-ray Intensities. II. Experimental Investigation and the X-ray Detection of Centres of Symmetry. Acta Cryst., 3, 210.

Kraut, J., Sieker, L.C., High, D. and Freer, S.T. 1962. Chymotrypsinogen: A Three-dimensional Fourier Synthesis at 5 Angstrom Resolution. Proc. Natl. Acad. Sci. USA, 48, 1417.

Matthews, B.W. 1966. The Extension of the Isomorphous Replacement Method to Include Anomalous Scattering Measurements. Acta Cryst., 20, 82-86.

North, A.C.T. 1965. The Combination of Isomorphous Replacement and Anomalous Scattering Data in Phase Determination of Non-centrosymmetric Reflections. Acta Cryst., 18, 212-216.

Schevitz, R.W., Podjarny, A.D., Krishnamachari, N., Hughes, J.J., Sigler, P.B. and Sussman, J.L. 1979. Crystal Structure of Eukaryotic Initiator tRNA. Nature, 278, 188.

Sygusch, J. 1977. Minimum-Variance Fourier Coefficients from the Isomorphous Replacement Method by Least-Squares Analysis. Acta Cryst., A33, 512-518.

Ten Eyck, L.F. and Arnone, A. 1976.Three-dimensional Fourier Synthesis of Human Deoxyhemoglobin at 2.5 Angstrom Resolution. J. Mol. Biol., 100, 3.