MIR: Multiple Isomorphous Refinement
Contact Author: Keith Watenpaugh, The Upjohn Company,
Physical and Analytical Chemistry, Kalamazoo, MI 49001, USA
MIR is a general multiple isomorphous replacement program developed as a
collaborative project of the National Resource for Computational Chemistry
(Alden et al., 1983). MIR calculates reflection phases and refines heavy atom
parameters for macromolecular compounds calculates reflection phases and
refines heavy atom parameters for macromolecular compounds by the method of
multiple isomorphous replacement. The reflection intensity data for a native
protein and a series of heavy atom isomorphous derivatives is used in the
calculation. In addition the observed anomalous scattering of both the native
and isomorphous substances may be used in the phase determination and parameter
refinement. The trial coordinates of the heavy atoms of the derivatives must be
supplied. These coordinates may be refined by means of the program. The phases
of the reflections of the parent protein are estimated and may be refined. The
estimation of phases may be carried out by one of two methods: A modified
Blow-Crick procedure (1959), which is a probability method. This method is here
given the acronym FAZPRB. The other method is the minimum variance Fourier
coefficient method of Sygusch (1977) which is given the acronym MVFC. The
phases of the native may then be refined by adjustment of the most probably
phase by a method developed by Bricogne (1982). In this method the most
probable structure factor of the native is adjusted so as to minimize the sum
of the closure errors. This minimization is carried out by an application of
Newton's method in the complex plane.
Acknowledgement Of Contributing Authors
The MIR program was written under the auspices of the National Resource for
Computation in Chemistry at the Lawrence Berkeley Laboratory. It was started as
a cooperative programming effort carried out in three workshop sessions by the
following authors (in alphabetical order). The use of the program should refer
to Alden et al. (1983).
Richard Alden University of California at San Diego
Gerard Bricogne Cambridge University
Stephan Freer University of California at San Diego
Sydney Hall University of Western Australia
Wayne Hendrickson Naval Research Lab., Washington
Pella Machin SRC Computer Centre, Daresbury
Robert Munn University of Maryland
Arthur Olson NRCC
George Reeke Rockefeller University
Steven Sheriff Naval Research lab., Washington
James Stewart University of Maryland
Jurgen Sygusch University of Sherbrooke
Lynn Ten Eyck University of North Carolina
Keith Watenpaugh University of Washington
The coordinates of the heavy atoms of the isomorphous derivatives may be
refined either by conventional block diagonal or by full matrix least squares.
In the block diagonal approximation the parameters of each isomorph are refined
independently. In the full matrix method all the parameters of all the
isomorphs are refined simultaneously. The implicit dependence of the phases is
taken into account in the calculation of the partial derivatives of the total
lack of closure. This accounting leads to a correction to the normal matrix
which reflects the interaction among all parameters, even if they belong to
distinct isomorphous compounds. This method allows for the joint role of all
isomorphs in the definition of the most probable structure factors of the
native. In essence the correction accounts for the effect of the parameter
shifts on the redefinition of the phases.
The ultimate result of the use of the MIR program is a subset of reflections
for the parent compound, each with an estimated phase. These phases with the
structure factor amplitudes may be processed to produce an electron density
map. The calculation requires that a standard bdf be prepared which has a
collection of reflections for which multiple observations are available. These
observations will usually be for the native protein, one or more isomorphous
derivatives and the anomolous reflections for both compounds. The trial
coordinates of the heavy atoms of the isomorphous derivatives must be known
from an analysis of their Patterson functions or some other source. The methods
of phasing and refinement are specified by means of the control lines shown
below. The kinds of statistics which may be printed are many and may be
specified by the user.
Mir Phasing By Probability Methods
The option in this program for MIR phasing by probability methods is based on
a slight revision of the treatment of phases introduced by Blow and Crick
(1959). In keeping with the general philosophy of this program to deal in
rather than in F, the residuals between observation and
calculation are taken in rather than in F. In particular, the
closure error for isomorphous-replacement is defined as
ε(φ)ISO = ||2 -
| +|2 (1)
as suggested by Hendrickson and Lattman (1970). This is used instead of the
conventional definition of Blow and Crick,
ε'(φ)ISO = || - | +
| (2)
In the case of anomalous scattering information, the closure errors are defined
analogously in terms of as proposed by Hendrickson (1979),
ε(φ) = ΔPH(3)
rather than in terms of F as given by North (1965) and Matthews (1966).
Blow and Crick treat the closure errors in terms of a Gaussian distribution such that
P(φ) = N exp {-ε2} (4)
where is the variance, or mean-square value, of the closure
errors. When closure errors are defined as in equations 1 and 3 the exponent in
equation 4 can be reduced to a simplified form so that
P(φ) = N' exp {Acos φ + Bsin φ + C cos 2φ+ D sin 2φ}
(5)
where the phasing coefficients, A, B, C, and D, incorporate the information.
This is the representation used in this program. Expressions for the
coefficients for isomorphous replacements are taken from equation 6 of
Hendrickson and Lattman (1970):
(6)
= -[ 4 ( +
(7)
(8)
(9)
where and are the real and imaginary parts of the
structure factor calculated from the atomic parameters of the heavy atoms in
the jth derivative. The coefficients for anomalous scattering derive directly
from equations 6 and 9 of Hendrickson (1979) and are as follows:
(10)
(11)
(12)
(13)
where ΔPHj = ((+) -
(-))PHj, and are real
and imaginary components of the heavy-atom scattering due to the real parts of
the scattering factors, and a'PHj and b'PHj are those due
to the imaginary parts of the scattering factors.
An essential problem to be addressed in the probability method is the
estimation of appropriate value for and
. It is generally observed that the E'2
values for the Blow-Crick formulation are not very strongly dependent on either
scattering angle or intensity. However, since by comparison of (1) and (2)
εISO (14)
it follows that standard errors, =
<ε2ISO>, for the treatment based on
will depend directly on intensity. One way to account for this
dependence is simply to tabulate E value in categories of intensity as well as
scattering angle (Hendrickson, Love, Karle, 1973). This makes for a rather
coarse function. Another approach is to assume a Gaussian distribution and
relate errors to the Blow-Crick formulation in order to take account of the
intensity dependence. This is the approach described by Blundell and Johnson
(1976) and generalized by Hendrickson (1979) to
= 3 E'4 + 4( +
σ)E'2 (15)
The approach adopted in the MIR program is based on the innovation of Ten Eyck
and Arnone (1976) to divide closure errors into a part accounted for by
variances of the observations and a residual closure error. The residual
closure error accounts for expected variance due to lack-of-isomorphism, model
incompleteness or other factors.
(16)
The first partial derivative is unity and the second is
(17)
when no prior knowledge of the phases is available this has the expected value
of
.
However, by intentionally erring toward over extinction, the relation used here is
(18)
Notice that since by the Ten Eyck-Arnone procedure
E'2ISO
and, since
,
much of the intensity dependence
displayed in (15) should be accomodated in this formulation of the standard
errors. The analogous formula used for anomalous scattering information is
(19)
Residual closure errors are tabulated in bins of intensity and
sinθ/λ. See below for the definition of a "bin".
The procedure adopted here for combining the phase information from the
several sources assumes each to be independent of the others. This is the
standard assumption when using the probability method for MIR phasing and is a
potential weakness. The product is formed simply by adding phasing
coefficients:
(20)
The evaluation of centroid phases and figures-of-merit is accomplished using
the numerical integration technique introduced by Dickerson, Kendrew and
Strandberg (1961). Hence,
(21)
and
(22)
which yield m and φc. In the case of "centric" reflections where
the phase must be one of two possible values special formulae apply Hendrickson
(1979) and Hendrickson (1971). Thus
(23)
and (24)
Note that the equation on which the isomorphous replacement phasing described
above is based is equation 4 of Hendrickson (1970). This formula supposes that
there is no imaginary component of the heavy atom scattering. A more
appropriate formulation can readily be derived from equations 7 and 8 of
Hendrickson (1979). Thus, if, as is done here, one defines
(25)
then
(26)
and
(27)
This compares with Equation 4 of Hendrickson (1970),
(28)
by inclusion of the term. This will ordinarily
be small in comparison with other terms. However, it can readily be included.
It only modifies two definitions given above for A(ISO) and B(ISO), equations 6
and 7, thus,
(29)
(30)
In the case that the native structure includes anomalous scatterers
(31)
but and still both
refer only to the structure factor contributions from the real and imaginary
components of the scattering from the replacing heavy atoms and not at all to
the scattering from anomalous scatterers in the native structure.
As written, the probability phasing option does not treat the case in which
only or has been
measured. In such cases, even approximately, new phasing coefficients based on
(32)
or (33)
using equation 6 or 7 of Hendrickson (1979) for
and respectively, would have to be derived. Note
that the formulae used in the probability phasing option for anomalous
scattering information needs to be interpreted cautiously in the case that the
native structure contains anomalous scatterers. The relationships will be
correct as given provided that the following definitions are close to the
truth.
The summation for each of the following is from i = 1 to the number of jth
derivatives only.
(34)
(35)
The summation for each of the following equations is from i = 1 to the number
of jth derivatives + anomalous scatterers of native.
(36)
(37)
Then
(38)
(39)
Note further that Lynn Ten Eyck stated in a private communication that he chose
to use the evaluation of
used in (17) in order to give a "worst case" estimate for the error. If this is the
goal, Hendrickson in a second communication asserts that the appropriate value,
at cos(φ-ϕ) = 1.0 gives
(40)
In as much as is usually less than , the added term,
, will usually dominate
. Notice that the expected
value without phase knowledge is
(41)
and that the present program has
(42)
Mir Phasing By The MVFC Method
MIR phasing by the MVFC method requires the simultaneous least-squares
refinement of both the heavy atom parameters of the derivatives and of the
native protein phases. The method, due to Sygusch (1977), takes into account
the experimental error in the measurement of intensities. It can include the
implicit correlation existing among protein phases and heavy atom parameters.
It is free of bias since all weights are based on experimental error in the
measurement of the intensities. The weights are used for the refinement of both
the heavy atom parameters and the native protein phases. These weights allow
for both the error in the native structure amplitudes and the error in the
derivative structure amplitudes. The MVFC method does not require a probability
distribution to determine the phase of the reflections of the native protein.
The method yields the expected values of the cosine and sine of the phase
directly, ready for use in the calculating of the electron density of the
native protein. A figure of merit is obtained which takes into account not only
the error in the phases but also the error in the native protein structure
amplitudes.
The minimum variance Fourier coefficient method of phase refinement for
optimum phase determination requires a good estimate of the variance (error) of
each measured reflection. This variance estimate is essential for the success
of the phase refinement. The factors that contribute to the variance are
counting statistics, short term fluctuations in the x-ray source, absorption
correction errors, crystal decomposition errors, and inter-crystal scaling
errors. The MVFC method, as programmed in the MIR program, will allow one to
begin the phasing procedure without any prior knowledge of the phases. Special
care must be taken for weak reflections. Unless the weak reflections (<
3σF) are free from bias or are corrected for bias due to the square root
transformation of
||2 (i.e.
is not equal to
). When this condition holds the
weak reflections should not be used during the refinement. This is because the
error of the bias is of the order of, or greater than, the heavy atom signal
which is being estimated.
It is essential that MVFC phase refinement be carried out concurrently with
the heavy atom parameter refinement. Occupancy and thermal motion parameters of
each and every heavy atom site should be refined simultaneously. If these
conditions are not observed refinement to the lowest minimum will be slow.
Anisotropic thermal motion parameter refinement of the heavy atoms is
recommended.
In the case of heavy atom derivatives which share common sites, the usual
matrix of normal equations must be modified to take into account the
correlation among the derivatives. This is accomplished by forming the
additional off-diagonal matrix blocks among the corresponding derivatives. It
is recommended that the full normal equations matrix be used throughout the
refinement. This will speed convergence and automatically avoid certain
difficulties in refinement. For example, the case in which one derivative
unduly dominates the refinement.
The MVFC method may be used to assess the quality of any data set included in
or deleted from the overall refinement. The weighted mean squared error,
referred to as the "goodness of fit", is the quotient of the least squares sum
and the difference between the total number of observations and the number of
parameters refined (including phases). This quantity (GOF) will increase if a
poorer heavy atom data set is added, and it will decrease if the heavy atom
derivative data set added is better than any of the currently included heavy
atom derivatives. This relationship will also hold true if a heavy atom
derivative data set is taken out of the refinement. It should be noted that the
GOF always pertains to the previous cycle of refinement. In the presence of
systematic error the GOF is greater than its real value of 1.0. If the GOF is
less than 1.0 then the weights that are being used in the refinement are either
inaccurate or will need to be adjusted by a scale factor.
Bricogne Method Of Refinement
This method is not fully implemented in this version. Bricogne has
suggested an improved mathematical formulation for the refinement of phases in
the MIR procedure. This formulation is incorporated in the MIR program and
should overcome the most serious pitfall of the classical method; that is, the
dominance of good derivatives over poorer ones. This enhancement does not
require any increase in the number of parameters to be refined.
The conventional approach to parameter refinement from acentric reflections
was originally conceived as a straightforward adaptation of the least-sqares
method previously used for centric data: the "most probable" or the "best"
estimates of the phases, as defined by Blow and Crick (1959), were simply made
to play a role analogous to that of the signs of centric reflections. Blow and
Matthews (1973) found this method to have poor convergence properties unless it
was ensured that the acentric phase estimates used in the refinement were
independent of the parameters which were being refined. In reaction to these
difficulties, a more conservative scheme was devised, under the name of
" method", in which the use of acentric phase estimates was
avoided altogether. A common feature of both amendments was that the phase
information used in the refinement had to be purposely impoverished, in order
to make it truly independent of the parameters to be refined and thus avoid
bias. Sygusch (1977) recognized that these restrictions could be lifted if the
acentric phases were no longer deemed to be "estimates", but were instead
treated as extra parameters and refined along with the others. The enormous
increase in the number of variables dictated the use of a diagonal
approximation. Introducing this approximation muted the original purpose of
accommodating the correlations among phases and parameters.
A further analysis of the problem leads to a solution which overcomes the
aforementioned difficulties without causing any increase in the number of
variables. The main idea is that structure factor estimates for acentric
reflections are implicit functions of the parameters which are being refined.
This dependence is expressed analytically by means of the implicit function
theorem, and is shown to result (via the chain rule) in a correction to the
partial derivatives from which the normal equations are to be formed. The
normal matrix, previously block-diagonal, is now full, as all parameters
interact even if they belong to distinct isomorphous compounds.
In this way, all sources of bias are removed without discarding any of the
available phase information, and no further parameters need to be refined
because the native structure factor estimates have been eliminated. The
occurrence of a full normal matrix also allows a unified treatment of
situations which previously disturbed the block-diagonal structure of the
standard normal equations, namely: non-crystallographic symmetry in the
heavy-atom constellation, common sites shared by several derivatives, and
anomalous scatterers in the native structure.
This formulation is equally applicable to cylindrically averaged data obtained
from fibers or oriented gels, where the problem of bias in the conventional
approach would have been even more formidable.
In the classical treatment, parameters describing the isomorphous substitution
are refined by least-squares so as to minimize the total lack of closure of the
phase triangles at the "most probable" phases. In the derivation of the normal
equations, these most probable phases are treated as constraints, so that the
normal matrix is block-diagonal. In the Bricogne method the most probable
phases are treated as quantities defined implicitly by the values of all the
isomorphous derivative parameters. This implicit dependence is taken into
account in the calculation of the partial derivatives of the total lack of
closure, and this leads to a correction to the normal matrix. This correction
reflects the interaction among all parameters (even if they belong to distinct
isomorphous compounds). Since in this treatment, the normal matrix is now full,
instead of block-diagonal, a considerable increase in the speed of convergence
should result. The correction takes into account the effect of the parameter
shifts on the redefinition of the phases.
The formulation of the refinement is enhanced further, in the sense that the
Blow and Crick approximation (that is error-free) is abandoned.
In this method the implicit dependence of the two coordinates of a "most
probable native structure factor", not just a "most probable phase", is decoded
and incorporated into the correction. This correction takes into account the
implicit dependence of the coordinates on all the refined parameters.
The most probable native structure factor heavy-atom contribution, most
probably structure factor for jth compound.
= 1/()
All the local quantities are understood to be subscripted hkl. The global
residual is zero where the local residuals are:
The normal equations for the minimization of E are then
∂/∂p
∂/∂q)]hkl} δp=
Σhkl[Σi(( -
) ∂/∂q]hkl
The normal matrix is block-diagonal, since no term in the residual depends
simultaneously on parameters belonging to two distinct isomorphous compounds
(excluding the case of common sites).
For each hkl, the coordinates x(hkl) and y(hkl) are those of the "most
probable native structure factors", i.e. of the point where is
minimum. In the standard formulation, they are treated as constants when
partial derivatives are evaluated. However, as a result of their definition,
they are in reality implicit functions of all the isomorph parameters p,q,.....
. Any alteration of a parameter belonging to a particular isomorph will induce
a readjustment of and for all reflections,
which in turn will modify the residuals of all the other isomorphs. In other
words, all the parameters interact through their joint contribution to the
definition of the most probable native structure factors from which the
residuals are evaluated.
This effect can be described analytically, by first using the implicit
function theorem to decode the dependence of and
on the parameters p,q,....; and then by using the chain rule to
evaluate the corrections to the partial derivatives which result from this
dependence.
For this purpose, some extra notation will be useful.
Let G = G(x,y; p,q,...) be any function of x,y,p,q,.... . Suppose that x and y
are implicitly defined as functions of p,q,.... by the condition that a
function E(x,y; p,q,....) be minimal with respect to x and y, i.e.:
∂E/∂x = 0 and ∂E/∂y = 0
(43)
| ∂2E/∂
∂2E/∂x∂y |
(44)
| | positive definite
| ∂2E/∂y∂x
∂2E/∂ |
(45)
Let G*(p,q,...) denote the function obtained from G when x and y are thus
defined in terms of p,q,...:
G*(p,q,...) = G(x(p,q,...),y(p,q,...); p,q,....) (46)
The problem is to evaluate the partial derivative
∂G*/∂p.
Differentiation of the constraint equations (44) and (45) by the chain rule
gives:
(47)
(48)
This system of two linear equations, in the two unknowns
∂x/∂p and ∂y/∂p, can always be
solved, since the determinant is always positive by condition (46). Its
solution gives the desired local dependence of x and y on the isomorph
parameters:
| ∂x/∂p | |
∂2E/∂∂
2E/∂x∂y |-1 |
-∂2E/∂x∂p |
| | = | | | | (49)
| ∂y/∂p | |
∂2E/∂y∂x
∂2E/∂| |-∂2E/∂y∂p |
Now the chain rule, applied to G*, gives:
∂G*/∂p = ∂G/∂p +
∂G/∂x ∂x/∂p +
∂G/∂y ∂y/∂p (50)
The function G of interest for the least-squares refinement is.
The final result is that the partial derivatives
∂/∂p in the normal equations should be
replaced by the ∂r*i/∂p, whose full
expression is:
∂r*i/∂p =
∂/∂p -
{∂2E/∂
∂2E/∂x∂y -
(∂2E/∂x∂y)2}
-1Z (51)
T | ∂/∂x | |
∂2E/∂-∂
2E/∂x∂y | |
∂2E/∂x∂p |
where Z = | | | | | |
| ∂/∂y |
|-∂2E/∂y∂x
∂2E/∂x2| |∂
2E/∂y∂p |
There are three types of lies: lies, damned lies and statistics.
-attributed to Benjamin Disraeli
The following description includes the analytical formulation of all the
statistics collected by the program.
Definition of Terms
ANOCALC calculated anomalous difference on:
ANOCALC (on F) calculated anomalous difference of F: FPHCALC(+) - FPHCALC(-)
ANOOBS observed anomalous difference on :
ANOOBS (on F) observed anomalous difference on F: FPHOBS(+) - FPHOBS(-)
ANOM anomalous amplitude of heavy atom model
E normalized structure factor
FH amplitude of heavy atom model
FP amplitude of parent
FPHCALC amplitude associated with vector sum of vector(FP) (at either best or
most probable phase) and vector(FH) FPHOBS observed amplitude of isomorph
M figure of merit
N number of reflections
V volume of unit cell
W weight = 1/variance()
Statistics Over the Entire Data Set
The phase differences between (or within) cycles in degrees give a check upon
the convergence of the refinement in addition to being able to follow the
parameter shifts. This is especially important in FAZPRB mode as refinement of
the parameters is often a slow and oscillatory process.
Change in best phase between cycles is defined as:
Σ [|phase(best,this cycle) - phase(best,last cycle)|] / N
where N in this case is defined as the total number of acentric reflections.
The number of changes in centric reflections and the total number of centric
reflections give one a feel for the quality of the centric data as there should
be very few centric phases changing signs.
In FAZPRB mode one also obtains the change in the most probable phase between
cycles and the difference between best and most probable phases within a cycle.
These are defined analogously to the change in the best phase above. All three
quantities should decrease as the refinement proceeds. However, the difference
between cycles of the most probable phase will in most cases be worse than that
of the best phase. The most probable angle is defined as the angle during the
numerical integration which has the maximum probability. Thus when the most
probable phase changes value the effect will be a quantum leap.
The mean square error in electron density was originally defined by Dickerson
et al. (1961) as:
Note, however, that if the reflection multiplicity is known the above
simplifies to:
(reflection
multiplicity)(1-m)2]
The figure of merit histogram is obtained by dividing the figure of merit by
0.1 and adding 1 to the appropriate category. In traditional Blow-Crick type
phasing the figure of merit measures nothing more than the unimodality and
sharpness of the phase probability distribution. There have been cases where
higher figures of merit were obtained from wrong models than correct ones, but,
to my knowledge none of those have ever been reported in the literature.
Nevertheless, results of Sigler and coworkers on eukaryotic initiator tRNA
(Schevitz et al., 1979) suggest that when there is truth in the heavy atom
model there is a strong correlation between a high figure of merit and the
"goodness" of the parent phase.
When anomalous scattering data are used, a count is made of the number of
reflections for which the signs of observed and calculated anomalous scattering
are the same or different. The more that the same the greater confidence one
has in both the data and the model.
Statistics Divided into Categories
The remaining statistics, with the exception of the agreement of the signs of
the calculated and observed anomalous differences, are calculated by breaking
each statistic down into categories based on the range and a bin
based on or, more accurately, a quasi-normalized structure factor
(see below). These two methods provide for a relatively even number of
reflections in each grouping so that each grouping or bin is of approximately
equal reliability. This is important in FAZPRB mode as the closure errors need
to be updated on every cycle. Blow and Crick (1959) showed that at least with
hemoglobin data there was a dependence of the closure error on the scattering
angle and this dependence has been observed consistently since. In addition, as
described elsewhere in this manual, the Hendrickson and Lattman (1970)
formulation of Blow-Crick type phasing subsumes a term containing an amplitude
into the closure error thus making the closure error dependent on this
quantity. The adjustment of closure errors to account for observational errors
(variances) of Ten Eyck and Arnone (1976) should, in principle, account for
most or all of this dependence. Nevertheless, it was thought crucial to allow
observation of the closure errors as dependent upon both the scattering angle
and a function of the amplitude.
Procedure for Bins on
The following explanation of the calculation of bins based on is
due to W. A. Hendrickson (personal communication). Normalized structure factors
can be made by using a K-curve or Wilson plot, but a rough-and-ready value can
be found by normalizing in shells. Thus
From Howells et al. (1950) we know that the fraction of reflections with a
normalized intensity below a specified value can be determined from the
distribution. If N(z) is the fraction of reflections with z' <= z and z' =
= I/<I>, then:
acentric N(z) = 1 - exp(-z)
and centric
where erf is the "error function." It is difficult to find the inverse function
for the centric case, but if we are satisfied to base our categories on the
acentric distribution then z= -ln(1-n(z)).
Thus in practice the average is calculated for each
range. The maximum normalized structure factor for each bin
is calculated by the above equation and the largest value is arbitrarily set to
100. The normalized structure factor is then tested against the maximum for
each bin until it is found to be smaller than one of them.
Statistics Calculated for MVFC and FAZPRB
The average figure of merit is defined as: Σ(m) / N
RCULLIS (Cullis et al., 1961) is defined as:
Σ | |FPHOBS +/- FP| - FH | / Σ | FPHOBS - FP |
The problem of "cross-overs" needs to be prevented (see Blundell and Johnson,
1976, p338) from falsely inflating the statistic.
There are six cases for the Cullis R factor and their mirror images.
Case 1 FP FH FH FP
FH ----> --------->----> -or- <----<---------
FP ---------> --------------> -or- <--------------
FPH --------------> FPH FPH
Case 2 FP FH FH FP
FP ----> ---->---------> -or- <---------<----
FH ---------> --------------> -or- <--------------
FPH --------------> FPH FPH
Result: ABS(FPH-FP) yields FH in both cases 1 and 2
Case 3 FP FP
FH ----> --------------> -or- <--------------
FPH ---------> ---------><---- -or- ----><---------
FP --------------> FPH FH FH FPH
Case 4 FP FP
FPH ----> --------------> -or- <--------------
FH ---------> ----><--------- -or- ---------><----
FP --------------> FPH FH FH FPH
Result: ABS(FPH-FP) yields FH in both cases 3 and 4
N.B. that in case 4 FH > FPH, but crossover has not occurred.
Case 5 FPH FP FP FPH
FP ----> ---------><---- -or- ----><---------
FPH ---------> --------------> -or- <--------------
FH --------------> FH FH
Case 6 FPH FP FP FPH
FPH ----> ----><--------- -or- ---------><----
FP ---------> --------------> -or- <--------------
FH --------------> FH FH
Result: ABS(FPH+FP) yields FH in both cases 5 and 6
Therefore in cases 5 and 6 we have crossover.
There is an alternative way to determine RCULLIS. One could calculate
FPA*FPHCALCA + FPB*FPHCALCB, where FPA and FPHCALCA are the real parts of FP
and FPHCALC, respectively and FPB and FPHCALCB are the imaginary parts of FP
and FPHCALC, respectively. If this sum is positive then FP and FPH are pointed
in the same direction and crossover has not occurred. If it is negative then FP
and FPH are pointed in opposite directions and crossover has occurred. One the
uses the SIGN function to determine the sign of 1.0 and this is multiplied
times FP. Symbolically:
FP*SIGN(1.0,(FPA*FPHCALCA + FPB*FPHCALCB))
RKRAUT (Kraut et al., 1962) is defined as:
Σ | FPHOBS - FPHCALC | / Σ FPHOBS
RFAZPOWER is the weighted closure error divided by weighted heavy atom model
amplitude. It is identical to the R factor, RMODULUS, in the MIR phasing
program written by Rossmann and coworkers (Adams et al. (1969)) and is defined
as:
Σ (FPHOBS - FPHCALC) / ΣFH
Note that as this value approaches or becomes greater than 1.0, it becomes
impossible for the structure factor of the heavy atom model to close the phase
triangle for most reflections. This condition is referred to as the loss of
"phasing power."
Phase of parent minus phase of heavy atom model is a statistic suggested by
Dodson (1976) which is sensitive to problems with scaling. The expected value
is 90°, that is the phases should be entirely uncorrelated. It is defined
as:
The rms weighted isomorphous closure error is based on F rather than F**2 and
is defined as follows:
The rms weighted anomalous closure error is based on F rather than
and is defined as follows (note if parent has anomalous scattering it is
also calculated for the
Statistics Calculated in the FAZPRB Mode Only
The parent has the following values calculated for it. The rms parent
variance is defined as:
When anomalous scattering from the parent is included the following further
statistics are provided. The rms anomalous closure error is defined as
(Hendrickson, 1979):
The rms anomalous closure error adjusted for variances is defined as (see Ten
Eyck and Arnone (1976) for a derivation of this value with respect to F rather
than ):
Note that negative terms are not summed.
The rms anomalous differences are defined as:
.
For each isomorph other than the parent the following values are
calculated.
The rms closure error is defined as (Hendrickson and Lattman, 1970):
The rms closure error adjusted for variances is defined as (see Ten Eyck and
Arnone (1976) for a derivation with respect to F rather than):
Note that negative terms are not summed.
The rms isomorphous differences are defined as:
The rms joint variance is defined as (see program decribed by Adams et al.
(1969) for a version of this statistics based on σFP and
σFPHOBS):
The rms heavy atom model contribution is defined as:
The rms anomalous closure error is defined identically to that of the parent
above.
The rms anomalous closure error adjusted for variances is defined analogously
to that of the parent above:
Note that negative terms are not summed.
The rms anomalous differences are defined identically to the parent above.
The closure errors are updated on the following basis:
1. Closure errors are only updated when new ones have been calculated, i.e.
there were observations in a particular bin.
2. Uses adjusted closure errors as defined by Ten Eyck and Arnone (1976).
3. Closure errors used for isomorphous are from centric zones, if
available.
4. Otherwise isomorphous closure errors are updated from the total reflections
(which in this case must be acentric) if there were any measurements.
The above procedure may not be wise for the following reasons:
1. One or just a few centric reflections may determine the closure error for a
bin.
2. Acentric reflections, properly weighted, might provide a better estimate
when there is little or no centric data. But there is currently no mechanism
for providing "proper weights." This probably could be done by user on the
fazprb line. However, some experience might lead us to a
better default weight than 1.0).
Using MIR to Refine Phases From Another Source
MIR can be used to refine phases derived from another source. For example:
1. Do an FC calculation
2. Run SIMWGT to put the phase information into the output bdf as phase set
2
3. Run MIR using the fazin line
fazin init 2
comb 2 (see Examples below)
The init signal causes the phases to be tied to phase set 2
before the first cycle. The comb signal causes the phases
generated during the refinement with the heavy-atom derivative to be "tethered"
to phase set 2. To use the fazin feature, the input bdf must
have phase information present in items 700-704 or 705-709 etc of
lrrefl:, corresponding to phase sets previously determined. At the
present time MIR writes phases in items 700-704 (Phase Set 1).
The new field in the mvfc line allows the return to an
ab initio phase calculation by MIR. Previously, once phase set 1
was in place it was always used on subsequent runs. This function is
independent of fazin. Note that fazin makes no
sense in the absence of a previous phase set.
Reads reflection data from the input archive bdf
Writes the reflection data to the output archive bdf
MIR
datset cheynat 48. 0.
datset cheyuo2 42.7 4.
noref all u pp ppa
mvfc *7 newphases
cycle 5
In this example, an initial set of phases will be determined by the Blow-Crick
algorithm included in MVFC subroutines, whether or not phases exist on the bdf,
followed by MVFC phasing. Because this is the first pass, overall scales and
U's are being refined but undividual U's and population parameters are not
refined.
MIR
noref cheyno2 k c
noref all u
mvfc
cycle 5
In this example overall scales and U's are not refined while individual
population parameters are refined.
MIR
datset cheynat
datset cheyno2
noref all pp ppa
fazprb
cycle 5
In this example individual atom U's are refined but not the population
parameters. FAZPRB (modified Blow-Crick) phasing method will be applied.
MIR
datset cheynat
datset cheyuo2
datscl cheynat 37.9 0.0
datscl cheyuo2 33.4 2.1
noref all ppa
fazin init 1 comb 2
mvfc
cycle 5
In this example, an initial set of phases exist as phase set 1 (items 700-704
in lrrefl:) which will be used as starting phases. This set of phases
will be combined with phases from phase set 2 (items 705-709 in
lrrefl:). The contribution from each phase set will be weighted by its
figure-of-merit. Phase set 2 in this example could be a set of Fc phases from a
partial model that has been run through SIMWGT to generate figures-of-merit of
the phase beased on agreement of Fc and Fo (see SIMWGT). The ratio between
anomalous population parameter and real population parameter will not be
refined.
MIR
datset cheynat
datset cheyuo2
datscl cheynat 37.9 0.0
datscl cheyuo2 33.4 2.1
noref all ppa
fazin init 2 comb 2
mvfc
cycle 5
This exmple is the same as the previous one except that the Fc phase set and
not a previous MIR phase set is used as the initial phase set to go into the
MVFC phasing. The phases will be continually tethered to the Fc phases with
appropriate weighting.
MIR
datset cheynat
datset cheyuo2
datscl cheynat 37.9 0.0
datscl cheyuo2 33.4 2.1
noref all ppa
fazin init 2
mvfc
cycle 5
In this example, the initial phases will be from the Sim weighted Fc (see
SIMWGT). On subsequent cycles of refinement, the starting phases will no longer
be remembered.
Adams, M.J., Haas, D.J., Jeffrey, B.A., McPherson, A., Jr., Mermall, H.L.,
Rossmann, M.G., Schevitz, R.W. and Wonacott, A.J. 1969. Low Resolution Study
of Crystalline L-Lactate Dehydrogenase. J. Mol. Biol., 41, 159.
Alden, R.A., Bricogne, G., Freer, S.T., Hall, S.R., Hendrickson, W.A., Machin,
P., Munn, R.J., Olsen, A.J., Reeke, G.N., Sheriff, S., Stewart, J.M., Sygusch,
J., Ten Eyck, L.F. & Watenpaugh, K.D. 1983. Cooperative Programming in
Crystallography. Comput. Chem., 7, 137-148.
Blow, D.M. and Crick, F.H.C. 1959. The Treatment of Errors in the
Isomorphous Replacement Method. Acta Cryst., 12, 794.
Blow, D.M., and Matthews, B.W. 1973. Parameter Refinement in the Multiple
Isomorphous-Replacement Method. Acta Cryst., A29, 56-62.
Blundell, T.L. and Johnson, L.N. 1976. Protein Crystallography. Academic
Press: London.
Bricogne,G. 1982. in Computational Crystallography. Ed. D. Sayre. Oxford
Press: New York. p223.
Cullis, A.F., Muirhead, H., Perutz, M.F., Rossmann, M.G. and North, A.C.T.
1961. The Structure of Haemoglobin. VIII. A Three-dimensional Fourier
Synthesis at 5.5 Angstrom Resolution: Determination of the Phase Angles.
Proc. Roy. Soc., A265, 15.
Dickerson, R.E., Kendrew, J.C. and Strandberg, B.E. 1961. The Crystal
Structure of Myoglobin: Phase Determination to a Resolution of 2 Angstrom by
the Method of Isomorphous Replacement. Acta Cryst., 14, 1188.
Dodson, E.J. 1976. A Comparison of Different Heavy Atom Refinement
Procedures. Crystallographic Computing Techniques. Eds. F.R. Ahmed,
K. Huml and B. Sedlacek, Munksgaard: Copenhagen, 259.
Hendrickson, W.A. 1971. Some Aids for Breaking the Phase Ambiguity in the
Single Isomorphous Replacement Method. Acta Cryst., B27, 1474-1475.
Hendrickson, W.A. 1979. Phase Information from Anomalous Scattering
Measurements. Acta Cryst., A35, 245.
Hendrickson, W.A. and Lattman, E.E. 1970. Representation of Phase
Probability Distributions for Simplified Combination of Independent Phase
Information. Acta Cryst., B26, 136.
Hendrickson, W.A., Love, W.E., and Karle, J. 1973. Crystal Structure
Analysis of Sea Lamprey Hemoglobin at 2 Angstrom Resolution. J. Mol. Biol.,
74, 331-361.
Howells, E.R., Phillips, D.C., and Rogers, D. 1950. The Probability
Distribution of X-ray Intensities. II. Experimental Investigation and the X-ray
Detection of Centres of Symmetry. Acta Cryst., 3, 210.
Kraut, J., Sieker, L.C., High, D. and Freer, S.T. 1962. Chymotrypsinogen: A
Three-dimensional Fourier Synthesis at 5 Angstrom Resolution. Proc. Natl.
Acad. Sci. USA, 48, 1417.
Matthews, B.W. 1966. The Extension of the Isomorphous Replacement Method to
Include Anomalous Scattering Measurements. Acta Cryst., 20, 82-86.
North, A.C.T. 1965. The Combination of Isomorphous Replacement and Anomalous
Scattering Data in Phase Determination of Non-centrosymmetric Reflections.
Acta Cryst., 18, 212-216.
Schevitz, R.W., Podjarny, A.D., Krishnamachari, N., Hughes, J.J., Sigler, P.B.
and Sussman, J.L. 1979. Crystal Structure of Eukaryotic Initiator tRNA.
Nature, 278, 188.
Sygusch, J. 1977. Minimum-Variance Fourier Coefficients from the Isomorphous
Replacement Method by Least-Squares Analysis. Acta Cryst., A33, 512-518.
Ten Eyck, L.F. and Arnone, A. 1976.Three-dimensional Fourier Synthesis of
Human Deoxyhemoglobin at 2.5 Angstrom Resolution. J. Mol. Biol., 100, 3.