Author: Syd Hall, Crystallography Centre, University of Western Australia, Nedlands WA 6907, Australia
GENTAN uses triplet and/or quartet structure invariant relationships in a general tangent formula to propagate and refine structure factor phases. The phases required to define the cell origin and enantiomorph are specified automatically, or may be selected (wholly or partly) by the user.
The tangent formula approach of Karle and Hauptman (1956) is the most widely used procedure for the extension and refinement of structure factor phases. Users who are relatively unfamiliar with this field are well advised to read summaries of these methods by Karle and Karle (1966) and Stewart and Hall (1971). Practical guidelines and background information on the application of the tangent formula is detailed in the proceedings of the 1975 IUCr Computing School (Ahmed, 1976).
The tangent formula is a straightforward computational approach to summing phases estimated with the triplet invariant relationship
The conditions for which
has a value close to zero have been discussed in the
Because contributing phases can vary in reliability, a weighted tangent formula is required to average the phases as follows,
The quantity is a measure of the joint reliability of the two phases contributing to the i'th triplet invariant. This approach can be applied to any order of structure invariant. A general expression for structure invariants of order n is
It follows from equation (4.46) that the general tangent formula suitable for application to these invariants has the form shown in equation (4.48) (Hall, 1978). X is A, B, . . . according to the order of the invariant.
The general tangent formula also permits the use of
values. This is
particularly important if oriented or positioned fragment
information is used to estimate
from the group
structure factors (see the
It has also been discussed in GENSIN that
may be expected to
be closer to
than 0 for negative
quartets (see the
The tangent phasing process is usually initiated with a few "known" phases. From these phases additional values are determined through the application of the tangent formula to connecting invariant relationships. In turn, new phases are used to expand the phasing process further.
The tangent refinement of a phase set stops when the estimated phases converge to a constant value. Convergence occurs when the refined phases are self-consistent with the structure invariant relationships and the refinement constraints applied (e.g., weighting scheme). Due to the pyramidal nature of the phasing process (i.e., a few phases determine many), the final phase set is also strongly dependent on the starting phases. To a large extent the success or failure of the multisolution method is determined by the actual values of the starting phases. This is why a great deal of the computational effort goes into the selection of starting reflections.
Where do the "known" starting phases come from? Some may be specified directly in order to fix the origin in the cell. These are known as origin defining reflections (ODR). ODR phases alone are usually insufficient to reliably start the phasing process. The larger the number of starting phases, the less dependence there is on the reliability of a few invariant relationships and the higher the likelihood of obtaining a correct solution.
Where do the additional "known" phases come from? Because phase values are usually not known before a solution, one way is to assign trial phase values to a limited number of reflections and permute these in a series of separate tangent refinements. Permuted starting phases form the basis for the multisolution approach to the tangent phasing process.
The first step in the GENTAN calculation is to specify sufficient phases to define uniquely the cell origin. The formal conditions required to do this have been detailed in the GENSIN documentation. The specification of phases to fix the cell origin is to be done automatically unless the user intervenes via the phi or assign control lines. All ODR phases entered are checked for validity by the program. If they are incorrect or insufficient, additional or new phases will be automatically selected to satisfy the origin fixing requirements.
The selection of starting phases which are involved in reliable structure invariant relationships is critical to the success of these methods. In GENTAN, the generator reflections are sorted by a convergence-type process (Woolfson, 1976) that maximizes the connections through structure invariant relationships between ODR phases and additional starting phases. This is called the MAXCON(nection) procedure. If the structure is noncentrosymmetric this procedure is also used to specify one or more EDR (enantiomorph defining reflection) phases to fix the enantiomorphic form of the structure. This is done automatically by the MAXCON procedure unless the EDR phase is selected manually by the user. Additional phases are specified in the MAXCON procedure as requested (see field 2, select line). The MAXCON procedure sorts generator reflections in order of descending connectivity and this sort order is used in all subsequent operations. The sorted order of generators is referred to as the phase path. The origin, enantiomorph, and any other MAXCON-selected phases are at the beginning of this path.
In addition to the starting phases selected for maximum connectivity, phases are specified to optimize the rate of "phase extension" to the remaining generators. This is referred to as the MAXEXT(ension) procedure. Extra phases are often needed in the starting set to accelerate phase propagation along the sorted phase path, rather than maximizing the connections between the initial starting set. It is important to emphasise that the MAXCON approach insures that there are strong links between the initial starting set and other strong generators, while the MAXEXT procedure provides additional phases which enhance the rate at which new phases are generated in one pass down the phase path. The criteria used in the MAXEXT procedure to measure the rate of "phases extension" is specified in field 6 of the select control line.
Both the MAXCON and MAXEXT procedures use all available triplet and quartet invariant relationships unless instructed otherwise (see field 5, select line). This option is useful if the automatically selected starting phases fail to provide a solution. Note that field 1 of the invar line has precedence over this option.
The permutation of phases assigned to starting
reflections is performed in several ways. In the
An alternative approach to phase permutation is
available using the magic integer procedure of White and
Woolfson (1975). The
The inherent fragility of a phasing process started with a limited set of known phases and extended sequentially to all other phases has already been discussed. The success of this procedure hinges on the reliability of a few individual structure invariants. This is particularly critical in the early stages of the extension process when new phase estimates are often determined from one or two relationships. An incorrect phase estimate at this stage will frequently cause the phasing procedure to fail.
The random phase approach of Yao Jia-xing (1981) is
an alternative to using a limited set of permuted phases.
The extension and refinement of phases is simultaneous in the tangent process. When a structure invariant relationship contains only one unknown phase then this phase is estimated from the relationship. This is referred to as a phase extension. If all phases in an invariant are known, then each member phase is determined by a combination of the others. When a phase occurs in more than one such invariant, the estimate of its value is averaged or refined. Repeated averaging of phases in this way is referred to as phase refinement. As discussed earlier, the generator reflections and related invariants are sorted into an optimal phase path during the selection of starting phases. Reflections and invariants are subsequently processed in this sequence and one pass down this list is known as a single refinement iteration. The maximum number of refinement iterations may be specified by the user via the refine control line (field 3). The default value is 30.
The reliability of each phase estimate (h) is gauged by the calculated value of (h) (see equation (4.52) below). Phases are accepted for phase extension if the value of (h) is above a threshold (see field 5, refine line). This threshold is reduced with each iteration (see field 7) thus insuring that the most reliable phases are used in the early iterations.
Only the top sorted generators (those used in the MAXEXT selection process (see field 4, select line)) are phased in the early iterations. When the average value of (h) changes less than a specified percentage (see field 7, refine line) additional generators (50, or 25% of total, whichever is greater) are added to the phasing process. The weight of each phase, is also used to control phase extension and refinement. A weight threshold is set for the duration of the refinement (see field 4, refine line) and is used to reject less reliable phases.
There are two different methods for propagating
phases in GENTAN;
The two main characteristics of this process are that it is very dependent on the order of the phase path, and that phase values change during an iteration. This is important because three or four phases are dependent on a common triplet or quartet. If the of this invariant changes substantially during the course of a single iteration the phase estimates tend to be dominated by phases near the end of the phase path. As these phases are usually of lower reliability, some instability in the refinement process can result. This may show itself as phase oscillations from iteration to iteration. This will be a problem if the value of is strongly non-zero; as may be the case in heavy-atom or strongly planar structures.
The tangent formula weight is defined by the joint probabilities of the phases contributing to the RHS of equation (4.48). If the individual weight of a phase is defined as
and the joint tangent weight as
The variance of an unrestricted phase is given by the following expression (Karle and Karle, 1966) where and are modified Bessel functions dependent on ,
is a measure of agreement between contributing estimates of within the tangent formula and is defined in the following equation where T and B are the tangent formula numerator (top) and denominator (bottom), respectively:
It follows that the variance of may be calculated from equations (4.51) and (4.52). Computationally, however, this is prohibitive so in practice it is necessary to approximate w(h). GENTAN provides for three different weighting schemes: probabilistic, Hull-Irwin statistical, and modified statistical.
For unrestricted phases the weight w(h) based on equation (4.51) is approximately linear with respect to (h) (see p92-94, Stewart and Hall, 1971). It is possible, therefore, to approximate w(h) as K (h) where K is set a fixed fraction. This approach has the disadvantage, however, of not providing weights normalized about 1.0. This normalization is important for the correct evaluation of (h) from equation (4.52). In addition, for very large structures, all (h) values will be small and weights based only on (h) alone must also be small. The converse is true for small structures. For this reason, GENTAN uses a modified probabilistic weight which is essentially structure-independent and has values restrained to the range of 0 to 1.
The value of w(h) calculated from equation (4.51) is not valid for restricted phases because it assumes a continuous phase distribution. For restricted phases the probability that cos (h) is positive is given by the following expression:
It follows that weight of an individual phase has the form
Although w(h) calculated from equation (4.55) will lie in the range 0.0 to 1.0, it has the same deficiencies of unrestricted weights based solely on (h). For this reason GENTAN uses the relative weight expression for restricted phases.
where ' = min(2., X) and X = mean of A (triplets) or B (quartets)
The probabilistic weight
To overcome these deficiencies, Hull and Irwin (1978) have suggested a weighting scheme based on the ratio of (h) and the expectation value < (h)>. It has the general functional form
The precise functional form of is given by Hull and Irwin.
There is, however, a fundamental limitation to
This weighting scheme is identical to
It is very desirable in multisolution tangent methods to have some method of detecting "correct" phase sets prior to computing a Fourier transform (i.e., E-map). An a priori assessment of phase sets is made in GENTAN using two 'measure of success' parameters CFOM and AMOS. These parameters are based principally on the four figures-of-merit, RFOM, RFAC, PSI0, and NEGQ.
This parameter is the inverse of the ABSFOM parameter of the MULTAN program (Main et al., 1980) and has the form
where all 's are the mean values for the calculated , the expected (see equation (4.58)) and the random (i.e., if all phases were randomly distributed). For a correct phase set the value of should approach that of and RFOM should tend to 1.0. Incorrect phase sets will deviate significantly from 1.0, random phases towards 2.0, and overcorrelated phases towards 0.0. In general, however, phase sets with small RFOMs are more likely to be correct than those with larger RFOMs. The actual range of RFOMs for a given GENTAN run will vary according to the validity of the estimate of . For this reason RFOM tends to be less reliable for strongly non-random structures.
The RFAC parameter is similar to the residual FOM calculated in MULTAN (Main et al., 1980) except for a scale that takes into account the relative dominance of heavy atoms in the structure.
RFAC is a minimum when there is close correspondence between the refined and the expected . In this respect it is very similar to the R-factor of Karle and Karle (1966). RFAC is, like RFOM, dependent on the reliable estimate of .
invariants of Cochran and Douglas (1957) provide a
sensitive figure-of-merit which is largely independent of
the triplet and quartet invariants used in the tangent
relates two generator reflections (with |E| > EMIN) to
a third which is selected to have an |E|-value as close
as possible to zero (see the
for ψ(0) triplets.
PSI0 should be smallest for the correct phase sets. PSI0 is, along with NEGQ, one of the most sensitive and independent methods of measuring the relative likelihood of success.
Quartet structure invariant relationships are
classified according to the magnitude of their
crossvector |E| values. When the crossvector sum is very
low there is a high probability that the invariant phase
will tend to have
a value of
rather than 0.
These invariants are referred to as negative quartets. In
GENTAN negative quartets are usually not used applied to
the tangent refinement process but are retained as a test
of the phase sets (unless the
for n negative quartets, where is the tangent refined phase, is the phase estimated from negative quartets alone. Correct phase sets should have low values of NEGQ ranging from 0° for centrosymmetric structures, to 20-60° for noncentrosymmetric structures. Note that if fragment QPSI values are used, the value of is automatically set to 0° and the NEGQ test will remain valid. This FOM is a very powerful discriminator of phase sets provided that sufficient negative quartets are available.
The combined FOM is a scaled sum of the four FOM parameters RFOM, RFAC, PSI0 and NEGQ.
The FOM weights may be specified on the setfom control line. These values are subsequently scaled so that the maximum value of CFOM is 1.0. It is important to stress that CFOM is a relative parameter and serves mainly to highlight which is the best combination of FOMs for a given run. It does not indicate whether these FOMs will provide a solution.
The AMOS parameter is a structure-independent gauge of the correctness of a phase set. It uses pre-defined estimates of the optimal values for the FOM parameters RFOM, RFAC, PSI0 and NEGQ. OPTFOM values may be user defined (see setfom line). Rejection values for the four FOM parameters are derived from the OPTFOM values as REJFOM=2*OPTFOM. The default values are as follows,
The absolute measure-of-success parameter is calculated from all active FOMs as
where the WFOM values are scaled so that AMOS ranges from 0 to 100. In addition to being used to sort phase sets in order of correctness, the AMOS values provide a realistic gauge of the correctness of phase sets. As a rule of thumb, AMOS values can be interpreted in the following way:
These classifications are only approximations. The predictability of optimal FOM values can be perturbed by a variety of structure dependent factors and by the FOM weighting. Nevertheless, the AMOS value provides the user with a concise overview of the phase sets.
Phase sets must satisfy certain criteria before being considered for possible output to the bdf. Each phase set is tested at three stages in the tangent process and is rejected if the FOM values and other parameters are outside acceptable limits. In this way time is not spent on phase sets that have little or no chance of being correct - a very desirable feature for a multisolution procedure.
The first rejection test is made following the sixth tangent iteration and involves only the top block of sorted phase estimates. This is referred to as the PRETEST of FOMs and the rejection criteria 011, 012, and 014 are applied. The user may disable the PRETEST of FOMs with the setfom line (field 1). The second rejection test is made after the last tangent iteration and a phase set is rejected according to the criteria 021, 022, 023, 024, and 025. The final rejection test occurs during the sorting of phase sets and the calculation of the AMOS value. Phase sets are rejected if criteria 036 and 037 are not satisfied.
PRETEST Rejection Criteria
Last Iteration Rejection Criteria
Final Rejection Criteria
Tests for phase correctness are made at the same time
as the second rejection test. These are made on the basis
of the optimum FOM values, OPTFOM. If all tests are
satisfied, the tangent phasing process is terminated and
the program enters the sort mode. In this way GENTAN
insures that computing time is not wasted on generating
further phase sets when a correct set of phases has already
been calculated. This is particularly important when the
If inadvertant termination occurs, the user can adjust the OPTFOM values used in the above criteria, or switch off this test entirely (field 10, setfom line).
If structure information of types 3 and 4 (oriented
and positioned) is entered into the GENEV calculation,
An alternative approach to partial structure
information is to use calculated structure factor phases
as input starting phases (Karle, 1976). This approach has
the advantage over the group structure factor method of
not requiring the repeat of
This run automatically selects starting phases.
Phases are extended and refined in the
GENTAN pout 10 :output the top 10 phase sets invar trip :use triplet invariants only select magic :use magic integer permuted phases assign odr 7 23 5 per 1 17 :assign ODR/permute phases refine block w3 :block mode with mod H-I weights setfom nopr :do not pretest FOM values
GENTAN pset 128 :permit 128 phase sets invar *7 allxv :include all quartets refine cascade w2 30 :max iterations to 30 phi 5 2 -3 0. odr :define ODR phi 1 5 7 0. odr :define ODR phi 3 2 2 0. odr :define ODR select random :select all phases with random values archiv -1601 -1603 :delete items 1601 and 1603 from bdf