SIMPEL : Symbolic-addition phasing

Authors: Henk Schenk and Syd Hall

Contact: Syd Hall, Crystallography Centre, University of Western Australia, Nedlands 6907, Australia

SIMPEL applies the symbolic addition procedure to triplet and/or quartet structure invariant relationships to determine structure factor phases from normalized structure factors. The program is space group independent, and the origin and enantiomorph is specified automatically, or may be selected by the user. SIMPEL contains a wide range of phase selection and extension options. Different procedures for accepting, propagating, and evaluating symbol phases are provided and a variety of figure-of-merit tests are available to identify the correct symbol phases. Up to 16 phase sets may be output for subsequent E-map calculations.

The Algorithm

Symbols have been used with structure invariant relationships to determine structure factor phases from the beginning of direct methods. Symbolic addition procedures (SAP) compete with and complement the other main direct-methods approach, the multi-solution procedure. SAP defines and extends phases as symbols which are evaluated at the conclusion of the process. No permutation of phase sets is involved, as in the multisolution procedure, so that the process is fast.

There is considerable background literature on the symbolic addition procedure. In particular we refer to Karle & Karle (1966), Karle (1974) and Schenk (1980). References specific to the SIMPEL approach to symbolic addition can be found in Overbeek & Schenk (1978), Schenk (1983) and Schenk & Kiers (1984).

Step 1. Convergence Procedure: selecting the starting set

The first step in the symbolic addition procedure is the specification of the origin/enantiomorph defining phases, and the selection of symbolic phases. These phases, referred to as the starting set, must be those which will most reliably propagate to all other large E-values via the triplet and quartet structure invariants. This is the pivotal step in the phasing process and considerable care is taken to ensure that the best possible starting set is selected. The starting phases are selected from the largest |E|-values (25% of total, or user option on the SIMPEL line) using a convergence type procedure as described by Germain and Woolfson (1970). Convergence rejection is based on either:

(1) \(\alpha ^{2}\) for centrosymmetric and noncentrosymmetric

or (2) \(  \alpha tanh(\alpha /2)\) for centrosymmetric or \(\alpha I_{1}(\alpha )/I_{0}(\alpha )\) for noncentrosymmetric

See the GENSIN writeup for the full \(\alpha \) definition. The result of this procedure is a set of reflections which form an optimal choice for a starting set. Within this set origin (and enantiomorph, if required) defining reflections are assigned phases first, and then several symbols are assigned amongst the remaining reflections of the set (maximum is 8). The assigned phases are applied to |E|-values above the convergence limit using triplet and/or quartet invariants (user option on start line). Generated phases which have consistent values (no conflicts) are accepted and set at a weight of 1.0 for the rest of the phasing procedure.

Step 2. Divergence Procedure: test phase propagation

A divergence or accessibility procedure is used to test if the starting set of phases selected in Step 1 will propagate satisfactorily within the divergence set of generators (default is 50% of total). The procedure checks if the phases of all |E|-values can be accessed via sufficiently reliable invariant phase relationships. No test for (symbolic) phase consistency is made in this procedure. New phases are accepted as accessible if the sum of the \(\alpha \) s of each invariant exceeds a threshold value (ALTHR starts as 10th largest \(\alpha \) of input invariants, or user option on the start line). The threshold value ALT(m) is varied according to the number of invariants, m, used to derive a phase. For example, if for one invariant ALT(1) = ALTHR, for two invariants ALT(2) = 1.3*ALT(1), for three invariants ALT(3) = 1.3*ALT(2) and so on. For each next cycle the threshold value is reduced by a factor of 0.9. This ensures that the most probable phases are accessed first.

If more than 10 of the |E|-values in the divergence set remain unphased, additional symbol phases are assigned to unaccessed reflections. This enables any unconnected groups of related reflections to enter easily into the phasing process.

Step 3. Symbolic Addition Process

Once the starting set of phases has been fixed by the convergence and divergence processes, they are used to phase all remaining |E|-values. The starting set is composed of numeric phases assigned to specify the cell origin (and enantiomorph), and symbol phases to ensure complete phase extension. The starting set also contains reliable numeric and symbolic phases which were derived in the earlier processes. All these phases, assigned and derived, are now considered to be active phase propagators and have been assigned a weight of 1.0.

The symbolic addition process uses these phases and the structure invariant relationships to derive the phases of the other |E|-values. Before accepting a newly derived phase into the list of known phases (and thus successively using this reflection to derive additional phases) it must be carefully checked for reliability. If an incorrect phase is accepted, it can lead to a failure of the whole phasing process.

Two different mechanisms are used in SIMPEL to test if a new phase is suitable as an active phase propagator. These are the "probability threshold" test and the "multisymbol acceptance" test. In each case there are two separate procedures for performing these tests - each has different properties that may suit the solution of particular structural types.

I. Probability Threshold Test

A new phase is derived by substituting known phases into one, or more, structure invariant relationships. Each of these relationships has an a value which is a measure of the probability that the phase relationships have a value of zero (base module 2 \(\pi \) ). If \(\alpha \) is high, there is a high probability this is true; if it is not, the relationship must be used with caution, or in conjunction with other relationships. The sum of these \(\alpha \) 's is therefore an important test in gauging the probable reliability of a new phase. In SIMPEL the sum of \(\alpha \) 's may be applied in two different ways. They are:

  • (i) Weighted Alpha method ( wa ): Each phase in the active phase list is assigned a weight according to its expected reliability. These weights range from WMIN to 1.0 and are calculated for centrosymmetric phases as

    \(W = tanh \alpha _{c}/ ALTHR\)

    and for non-centrosymmetric phases as

    \(W = min(1.,\alpha _{c}/  ALTHR)\) where

    \(alpha;_{c}= [ {\sum W_{k}\alpha _{k}sin \phi _{k}}^{2}+ { \sum  W_{k}\alpha _{k}cos \phi _{k}}^{2}]^{1/2}\) (summed over m).

    ALTHR is the threshold \(\alpha \) value specified by the user, or set automatically as the 10th largest \(\alpha \) of input invariants. This is the same ALT used in divergence process. \(W_{k}\) is the combined weight of the component phases in the invariant derived from

    \(W_{k}= \prod  W_{j}/ \sum  W_{j}j=1\) to 2 (triplets), or 3 (quartets).

    A phase is accepted if its calculated weight \(W_{k}\) is above the minimum weight WMIN. WMIN is an input option or is set automatically to 0.3. In this method phase acceptance is a relatively smooth and continuous process. Each new phase given an associated reliability index; an index which is used to determine the reliability of subsequent phases (i.e. the history of prior determinations has a bearing on future phase estimates). The calculation of \(W_{k}\) is slower than the alternative but this tends to be offset by the more rapid propagation of phases.

  • (ii) Alpha Ninv method ( an ): Another test for phase acceptance is available in SIMPEL based on the same procedure described above in Step 2. The weights of all active phases are assumed to be 1.0. The calculated \(\alpha \) of a new phase (see \(\alpha \) \(_{c}  \) definition above) is tested against ALT(m), where m is the number of invariants used to calculate \(a_{c}\) and the new phase. The values of ALT(m) are preset as described in Step 2 above. This procedure is relatively simple and fast and is based on a relatively demanding criterion for acceptance. It is discontinuous (i.e. it accepts or rejects - nothing in between) and therefore requires more phasing cycles than (i). It also does not use the relative reliability of the active phases. This may be particularly important for non-restricted phases.

II. Multisymbol Acceptance Test

In the symbolic addition process a second phase acceptance test is applied when more than one structure invariant is used to derive a new phase (i.e. m>1). This test has two separate modes of operation.

  • (i) Accept multiple symbol indications ( mult ): If two or more symbol combinations are generated for new phase (e.g. say, -A and +AD), the phase is still accepted provided the strongest indication (i.e. largest \(\alpha \) \(_{c}  \) ) satisfies acceptance test I. This mode assumes that certain symbol combinations are equivalent (i.e. will reduce to the same numeric phase) and promotes a new phase to the active propagation role provided it satisfies the probability acceptance criteria. This is the default mode.

  • (ii) Reject multiple symbol indications ( sing ): In this mode a new phase is rejected if more than one symbol combination is derived. This is a more demanding requirement than in (i) and means that fewer phases are promoted to the active list to assume the role of phase propagators. In this mode the symbolic addition process requires more cycles and the statistics available to the subsequent figure-of-merit tests are fewer in number.

It should be noted that in previous versions of SIMPEL the only available options are I(ii) and II(ii). These are the most conservative options in accepting new phases, and have been used successfully in the past. The strong point of the alternative acceptance criteria I(i) and II(i) is discussed above and in view of this these are currently set as the default modes. Users are advised that if the defaults fail to provide a solution the more conservative combination of I(ii) and II(ii) should be applied.

Step 4. Correlation of Symbol Phases

In Step 3. the starting phase set is expanded into a larger list of known phases containing numeric and symbolic values. In this step a final symbolic addition cycle is applied so that all phase estimates can be tabulated as symbol correlation statistics (Schenk, 1971). Only phase estimates that involve more than one symbol combination will contribute to these statistics. For example if a given phase is estimated form ten different invariants to be:

phase m \(\sum \alpha \)
- 5 20
+A 3 15
-AD 2 10

where m is the number of invariants, this would lead to the correlation statistics

phase = 0 (+) frequency Product of \(  \sum \alpha \)
-A 1 300
+AD 1 200
-D 1 150
     

This process assumes that different symbol indications for the same reflection are in fact equal and may therefore be correlated. The statistics above are consistent with symbols A and B having the value of 180°.

The symbol correlation table is then used to test the plausibility of numeric values for each symbol. Symbols assigned to restricted phases are assigned their two possible values (e.g. p/2 and 3p/2) and symbols assigned to unrestricted phases are tested for the numeric values in the range 0 to 2p in intervals of p/4. The correlation statistics are used to calculate a correlation factor QFAC (Schenk, 1971) that has a maximum value of 100 if the numeric phases agree exactly (and -100 if they disagree exactly!). In space groups with translational symmetry (non-symmorphic) cofactors greater than 50 are good, and >70 are excellent. However, in the other space groups the QFAC is less indicative.

The last part of this step orders the phase sets in descending magnitude of cofactor. Only the top phase sets (16, or specified by user) will enter into the more exhaustive figure-of-merit tests in the next step.

Step 5. Figure-of-Merit Tests: Identifying the correct phase set

The previous step selected, and ordered, the numeric phase combinations that have the best chance of being correct. In this step each of these combinations is applied in a separate symbolic addition cycle to provide the agreement statistics needed to calculate various figures-of-merit. A figure-of-merit is intended to discriminate between a 'good' phase set and a 'bad' phase set (i.e. one that may provide a correct solution from one that will not). Not all FOM's of the original SIMPEL versions are implemented at this time, but will be considered for future development. On the other hand, several FOM's have been added that are not present in the other SIMPEL versions.

Correlation factor Figure-of-Merit (QFOM)

This figure-of-merit is a reformulation of QFAC calculated in Step 4. It is

QFOM = 1.5 - QFAC/100.

In accordance with all other FOM values, the best QFOM is the lowest. It has an active range from 0.5 to 2.5, and any value below 1.1 is considered good, and above 1.5 is considered unlikely. QFOM is, of course, correlated to the symbol extension process and cannot be considered an independent phase set discriminator in the same sense as the FOM tests PSI0 and NEGQ. Caution must therefore be exercised in interpreting small differences in QFOM values.

Relative Figure-of-Merit (RFOM)

This parameter is the inverse of the CFOM parameter of the MULTAN program (Main et al., 1980) and has the form

\(RFOM = ( \sum  <\alpha >  - \sum  \alpha _{r}) / ( \sum  \alpha _{c}- \sum  \alpha _{r})\) (summed over all h)

where \(<\alpha >\) is the expected \(\alpha \) of a phase, and \(\alpha _{r}\) is the \(\alpha \) if all phases were randomly distributed. For a correct phase set the value of \(\alpha _{c}\) should approach that of \(<\alpha >\) and RFOM should tend to 1.0. Incorrect phase sets will deviate significantly from 1.0, random phases towards 2.0, and overcorrelated phases towards 0.0. In general, however, phase sets with small RFOM's are more likely to be correct than those with large RFOM's. The range of RFOM's will vary according to the validity of the estimate of \(<\alpha >\). For this reason RFOM tends to be less reliable for strongly non-random structures.

R-factor Figure-of-Merit (RFAC)

The RFAC parameter is similar to the residual FOM calculated in MULTAN (Main et al., 1980) except for a scale that takes into account the relative dominance of heavy atoms in the structure.

\(RFAC = \sum  { | \alpha _{c}- <\alpha >| } /  \sum <\alpha >\) (summed over all h)

RFAC is a minimum when there is close correspondence between \(\alpha _{c}\) and \(<\alpha >\). In this respect it is very similar to the R-factor of Karle and Karle (1966). RFAC is, like RFOM, dependent on the reliability of < \(\alpha \) >.

PSI0 Figure-of-Merit (PSI0)

PSI0 triplet invariants (Cochran and Douglas,1955) provide a figure-of-merit which is largely independent of the triplet and quartet invariants used in the tangent refinement. A PSI0 triplet relates two strong reflections (with |E| > EMIN) to a third which has an |E|-value as close as possible to zero (see the GENSIN writeup). The phases estimated from a series of PSI0 triplets are expected to be random when the contributing phases from the other two large-|E| reflections are correct. When this is the case the resulting values of \(\alpha \) \(_{c}  \) are significantly lower than if the distribution of contributing phases was biased or incorrect. These invariants are used to form the figure-of-merit

\(PSI0 = \sum \alpha _{c}/  \sum <\alpha >\) (summed over psi0 triplets ).

PSI0 should be smallest for the correct phase set. PSI0 is, along with NEGQ, one of the most independent methods of measuring the relative likelihood of success.

Negative Quartet Figure-of-Merit (NEGQ)

Quartet structure invariant relationships are classified according to the magnitude of their crossvector |E| values. When the crossvector |E|'s are small there is a high probability that the phase invariant has a value close to p rather than 0 (Hauptman, 1974; Schenk, 1974). These invariants are referred to as negative quartets. In SIMPEL negative quartets are not used in the phasing process but are retained as a test of the phase sets. The negative quartets are considered independent because, unlike the positive quartets, they cannot be represented by a series of triplet invariants. They provide, therefore, a separate estimate of the phases. A direct comparison of these phases provides the basis for the figure-of-merit (Schenk, 1974).

\(NEGQ = \sum [ \alpha _{c}|\phi _{k}- \Phi _{k}| ] / \sum \alpha _{c}\) (summed over all k neg. quartets)

where \(\phi _{k}\) is the phase estimated from triplets and positive quartets, and \(\Phi _{k}\) is the phase estimated from negative quartets alone. Correct phase sets should have low values of NEGQ ranging from 0 for centrosymmetric structures, to 20- \(60^{o}\) for non-centrosymmetric structures. Note that if fragment QPSI values are used the value of \(\psi \) is automatically set to 0 and the NEGQ test will remain valid. This FOM is a very powerful discriminator of phase sets provided that sufficient negative quartets are available.

Combined Figure-of-Merit (CFOM)

The combined FOM is a scaled sum of the FOM parameters QFOM, RFOM, RFAC, PSI0 and NEGQ.

\(CFOM =\sum  [WFOM_{i}(FOMMAX_{i}-FOM_{i})/ (FOMMAX_{i}-FOMMIN_{i})]\) (i=1 to 5)

The weights WFOM may be specified on the SETFOM control line. These values are subsequently scaled so that the maximum value of CFOM is 1.0. It is important to emphasise that CFOM is a relative parameter and serves only to highlight which is the best combination of FOM's for a given run. It does not indicate if a given FOM will provide a solution.

Absolute Measure-of-Success Parameter (AMOS)

The AMOS parameter is a structure-independent gauge of the correctness of a phase set. It uses pre-defined estimates of the optimal values for the FOM parameters QFOM, RFOM, RFAC, PSI0 and NEGQ. OPTFOM values may be user defined (see setfom line). Rejection values for the four FOM parameters are derived from the OPTFOM values as REJFOM = 3*OPTFOM. The default values are as follows:

  QFOM RFOM RFAC PSI0 NEGQ
OPTFOM 0.75 1.0 0.25 .75 60°
REJFOM 2.25 3.0 0.75 2.25 180°

The absolute measure-of-success parameter is calculated from all active FOMs as

\(AMOS = \sum  [ WFOM_{i}( REJF_{i}- FOM_{i}/ OPTFOM_{i}]\) (i=1 to 5)

where the WFOM values are scaled so that AMOS ranges from 0 to 100. In addition to being used to sort phase sets in order of correctness, the AMOS values provide a realistic gauge of the correctness of phase sets. As a rule of thumb, they can be interpreted in the following way:

AMOS  
100-81 high probability of being correct set
80-61 good chance of being correct set
60-41 possibility of being correct set
40-21 low probability of being correct set
20-0 very unlikely to be correct set
   

These classifications are only approximations. The predictability of optimal FOM values can be perturbed by a variety of structure dependent factors and by the FOM weighting.

Rejection of Phase Sets

Phase sets must satisfy certain criteria before being considered for possible output to the binary file for subsequent E-map calculations.

FOM Rejection Criteria Message
Reject if QFAC > REJFOM(1) REJECT1
Reject if RFOM > REJFOM(2) REJECT2
Reject if RFAC > REJFOM(3) REJECT3
Reject if PSI0 > REJFOM(4) REJECT4
Reject if NegQ > REJFOM(5) REJECT5
Reject if |av.φ-<av.φ>| > 45° REJECT10
   

The value of \(<av.\phi >\) is 90° for centrosymmetric structures and 150-180° for non-centrosymmetric structures. This test avoids the "all-plus catastrophe" phase set.

File Assignments

  • Reads |E| values from the input archive bdf

  • Writes the estimated phases to the output archive bdf

  • Reads structure invariant relationships from bdf inv

Examples

SIMPEL

This is the standard run in which all defaults will be applied to the converge, diverge, symbolic addition and FOM testing processes. All |E|-values used in the GENSIN program, and all invariants entered on inv, will be applied in the phasing process. Phase extension will be based on weighted \(\alpha \) s.

SIMPEL 60 200 *6 8            :set con/divergence limits     
invar all 1. 0.7            :use all invariants with new a/b limits
    
start trip                   :use only triplets in converge process
    
symbad an single            :accept single symbols based on altm   
 
print bdfout             :print phase sets output to bdf     

SIMPEL
symbad wa single            :use weighted alphas ; reject
multisymbols     
phase 5 7 11 0 nul            :suppress from process     
phase 1 2 7            :assign as symbolic phase     

References

  • Cochran, W. and Douglas, A.S. 1955. Proc. Roy. Soc. A277, 486-500.

  • Germain G., Main, P. and Woolfson, M.M. 1970. Acta Cryst. B26, 274-285.

  • Hauptman, H.A. 1974. Acta Cryst. A30, 472-476.

  • Karle, J. and Karle, I.L. 1966. Acta Cryst. 21, 849.

  • Karle, J. 1974. International Tables , Vol. IV, section 6, 337.

  • Main, P. 1980. Multan-80 York England: University of York. Overbeek, A.R. & Schenk, H. 1978. Computing in Crystallography, University Press, Delft, p108-112.

  • Schenk, H. 1971. Acta Cryst. B27, 2037-2039.

  • Schenk, H. 1974. Acta Cryst. A30, 477-481.

  • Schenk, H. 1980. Computing in Crystallography, eds. R. Diamond, S. Rameseshan and K. Venkatesan, Indian Academy of Sciences, Bangalore, p701-722.