Authors: Ernst Egert and Syd Hall
Contact: Ernst Egert, Institut für Organische Chemie, Universität Frankfurt, Niederurseler Hang, D-6000 Frankfurt am Main, FRG.
PATSEE searches for a fragment of known geometry in the unit cell using an integrated Patterson and direct methods procedure. This program, which is valid and efficient for all space groups, is based on the standalone program written by Ernst Egert for the SHELX system. The rotation search is applicable to a fragment of any size and allows one torsional degree of freedom. The translation search may locate up to two independent search models of any size (including single atoms), taking into account known atoms at fixed positions, if any. The principles of this method are detailed by Egert & Sheldrick (1985), Acta Cryst. A41, 262-268.
The choice of strategy for the solution of a crystal structure at atomic resolution is usually determined by the presence or absence of heavy atoms. Thus it is common practice to solve light-atom structures with direct methods and those containing heavy atoms with Patterson techniques. If thisstrategy fails, it may be advisable to resort to the corresponding alternative method; direct methods may well reveal the positions of heavy atoms, and the Patterson function can be interpreted even for purely light-atom structures, such as those of organic molecules, provided that part of the molecular geometry is known. This so-called Patterson search has been shown by various authors to be a powerful tool for solving difficult crystal structures; its great strength is that it employs chemical information directly, and so can compensate for mediocre precision and resolution of the X-ray data. PATSEE combines the merits of both Patterson and direct methods - in a manner that is generally applicable, efficient, automatic and easy to use - and thus to exploit all the a priori available information in order to solve large problem structures.
Generally, a Patterson search in vector space consists of the following stages: (1) definition of a search model; (2) calculation and storage of the Patterson function; (3) rotation search, and (4) translation search. It is a serial technique, with the last two stages crucially dependent on the accuracy of the preceding ones. Thus the first step is by no means trivial; this is especially true for a procedure such as this where the fragments are taken as rigid and no model refinement is attempted (with the exception of one torsional degree of freedom between rigid groups). Usually a small well-defined search model is more appropriate than a larger one containing several incorrect atoms. The model is defined by atomic coordinates in a given coordinate system; these will normally be either fractional (taken from a related crystal structure) or Cartesian (e.g. from a force-field calculation).
The triplet structure invariant relationships which
are required for the translation search (if applied) are
calculated prior to the PATSEE run using
The Patterson map is generated by the program
The region around the origin of the Patterson function is dominated by intra- molecular vectors, which depend on the orientation but not on the position of the fragment. Thus the full six-dimensional search can be split into two three-dimensional searches, a rotation and a translation search (depending on the space group, the latter may be of even lower dimensionality).
The atom fragment information used in the searches is entered in the following way. Atom site coordinates may be entered as fractional or Cartesian according to the option on the preceding FRAG line. Each group of atom sites must be preceded by a FRAG line, or in the case of sites loaded from the bdf, the FRAG line(s) contain the labels of the atom sites to be used in the search. The position of each fragment of the atom sites in the input stream determines how they are used in PATSEE. A fragment that precedes the rotate line will be fixed (i.e. the vectors between these atoms will be neither rotated nor translated but they will be used in the figure of merit calculations). Fragment(s) that follow the rotate line but precede the transl line will be rotated and translated. Fragments that follow the transl line will be translated only. The shift, spin and twist lines are used modified the atom sites in a fragment. If a shift or a spin line will transform or rotate, respectively, the atom sites of the next fragment. The twist line serves a different function -- it enables two parts of a fragment to be rotated about a connecting bond (and searches are applied for each twist settings). The twist line must be positioned in the input stream between the two atom sites which will be twisted with respect to each other.
The first step in the rotation search is to set up
intra -molecular vector set to be used
for the search, i.e. to express the model geometry (which
should always be checked thoroughly - see the
Any orientation of a rigid fragment relative to a
fixed coordinate system can be described by three angles
corresponding to successive rotations about properly chosen
axes. (There are various definitions of the Eulerian
angles. For computational reasons, we prefer successive
rotations about the
c axes, in that order.) The asymmetric
unit of angular space depends on both the Laue group and
the model symmetry. Instead of scanning the respective
range of angles by specifying rotation increments, we have
chosen to generate random orientations (see
For each orientation, the correlation between the
rotated intramolecular vector set and the Patterson
function is measured by a
product function (note that this is a
different approach to non-Xtal versions of PATSEE). The
weight of each vector
is thus multiplied with the nearest Patterson
. The rotation figure of merit (
Rfom ) is ?
for a specified sample (see fraction
Before an orientation is placed in the short list of best solutions, it must pass two tests. The 'overlap test' ensure that no close interatomic contacts arise form the application of the lattice translations present and the 'equivalence test' compares the orientation in question with those already stored. Two orientations are regarded as similar when all pairs of equivalent atoms are close to each other; in that case only the better one is kept.
In order to improve the performance of the subsequent
translation search, the best solutions are 'refined' by a
restricted and finer rotation search. The maximum within
each promising region of angular space is found by testing
Users may also specify the starting orientation of a
If the search model has one torsional degree of freedom the rotation searches are repeated for each distinct geometry using the twist option. This specified by a range of possible torsion angles and an appropriate increment. Invocation of twist causes a merged list of best solutions is set up. At the end of the rotation search, a small number of promising orientations are passed over to the translation search. It is our experience that the correct one is usually present among the best two or three for reasonably sized fragments.
In procedures to position a fragment of known geometry in the unit cell, the translation search has usually proved to be less reliable than the rotation search. This is because the 'cross' (i.e. inter -molecular) vectors used to locate a fragment with respect to the origin suffer from errors in both the model geometry and orientation amplified by the symmetry elements; in addition, model vectors with very high weight are less likely than in the rotation search.
The phases calculated from the coordinates of an oriented model are a continuous function of the shift vector r. When the fragment is moved through the unit cell keeping its orientation fixed:
F = F . exp 2 π h r
since all atomic displacements r are the same. So the scattering contributions from the atoms of the search model have to be summed only once for each orientation and reflection to yield a structure factor F for the starting position; subsequently, the structure factor F for any position is readily obtained by multiplication with a simple phase factor. For the true structure, the individual phases of the strongest reflections are linked by various statistical phase relations; amongst these, the three-phase structure invariants have proved to be especially useful. The search fragment is usually incomplete and may also be not very accurate. Nevertheless, if its scattering power is significant, the triple-phase relations should hold at least approximately for the correct solution, in the sense that the distribution of the phase sums is far from being random.
These considerations led us to the development of a novel strategy for a Patterson translation search, which exploits in an integrated fashion the information contained in the sharpened Patterson function, the three-phase structure invariants and allowed intermolecular distances. In short, we have chosen the optimization of a weighted sum of cosine invariants as our refinement procedure, with the Patterson correlation and R indices as additional figures of merit, and the minimum intermolecular distance as a possible rejection criterion. This method is computationally efficient, especially for larger structures, because the refinement is based on phase relations derived from a relatively small number of large E magnitudes (say, >1.8). Only when an acceptable solution has been found by this 'direct search' is it necessary to calculate the time-consuming Patterson correlation.
Since, in order to save computing time, relatively
few phase relations are employed for the refinement, they
have to be selected carefully. Normally only the 40-60 most
probable and translation-sensitive three-phase structure
invariants are used for a translation search. It is
advisible to apply a
limit to the E
values before searching for phase relations, since
high-order reflections may be influenced considerably by
errors in the model. However, if the cut-off is too severe,
the accuracy of the phase-refinement procedure suffers. It
seems that a nominal resolution of about 1 Å is the best
Then random positions are generated for the rotated
search fragment(s); it is our experience that about one
translation try per cubic Ångstrom is sufficient in
order to have a good chance of locating one search model
Taking the limited range of the subsequent refinement
into account, only those random positions that are fairly
close to physically reasonable solutions are worth
refining; thus all positions that give rise to short
inter-molecular distances (say d <
t3sum = cos ( + + )
t3sum is expected to be large and
positive for the correct solution. At the end of the second
cycle, only positions with
For solutions that have survived these tests, the Tfom value is calculated identically as for Rfom but now for the inter-molecular vectors. A small number of best solutions (according to both t3sum and Tfom ) are stored provided that they pass various tests for possible equivalence (allowed origin shift or lattice translation). Although the true position of the search fragment is usually recognizable at this stage, R indices Re1 and Re2 based on E magnitudes have proved very useful in distinguishing further between correct and false solutions.
Finally, the solutions are sorted according to a combined figure of merit:
Cfom = ( Rfom. Tfom. t3sum ) / ( 10 . Re1. Re2 )
For all solutions printed, a Patterson sum function is calculated as a measure of fit/misfit for each individual atom, taking all vectors (intra- and inter-molecular) into account, this enables identification of possible wrong atoms and thus model correction.
The procedure described differs from other Patterson translation functions in that the oriented model is placed with respect to all symmetry elements of the space group simultaneously. Tests with known structures have indicated that this routine is able to locate very large fragments (of more than 300 atoms), in which case the distance tests sometimes preclude the majority of trial positions, as well as single atoms even when the latter are not very heavy (e.g. phosphorus or sulphur in large organic structures). Above all, the variety of different criteria employed to judge solutions should make this combination of Patterson and direct methods a powerful structure-solving strategy, if chemical information is available. One would expect that a position that is in agreement simultaneously with packing criteria (dmin), the Patterson function ( Tfom ), triple-phase relations ( t3sum ) and E values ( Re ) is probably correct, and our experience shows that this is indeed the case.
compid lac1 GENEV smax 0.5 list 1.8 :calculate the E values GENSIN :calculate the structure invariants gener 1.8quar noprint 100 1 100 FOURR epat full :calculate the E.F Patterson map PATSEE geom rotate vfom .8 frag 9.946 29.966 11.189 90 90 90 setid site C1 0.43394 0.51778 0.51693 :Diastereoisomer coordinates C2 0.46733 0.54875 0.62187 :Acta Cryst,C39,95 (1983). C3 0.48235 0.52338 0.73773 C4 0.57311 0.48221 0.72824 C5 0.54921 0.45318 0.61956 C6 0.65325 0.41691 0.60175 C7 0.63235 0.38980 0.51082 C8 0.50339 0.39796 0.44173 C9 0.49348 0.44807 0.39978 C10 0.53945 0.48047 0.50109 C11 0.56162 0.45583 0.27891 C12 0.51547 0.42331 0.18395 C13 0.54556 0.37553 0.22118 C14 0.46982 0.36608 0.34010 C15 0.48559 0.31558 0.35594 C16 0.47342 0.29772 0.22711 C17 0.48286 0.33807 0.14279 C18 0.69738 0.36838 0.23249 C19 0.67749 0.50191 0.47513 C20 0.55074 0.326640.02137 O1 0.54488 0.55020 0.82750 N1 0.41764 0.42882 0.62876 N2 0.39747 0.39645 0.53877 setid transl finish
This is the lac1 test deck. It is the standard test for PATSEE. Use the lac1.dat listing as a guide for other applications of PATSEE.