MODEL
: Build molecule from map
Authors: Ruth Doherty and
Sydney Hall
Contact: Syd Hall,
Crystallography Centre, University of Western Australia,
Nedlands 6907, Australia
MODEL searches peak or atom sites for a connected
molecular model. The model interpretation is on the basis
of bond lengths and angles, or if information concerning
the connectivity sequence of a fragment or molecule is
supplied, MODEL attempts to match a canonical description
of the input fragment to the connected peak and atom sites.
Connected sites are plotted on the printer and a
comprehensive list of distances and angles is
produced.
The first step in the interpretation of a molecular
model is to determine which sites are connected. Sites are
considered connected if their bond radii overlap. The bond
radii of atom sites is extracted from the archive bdf. The
bond radius for peak sites is input on the
limits line. The search
for these connections includes all possible symmetry
transformations of the sites. Sites connected in this way
are placed in the same group and labelled as a
cluster.
Maximum and minimum bond angles represent a different
type of constraint on the site coordination. Within a
cluster of sites there may be large number of different
ways that the angle constraints can be satisfied. Sites
that satisfy a particular interpretation of angle
constraints are grouped in a
subcluster. One of three different
approaches must be selected (on the
MODEL line) for
identifying sites that belong to the same subcluster. They
are:
No subcluster search (
none
)
In this mode no angle limits are applied to the
connected sites. All sites within the specified site
radii will be included in the model.
Connected subcluster search (
conn
)
In this mode sites are accepted into a subcluster
by testing the angles in the
order of decreasing peak height and bond
connectivity (the number of bonds to each
site). This referred to as the maximum connectivity
approach.
Weighted mean subcluster search (
mean
)
A more geometric approach to angle constraints is
based on weighting sites according to bond lengths and
angles they form. Peak sites are assigned weights based
on proximity of a connecting bond length to
mean accepted value
(bondmax-bondmin)/2 and bond angles to the mean
(angmax-angmin)/2. This can be an important approach for
very regular (e.g. 'chickenwire') or highly-coordinated
structures. It is not suitable, however, for structures
with a range of coordination geometries (e.g. heavy atoms
or 5- & 6-membered rings.).
The ability of MODEL to identify the correct set of
connected sites is strongly dependent on the bond length
and angle constraints discussed above. When only peak sites
are input the default values (see the
limits line) will permit
a range of geometries. In such cases the limits will be
adequate for a molecular search provided the peak sites are
well-defined and there are not too many spurious peaks in
the map. If possible the user should specify limits to suit
the stereochemistry of the structure. Be careful - if you
are too restrictive, some legitimate peaks may be rejected;
if you are too permissive, spurious peaks will complicate
the interpretation of the subcluster.
In addition to the bond length and angle modes for
site selection, the user may specify a known fragment as a
template (see the
conect line). This
approach places stringent stereochemical constraints on the
selection of connected sites and should be used whenever
possible. It should be emphasised, however, that the
fitting of the input model to the peak sites is strongly
dependent on the sites that are connected in rings. Special
care must be taken that the input ring sites are correct -
otherwise the fitting process will probably fail. The
reliability of the non-ring atoms is less critical.
Interpretation of the connected peaks depends on the
'quality' of the Fourier map, the appropriateness of the
bond length and angle constraints and the availibility of
reliable stereochemical information. The 'best' model is
selected from two figures-of-merit values.
The first FOM is based on the sum of the subcluster
connections, weighted according to peak height. This is a
reasonable measure provided the bond length and angle
constraints are appropriate for the structure. The expected
value for the FOM is very dependent on the number of peak
sites entered compared to the number of non-Hydrogen sites
in the structure.
A second FOM relies on the stereochemical information
input on
conect lines. If this is
used a FOM value is calculated based on the fit of the
input model to subcluster peaks. This is a more sensitive
FOM than the first and has an optimal value around 2.0. It
should be emphasised that, just as the fitting process is
strongly influenced by sites connected in rings, this FOM
is especially sensitive to matching atoms connected in
rings.
MODEL outputs a range of numerical and graphical
information about the modelling process. Here is a summary
of the different sections.
-
The control parameters are listed as a record
of the constraints used in the search process. It
must be emphasised again that the MODEL calculation
is as effective as the constraints are appropriate to
the structure being processed.
-
If
conect lines are
input, the atomic coordination of each site is
listed. Check to make certain that the specified
model is as you intended - this is a common source of
error.
-
Each interpretation of a cluster of peak sites
satisfying the FOM limits is listed. This listing
includes the calculated FOM values, the number of
subclusters and the peak connectivity table. The
table lists the atom and peak sites in decreasing
order of peak height, with information on subcluster
allocation, connected peaks and sites that have been
assigned from the input model. This table must be
referred to when connecting up the projected peak
positions output in 4. below. Note that the atom
labels stored on the bdf are truncated to 4
characters in MODEL to aid their presentation on the
printer plot of the molecule as described in
4.
-
A graphical representation of the peak clusters
projected down one or more axes is provided. The
number of projection axes can be specified by the
user (see the
limits line). The
default is for one projection perpendicular to the
best least-squares plane through sites connected in
rings (or if no atoms are in rings, all sites are
used), and one projection down an axis in the
least-squares plane. The second projection is not
output if the cluster is approximately planar. The
projected peak sites are shown to scale (default is
2.5 A/cm) and as the peak sequence number preceded by
a special site marker. For peaks connected in rings
the site marker is an asterisk (
*
); for
peaks satisfying the subcluster constraints, as a
plus (
+
); and
for peaks in the cluster but not within the
subcluster, as a period (.). In addition to the
graphing of peak clusters, projections are also
output for all peak sites. This gives a composite
picture of the peak positions independent of
clustering and subclustering selections.
-
At the conclusion of the graph outputs, the
peak sites are listed with the allocated cluster
number and, if model sites were input, the assigned
atom sites.
-
A table of contact distances less than a
specified limit (see field 7 of the
limits line)
calculated using the coordinates from list 5. The
default limit is 2.4A.
-
A table of bond distances for the connected
sites (based on the atom radii and the peak radius
from the
limits line)
calculated using the coordinates from list 5 and the
symmetry equivalent positions.
-
A table of bond angles for the peaks listed in
list 5 for all asymmtric units.
-
A table of bond lengths for twinned peaks
related only by symmetry operations.
This example shows the fully defaulted run. All
default search conditions apply. No molecular connections
(as defined by
conect lines) are used in
the FOM assessment.
The compound described in this input is salicylic
acid. If
PEKPIK
has been run to select and sort the top peaks from a
Fourier map onto file
pek
, MODEL will then
sort the peaks into clusters of peaks that are bonded to
one another. Then an analysis of the peaks in terms of some
structure will be attempted. Only those peaks which are
bonded at a distance of between 1.2 and 1.8 Angstroms and
which form angles of between 100° and 140° with
the other peaks will be considered in the interpretation.
An attempt is made to assign the atom names, given in the
conect line, to the
cluster, or clusters, of peaks which have met the bonding
criteria. Up to three interpretations will be attempted (by
default). For each interpretation, a plot will be made of
the peaks in the cluster projected on the least squares
plane of the cluster. After the interpretation has been
completed, a list of all the inter-peak distances up to 2.0
Angstroms will be produced.