MODEL : Build molecule from map

Authors: Ruth Doherty and Sydney Hall

Contact: Syd Hall, Crystallography Centre, University of Western Australia, Nedlands 6907, Australia

MODEL searches peak or atom sites for a connected molecular model. The model interpretation is on the basis of bond lengths and angles, or if information concerning the connectivity sequence of a fragment or molecule is supplied, MODEL attempts to match a canonical description of the input fragment to the connected peak and atom sites. Connected sites are plotted on the printer and a comprehensive list of distances and angles is produced.

Clusters And Subclusters

The first step in the interpretation of a molecular model is to determine which sites are connected. Sites are considered connected if their bond radii overlap. The bond radii of atom sites is extracted from the archive bdf. The bond radius for peak sites is input on the limits line. The search for these connections includes all possible symmetry transformations of the sites. Sites connected in this way are placed in the same group and labelled as a cluster.

Maximum and minimum bond angles represent a different type of constraint on the site coordination. Within a cluster of sites there may be large number of different ways that the angle constraints can be satisfied. Sites that satisfy a particular interpretation of angle constraints are grouped in a subcluster. One of three different approaches must be selected (on the MODEL line) for identifying sites that belong to the same subcluster. They are:

No subcluster search ( none)

In this mode no angle limits are applied to the connected sites. All sites within the specified site radii will be included in the model.

Connected subcluster search ( conn)

In this mode sites are accepted into a subcluster by testing the angles in the order of decreasing peak height and bond connectivity (the number of bonds to each site). This referred to as the maximum connectivity approach.

Weighted mean subcluster search ( mean)

A more geometric approach to angle constraints is based on weighting sites according to bond lengths and angles they form. Peak sites are assigned weights based on proximity of a connecting bond length to mean accepted value (bondmax-bondmin)/2 and bond angles to the mean (angmax-angmin)/2. This can be an important approach for very regular (e.g. 'chickenwire') or highly-coordinated structures. It is not suitable, however, for structures with a range of coordination geometries (e.g. heavy atoms or 5- & 6-membered rings.).

Search Parameters

The ability of MODEL to identify the correct set of connected sites is strongly dependent on the bond length and angle constraints discussed above. When only peak sites are input the default values (see the limits line) will permit a range of geometries. In such cases the limits will be adequate for a molecular search provided the peak sites are well-defined and there are not too many spurious peaks in the map. If possible the user should specify limits to suit the stereochemistry of the structure. Be careful - if you are too restrictive, some legitimate peaks may be rejected; if you are too permissive, spurious peaks will complicate the interpretation of the subcluster.

In addition to the bond length and angle modes for site selection, the user may specify a known fragment as a template (see the conect line). This approach places stringent stereochemical constraints on the selection of connected sites and should be used whenever possible. It should be emphasised, however, that the fitting of the input model to the peak sites is strongly dependent on the sites that are connected in rings. Special care must be taken that the input ring sites are correct - otherwise the fitting process will probably fail. The reliability of the non-ring atoms is less critical.

Selecting The Best Model

Interpretation of the connected peaks depends on the 'quality' of the Fourier map, the appropriateness of the bond length and angle constraints and the availibility of reliable stereochemical information. The 'best' model is selected from two figures-of-merit values.

The first FOM is based on the sum of the subcluster connections, weighted according to peak height. This is a reasonable measure provided the bond length and angle constraints are appropriate for the structure. The expected value for the FOM is very dependent on the number of peak sites entered compared to the number of non-Hydrogen sites in the structure.

A second FOM relies on the stereochemical information input on conect lines. If this is used a FOM value is calculated based on the fit of the input model to subcluster peaks. This is a more sensitive FOM than the first and has an optimal value around 2.0. It should be emphasised that, just as the fitting process is strongly influenced by sites connected in rings, this FOM is especially sensitive to matching atoms connected in rings.

Output Information

MODEL outputs a range of numerical and graphical information about the modelling process. Here is a summary of the different sections.

  1. The control parameters are listed as a record of the constraints used in the search process. It must be emphasised again that the MODEL calculation is as effective as the constraints are appropriate to the structure being processed.

  2. If conect lines are input, the atomic coordination of each site is listed. Check to make certain that the specified model is as you intended - this is a common source of error.

  3. Each interpretation of a cluster of peak sites satisfying the FOM limits is listed. This listing includes the calculated FOM values, the number of subclusters and the peak connectivity table. The table lists the atom and peak sites in decreasing order of peak height, with information on subcluster allocation, connected peaks and sites that have been assigned from the input model. This table must be referred to when connecting up the projected peak positions output in 4. below. Note that the atom labels stored on the bdf are truncated to 4 characters in MODEL to aid their presentation on the printer plot of the molecule as described in 4.

  4. A graphical representation of the peak clusters projected down one or more axes is provided. The number of projection axes can be specified by the user (see the limits line). The default is for one projection perpendicular to the best least-squares plane through sites connected in rings (or if no atoms are in rings, all sites are used), and one projection down an axis in the least-squares plane. The second projection is not output if the cluster is approximately planar. The projected peak sites are shown to scale (default is 2.5 A/cm) and as the peak sequence number preceded by a special site marker. For peaks connected in rings the site marker is an asterisk ( *); for peaks satisfying the subcluster constraints, as a plus ( +); and for peaks in the cluster but not within the subcluster, as a period (.). In addition to the graphing of peak clusters, projections are also output for all peak sites. This gives a composite picture of the peak positions independent of clustering and subclustering selections.

  5. At the conclusion of the graph outputs, the peak sites are listed with the allocated cluster number and, if model sites were input, the assigned atom sites.

  6. A table of contact distances less than a specified limit (see field 7 of the limits line) calculated using the coordinates from list 5. The default limit is 2.4A.

  7. A table of bond distances for the connected sites (based on the atom radii and the peak radius from the limits line) calculated using the coordinates from list 5 and the symmetry equivalent positions.

  8. A table of bond angles for the peaks listed in list 5 for all asymmtric units.

  9. A table of bond lengths for twinned peaks related only by symmetry operations.

File Assignments

  • Reads symmetry, atoms and peak sites from bdf pek

  • Writes connected atom sites to line file pch

Examples

MODEL

This example shows the fully defaulted run. All default search conditions apply. No molecular connections (as defined by conect lines) are used in the FOM assessment.

MODEL conn
limits 4 0.6 0.9 100 140
conect c1 o1 o2 c6; c7 c2 c6 03; c5 c4 c6; c3 c4 c2

The compound described in this input is salicylic acid. If PEKPIK has been run to select and sort the top peaks from a Fourier map onto file pek, MODEL will then sort the peaks into clusters of peaks that are bonded to one another. Then an analysis of the peaks in terms of some structure will be attempted. Only those peaks which are bonded at a distance of between 1.2 and 1.8 Angstroms and which form angles of between 100° and 140° with the other peaks will be considered in the interpretation. An attempt is made to assign the atom names, given in the conect line, to the cluster, or clusters, of peaks which have met the bonding criteria. Up to three interpretations will be attempted (by default). For each interpretation, a plot will be made of the peaks in the cluster projected on the least squares plane of the cluster. After the interpretation has been completed, a list of all the inter-peak distances up to 2.0 Angstroms will be produced.