PROTIN: Set Constraints for Refinement of Parameters
Authors: Wayne A. Hendrickson
Contact: Jim Stewart, Department of Chemistry,
University of Maryland, College Park, MD 20742, USA
PROTIN combines the atomic parameters of the trial structure with canonical
stereochemical information and sets up the constraint observational terms for
PROLSQ, the restrained-parameter structure-factor least-squares atomic
parameter refinement program for large molecules. This XTAL version is derived
from the program of John H. Konnert (1976) which was elaborated for protein
refinement by Wayne A. Hendrickson (1985). Since that time a number of others
have made changes, in details, to the original code and documentation. Among
these contributors are Barry Finzel, Steven Sheriff, Anita Sielecki, Janet
Smith, & Alex Wlodawer.
The pair of programs PROTIN and PROLSQ are designed to allow the refinement of
the atomic parameters obtained from crystallographic diffraction data while
restraining the parameters to conform to stereochemical information. These
programs use stereochemical information as additional observations which are
incorporated with the observations of reflection intensities so that the
refinement of atomic parameters includes information in the obsevational
equations about both stereochemistry and diffraction structure factors. In the
XTAL system PROATM is used to load the atomic parameters, identifying each atom
in terms of its type and residue or group. PROTIN is used to load the canonical
stereochemical information and connect the attributes of the ideal atoms to
atoms of corresponding type and group in the given structure. Once the
attachments are made, the quantities for the structure being refined which are
needed to contribute to the matrix of normal equations are written to a file
for use in the PROLSQ program. PROLSQ takes in this information and the
diffraction information and forms and solves the combined least-squares normal
equations for shifts in the atomic parameters.
Before using PROTIN/PROLSQ the Henderson paper (1985) should be read.
Especially important are pages 264-268 where the "Refinement Strategy" is laid
out.
The method used in PROTIN/PROLSQ is documented in publications by Konnert and
Hendrickson (1976,1979,1980,1985). From the standpoint of use of these programs
Hendrickson (1985) presents an overview of the PROTIN-PROLSQ programs which are
the ones which have been translated into RATMAC and placed in the XTAL system.
A summary of this paper follows in which the input lines set forth below are
related to the terms used in that paper. In general the geometric data must be
placed in the input stream in a very restricted order. Therefore, the input
lines are grouped according to the kinds of stereochemical features which may
be restrained.
The first lines PROTIN, spcpep, and
altchn set global conditions for the structure.
spcpep, and altchn are optional if the protein
has well defined chains wich have been specified in the atom
lines loaded by PROATM. The PROATM program marks, through the lrsequ:
record of the binary data file, the N terminal and C terminal group of each
chain. The chain identity and terminal residue information is then generated in
PROTIN from the signals set by PROATM.
disulb is optional if no disulphide linkages in the structure
are to be restrained.
The optional lines from intrad to symwgt
supplied at this point cause all the lines from restrm through
ellres, to be read from the "default" file,
PRORES, which is supplied with the XTAL system. This file
contains the images shown below. The groups contained in this file will serve
for many substances. If there are other groups or restraints which must be
defined it may be done either by editing the ascii PRORES file
or by supplying all the lines shown below in the specified order.
NOTE WELL: Additional groups which are added must be added after those groups
already present. As noted elsewhere in this document the line order is very
restricted in PROTIN due to the structure of the program itself.
The lines, restrm, reslnk, and
residu contain the canonical coordinates for the amino acids
and some other well known moieties. They also serve to set up the character
strings which identify groups and atoms within the groups. The substrings
trm, lnk, and res are then used
in forming names for lines which follow where the distinction between groups
which are terminal on chains, linking in chains, or a residue in a chain is
important in forming the restraint derivatives. The lines which follow use the
strings established in the resxxx lines to connect restraints
to the defined groups. Thus, any new groups added will require the addition of
all the other corresponding restraint input lines described below.
distrm, dislnk, and disres
supply the connectivity for the groups. This permits the calculation of the
important distances. Four types of distances have been distingushed: actual
bond distances, the next-nearest-neighbour distances from the triple of atoms
that define bond angles, first to fourth atom distances that relate to
prescribed dihedral angles as within planar groups, and hydrogen bond lengths.
In restraining these geometric factors all the information is encoded in terms
of atom pair distances which are used to produce the factors in the normal
equations formed in PROLSQ.
plares, plapro, platrm,
plalnk and plalnl serve to establish
constraints on the coplanarity that exists and should be conserved in some
groups. In PROTIN/PROLSQ a method is used which restrains the deviations of the
atoms of the refined structure from the least-squares plane of the group.
chiral serves to define chiral centres within groups. The set
of interatomic distances is insensitive to handedness so additional restraints
must be imposed to assure the preservation of chirality. This is done by
introducing a chiral volume equal to the triple scalar product of the vectors
from a central atom to three attached atoms to quantify chirality. The sign of
the chiral volume depends upon the handedness of the group and its magnitude
equals the volume of the parallelepiped formed by the three vectors. The
connectivity given in the chiral line serves to allow the
calculation of the ideal and model volumes to be used in forming the
observational function.
vdwdis serves to set up a table of atomic van der Waals
diameters which may then be used in the calculations of non-bonded contacts.
vdwtrm, vdwlnk, vdwres, serve
to supply the number of contact distances characteristic of each group.
vdwcon lines are used to supply the distance information in the
form of pairs of atoms which might come in non-bonding contact within the
group. In the case of these distances the constraints are designed to follow a
potential energy function that feature a steep repulsive barrier against close
contacts and a very weak attractive potential. The observational function used
is taken only over possible repulsive contacts. That is when d(model) <
d(minimum). The value of d(minimum) depends upon the atomic elements in contact
and on the type of contact: single-torsion separated atom pairs or
multiple-torsion separated atoms.
tortrm, torres, and their data supplying
adjuncts neighb, chidis and
chiwgt serve to restrain torsion angles. Torsion angle
restraints are among the least restricted of stereochemical features, but
certain restrictions, related to non-bonded contacts, do apply. The nature of
these restrictions, both for main-chain and for side-chains conformations in
proteins are known. In PROTIN a simple quadratic form for the torsional
potential is used. The differences in the angle of the torsional potential
minimum, χ, is calculated for both for the ideal group and for the model
and the difference in the square used in the observational function. The
configurations treated can include quasi-planar torsions, as in the peptide
bond, staggered potentials as in aliphatic side chains, transverse preferred
conformations as in aromatic side chains, and targeted main-chain conformation
angles such as in α helices. The definition of the torsion angles
ω, φ, ψ, χ(1), χ(2), etc. are given in Jane S.
Richardson's paper (1981) which can serve as a useful general reference for the
nomenclature used in setting up the restraints used in PROTIN/PROLSQ.
secstr allows for the naming and characterizing of known
secondary structural features in terms of characteristic χ and φ
values.
elltrm and ellres serve to control the thermal
parameters shifts by restraining the motion of bonded atoms relative to one
another.
interd, intrad, and spcdis
restrain the distances between chains in structures with multiple chains.
spcpla serves to set up constraints for special planar
groups.
excon serves to set up restraints wich will exclude contacts
between chains in the structure.
secsel serves to set up restraints on elements of secondary
structure for back-bone torsion-angles.
spcsym, symop, and symwgt
serve to set up constraints based on chains related by non-crystallographic
symmetry.
The PROTIN program should be run once before a series of refinements using
PROLSQ. The augmenting normal-equation elements pertinant to the stereochemical
restraints are stored in file F for communication to PROLSQ. The values in this
file will change slowly as refinement takes place so that, in general, it is
not necessary to rerun PROTIN for every execution of PROLSQ. Of course, after
several cycles of least-squares or when additional atoms are added to the
model, PROTIN must be run again before PROLSQ.
Names Of Chains And Groups/Residues
In the description which follows a "chain identifier" will mean the character
string supplied in field 5 of an atom line in PDB format or
field 7 of a proat line. All reference to chains will be by
these "names". Note that the PDB format limits chain names to 1 character while
proat lines allow up to three characters. These names are
placed sequentially in a table and transformed to relative pointers for use in
PROTIN and PROLSQ.
Groups are defined in the restrm, reslnk,
residu and reside lines. The symbols used for
these groups must be used in the PROATM lines to identify the group to which an
atom belongs. Residues are a special subset of these groups consisting of the
amino acid groups which make up the chains. Each residue in a chain is assigned
a "residue sequence number" during atom loading by PROATM. This number is the
one in field 6 of a PDB atom line or in field 8 of a
proat line. During processing these residue sequence numbers
are compressed into a sequential set of pointers developed at atom loading. The
user refers to his "residue sequence number" in the input lines which follow,
but it is well to keep in mind, that internally these pointers will dominate.
In this case error messages may give the pointer which will require thought to
find which residue is being pointed to.
The term "order number" refers to the number which points to the relative
position of a given atom in a group. For example, arginine, ARG contains atoms
N, CA, C, O, CB, CG, CD, NE, CZ, NH1, and NH2 with order numbers 1 through 11.
The order numbers of linking groups are sometimes negative as a signal that the
atoms are in the previous residue. This term is used in the data lines which
follow and may appear in error messages from PROTIN.
Only the essential counts of restraints and atoms are printed at priority 3.
Printer output is copious at priority 4. At 4, all the input data is echoed,
and the compound specific distances, angles, etc. displayed.
Reads atom and unit cell data from an input archive bdf
Writes geometric constraint data for PROLSQ on file prt
Optionally reads standard constraint data from file prores on
profil:
Note that the program PROATM must have been run before PROTIN may be run in
order that the atom coordinates to be constrained will be available on the
input archive bdf so that the appropriate constraint information can be
generated on file F for use in the refinement process.
title CRAMBIN FROM ABYSSINIAN CABBAGE SEED-HENDRICKSON & TEETER
JMS
PROTIN vdw 4.51 nap 5 atl gnl cel 3 aniso
PROLSQ ncy 25 pch iso
print alis 1
stats 5.0 4.0 3.0 2.0 1.0
rtest 5 23 1 1 4 3 .25 .45 .65 1.25 .35 .65 .95
Finzel, B.C. "Incorporation of Fast Fourier Transforms to Speed Restrained
Least-squares Refinement of Protein Structures". J. Appl. Cryst. (1987)
20, 53-55
Hendrickson, W.A. "Stereochemically Restrained Refinement of Macromolecular
Structures in Diffraction Methods for Biological Macromolecules Part B" from
Methods in Enzymology Edited by H.W. Wyckoff, C.H.W. Hirs, & S.N.
Timasheff Academic Press, Inc. (1985)
Hendrickson, W.A., & Konnert, J.H. (1979) in Biomolecular Structure,
Conformation, Function, and Evolution, edited by R. Srinivasan, Vol 1, pp.
43-57. Pergamon Press, New York
Konnert, J.H. "A Restrained-Parameter Structure-factor Least-squares Refinement
Procedure for Large Asymmetric Units" (1976). Acta Cryst. A32,
614-617.
Konnert, J.H. & Hendrickson, W.A. "A Restrained-parameter Thermal-Factor
Refinement Procedure" (1980). Acta Cryst. A36, 344-350.
Richardson, J. S. "Protein Anatomy", page 168- Chapter in Advances in
Protein Chemistry (1981) Acedemic Press, New York Edited by C.B. Anfinsen,
J.T. Edsall, and F.M. Richards
Sheriff, S. "Addition of Symmetry-Related Contact Restraints to PROTIN &
PROLSQ" J. Appl. Cryst. (1987) 20, 55-57.