PROTIN: Set Constraints for Refinement of Parameters

Authors: Wayne A. Hendrickson

Contact: Jim Stewart, Department of Chemistry, University of Maryland, College Park, MD 20742, USA

PROTIN combines the atomic parameters of the trial structure with canonical stereochemical information and sets up the constraint observational terms for PROLSQ, the restrained-parameter structure-factor least-squares atomic parameter refinement program for large molecules. This XTAL version is derived from the program of John H. Konnert (1976) which was elaborated for protein refinement by Wayne A. Hendrickson (1985). Since that time a number of others have made changes, in details, to the original code and documentation. Among these contributors are Barry Finzel, Steven Sheriff, Anita Sielecki, Janet Smith, & Alex Wlodawer.

Purpose

The pair of programs PROTIN and PROLSQ are designed to allow the refinement of the atomic parameters obtained from crystallographic diffraction data while restraining the parameters to conform to stereochemical information. These programs use stereochemical information as additional observations which are incorporated with the observations of reflection intensities so that the refinement of atomic parameters includes information in the obsevational equations about both stereochemistry and diffraction structure factors. In the XTAL system PROATM is used to load the atomic parameters, identifying each atom in terms of its type and residue or group. PROTIN is used to load the canonical stereochemical information and connect the attributes of the ideal atoms to atoms of corresponding type and group in the given structure. Once the attachments are made, the quantities for the structure being refined which are needed to contribute to the matrix of normal equations are written to a file for use in the PROLSQ program. PROLSQ takes in this information and the diffraction information and forms and solves the combined least-squares normal equations for shifts in the atomic parameters.

Before using PROTIN/PROLSQ the Henderson paper (1985) should be read. Especially important are pages 264-268 where the "Refinement Strategy" is laid out.

Method

The method used in PROTIN/PROLSQ is documented in publications by Konnert and Hendrickson (1976,1979,1980,1985). From the standpoint of use of these programs Hendrickson (1985) presents an overview of the PROTIN-PROLSQ programs which are the ones which have been translated into RATMAC and placed in the XTAL system. A summary of this paper follows in which the input lines set forth below are related to the terms used in that paper. In general the geometric data must be placed in the input stream in a very restricted order. Therefore, the input lines are grouped according to the kinds of stereochemical features which may be restrained.

The first lines PROTIN, spcpep, and altchn set global conditions for the structure. spcpep, and altchn are optional if the protein has well defined chains wich have been specified in the atom lines loaded by PROATM. The PROATM program marks, through the lrsequ: record of the binary data file, the N terminal and C terminal group of each chain. The chain identity and terminal residue information is then generated in PROTIN from the signals set by PROATM.

disulb is optional if no disulphide linkages in the structure are to be restrained.

The optional lines from intrad to symwgt supplied at this point cause all the lines from restrm through ellres, to be read from the "default" file, PRORES, which is supplied with the XTAL system. This file contains the images shown below. The groups contained in this file will serve for many substances. If there are other groups or restraints which must be defined it may be done either by editing the ascii PRORES file or by supplying all the lines shown below in the specified order.

NOTE WELL: Additional groups which are added must be added after those groups already present. As noted elsewhere in this document the line order is very restricted in PROTIN due to the structure of the program itself.

The lines, restrm, reslnk, and residu contain the canonical coordinates for the amino acids and some other well known moieties. They also serve to set up the character strings which identify groups and atoms within the groups. The substrings trm, lnk, and res are then used in forming names for lines which follow where the distinction between groups which are terminal on chains, linking in chains, or a residue in a chain is important in forming the restraint derivatives. The lines which follow use the strings established in the resxxx lines to connect restraints to the defined groups. Thus, any new groups added will require the addition of all the other corresponding restraint input lines described below. distrm, dislnk, and disres supply the connectivity for the groups. This permits the calculation of the important distances. Four types of distances have been distingushed: actual bond distances, the next-nearest-neighbour distances from the triple of atoms that define bond angles, first to fourth atom distances that relate to prescribed dihedral angles as within planar groups, and hydrogen bond lengths. In restraining these geometric factors all the information is encoded in terms of atom pair distances which are used to produce the factors in the normal equations formed in PROLSQ.

plares, plapro, platrm, plalnk and plalnl serve to establish constraints on the coplanarity that exists and should be conserved in some groups. In PROTIN/PROLSQ a method is used which restrains the deviations of the atoms of the refined structure from the least-squares plane of the group.

chiral serves to define chiral centres within groups. The set of interatomic distances is insensitive to handedness so additional restraints must be imposed to assure the preservation of chirality. This is done by introducing a chiral volume equal to the triple scalar product of the vectors from a central atom to three attached atoms to quantify chirality. The sign of the chiral volume depends upon the handedness of the group and its magnitude equals the volume of the parallelepiped formed by the three vectors. The connectivity given in the chiral line serves to allow the calculation of the ideal and model volumes to be used in forming the observational function.

vdwdis serves to set up a table of atomic van der Waals diameters which may then be used in the calculations of non-bonded contacts.

vdwtrm, vdwlnk, vdwres, serve to supply the number of contact distances characteristic of each group. vdwcon lines are used to supply the distance information in the form of pairs of atoms which might come in non-bonding contact within the group. In the case of these distances the constraints are designed to follow a potential energy function that feature a steep repulsive barrier against close contacts and a very weak attractive potential. The observational function used is taken only over possible repulsive contacts. That is when d(model) < d(minimum). The value of d(minimum) depends upon the atomic elements in contact and on the type of contact: single-torsion separated atom pairs or multiple-torsion separated atoms.

tortrm, torres, and their data supplying adjuncts neighb, chidis and chiwgt serve to restrain torsion angles. Torsion angle restraints are among the least restricted of stereochemical features, but certain restrictions, related to non-bonded contacts, do apply. The nature of these restrictions, both for main-chain and for side-chains conformations in proteins are known. In PROTIN a simple quadratic form for the torsional potential is used. The differences in the angle of the torsional potential minimum, χ, is calculated for both for the ideal group and for the model and the difference in the square used in the observational function. The configurations treated can include quasi-planar torsions, as in the peptide bond, staggered potentials as in aliphatic side chains, transverse preferred conformations as in aromatic side chains, and targeted main-chain conformation angles such as in α helices. The definition of the torsion angles ω, φ, ψ, χ(1), χ(2), etc. are given in Jane S. Richardson's paper (1981) which can serve as a useful general reference for the nomenclature used in setting up the restraints used in PROTIN/PROLSQ.

secstr allows for the naming and characterizing of known secondary structural features in terms of characteristic χ and φ values.

elltrm and ellres serve to control the thermal parameters shifts by restraining the motion of bonded atoms relative to one another.

interd, intrad, and spcdis restrain the distances between chains in structures with multiple chains.

spcpla serves to set up constraints for special planar groups.

excon serves to set up restraints wich will exclude contacts between chains in the structure.

secsel serves to set up restraints on elements of secondary structure for back-bone torsion-angles.

spcsym, symop, and symwgt serve to set up constraints based on chains related by non-crystallographic symmetry.

The PROTIN program should be run once before a series of refinements using PROLSQ. The augmenting normal-equation elements pertinant to the stereochemical restraints are stored in file F for communication to PROLSQ. The values in this file will change slowly as refinement takes place so that, in general, it is not necessary to rerun PROTIN for every execution of PROLSQ. Of course, after several cycles of least-squares or when additional atoms are added to the model, PROTIN must be run again before PROLSQ.

Names Of Chains And Groups/Residues

In the description which follows a "chain identifier" will mean the character string supplied in field 5 of an atom line in PDB format or field 7 of a proat line. All reference to chains will be by these "names". Note that the PDB format limits chain names to 1 character while proat lines allow up to three characters. These names are placed sequentially in a table and transformed to relative pointers for use in PROTIN and PROLSQ.

Groups are defined in the restrm, reslnk, residu and reside lines. The symbols used for these groups must be used in the PROATM lines to identify the group to which an atom belongs. Residues are a special subset of these groups consisting of the amino acid groups which make up the chains. Each residue in a chain is assigned a "residue sequence number" during atom loading by PROATM. This number is the one in field 6 of a PDB atom line or in field 8 of a proat line. During processing these residue sequence numbers are compressed into a sequential set of pointers developed at atom loading. The user refers to his "residue sequence number" in the input lines which follow, but it is well to keep in mind, that internally these pointers will dominate. In this case error messages may give the pointer which will require thought to find which residue is being pointed to.

The term "order number" refers to the number which points to the relative position of a given atom in a group. For example, arginine, ARG contains atoms N, CA, C, O, CB, CG, CD, NE, CZ, NH1, and NH2 with order numbers 1 through 11. The order numbers of linking groups are sometimes negative as a signal that the atoms are in the previous residue. This term is used in the data lines which follow and may appear in error messages from PROTIN.

Printer Output

Only the essential counts of restraints and atoms are printed at priority 3. Printer output is copious at priority 4. At 4, all the input data is echoed, and the compound specific distances, angles, etc. displayed.

File Assignments

Reads atom and unit cell data from an input archive bdf

Writes geometric constraint data for PROLSQ on file prt

Optionally reads standard constraint data from file prores on profil:

Example

Note that the program PROATM must have been run before PROTIN may be run in order that the atom coordinates to be constrained will be available on the input archive bdf so that the appropriate constraint information can be generated on file F for use in the refinement process.

title CRAMBIN FROM ABYSSINIAN CABBAGE SEED-HENDRICKSON & TEETER JMS

PROTIN vdw 4.51 nap 5 atl gnl cel 3 aniso

PROLSQ ncy 25 pch iso

print alis 1

stats 5.0 4.0 3.0 2.0 1.0

rtest 5 23 1 1 4 3 .25 .45 .65 1.25 .35 .65 .95

References

Finzel, B.C. "Incorporation of Fast Fourier Transforms to Speed Restrained Least-squares Refinement of Protein Structures". J. Appl. Cryst. (1987) 20, 53-55

Hendrickson, W.A. "Stereochemically Restrained Refinement of Macromolecular Structures in Diffraction Methods for Biological Macromolecules Part B" from Methods in Enzymology Edited by H.W. Wyckoff, C.H.W. Hirs, & S.N. Timasheff Academic Press, Inc. (1985)

Hendrickson, W.A., & Konnert, J.H. (1979) in Biomolecular Structure, Conformation, Function, and Evolution, edited by R. Srinivasan, Vol 1, pp. 43-57. Pergamon Press, New York

Konnert, J.H. "A Restrained-Parameter Structure-factor Least-squares Refinement Procedure for Large Asymmetric Units" (1976). Acta Cryst. A32, 614-617.

Konnert, J.H. & Hendrickson, W.A. "A Restrained-parameter Thermal-Factor Refinement Procedure" (1980). Acta Cryst. A36, 344-350.

Richardson, J. S. "Protein Anatomy", page 168- Chapter in Advances in Protein Chemistry (1981) Acedemic Press, New York Edited by C.B. Anfinsen, J.T. Edsall, and F.M. Richards

Sheriff, S. "Addition of Symmetry-Related Contact Restraints to PROTIN & PROLSQ" J. Appl. Cryst. (1987) 20, 55-57.