PROATM: Load Protein Atom Parameters

Author: Jim Stewart, Department of Chemistry, University of Maryland, College Park, MD 20742, USA

PROATM loads atomic structure parameters in the form specified by the Protein Data Bank (PDB) (Bernstein et al., 1977). Data may be presented exactly as specified in the PDB literature or in more convenient XTAL free format. The data stored on the output archive bdf may be listed and/or output as lines in the PDB format. Provision is made to insert, replace, delete, and renumber atoms in terms of the PDB serial numbers.

Calculations Performed

The primary function of PROATM is the transformation of atomic parameters of protein structures into a form suitable for use in the XTAL system and back. In PDB form the atomic coordinates are given in orthogonal Angstrom coordinates and the thermal displacement parameters as U. Moreover, the anisotropic U values are scaled by 10000. In addition to the scaling problem, the PDB specifies more naming characters than the XTAL convention allows. These extra characters serve to unambiguously locate the atom in the structure. Moreover, atoms are classified as atoms or heteroatoms depending upon whether they are in the polymeric part of the structure or are attached to it as solvent or chelated moieties.

Each atom is specified by a unique serial number in addition to the other naming parameters which are listed below. All of these naming parameters are stored in the atom record of the bdf. Once they are stored they may later be retrieved in the exact form specified by the authors of the PDB.

As the loading of the atoms takes place a survey of the residue sequence is made. The sequence of amino acids in the protein is stored in the bdf in logical record sequence. This requires that all atoms be loaded in a preliminary pass. The atoms are written to a scratch file and then copied to the output archive bdf after the orthogonal to factional transformation matrix and the sequence of residues have been written to records lrcell: and lrsequ: of the output archive bdf. If loading is bypassed, all printed/punched information will be taken from the input archive bdf alone. If loading is done the printed/punched data will be from the output archive bdf.

Quantities stored in lratom:

The actual quantities stored in record lratom: of the output archive bdf is, to a certain extent, under user control. At a minimum the XTAL and PDB form of the atom name, the atom serial number, x, y, z, the residue sequence number, the remoteness indicator, the branch and sub-branch designators, the alternate location indicator, the chain identifier, the atom serial mumber, the residue name and scattering factor type will be stored for each atom. All the other quantities are optional. The program is set up to store the most important quantities by default. However, it is possible through the use of the prolod line to specify any of the possible atomic parameter items described by the PDB documentation.

The following list gives the items which may be stored; those marked with * are stored by default; those marked ** are stored depending upon the complexity of the thermal parameters specified. The rest will not be stored unless specified in a prolod line. The description gives the form the parameter tables in the bdf. Translation to and from the PDB prescribed form is done automatically.

Mnemonic Idnum Description

None 14 * XTAL form of atom name as two words:

Word 1 contains the atom type followed by the residue number (right justified)

Word 2 contains the insertion code followed by the remoteness indicator information.

X 1 * x fractional coordinate

Y 2 * y " "

Z 3 * z " "

U 4 ** isotropic thermal parameter U

U11 5 ** U(1,1) individual anisotropic thermal parameter

U22 6 ** U(2,2) " " " "

U33 7 ** U(3,3) " " " "

U12 8 ** U(1,2) " " " "

U13 9 ** U(1,3) " " " "

U23 10 ** U(2,3) " " " "

POP 11 atom population (occupancy) parameter

APP 12 atom anomolous population parameter

SEQ 15 * character 1 remoteness indicator

character 2 branch designator

character 3 sub-branch designator

character 4 alternate location indicator

RSQ 16 * characters 1-3 least significant digit of residue sequence number

character 4 insertion of residue code

SET 17 dataset to which the atom belongs

CHN 18 * characters 1-3 chain identifier

character 4 most significant digit of residue sequence number

RES 19 * 4 character residue name

SQN 20 * atom serial number

SFT 22 * scattering factor type; pointer to SF table

TFT 23 thermal parameter type

SX 101 standard deviation in x fractional coordinate

SY 102 " " y "

SZ 103 " " z "

SPP 111 " " population param

SU 104 " " U

SAP 112 " " anomolous population

SU1 105 " " U(1,1)

SU2 106 " " U(2,2)

SU3 107 " " U(3,3)

SU4 108 " " U(1,3)

SU5 109 " " U(2,3)

SU6 110 " " U(3,3)

It is important to note that because PROATM is biased toward keeping the bdf as small as possible, it is important to use a prolod line if it is desired to keep any of the items not flagged with an (*) from being purged during a run.

Hydrogen atoms

The original definition of data for the PDB did not allow for the inclusion of hydrogen atoms. When H-atoms were added to protein structures it became necessary to have a sub-branch designator since there could be up to three hydrogens attached to a carbon atom. This sub-branch designator has been placed as a number on the part of the scattering factor symbol H in column 7 of an ATOM line. The PROATM program stores this sub-branch designator as the third character of item 15 of the atom record.

Order of Entry of Atoms.

The algorithm used in PROATM is based on the serial number as defined in the Protein Data Bank. All atom data must be presented in serial number order. Except for the a priori run, all runs are treated as edits of the binary data file record lratom:. Atoms may be added, replaced, inserted or deleted, but only in serial number order. The serial number must be increasing in the list but need not be continuous. If gaps are not included initially (to provide for future additions) special provision is made to insert atoms subsequently. This does, however, increase the serial numbers of following atoms already present in the bdf.

Application Of PROATM

There are four functions allowed in the loading of atom parameters with this program. They are input, replacement, deletion and insertion of atoms into the bdf. Renumbering of following atoms takes place only with insertion. All operations are done in terms of the PDB serial numbers.

Input of Atoms

With the a priori option specified all atoms are entered in serial number order by input lines. Any atoms present in lratom: of the input binary data file are deleted. With the 'merge' option atoms are entered in serial number order and merged in proper order with the atoms previously loaded into the lratom: record.

Replacement of Atoms

An atom with a serial number equal to one in the file will cause the replacement of the values in the file with the values in the line input stream.

Deletion of Atoms

If an atomd line is prepared in the following form:

atomd <first serial number> <last serial number>,

then all atoms from the first serial number through the last serial number will be deleted from the file. If the last serial number is void or smaller than the first serial number only the first number atom will be deleted.

Insertion of Atoms (with subsequent renumbering)

atomi and atome lines containing the serial numbers to control the insertion are entered surrounding a group of new atoms to be loaded. e.g.

atomi < existing serial number of the atom preceding the insert >

............atom input lines in any recognized form

atome < existing serial number of the last atom to be pushed down >

All the inserted atoms will be forced into the file just after the atomi-specified atom. The serial number of all the atoms in the bdf following the inserted atom, but only down to, and including the one specified in the atome line will have their serial numbers increased by the number of inserted atoms.

All the editing features described here may be used in any combination so long as all are presented in the input stream in increasing serial number order.

After the binary data file has been prepared, it may be printed and/or punched in standard Protein Data Bank form. This operation can be carried out just at the end of a loading/editing session or as a separate operation on an existing bdf.

File Assignments

Reads atom data from the input archive bdf

Writes protein atomic parameters to the output archive bdf

Reference

Bernstein et al. 1977. J. Mol. Biol.112, 535-542.