MEPHAS: Establish Phases of Selected Reflections by Application of a Maximum Entropy Procedure

Authors: Edward Prince, Douglas M Collins & James M Stewart

Contact: James M Stewart, Department of Chemistry, University of Maryland, College Park, MD 20742, USA

MEPHAS calculates reflection moduli from a low-resolution direct-space constrained-exponential electron density distribution and a selected subset of reflections. It also calculates an updated constrained exponential electron density map and its entropy. Trials are performed with different phases assigned to the chosen reflections by a fractional factorial design. The trial which shows the map with maximum entropy is accepted as the the updated constrained exponential electron density distribution and the determined phases are assigned to the chosen reflection subset. The input electron density map is based on a smaller number of starting reflections. It is used as a basis to establish, by maximum entropy criteria, the phases of the a "next" subset of chosen reflections. This process is used to extend the number of phased reflections. At each stage in the process of establishing the maximum entropy phases the constrained exponential electron density function is held everywhere positive. The magnitudes of the F calculated implied by the electron density is refined by a dual function procedure which forces a match in the magnitudes of the observed and calculated F values of the chosen subset of reflections.

This program is modeled on the programs MESIGN and MEFFIT written by E. Prince at the National Institutes for Science and Technology, Gaithersburg, Maryland.

Purpose

MEPHAS reads a direct space exponential electron density map generated by the programs FOURR and MEDENS or MEPHAS itself. It extends the number of reflections phased and to make the map such that the moduli of the observed reflections are expressed in it. The input map can be prepared by only the phases needed to define the origin and enantiomorph of the space group with, perhaps, others determined from other information or a previous MEPHAS run.

Method

Two terms are used repeatedly in this description and in the program output. They are trial and iteration. A trial refers to the use of one set of phases applied to a subset of reflections to be phased. An iteration refers to an application of Prince's dual function to bring the moduli to conformance.

MEPHAS operates in four distinct modes:

1) The 'MEsign' mode.

Sixteen centric reflections are tested. From 49 to 289 separate trials are carried out until the phases corresponding to the maximum entropy of the constrained exponential electron density has been found. In each trial dual function iterations are applied until a good fit of the 16 reflection moduli is achieved. The first 32 trials consist of assigning 16 patterns of centric phases and their supplements to the chosen subset of reflections to be phased. A 'centric' reflection is one with phase restricted to 0/180, 30/210, 45/225, 60/240, 90/270, 120/300, 135/315, or 150/330 degrees. The difference in entropy associated with the maps from each pair of trials is used in Yates's algorithm to predict the maximum entropy phase set. A check is made to see if any of the 32 trials show a greater entropy than the Yates's result. The phases from one or the other source are then tested by taking the supplement of each phase, reflection by reflection in turn. If the entropy rises the phase is accepted and the next reflection tested until all have been 'checked'. This checking will be confined to checking each of the 16 reflections once, or optionally, over and over until no changes are detected. That is a maximum of of 256 possible trials. Usually the process is complete in 49 trials. The final phases and their F(calc) values are put out to the bdf without changing any other reflections previously treated. In addition the final constrained exponential density as modified in the run is written to an output file.

2) The 'MEosc' mode.

A specified number of centric reflections are oscillated by 180 degrees in phase. At each trial, when the F fitting iterations are completed, if the entropy has risen, the current value of the reflection phase is saved and left unchanged as the trials continue. This is tantamount to running the last part of the MEsign mode.

3) The 'MEcyc' mode.

A small number of general reflections, usually one, may be specified along with a phase 'step' to be applied. These reflections will then be tested against the maximum entropy criterion at each step from 0 to 360 degrees less the step. Once the maximum entropy phase is established for each reflection in turn that value is used while the next reflection is tested.

4) The 'MEfit' mode.

A number of general reflections may be specified for moduli fitting. In this mode the initial phases will be those obtained from the input bdf. Usually a previous run of RFOURR using the last previous constrained exponential electron density map will be required. In this case the map will be refined in one trial of as many iterations as are required to bring the moduli into conformance. This is the many reflection moduli fitting mode of operation. Note that differing from the MEFFIT approach, except for special cases, input phases are held fixed in the logarithm of density while the constrained exponential map is modified to exactly fit the input |F(obs)|. The phases derived from the modified map will then show a departure from the phases of the prior map. The program MEFFIT refines the phases as well as forcing the moduli to conform.

The overall process may be described in terms of the 'MEsign' mode (1). The other three modes are essentially similar except that in the case of trials of general reflections, 3), far fewer reflections may be treated at a time. In the case of moduli fitting, 4), many reflections are treated for just one trial. In every case at the end the updated constrained electron density map is written to file mee. The finally arrived at phases and calculated moduli of the chosen reflections are placed in the output binary data file.

The following steps give an overview of the procedure in mode 1:

I) The program must have as input a "prior" exponential electron density map which has been prepared using a subset of previously phased reflections. This requires the use of the XTAL links FOURR and MEDENS. The resolution of this map must be sufficient to accommodate the highest order reflections used in the calculation of Prince's dual function which follows.

II) Sixteen reflections which are to be phased in the run must be chosen and specified by means of PHI input lines or by allowing automatic selection based on magnitude of F observed. In the 'MEsign' stages of the phasing process these will be "centric" reflections, that is, ones restricted to two possible values differing by 180 degrees.

The steps which follow are under program control:

III) If "centric" reflections are being determined, the sixteen reflections specified by the user are assigned phases, iteratively, in 32 patterns. For each pattern the entropy of the "prior" electron density as modified by the inclusion of these new reflections is calculated. The 'MEsign' feature sets up a search for the phase (sign) pattern associated with the highest entropy, the "maximum entropy". Once the sign pattern with the highest entropy is identified, the phase of each of the sixteen reflections in turn is switched and the result tested by the maximum entropy criterion. By this technique the testing of the 65536 possible sign combinations is reduced to 49 trials. For every trial their are a number of iterations carried out using Prince's dual function to force the map to reflect the observed moduli of the reflections being tested.

The entropy of each map for each iteration of each trial is calculated by:

S = sum(-rho*ln(rho))/sum(rho) + ln(sum(rho)/nx*ny*nz)

where the sums are over all the pixels in the unit cell sampled in nx points along x, ny along y, and nz along z. The values of rho used in the calculation are in electrons per pixel. The entropy, S, is independent of the scale of rho, and, provided rho is sufficiently sampled by the pixel grid of nx, ny, and nz points, S is also independent of the number of pixels. This "standard relative entropy, S is calculated against "the uniform distribution" so that it represents a measure of the "lumpiness" of rho. The maximum entropy corresponds to the smoothest possible all positive map.

IV) At any stage in the search, the subset of sixteen selected reflections produce 304 reflections which will be used in the Hessian matrix to compute corrections to the electron density. These corrections will force the magnitudes of F observed and F calculated closer to one another. The "reverse" Fourier transform of an output map from MEPHAS will give F calculated values with the derived phase, and the magnitude of F calculated will conform to the observed F.

The application of Prince's dual function in a system of equations which lead to a modification in the electron density map forces the magnitudes of the F values closer together while maintaining smoothness and positivity of the map. The number of reflections (304) in the expanded list arise from the combination of the indices of the sixteen chosen reflections. h'(+) = h(i) + h(j) ; h'(-) = h(i) - h(j) i ranging from 1 to 16; j ranging from i to 17 where in the 17th case h'(+) = h(i) and h'(-)=h(i). In each phase search cycle the structure factors of all 304 reflections are needed to set up a Hessian lower triangular positive definite matrix and a gradient based on the delta F values. This system of equations is solved by Cholesky factorization to give Fourier coefficients for the sixteen reflections being phased. These coefficients are then used in a 'forward' Fourier transform to produce a map which contains values on a logarithmic scale which are used to adjust the ln(rho) of the 'prior' constrained exponential map. The corrections to the "prior" map produced by the application of Prince's dual function to determine the Fourier coefficients is to change the value of rho at each pixel such that the value of each of the 16 |F calc| moduli moves closer to those of |F obs|. At the same time the map remains everywhere positive and as smooth as possible. After each trial the prior map is left untouched to be adjusted anew in the succeeding trials.

The trials are carried out until the phase combination is found, which in conjunction with the reflection moduli, produce a maximum entropy constrained exponential electron density. The phases determined are placed in the output binary data file on aa2 without changing the phase values for any other reflections save those established in the run. The input c. e. e. d. map corresponding to the new result is written to file mee in the same format as the 'prior' map on file med. By copying file mee to file med it may then be used as a new 'prior' c.e.e.d. input map and other reflections may be phased under one of the four modes in which the program may be run.

Automatic Selection of Reflections

If no phi lines are supplied the reflections to be phased will be taken, in order, from the input binary data file. Selection will be the largest encountered unphased reflections where delta(F)>DF. In the case of a 'MEsign' run the next 16 centric reflections will be treated. In the case of a 'MEcyc' run where a general reflection is being tested or a 'mefit' run the number of phases specified in the MEPHAS line will be used. For any of the phase cycling modes delta(F) is ABS(Fobs - Fc) where the value of Fc is that found in the input bdf. If Fc in the input is void, i.e. never determined, Fc is taken to be zero, hence delta(F) is just F observed. For the mefit mode delta(F) is taken as Fc*(Fo-Fc) so that all large Fc reflections will be left out. The use of a MAXHKL input line will restrict the choices to be reflections with magnitudes of h, k, and l less than the values given. It is important to note that the generation of the Hessian matrix involves the combination of all the reflection indices with themselves. Thus if the maximum h is 6 a reflection with h 12 will be generated. In order for this reflection to have F calc by reverse Fourier transform the resolution of the Fourier map must be at least 2n+1 of the maximum index. In this example the x grid must be 25 or finer. To save computation time and to assure the most meaningful values of entropy it is best to choose grids appropriate to the maximum h, k, & l being fitted. I. e. 4*maxh+1 or as close thereto as FOURR and RFOURR allow.

File Assignments

Reads space group, cell, and reflection information from input binary data file

Reads a 'prior' constrained electron density map from med

Updates the newly phased reflections on the output binary data file

Example

: Test of maximum entropy routines MEDENS and MEPHAS

: REMARK HYDROLASE 06-APR-81 1BP2*

REMARK PHOSPHOLIPASE A=2= (E.C.3.1.1.4) (PHOSPHATIDE

REMARK 2 ACYL-HYDROLASE)

REMARK BOVINE (BOS TAURUS L.) PANCREAS

REMARK B.W.DIJKSTRA,K.H.KALK,W.G.J.HOL,J.DRENTH

TITLE BOVINE (BOS TAURUS L.) PHOSPHOLIPASE A=2= B.W.DIJKSTRA, et. al.

compid BPHOLP

STARTX

cell 47.070 64.450 38.150 90. 90. 90.

cellsd 0.02 0.02 0.02 0. 0. 0.

sgname P 2AC 2AB : int tab P212121

celcon H 5350

celcon C 2408

celcon N 652

celcon O 1188

celcon S 56

celcon Ca 4

exper 1 1.54178

ADDREF ffac frie

limits *4 .2941176

hklgen HKL FREL SIGF FFRI SFFR

proatm

scale1 .021245 0.000000 0.000000 0.00000

scale2 0.000000 .015516 0.000000 0.00000

scale3 0.000000 0.000000 .026212 0.00000

: PDB atom coordinates omitted end

SORTRF order KHL sort 3 FREL pakfrl aver

end

FC iso archiv 1800 1801 1802 1803 1804 1700

end

PHONYD fri err 0.00001

end

reset psta 3

MESTAR

phi 1 0 1 90

phi 0 1 1 270

phi 0 2 1 180

phi 1 1 0 90

end

FOURR fobs ffsum nw

grid 60 80 48

layout down across layer 60 80 48 0 0 0 1 1 1

end

MEDENS

end

MEPHAS nr 4 mf si 0.005

: MEfit test

phi 1 0 1 90

phi 0 1 1 270

phi 0 2 1 180

phi 1 1 0 90

end : There must be an end line

copybdf mee med : This wil overwrite the initial "prior"

MEPHAS CR 350 CV 0.5 SI 0.005 ms

: MEsign test

phi 1 0 1 90 FIX

phi 0 1 1 270 FIX

phi 1 0 2 0

phi 1 5 0 270

phi 0 2 0 180

phi 0 0 2 0

phi 2 2 0 180

phi 0 2 1 180 FIX

phi 1 1 0 90 FIX

phi 0 3 1 270

phi 2 1 0 0

phi 2 0 1 90

phi 1 4 0 90

phi 3 4 0 270

phi 2 0 3 270

phi 0 9 9 270

phi 0 2 3 0

phi 4 2 0 0

phi 2 0 8 180

phi 2 0 0 180

end : There must be an end line

copybdf mee med

MEPHAS CR 350 CV 0.5 SI 0.005 ms

: MEsign test program chooses refl

end : There must be an end line

MEPHAS CR 350 si 0.005 mc nr 10

: MEcyc test

phi 1 1 1

end : There must be an end line

copybdf mee med

PHACMP ext PHD shel 16

phaso1 ds 1 fc

phaso2 ds 1 fc

end

FINISH

Reference

E. Prince, L. Sjolin, and R. Alenljung (1988) Phase Extension by Combined Entropy Maximization and Solvent Flattening. Acta Cryst. A44, 216-222