Binary File Processing

The processing of binary files is one of the most important aspects of Xtal programming. Here we summarise the various procedures and macros used to read and write these files. Knowledge of the archive file structure is assumed (see Section 5). In particular, familiarity with the concepts of a logical record, a logical record directory and a logical record packet, are needed to understand the procedures discussed here.

As with line input/output, a developer is provided with a set of tools for handling binary files. The standard Fortran input/output instructions are inadequate for this purpose, both for the reasons cited for line I/O, and because of the enormous flexibilty that exists in the structure of the archive file. This flexibility necessitates considerable checking and bookkeeping on the part of the nucleus routines.

The checking operations have been largely concealed from programmer through the use of the macro tools:

writepkt: provides the position for the next packet to be inserted into thedesignated output bdf buffer, and writes the buffer when full.
readwpkt: provides the position of the next packet to be extracted from thedesignated input bdf buffer, and reads a new buffer when complete.
indexpkt: reads and if requested constructs the directory packet of a logical record.
copyfile: copies designated logical records from one bdf to another.

The nucleus permits up to eight different binary files to be assigned to a calculation at one time. Typically between two and four files are used. The device number for each binary file is stored in the system array IOUNIT(1) to IOUNIT(8). In the program files are referenced by the index of IOUNIT, i.e. bdf 1 to bdf 8. Externally these bdf's are usually referred to by their filename extensions.

The line files discussed above are character files, and, as such, require the use of the character buffers CHRIN and CHROT. Bdf's are output in 'binary', i.e. exactly as they appear in computer memory. The buffers used to output these files are allocated by the programmer as part of the QX array. Each bdf used in a program must be assigned a buffer of real words in this array. This is quite simple to do. For example, the buffer area for bdf 2 is designated in the QX array with the following lines

IOMARK(2)=MARKS#                 mark start of bdf 2 buffer
incrqx:(MARKE,MARKS+bdfbuf:,XX0502)# mark end of bdf 2 buffer

The first line specifies the start of the buffer space in the QX array by setting the system variable IOMARK(2) to the current QX limit MARKS. The next line uses the macro incrqx: to set MARKE equal to MARKS plus the length of the buffer in words (defined by the macro bdfbuf:) and to request that the QX limit be extended to this value. The bdf reading and writing macros will automatically use this allocated buffer space.

Here is a brief summary of how these macros are applied. For detailed information refer to the later macro definitions.

writepkt:(NFIL, LREC, PSIZ, PAKPT)

writepkt: is responsible for the 'putting' of the binary file buffer to the physical device. It is equivalent to writeline: for the line files, except that each application of writepkt: does not necessarily result in a buffer being output. The different mechanism is because information is stored in a bdf buffer as 'packets'. Each call to writepkt: provides the position in the buffer where the next packet starts. If there are insufficient words for another packet, writepkt: outputs the buffer and positions the pointer at the front of the buffer. writepkt: performs all of the functions needed to construct a logical record (bdf 'lead words' are described in Section 6), but the programmer is responsible for transferring the packet information into the buffer.

The arguments of writepkt: are as follows. NFIL is the bdf file number (1 to 8). LREC is the logical record number in the form of a system macro (see the XMACRO file or Section 6). PSIZ is the number of words to output as a packet. PAKPT is the QX array index returned by writepkt: which points to the first word of the packet minus one. That is, the first word of a packet is QX(PAKPT+1).

writepkt: is a file creator but it is usually used in conjuction with copyfile: and readwpkt:. Logical record 1 is special, in that it is automatically updated with the calculation history by the nucleus, and is usually handled by these latter two macros (see example below). For all other records, writepkt: may be either used separately or in conjunction with readwpkt: and copyfile:.

readwpkt:(NFIL, LREC, PSIZ, PAKPT, MFIL)

readwpkt: is used to read, or read and write files. It is the 'read' equivalent of writepkt: where PAKPT points to the start of the packet to be read. readwpkt: inputs from bdf NFIL into a buffer starting at IOMARK(NFIL). Provided the input packets do not need to be expanded or contracted, readwpkt: may also be used to output this buffer to file MFIL. This is referred to as the read-write single buffer mode. When packets being output differ in size from those being read it is necessary to use both readwpkt: and writepkt: and separate buffers. Note that in single-buffer mode it is possible to change the value of words in a packet but no actual transfer of data is needed (as is the case in the double-buffer mode).

copyfile:(NFIL, MFIL, LREC1, LREC2)

copyfile: copies all data of logical records LREC1 to LREC2 from the buffer of the binary file NFIL to the buffer of the binary file MFIL. Note that if LREC2 is endrecord: both bdf's NFIL and MFIL will be closed. If NFIL=1 and MFIL=2, the device numbers stored in IOUNIT(1) and IOUNIT(2) will be interchanged. This 'automatic interchange' process is a standard feature of the nucleus, and ensures that the most recent data will always be read from bdf 1 (i.e. fileA). Note also that the use of readwpkt: to read logical record endrecord: from bdf 1, and in double-buffer mode, writepkt: to write endrecord: to bdf 2, will also result in the interchange of these device numbers.

indexpkt:(NFIL, LREC, PSIZ, PAKPT, MFIL, KEY, WANT, RELPT)

When processing directory-format (rather than fixed-format) logical records the programmer needs to set up a procedure for extracting (or installing) information about the contents of a given record. The directory packet is the first packet in the logical record, and contains a unique identification number for each item stored in subsequent packets. The order of the ID numbers in the directory packet is identical to the order of the item values in all subsequent packets in the logical record.

indexpkt: is used to extract, and to instal, item identification numbers from and into the directory packet of a logical record. The first five arguments in this command are identical in function to readwpkt:. WANT is an integer array of item identification numbers (see Section 6) that are to be searched for, appended, updated or deleted from the directory of the designated logical record LREC. The purpose of the ID numbers in WANT are specified by control signals in the four-element array KEY. The results of the directory search by indexpkt: are returned in the integer array RELPT. RELPT contains the relative position in the packet of each ID number listed in WANT. There is usually a one-for-one correspondence between WANT and RELPT. Note that indexpkt: requires that the dimension of RELPT array must be greater than the dimension of the WANT array.

Warning: If the WANT array contains the same ID number repeated consecutively, as used for example to point to all of the words a large character string data item (e.g. an atom label), the RELPT array will be returned containing pointers to the consecutive words. That is, if WANT contains the IDN's 11 11 11 11, RELPT will be returned with 5 6 7 8 if the first IDN 11 is located in word 5. This is correct but a problem can occur if by chance two non-character IDN's are loaded into WANT consecutively. indexpkt: will automatically return the pointers as consecutive words, and this may not be what is needed. This is an usual circumstance but some care is required on the part of the programmer.

Complete details of this powerful command are given in the macro definitions. It will suffice here to show the purpose of indexpkt: in the overall binary file manipulation process. The RELPT packet pointers returned by indexpkt: are used during subsequent file reading and writing processes to extract, modify and insert data into the buffer. Note that it is only necessary to locate items which are actually 'used' in the calculation - all other items in the packet are transferred as a block. Below is an example which shows the general principles. Refer to the program FC for more complete examples.

Example of binary file processing

Here is an illustrative example of processing the logical record lrtest:. Note that the dimension of the REL array is one greater than the WNT array

INTEGER KEY(4)#				Indexpkt controls
INTEGER REL(6)#				Indexpkt rel pointers
intdata:(WNT,[11,17,18,8,23])#		Indexpkt list of ID numbers
..............
KEY(1)=1#				Item 1 is mandatory
KEY(2)=3#				Items 2-3 are optional
KEY(3)=5#				Items 4-5 are to be appended
KEY(4)=0#				Set append-item control
indexpkt:(1,lrtest:,NP,IP,2,KEY,WNT,REL)# Process directory
IF(KEY(4)<=0) iquit:(90103.)#	If item 1 not found-- exit
MP=KEY(4)#				Set expanded packet size
..............
REPEAT#					Loop over lrtest: packets
$(#					>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3
readwpkt:(1,lrtest:,NP,IP,0)#		Point to input lrtest packet
IF(IP<=0) BREAK#			Exit after last packet
writepkt:(2,lrtest:,MP,JP)#		Point to expanded o/p packet
movereal:(QX,IP,QX,JP,NP,0)#		Transfer input items to o/p
.............
I=JP+REL(1); QX(I)=QX(I)+1.#		Increment mandatory item
.............
I=JP+REL(3); IF(I>JP)AMX=QX(I)#	   Extract optional item
.............
I=JP+REL(4); QX(I)=QX(MM+2)#		Store appended item
$)#					<<<<<<<<<<<<<<<<<<<<<<<<<<<< 3