The Program Status Block

The program status block is 512 bits long.

The format of the first 224 bits of the program staus block, which control high-level components of the program in current execution, such as whether it is running in supervisor mode, is shown below:

The first bit is a 1 if the computer is in diagnostic mode, where all normal restrictions on the computer are lifted, and special model-dependent operations are possible. If clear, it can only be set by an interrupt, or a program instruction that is equivalent to an interrupt (that is, an instruction such as a supervisor call instruction).

The next three bits of the program status block are the ring; this can be increased, but not decreased, by instructions running in any mode of operation, and decreased only by interrupts, including software interrupts. Note that this means that all interrupt service routines, in addition to running in supervisor mode, must also be in ring 0. However, this in no way prevents such a routine from spawning a less privileged process to do the bulk of its work, and then returning from the interrupt after the less privileged process completes.

The fifth bit of the program status block is a 1 if the computer is in supervisor mode, in which the computer can perform input-output operations, and alter that bit, and the following 107 bits, of the program status block, which have such functions as controlling the access of processes to resources. Although no instruction in user mode can directly change to supervisor mode, interrupts, including software interrupts that can be initiated by a user-mode instruction, will shift the computer to supervisor mode; as these also cause the computer to start executing code whose location is specified, and that code and the interrupt vector will both be in memory not changeable by user programs, this is not in itself dangerous, and is in fact the normal method by which user mode programs call the supervisor.

However, the normal limitations of machine security can be bypassed in one way: if the interrupt vector table is subject to alteration by any process, that process may place entries in that table having any level of privilege, however high. Normally, some supervisor mode processes will have write access to the interrupt vector table. This means that diagnostic mode processes are no longer immune to tampering or creation by less-priviledged supervisor mode processes. In this way, supervisor mode, which, unlike diagnostic mode, is not model-dependent, is used for obtaining total control of the hardware, and is the highest mode that needs to be entered during normal operation for all normal system functions, including initial program load, even when the operating system provides access to diagnostic mode functionality. Thus, supervisor mode is the topmost mode in terms of general privilege from a security standpoint, but diagnostic mode processes still have the higher level of privilege they require to function undisturbed while they are running.

The bit marked Virtual indicates that the machine is executing code which executes as if a secondary program status block is the program status block. Instruction types and data formats may differ in the two blocks, and in that case, those in the secondary block are followed for the program running on a virtual machine; where a limitation of privilege exists, the actual program status block takes precedence, and violations will result in an interrupt returning control to the parent process, which will, within its own privilege level, simulate the more privileged operation for the child process running on a virtual machine.

The Multi-Level Virtual bit indicates that the parent process of a given process is itself in a virtual machine; when this bit is on, a third copy of the Program Status Block is also utilized in determining how instructions are to be executed.

This feature will be discussed further in a later section.

A bit in the Program Status Block is used to turn off all memory mapping so that the computer can handle swapping pages between random-access memory and an external storage device used to supply swap space, and another bit is used to turn off only the first level of memory mapping, so that addresses directly refer to global virtual memory, as is useful to many supervisory processes.

The Interrupt Mask is either zero, to indicate that the computer is not processing a hardware interrupt, and that all hardware interrupts are permitted, or it contains a number from 1 to 31, indicating that the interrupt service routine for that hardware interrupt is being executed, and only lower-numbered hardware interrupts may be processed.

The Register Map ID

Note that there is also an eight-bit register map ID in the supervisor portion of the program status block. This needs to be in the program status block, and not in additional model-dependent status bits, as it needs to be restored during some forms of context switching: this ID is used by the processor to determine which mapping of internal register space to registers used by the programmer is in effect.

The Register Map ID field of the program status block is used to identify, for implementations which make use of register renaming, which scheme of allocating the registers visible to the programmer to actual register storage space is in effect.

It might seem that having a register map ID field makes it unnecessary to also have bits in the program status block indicating which registers a process is allowed to use. However, register renaming is an optional implementation-dependent feature of the architecture, these bits remain fully relevant in that case to temporarily deny access to some registers that have been mapped to a process in order to speed context switching, and it is intended that a parent process using the same register set as a child process might set up some registers and then make them read-only for the child process.

Note also that it is entirely possible for register renaming to have two levels; one consisting of explicit allocation of register space to running processes by a supervisor program, and one performed by the processor itself to speed context switching, or allow more registers to be allocated than are physically realized. Thus, a processor could have only enough registers for two threads using the full complement of registers, and while in one thread gradually swap the registers of the previous thread out to an internal cache memory.

Since it is possible for some processes to be allocated significantly more register space than others, thus allowing more threads to be simultaneously executed in this way, it would seem reasonable to also allow the supervisor to specify that a particular job might use a segment of cache memory as its main memory, and treat the main memory as a paging device. With multiple levels of cache, the further flexibility of specifying which level of cache is used in this way is also possible; a big program with a high priority might receive a segment of level two external cache for use as its main memory, while a very small program might even receive internal cache. Of course, the latter might well effectively happen anyways as the result of normal cache operation: but appropriate circuitry could speed memory references when this is done explicitly by eliminating the test for whether a data item is in a particular cache or in the next more external element of the memory hierarchy.

In fact, even the capacity of cache could be increased by doing this; the bits indicating the external address corresponding to the current cache contents could instead be used for data when a cache is acting as main memory. This would work best, of course, if exactly half of the capacity of the cache were normally dedicated to each purpose, but this is a common proportion.

In addition to simplifying context-switching during interrupts, and when returning from interrupts, at least if only one process is allowed to use a particular group of registers, the registers denied can be ignored in register renaming for implementations that use a larger internal register bank to allow multiple threads to run concurrently without the need for context-switching between them. This allows more threads to run if those threads use fewer registers.

If register renaming is possible, that means that it may also be worthwhile to make it possible to give a favored process more than its share of registers; two bits in the status word indicate that four, eight, or sixteen extra bit matrix multiply register banks, each having the same size as the entire set of short vector registers, have been allocated to a process.

Register renaming would normally be associated with implementations that support multithreading. A time-shared computer will normally use a clock interrupt, occurring perhaps sixty times a second, to switch between different computer programs that are running at the same time, and which haven't already relinquished control of the CPU because of having made an I/O request of the operating system (or to switch to one whose I/O request has been met, and which may return to executing). Since a computer can execute thousands of instructions in a sixtieth of a second, the overhead associated with context switching at this frequency is not high.

A multithreaded computer, on the other hand, has more than one program counter. On alternate memory cycles, it fetches instructions belonging to different running programs. This can be useful in maximizing throughput if the different programs use different functional units within the computer; thus, one could have one program that is making extensive use of the Execute Extended Translate instruction, another program that is performing floating-point long vector operations, and a third program written in scratchpad mode and using only the conventional arithmetic-logic unit.

The operating system would normally set up user programs that work in separate areas of memory to work this way, and each process could even have its own dedicated segment of the cache. This is particularly important if the programs have been individually optimized to avoid cache collisions.

Note that the program status block and the program counter for a process are not located in the register bank.

The Process ID

In addition to a process having a Register Map ID, which is used to reference a description of how its working registers are allocated to internal register space, and a Memory Map ID, to reference a description of how its address space is allocated to physical memory, a process also has a Process ID, which is a unique number assigned to each running process.

The Parent Process ID is used to allow the owner of a process running in a virtual machine to be identified.

The Source Process ID is used to allow a process that has been interrupted to be identified by the interrupt service routine which responded to that interruption; this permits it to access the Program Status Block of the process, as well as other portions of the status of the process. Note that this means that when a process is inactive due to being interrupted, it still consumes one of the 255 possible Process ID values. No process can have zero as its process ID, as this is used in the Source Process ID field to indicate to an interrupt service routine that it was started as a new process.

To prevent processes having their Process ID value changed unexpectedly, and to conserve the 256 available spots for active concurrent threads, an interrupt service routine may retire its source process to inactivity after storing its status, taking over that processes' Process ID as its own so as to reserve that Process ID for the resumption of that process, thereby surrendering the interupt service routine's initial Process ID for reuse.

Interrupt handling will be discussed more fully in a later section.

Register Availability Control

Because of the size of the long vector registers, and the even larger size of the long vector scratchpad, there is a provision in the program status word to deny their use to a given process. Even the short vector registers consume a scarce resource, and so it is possible for the supervisor program to prevent user processes from using them at all, thus speeding up context switching in interrupts. It is also possible to make those registers read-only to a process; this can be useful if they are set to a constant value for a specific problem.

As well, the 64 supplementary accumulator/index registers together with the 64 supplementary floating-point registers, which are only used in vector register mode, and the 24 registers in three groups which constitute the scratchpad registers, the scratchpad pointer registers, and the array scratchpad base registers, only used in the scratchpad modes and (as 24 additional base registers) in vector register mode, and even the floating-point registers, can be denied to a process.

Also, the availability of the dedicated cache area used for MIMD parallel computing with RISC-style instructions using the long vector ALUs can be restricted, as as it is envisaged that the segment of the processor with this capability will be designed in a simple manner that renders it available to only one thread at a time. Two bits in the program status block also control trapping when either all the cache-parallel processes conclude, or when a single cache-parallel process concludes; this allows the parent process to avoid having to repeatedly test the status of these processes, while still being able to keep them busy. A trap, as it is an interrupt, is under the control of code running in supervisory mode; thus, suitable operating system calls to allow the use of this facility by user programs are required.

The sixteen-bit Translate Program Mask and the sixteen-bit Translate Program Read Mask allow the operating system to dedicate the resources used for the different forms of extended translate microprograms to different processes.

If the first bit of the Translate Program Mask is a one, the current process is allowed to compose and store a code 0 microprogram; if the thirteenth bit of this area is a one, the current program is allowed to compose and store a code 12 microprogram.

The Translate Program Read Mask determines if a process can execute the types of extended translate microprograms corresponding to the bits in that mask. Thus, if the first bit of this area is a one, the current process is allowed to use a code 0 microprogram; if the thirteenth bit of this area is a one, the current program is allowed to use a code 12 microprogram. This allows processes to use extended translate microprograms that are standardized, and which are maintained and controlled by other, more privileged, processes.

Program Status Block Format

Four bits in the Program Status Block are available to permit the use of other program status block formats than the one shown above which is indicated by filling these bits with 0000.

Other values for these bits may indicate a different significance for any or all of the rest of the Program Status Block. This may be considered as analogous to the bit, in the Program Status Word of the IBM System/370, which had been used in the IBM System/360 to indicate the use of a particular 8-bit code based on ASCII, to instead indicate extended control mode.

One possible use for these bits would be to select a mode in which a secondary set of data formats is not used, and operand filtering is not available, so that the 48 bits of the Program Status Block used for these purposes may be used for something else, such as allowing the types used in six-bit opcode translation to be individually specified.

Another possibility would be to switch the computer into a completely different mode of operation, with entirely new features, while retaining the ability to operate in the original mode for purposes of compatibility.

Status Modification Control

Each bit in the Program Status Mask corresponds to four bits in the last 272 bits (except for the last 32 bits) of the Program Status Block; if that bit is a one, the corresponding four bits in the Program Status Block cannot be changed by user programs.

When the bit marked Lock Memory Map is set, user programs cannot make any changes to their Memory Map ID; note that they can never do so directly by altering the Program Status Block, but user programs in Ring 0, if this bit is not set, may alter the Memory Map ID by means of special instructions.

When the bit marked Lock Machine Type is set, user programs can only alter the last sixty-four bits of the program status block, rather than the last two hundred and seventy-two.

Instruction Interpretation

Now we have come to the next section of the Program Status Block, comprising sixty-four bits from bit 224 through bit 287.

These bits are primarily concerned with how instructions are decoded and interpreted, and their format is shown below:

Explicit Indication of Parallelism

The bit labelled Explicit Indication of Parallelism causes instructions to be treated as belonging to blocks which normally contain sixteen 16-bit units; the first 16 bits of such a block, or the first 32 bits of such a block, instead of containing instructions, contains additional bits, to be associated with each 16 bit instruction or portion thereof, giving information about which instructions can be executed in parallel with, or in immediate pipeline succession with, the instructions which preceded them in order. This option is more fully described in a later section.

Fast Decode Assistance

An alternative to explicit indication of parallelism is to use the first 16 bits of a block to indicate which 16-bit units begin a new instruction.

If both the bits for explicit indication of parallelism and fast decode assistance are set, instead of simply indicating both parallelism and where instructions begin, a more efficient mode, called Advanced Indication of Parallelism, is enabled. In that mode, the bits indicating dependencies are in a different format, and bits indicating where instructions begin are optional on a block-by-block basis. This will also be described in the section devoted to the indication of parallelism.

The Postfix Supplementary Bit

There are nine other bits in the Program Status Block which also cause instructions to be treated as belonging to blocks of sixteen 16-bit units. These bits associate a 16-bit unit with each such block. This 16-bit unit is the last unit of a block rather than the first unit. If the bit marked Individual Postfix Supplementary Bit Fields is set, then each such mode selected leads to the use of an additional 16-bit area at the end of each block; otherwise, a single 16-bit area triggers all selected modes, modifying the way in which some of the modes are indicated as noted below. Also available is a bit marked Dual Postfix Supplementary Bit Fields; if this bit is set, then there are two 16-bit areas at the end of the block; a 1 bit in the first one triggers the modes selected normally, and a 1 bit in the second one triggers the modes selected by corresponding bits in the area labelled Secondary Postfix Supplementary Bit Field Interpretation.

If both of these bits are set, the use of a postfix supplementary bit is disabled. Instead, it is the format of the prefix supplementary bits which are indicated by setting the bit marked explicit indication of parallelism that is modified; this is done in a manner embodying the concepts outlined in the paper by Heidi Pan on Heads-and-Tails instruction encoding, the details of the particular implementation being described in the section on explicit indication of parallelism.

The bit marked Advanced Indication of Parallelism, if set, modifies explicit indication of parallelism in a different way to achieve some of the same goals as Heads-and-Tails instruction encoding as well as some additional goals, at the cost of some restrictions on the instruction stream. This, too, is described in the same later section.

In most cases, these bits are subject to the following conditions:

The last bit of the unit is not used, and must be zero.
All other bits correspond to the 16-bit units within the block, one bit to each unit.
Those bits which do not correspond to the first unit of an instruction are not used, and should also be zero.

These conditions apply to the simplest case, where only one bit of supplementary information is appended to each instruction by this mode, but exceptions to these rules will be noted below.

Another bit in the Program Status Block can be used to specify that the long vector instructions are to act upon numbers in the secondary format. This can be used to allow a program to continue to perform vector operations in the format usually used by the computer, while the numeric format for scalar operations is temporarily changed.

This also makes possible strict upwards compatibility with a subset implementation of the architecture in which the memory width and the numeric format for long vector instructions could not be changed, to simplify the construction of the arithmetic-logic units used for them, so that a full 64 such units, as required for maximum speed, could be provided more simply. Such a subset implementation might also exclude the cache-internal MIMD parallelism feature, and in such a case those ALUs would likely be only used for vector operations, and not to provide 65-way superscalar operation.

Of course, this is only needed if it is desired to retain all or part of the ability to change numeric formats and memory width for scalar operations, since a subset could offer only one numeric format in all cases.

For greater flexibility in the use of multiple numeric formats, without the overhead of Dual-Format Mode (to be described below) or of otherwise explicitly indicating in the instruction stream when the secondary formats are in use, Base-Controlled Secondary Formats may be indicated. In this case, the least significant bits of the base registers are not added to addresses; instead, when the least significant bit of a base register is set, memory-reference instructions which use that base register will use the secondary formats.

This mode, to be useful, requres that the secondary formats be compatible with the primary formats, which would still apply to all register-to-register instructions. One important application of this mode is to allow Fast Long Single and Fast Intermediate modes, which affect how floating-point numbers are stored in memory, to be alternated with other modes which provide single-precision and intermediate-precision floating-point numbers of the same width, but arranged in a different fashion in memory.

The first available use for the postfix supplementary bits, Condition Code Update Suppression, is to indicate whether or not the condition code bits are set by an instruction. A 1 bit indicates that updating the condition code bits is suppressed, as opposed to the default behavior. This applies not only to the four standard condition code bits at the end of the Program Status Block (or their alternate versions for Biconditional Operation) but also to the extra status bits for standard format floating-point operation.

This allows the computer to adopt a feature provided by some RISC architectures, such as the ARM and the SPARC, so that a branch instruction can easily refer to the state of the condition codes as set some instructions previously. In this way, it need not be delayed, nor would the computer need to resort to speculative execution, in pipelined implementations where an instruction might take several cycles to execute while new instructions are started, perhaps as rapidly as one per cycle, before earlier ones have completed.

The second purpose for these bits, Alternate Condition Code Copy, addresses the same issue; here, a 1 bit indicates that the primary four bits of the condition code only are saved to the four bits of the Program Status Block which precede them. This feature works with the additional branch instructions found in Normal Mode, and acts like the C bit found only in register-to-register instructions in that mode. Note that in the case of biconditional operation, a second set of alternate condition codes is present for the alternate sequence.

The third possible use for the postfix supplementary bits is Branch Hint/Cache Hint mode, in which the 16 bits at the end of each 256-bit block of instructions are used to provide branch hints and cache hints.

When an instruction is a conditional branch instruction, the bit corresponding to the first halfword of the instruction indicates, if it is a 0, that the branch is most likely not to be performed, and, if it is a 1, it indicates that the branch is more likely to be taken than not.

This is used to determine which case of the branch should be given priority for placement in the instruction pipeline. (Whether both sides of the branch, or either side of the branch, can be so placed, and even whether or not instruction execution is pipelined, is, of course, model-dependent.)

When an instruction is a memory-reference instruction, and an address field corresponding to each of its memory operands that occupies at least one halfword is present, the bit corresponding to the first halfword of the address field for an operand indicates, if it is 0, that the operand is likely to be found in the cache, and, if it is 1, that the operand is expected to be more likely to be in main memory.

A correct hint may speed accesses to memory operands on a few implementations of this architecture. The main use of this hint that is likelier to be applicable on a given implementation, however, deals with what happens to a flagged operand after it is fetched. If an operand is flagged, this indicates that it is used just once, and if the same instruction is executed again, indexing will cause it to refer to an unrelated and distant location in memory. This means that a flagged operand need not be retained in the cache. On some implementations, data must always be pulled into the cache before it can be processed. In that case, given a cache which operates on the basis of a least-recently-used algorithm, a cache line which contains only flagged operands will be aged at an accelerated rate.

This rate is indicated by the Cache Hint Ratio field in the Program Status Block as follows:

If, in addition, the bit corresponding to the first halfword of a memory-reference instruction is a 1, this indicates that flagged operands belong to a class with an intermediate likelihood of being in the cache, and in that case, the rate of accelerated aging for them is indicated by the Low Cache Hint Ratio field in the Program Status Block, which is coded in this manner:

Note, however, that this differential rate only applies when the process to which the data belongs is operating. When any one process is executing, all data belonging to other processes may age at an accelerated or decelerated rate determined by the priority assigned to the process. This is set by placing three bits in the Inactive Cache Ratio field of the Program Status Block while that process is active, which are then saved in association with the process when it is inactive. These bits are interpreted as follows:

000 1/16x
001  1/4x
010    1x
011    4x
100   16x 
101   64x
110  256x
111 do not reserve cache once inactive

Since the amount of the cache used by a given process depends on how long it was active, in order to keep some cache reserved for a very high-priority process, a specified portion of cache has to be specifically allocated to the process: not aging the cache the process was using at the time the switch to another process takes place could easily leave the computer without enough available cache to function effectively.

Also, other factors influence the rate at which information is swapped out of the cache. When data memory width control is used, if there is data stored in the cache at a width no longer in use for either the primary or secondary set of data formats, it will normally be swapped out before any other data, but the TSETPW and TSETSW instructions are available to override this when the data memory width control is changed for only a brief interval.

The fourth possible use for the 16-bit field at the end of each 256-bit block of instructions is Extended Translate Mode. In this mode, a one bit which corresponds to the first 16 bits of an instruction has the effect of placing an implied 173703 mode-independent instruction before the instruction. This causes the integer types to be replaced by the register packed types, and the floating types to be replaced by the simple floating types. This mode is useful if it is necessary to use all the available data types on a more or less equal footing for a program.

The fifth possible use for the 16-bit field at the end of each 256-bit block of instructions is Dual-Format Mode. Here, a one bit corresponding to the first 16 bits of an instruction places an implied INUAF mode-independent instruction, with the opcode 171754, in front of the instruction. Note that there is also a bit for Suspended Dual-Format Mode operation present; this is required for this mode, and for the use of the INUAF instruction when this mode is not present, as it may specify that space in the Program Status Block is to be used to describe a secondary format, depending on the current Program Status Block format in use. Also, when Suspended Dual-Format mode is in effect, a bit in the Program Status Block indicates that the operands of long vector instructions are to be in the secondary format without the need of an INUAF instruction to indicate this.

The sixth possible use for the 16-bit field at the end of each 256-bit block is Bimodal Operation. Here, a bit corresponding to the first halfword of an instruction, if it is a 1, indicates that the mode of operation indicated in the 16-bit Alternate Mode section of the Program Status Block, consisting of bits 368 to 383.

It may be noted that one particular mode of operation, Supplementary Mode, was designed specifically for use with this feature as a secondary mode supplying additional instructions other modes lack.

The seventh possible use for the 16-bit field at the end of each 256-bit block is to indicate an alternate set of base registers. If this is chosen, then if the bit corresponding to the first halfword of an instruction is a 1, in each case where a base register is specified to have its contents added to an displacement given in the instruction to form a memory address, the corresponding array scratchpad register is used instead. This mode is particularly useful in combination with bisequential operation, or with bimodal operation, and in that latter case, particularly when one of the two modes is a short page mode and the other mode is not.

The eighth possible use for the 16-bit field at the end of each 256-bit block is Direct Cache Indication. If the bit corresponding to one of the 16-bit address fields of an instruction is a 1, then the 3-bit base register field in the instruction associated with that address is combined with the 16-bit address field to form a 19-bit address within the same high-performance memory as is used with direct cache mode. This allows more comprehensive instruction sets than that of direct cache mode, such as that of vector register mode, to be used in conjunction with an explicit 512 kilobyte high-performance memory. Note that if the Program Status Block bit labelled 32/28 bit displacements is set, direct cache indication cannot operate in the manner described; in this case, if the bit corresponding to the first halfword of an instruction is a 1, all the addresses in the instruction refer to high-performance memory, and none of the address fields in the instruction are lengthened by 16 bits as that mode would otherwise indicate.

The ninth possible use for the 16-bit field at the end of each 256-bit block is Bisequential Operation.

For this mode, the supplementary bits are interpreted as follows:

When a bit which corresponds to the first unit in an instruction is a 0, the following instruction is obtained from the next unit in sequence after the end of the instruction in normal fashion, unless the instruction involves a transfer of control (such as a jump instruction).

When a bit which corresponds to the first unit in an instruction is a 1, the computer switches from the program counter currently in use to the other of two available program counters. Thus, this mode implies the existence of an auxilliary program counter. When instructions are being fetched by the use of the main program counter, a 1 bit means the auxilliary program counter will be used to fetch the next instruction; when instructions are being fetched by the use of the auxilliary program counter, the main program counter will be used to fetch the next instruction.

This also applies to jump instructions; when a jump instruction is so flagged, it will cause the other program counter to be loaded with its destination, and then execution continues using the other program counter. Thus, a jump instruction so flagged is used to initialize the auxilliary program counter when bisequential operation is started.

Also, if the first 16 bits of a conditional branch instruction, which contain the opcode, are not flagged, but the 16-bit address field (or the first 16 bits of a 32-bit address) of the conditional branch instruction is flagged, then the switch between program counters will happen only if the branch is taken. This is not affected, and remains fully usable, if the addresses are, due to the addressing mode in use, shortened by four bits to make room for an indirect addressing bit and a base register field.

As well as there being a second program counter for the second sequence of operation, sizable areas of the program status block are duplicated. Sign, carry, and overflow bits are shared, and so are all the bits implying privilege levels and access to registers. But each sequence of operation may use a different addressing mode, and each sequence can also use different formats for floating-point numbers and integers. When bisequential operation is enabled, these items are copied over from the main process into the alternate process. When it is disabled, the other process ceases to exist; if it is disabled from within the process using the auxilliary program counter, the contents of the auxilliary program counter are copied into the main program counter, and the relevant portions of the program status block are also loaded from the duplicate bits. Also, note that disabling bisequential operation only takes effect when the first instruction is fetched that begins in the next block of sixteen 16-bit units.

One case in which this unusual operational feature may be useful is if a program makes frequent calls to a subroutine that operates like a "state machine", executing one of several different short sequences of instructions depending on which step in its own process it is following. It is possible that this feature could be useful for optimizing merge sort routines.

This feature is based on the bisequential operation mode offered by the Honeywell 800 computer and related computers, but it has one important difference; in that computer, the first bit of an instruction selected either the sequence counter or the cosequence counter explicitly; here, the extra bits select either the program counter in current use, or switching to the other program counter; this means that in this architecture, unlike the Honeywell 800, symmetry between the use of the main and auxilliary program counter allows the same object code to be used with either program counter.

Since the supplemental word for this mode is placed at the opposite end of a block of instructions, it is possible for both explicit indication of parallelism and bisequential operation to be selected at the same time. However, bisequential operation is not available with variant alignment modes.

A bit labelled Suspended Bisequential Operation must be set when bisequential operation is selected to indicate that the auxilliary program counter is in use. As long as that bit remains set, even if the value of the Postfix Supplementary Bit Usage field is changed, the value of the auxilliary program counter will be retained.

This is useful if a procedure that is bisequential in nature has a long stretch of code not involving switches to the other sequence, since then a 16-bit unit that would contain no useful information is restored to being available for program code. The SPC, SPCAN, and SPCIB instructions can still be used in suspended bisequential mode, and thus it is perfectly possible to enter suspended bisequential mode directly, and use that mode exclusively for a computation requiring bisequential operation, if less frequent changes of program counter are more efficiently represented by occasional instructions than by associating a bit with each instruction.

Since bisequential operation leads to two threads of execution which operate to some extent independently, it would at least appear to be useful to allow conditional branch instructions in one thread to function with respect to operations taking place in that thread alone. This is indicated by setting the bit labelled Biconditional Operation, and if that bit is set, the Aux Condition Bits in the Program Status Block are used as the condition code bits whenever the Aux Program Counter is being used as the program counter during bisequential operation.

This facility, of course, is not strictly necessary, since in bisequential operation, transfer of control to the other sequence is strictly voluntary on the part of each sequence, and so such transfers do not need to take place between an instruction that sets a condition code bit and the conditional branch instruction that makes use of it.

This facility is, however, particularly useful if the bit indicating Alternating Bisequential Mode is set. In that mode, instead of requiring either explicit SPC instructions or an additional bit associated with each instruction, once a value has been defined for the auxilliary program counter, until that bit is cleared, instructions are fetched using the two program counters in alternation. Bisequential Operation takes precedence over Alternating Bisequential Mode when it is indicated.

In addition to the standard condition code bits, which reflect the result of the last arithmetic operation performed, the latching bits associated with IEEE-754 compliance also exist in a second copy for use with biconditional operation.

When the bit marked Individual Postfix Supplementary Bit Fields is not set, as noted above, but more than one of the bits corresponding to a possible use of the postfix supplementary bits are set, this indicates that a supplementary bit indicates the simultaneous presence of two special attributes for the instruction to the first halfword of which it corresponds. Of course, not all combinations will be useful.

If the bit for bisequential operation is set simultaneously with the bits for other usages of the supplementary bits, except for the cache hint and branch hint usage, and the direct cache indication usage (when the bit marked 32/28 bit displacements is not set) then the supplementary bits are only used in the normal manner for bisequential operation, and the other special functions indicated apply to instructions when they are fetched by means of the auxilliary program counter, not when the bit corresponding to those instructions is set.

When the array indexing mask is used in this fashion, whether or not any other usages of the supplementary bits are present, the use of supplementary bits for indicating cache hints and branch hints is modified as follows: the branch hint is only present in conditional instructions with an address field occupying at least one halfword, and is in the bit corresponding to the first halfword of the address field, not the first halfword of the instruction, and a cache hint bit always invokes the Cache Hint Ratio field of the Program Status Block, and never the Low Cache Hint Ratio field, so that the bit corresponding to the first halfword of the instruction will not be used in connection with either cache hints or branch hints, so that these remain independent of any other instruction attributes that may be indicated by use of the supplementary bits.

Address Space Size

If the bit marked 64-bit Address Space is set, then the base registers, the scratchpad registers, pointer scratchpad registers, and array scratchpad registers would be actually 64 bits long, although only their least significant 32 bits would be used when this bit is not set.

In addition, the two bits labelled Extended Addressing control an alternative way to have a virtual address longer than 32 bits without going to 64-bit addressing.

If they are not zero, a virtual address that is 36, 40, or 44 bits wide is used, and the contents of the base register used to reference memory in any instruction are shifted left four, eight, or twelve bits before the displacement is added to them to form the effective address.

Note that if base-controlled secondary format is selected, the least significant bit of the base register continues to be excluded from address calculation, and thus the size of the pages into which memory is effectively divided by this mode will be doubled.

Instruction Interpretation Control

The bit marked Guarded Execution Mode will cause three of the eight scratchpad registers to be put to a special purpose. Scratchpad registers 2 and 3 will indicate the lower and upper bounds respectively of the area of memory to which this mode applies, and scratchpad register 0 will point to another area of memory, each bit of which corresponds to a 256-bit block of memory in the area delimited by the contents of scratchpad registers 2 and 3.

In that area of memory, instructions can only be fetched for execution if the corresponding bit is a 1. An attempt to do so otherwise will cause an interrupt. That bit will be cleared if data is stored anywhere in the 256-bit block of memory to which it refers.

The bit marked Bounded Index Mode causes seven of the pointer scratchpad registers, and seven of the array scratchpad registers, to be put to a special purpose. For any memory-reference instruction which uses one of the arithmetic/index registers as an index register, the corresponding pointer scratchpad register indicates the lower bound, and the corresponding array scratchpad register indicates the upper bound, of the memory to which it is allowed to refer. Reference to any other part of memory will cause an interrupt.

The purpose of these two modes, which may be used in conjunction with each other, is to provide an additional layer of protection against buffer overflows. Although they can normally be turned off in user mode, they are still useful, because their purpose is to prevent untrusted code from being executed at all, not to restrict its execution. The Program Status Mask, of course, can be used to lock either or both of these modes in place.

The guarded execution mode is similar to, but more limited than, a feature which has recently been offered on 64-bit microprocessors from both AMD and Intel. A single bit corresponds to a 256-bit block of memory, rather than to a 16-bit halfword, so that the mode can operate with explicit indication of parallelism, or bisequential operation, to be described next. Note that while guarded execution can be used for a program that calls external subroutines, as long as they make no use of guarded execution, bounded index mode will normally have to be turned on only between subroutine calls. Also, of course, these modes are only useful when an addressing mode is selected where the various scratchpad pointer registers are not used for their normal purpose of permitting shorter instruction formats.

The entries in the lowest-level process page tables in the address translation unit also contain flag bits which, in addition to indicating whether a process has read or write access to the portion of memory to which that entry refers, indicate whether or not the processor can fetch instructions for the purpose of execution from that part of memory. This feature, which is always on, provides a functionality similar to that of guarded execution mode, and is much more similar to the feature which the existing processors referred to above offer. What is the purpose of having both features?

Guarded Execution Mode has two benefits: a process can specify which parts of memory are executable itself, without having to call upon a supervisory process, and, because it does not make use of the Process Page Table in the address translation unit, fragmentation of that page table is avoided, where a large number of isolated areas of memory have the ability to be executed of their contents controlled individually. It also has a drawback: since the execute flag is in main memory instead of in the address translation unit, it impacts performance.

In addition to Bounded Index Mode, there is also Bounded Array Mode. In this mode, instead of a register holding the maximum allowed value in an index register, following each address field in an instruction that supplies the displacement for an indexed address, a field is present having the same length as the index register used, giving the maximum value it may have for that instruction; thus, every address referencing an array is followed by the size of that array. This allows simple generation of code with enforced bounds checking for arrays; note that this does not enforce bounds within scratchpad areas of memory, or for stack operations. However, the stacks and scratchpads used with some addressing modes are inherently limited in size.

Numeric Formats

Bits 288 through 447 of the Program Status Block control the format in which data is interpreted. As it is possible to have the computer simultaneously working with data in two different formats, bits 288 through 351 and 416 through 431 define the normal data formats used by the machine, and bits 352 through 415 and 432 through 447 define the secondary data formats which may also be in use.

The format of this portion of the Program Status Block is shown below:

To assist emulation, the integer format can be changed from the normal two's complement form to one's complement or sign-magnitude. The two bits which control this have the following meaning:

00: Two's complement

10: One's complement
11: Sign-magnitude

Six bits control the format of the simple floating type.

The first two of those bits allow the simple floating-point format to be modified so that instead of the entire second word of the number being used for the exponent (including the sign of the exponent but not the sign of the number), a portion of that word is used for the exponent only, as follows:

00: Entire last word is exponent
01: 12 bits
10: 8 bits
11: 9 bits

The second two bits of that field give the format of the mantissa, as follows:

00: Two's complement

10: One's complement
11: Sign-magnitude

This is the same coding as used for the format of integers.

The last two bits of that field give the exponent format, as follows:

00: Two's complement
01: Excess-n
10: One's complement
11: Sign-magnitude

Note that the default format for simple floating numbers is also two's complement for both the mantissa and the exponent.

A later section will describe how the floating-point format is indicated by the eight bits dedicated to that purpose.

Because the results of floating-point arithmetic calculations routinely extend into less significant digits than those represented, a rule for rounding the results of calculations is required.

The Program Status Block contains a three-bit field in which a rounding rule can be specified, and the following choices are currently defined:

000: Round
001: Round on load, truncate on store
010: Truncate
011: Round magnitude up


110: Round down
111: Round up

Rounding means that all quantities are rounded to the nearest representable quantity; if, for simplicity, we consider the case of rounding to an integer, 5.1 becomes 5, 5.9 becomes 6.

Truncation means that the magnitude of the number is always rounded down: 5.9 becomes 5, and -5.9 becomes -5.

Rounding the magnitude up means that 5.1 becomes 6, and -5.1 becomes -6.

Rounding down means that each quantity goes to the next lower quantity, with the sign considered: 5.5 becomes 5, -5.5 becomes -6.

Rounding up means that each quantity goes to the next higher quantity, with the sign considered: 5.5 becomes 6, -5.5 becomes -5.

Note that these six modes include all four of the rounding modes called for in the IEEE 478 standard.

A bit is provided to permit exponent wraparound. This allows some calculations which have intermediate results that overflow or underflow to yield the correct final result.

This bit is not applicable to the Standard floating-point format, or to the Native floating-point format unless extremely gradual underflow/overflow and NaN-safe mode are both disabled. It does not interfere with the use of unnormalized floating-point instructions to monitor significance.

When exponent wraparound is in effect, that is, when the bit is set and an applicable floating-point format is in use, floating-point overflows and underflows, in effect, never take place: a wraparound will not trigger a trap, even if floating-point overflow or floating-point underflow is trapped.

Care is required in the use of this setting, as it can easily lead to misleading results from some types of calculation.

The bits marked Common Floating-Point Format, if they are nonzero, override both the regular Floating-point Format and the Simple Floating-Point Format. When these bits are set to a nonzero value, for the Simple Floating-Point format, instead of the exponent field occupying the entire length of the second word of the number, it only occupies the last few bits, as indicated below:

00: Common format not used
01: 12 bits
10: 8 bits
11: 9 bits

The number of bits shown includes the sign of the exponent, but not the sign of the mantissa, which, at the beginning of the mantissa field, is no longer adjacent to the exponent. Also, when a common format is active, the mantissa in the Simple Floating-Point Format is always in sign-magnitude form, and the exponent is always in excess-n form, to permit interoperability with the regular floating-point format.

In addition, the regular floating-point format is changed so that it is the same as the simple floating-point format as modified. Thus, in this case, both the integer and floating-point arithmetic units can operate on floating-point numbers in the same format, allowing an increase in the speed in which the computer can work on floating-point problems.

Setting these bits to a nonzero value, if it is done by an instruction that does not also directly set these other status bits, changes the simple floating-point format bits and the floating-point format bits as shown below:

Common      Simple     Floating-point
00          xxxxxx     xxxxxxxx
01          011101     x0000101
10          101101     x0000001
11          111101     x0000010

so that, when those bits are changed back to zero, intermediate results in the floating-point registers will remain valid, and so that the representation of negative simple floating-point numbers is of a type requiring no conversion for use in the floating-point arithmetic unit.

An x indicates a bit that is left unaffected.

Subsequent changes to the format bits for the simple or regular floating-point formats by instructions that do not affect the bits identifying the common floating-point format, where those bits are not both zero, have no immediate effect on the format of floating-point operands; they merely mean that, once the bits identifying the common floating-point format are set to zero, then the new format they specify will take effect.

Thus, those bits are ignored when a common floating-point format is active, but activating such a format will, when possible, set those bits to a value allowing compatible use of internal results.

Note that when the common floating-point format is selected, this applies not only to the main arithmetic-logic unit, but to the long vector unit and the short vector unit as well; thus, selecting the common floating-point format continues to allow the floating-point portions of the main arithmetic-logic unit and the long vector unit, as well as the short vector unit, to handle floating-point numbers in the same format, and only increases the ability to perform floating-point computations by allowing the integer portion of the main arithmetic-logic unit and the integer portion of the long vector unit to contribute as well.

Note also that neither the short vector arithmetic unit nor the integer arithmetic unit retain extra guard bits in registers, but the floating-point arithmetic unit will continue to be able to use guard bits in this mode, if this function is not disabled.

Also note that the integer arithmetic unit does not have a guard bit, a round bit, and a sticky bit: thus, although when a common floating-point format is active, simple floating instructions will act on numbers represented in the same format as those acted upon by ordinary and short vector instructions, these instructions will produce less accurate results, even when register guard bits are disabled.

The Integer/Fraction bit changes how the result of a multiplication, and the dividend to be input to a division, are represented in fixed-point arithmetic; if it is set, the former is shifted one place to the left, and the later must previously be shifted one place to the left, as compared to its position when this bit is not set, so that fixed-point arithmetic proceeds as if the binary point of fixed-point numbers follows the sign rather than being located at the end of the number.

The bit marked Compliant Mode, if it is not set, permits the computer to use fast algorithms for divide and square root which may not produce in all cases the result nearest to the exact answer, contrary to what is required for compliance with IEEE 754.

For these operations, if an implementation has available an algorithm which benefits by removing the requirement for compliance, and for operations where the best result is not required by that standard, such as the hardware log and trig functions, the attempt is still made to provide an accuracy of 0.51 units in the last place of the number as represented in the internal format within the register; this requires an approximate answer with seven additional bits of precision. Furthermore, as extra precision is required to calculate such functions as log and trig functions to a given accuracy, the unit which calculates them may have internally as much as sixteen bits of precision over and above the added precision of the internal register format of numbers, which is an additional eight bits in most formats, but which may also be nine or twelve bits for the Standard floating-point format.

It may also be noted that it is intended that the speed penalty for use of the Compliant bit, where it has an effect, is intended to be limited; perhaps three additional cycles of latency for divide using the main ALU (any additional ALUs would likely use algorithms requiring less circuitry and which are compliant at all times) and square root, with no throughput penalty.

Two bits in the Program Status Block indicate which format is used for the Small floating-point type, a floating-point type which is only 16 bits long, and which is used with the short vector instructions instead of the Medum floating-point type. The available formats are:

01: Small floating-point with gradual underflow
10: Small floating-point with extremely gradual underflow
11: Small floating-point with hyper-gradual overflow and hyper-gradual underflow

and they are defined in the section on the short vector instructions.

Also, although the compressed decimal format can be selected by use of the 173703 prefix, and, hence, through the use of Extended Translate Mode as well, since it cannot be more fully integrated into the instruction set by means of seven-bit opcode translation, a bit in the Program Status Block is provided to cause the instructions that would operate on normal packed decimal quantities to operate on quantities in compressed decimal format instead.

Normally, the floating-point registers contain guard bits which retain additional precision between operations. As this can mean that the results of a numerical calculation can be modified by whether or not a compiler retains certain intermediate results in registers, instead of storing them in memory, this may be intolerable for some applications, and a bit is provided to disable this feature.

This does not disable the guard bit, the round bit, or the sticky bit, used during a calculation to ensure that any individual operation produces the most accurate possible result; what is disabled instead is performing register-to-register operations at a slightly higher precision than operations which involve numbers stored in memory.

The two bits labelled Rounding Floor Precision have the following significance:

00 Floating
01 Medium/Double
10 Double/Medium
11 Quad

These bits are normally zero. When they are not zero, and the bit disabling the normal retention of extra guard bits in the floating-point registers is set, they indicate that retention of guard bits is to be performed according to IEEE 754 practice; operands of precision greater than the selected precision will not have guard bits retained for them when they are in registers, but operands of precision less than the selected precision will be retained in registers at the selected precision.

When these bits are not zero, and the bit disabling normal retention of extra guard bits is not set, then each format will additionally be expanded by the number of extra guard bits specified in the section on floating-point formats.

Note that the precisions are not listed in their usual order in this field, but in order of length. If the Medium format is 48 bits long, it is represented by 01; if it is 80 bits long, as it is for the Standard floating-point format, it is represented by 10.

Exponent Offset

This architecture provides a considerable flexibility in defining the floating-point format used. However, because of the wide variety in floating-point formats that have existed, including such things as the location of the mantissa and exponent fields, and unused bits or duplicate fields in the format, the attempt has not been made to directly implement every floating-point format ever used; if it is desired to emulate precisely the machine-language operation of a machine whose floating-point format cannot be reproduced by the options provided, operand filtering, to be described shortly, can be used.

Because of this, two features in particular have been deliberately omitted from the possible choices, one of which is representing the exponent of a floating-point number in any format (such as sign-magnitude, one's complement, or two's complement) other than excess-n. The other deals with negative floating-point numbers; while it is possible to invert the remaining bits of a negative floating-point number so that floating-point numbers will collate in the same fashion as two's complement integers, the ability is not provided to increment those bits afterwards to make the complementation a two's complement one, even though two popular architectures, that of the PDP-10 and that of the Scientific Data Systems, later Xerox Data Systems, Sigma computers did this.

These omitted features, along with specifying the location of every bit, it may be noted, are irrelevant to allowing higher-level programs, without explicit dependencies on the floating-point format in use, to be compiled without change for this architecture, provided that an equivalent floating-point format is chosen.

By default, the exponent of a floating-point number is in excess-n notation, where n is that power of two which most nearly divides the exponent range in half, and where the binary point of the mantissa is considered to immediately precede the most significant bit of the mantissa, except where there is a hidden first bit specified for an exponent which is a power of two, in which case the binary point immediately precedes the hidden first bit.

As the permissible range of floating-point numbers is a characteristic of the floating-point format in use that does affect the higher-level language programmer, a four-bit field in the Program Status Block is provided which is a number in two's complement form to be added to the exponent field in a floating-point number before use, so as to permit matching some floating-point formats that have been used at this level.

Subdivided Floating-Point

Normally, when the computer is operating on data in units of 8, 16, 32, and 64 bits, a Medium Precision floating-point number is either 48 bits long or 80 bits long, depending on the floating-point format in use, and aligned on a 16-bit boundary. This means that it may be necessary to fetch two consecutive 256-bit memory words to fetch an aligned Medium Precision number.

The Subdivided Medium mode, described in detail at this location, resolves this by instead using 51-bit or 85-bit floating-point formats, stored five to a 256-bit block or three to a 256-bit block.

This issue does not arise with either normal single-precision floating point or Double Precision floating point. However, the fact that with some small sacrifice of wasted bits it is possible to adjust the length of a floating-point number without sacrificing convenient fetching of aligned operands means that it may be possible to use floating-point operands that are of a more appropriate size than that which is natural for the current memory word size.

Associated with this is Multiplicative Block Indexing. This technique, the basic principle of which is discussed here, allows the contents of an index register to indicate a specific floating-point number directly, at the cost of a few locations being left unused so as to allow the addresses to be calculated rapidly by multiplication by a constant rather than by division by a constant.

Fast Long Single/Fast Intermediate

This feature, described in detail at this location, is intended to permit efficient addressing as well as efficient fetching of floating-point numbers 36 bits and 48 bits in length in a way that is applicable even to programs operating on large memory arrays which have a significant number of unavoidable cache misses.

The intent is to permit large programs to be sped up by being allowed to use no more floating-point precision than they actually require.

It works by causing four 48-bit floating-point numbers, or four 36-bit floating-point numbers, in the rightmost portion of a 256-bit aligned area of memory to be addressed as though they were 64-bit floating-point numbers in that area of memory. The four 48-bit numbers leave the leftmost 64 bits of that area available, and the four 36-bit numbers leave the leftmost 48-bit number available in the space for 48-bit numbers, as illustrated below:

This feature is designed around memory being composed of 256-bit blocks, and therefore it conflicts with data memory width control. Thus, a field in the Program Status Block, Memory Width Override, is provided in both the primary and secondary data formats areas, to indicate which floating-point precisions are used with this feature, to which data memory width control is not to be applied:

01: double precision
10: double precision, medium
11: double precision, medium, single precision

These are the relevant choices, since using this feature on 48-bit floating-point numbers is viewed as depending on also having the ability to address the unused 64 bits in the block, and using it on 36-bit numbers is similarly viewed as depending on the ability to address the unused 48 bits now also present.

Operand Filtering

Fifteen bits of the portion of the Program Status Block which defines the data formats currently in use can be used to specify the use of a filter program to modify the format of numeric operands for machine instructions. At present, this capability is defined only for code 12 microprograms.

Alternatively, these same fifteen bits can be used to indicate, for floating-point operands in the Medium, Floating, and Double precisions, but not in Quadruple precision, that they either have a different size of the exponent field than indicated in the eight bits of the floating-point format (to be described later in the section on floating-point formats), or are stored in different widths of memory, but are otherwise in the same format.

Mixed Exponent Mode

When the mixed floating mode bit is not set, and the mixed exponent/alignment mode bit is set, the 16 bits usually allocated to indicate operand filtering are instead used to indicate the number of bits in use for the exponent portion of medium, floating, and double floating-point operands. This only applies when the general floating-point format is one in which this can be specified, and then either five bits are used to specify that from five to thirty-six bits are used for the exponent, or four bits are used to specify that from four to nineteen bits are used from the exponent, depending on whether floating-point numbers are normalized based on radix-2, or on some larger radix; this field follows that specified in the section on floating point formats. This length is inclusive of the sign of the exponent, which is to be expected as exponents are stored exclusively in excess-n mode in the architecture here specified.

Having a longer exponent field for higher-precision floating-point numbers is a characteristic not only of the IEEE-754 floating-point format, which is predefined as the Compatible floating-point format, and therefore does not require the use of this feature, but of the floating-point formats of some computer architectures predating that standard, such as the Univac 1107 and its successors, the PDP-15 and its predecessors, and the Foxboro FOX-1, and thus it assists in emulation of those architectures.

Mixed Alignment Mode

When the mixed floating mode bit and the mixed exponent/alignment mode bit is set, the 16 bits usually allocated to indicate operand filtering are instead used to indicate the memory width, and the type, of medium, floating, and double floating-point operands.

The memory width is indicated as in the memory width field:

                   Med    Flt Dbl Qua

000:  32-bit word  48/80   32  64 128

010:  48-bit word  72/--   48  96  --
011:  36-bit word  54/90   36  72  --
100:  40-bit word  60/100  40  80  --

110:  60-bit word  45/75   30  60 120

with the widths of the various floating-point types summarized as well, and the type is indicated as follows:

00 Medium
01 Floating
10 Double
11 Quad

One way this can be used would be to fill the 16-bit area for this information with:

x 010 01 011 01 000 10

This would indicate that Medium floating-point numbers were to be treated as single-precision floating-point numbers for a 48-bit word; this would mean that they would be 48 bits long even in the Standard floating-point format, and would be aligned on 48-bit boundaries rather than on 16-bit boundaries, and it would indicate that single-precision floating-point numbers were to be 36 bits long. Double precison numbers would be 64 bits long.

This allows optimal lengths to be chosen for different floating-point types, but it means that floating-point numbers of different precisions cannot be mixed in the same data structures if their type uses a different memory width. As an example, if a FORTRAN compiler were extended to include support, including partial support, for this feature, for example so that it produced code for the processor running in a mode in which most variables were in memory with the usual 32/64-bit word length, but single-precision floating-point numbers were 36 bits wide, and located in memory with the appropriate alignment, an 18/36-bit word length, and double-precision floating-point numbers were 48 bits wide, and located in memory with a 24/48-bit word length for maximum efficiency in handling them, this being done with the intent to secure adequate precision for single-precision numbers, and avoid excessive precision for double-precision numbers, then neither REAL nor DOUBLE PRECISION quantities could be in the same COMMON or EQUIVALENCE statement or with variables having any other type.

Secondary Data Formats

To assist in converting data from one format to another, an area of the Program Status Block is devoted to the description of a second set of data formats; an instruction can be caused to use the data formats described there by the use of a mode-independent instruction, the INUAF instruction, as a prefix. This description is followed by sixteen bits indicating the filtering, if any, used with the secondary data formats, and a separate bit for controlling the use of register guard bits with the secondary floating-point format is also provided.

User-Level Status

The last sixty-four bits of the Program Status Block are almost always available to user programs, and their format is shown below:

In particular, the last thirty-two bits are invariably available to user programs, as some of those bits may change with every instruction executed.

Enlarged Address Space Handling

In 64-bit addressing mode and Extended Addressing Mode, bits are available to control whether or not other quantities are involved with addressing are expanded.

If the bit indicating 64-bit indexing is set, when an arithmetic/index register would normally be used for indexing, the indexing field indicates instead a register pair. Register pairs must start with an even-numbered register, and if register pair 0 is indicated, indexing does not take place, just as indicating register 0 will normally indicate no indexing.

Since the supplementary arithmetic/index registers are 64 bits long, whether or not this bit is set, instructions which directly specify a supplementary register as an index will use its full value even if this bit is not set (however, the higher bits will normally be ignored unless addresses are wider than 32 bits, whether because 64-bit addressing is enabled, or because extended addressing, which shifts the contents of the base registers left before use, is in effect).

Because 64-bit addressing uses only three register pairs as index registers, to allow instead the ability to specify seven index registers in a normal memory-reference instruction in a way that can be useful if the program makes use of the supplementary registers, instead of an odd value in the index field causing an error, or being rounded down to the preceding even number, it indicates that the supplementary register of that number serves as the index register. Thus, with 64-bit indexing, the interpretation of a three-bit index field in an instruction is as follows:

0 No indexing
1 Supplementary register 1
2 Arithmetic/index registers 2 and 3
3 Supplementary register 3
4 Arithmetic/index registers 4 and 5
5 Supplementary register 5
6 Arithmetic/index registers 6 and 7
7 Supplementary register 7

Another method of allowing each base register to permit addressing of a larger area of memory is to have the machine operate with word addressing instead of byte addressing. The field marked Word Displacements controls this, and indicates the size of the word referenced as a multiple of the current size of the operand of a byte instruction:

                                      For alternate
                                      memory widths:

Memory width bits:          000       010 011 100  110

00   1 byte                  8 bits     6   9  10   15
01   2 bytes (halfword)     16 bits    12  18  20   30
10   4 bytes (integer)      32 bits    24  36  40   60
11   8 bytes (long)         64 bits    48  72  80  120

The interpretation of base and index register contents is not affected by this bit; base registers should, in any event, contain a value ending in five zero bits so as not to affect the alignment of operands of any size, but it is intended that in this mode the index register will allow addressing individual bytes within a word.

The function of this field should not be confused with Array Indexing, described below, wherein a bit can specify that index values in a particular address/index register are in multiples of the operand width instead of in bytes. Note that combining this feature with array indexing can make it impossible to address an unaligned operand for some operand lengths.

Base-Register Range Modification

The bit marked Alternate Jump Base Registers causes the eight scratchpad registers, instead of the eight base registers, to be used as the base registers for any instructions involving a transfer of control.

The bit marked Alternate Instruction Memory Base Registers does this both for instructions involving a transfer of control, and for short vector instructions; this bit is for use with memory width control. Since the arithmetic-logic unit for short vector operations does not include the flexibility to handle data that comes from memory built up from 36-bit, 40-bit, or 24-bit words instead of 32-bit words, operands of short vector instructions will not be in the same area of memory as operands for other arithmetic instructions. Given this lack of overlap, a separate set of base registers is a reasonable way to extend the data memory available to a program.

If both of these bits are set, the scratchpad registers are used as the base registers for short vector instructions, and the pointer scratchpad registers are used as the base registers for instructions involving a transfer of control.

When either or both of these bits are set, since both the jump instructions and the short vector instructions belong to the operate instructions, and not to the memory-reference instructions, it is possible, in the short page modes, to load the scratchpad registers with displacements pertaining to 65,536-byte pages instead of 4,096-byte pages, and restore the format of those operate instructions to that associated with the normal mode. In some cases, however, this may not be desired, as it may be intended to allow some code to be used either with the regular base registers or with alternate base registers. Hence, a bit associated with the instruction mode in use, Force Long Page With Alternate Jump/Instruction Memory Base Registers, indicates which course is followed. Note that this does not change the format of instructions when the alternate base registers associated with the postfix supplentary bits are used; with those instructions, the format associated with the short page modes remains in use when those modes are in effect.

The auxilliary program counter has two bits associated with it; one indicates if it contains a value, another if it is currently the active program counter by means of which instructions are chosen for execution.

The Array Indexing Mask controls how the Arithmetic/Index registers function when used as index registers in instructions. The bits correspond to registers 0 through 7 in order from left to right. Normally, when the corresponding bit in this field is zero, the value in an index register is taken to be a displacement in units of bytes when used in the formation of an effective address.

If this bit is set, however, the quantity in the index register is shifted left before being added to the base and displacement portions of the address by an amount depending on the type of the operand involved.

The shifts are:

Byte              0 bits (also long vector of byte)
Halfword          1 bit  (also long vector of halfword)
Integer           2 bits (also long vector of integer)
Long              3 bits (also long vector of long)
Medium            1 bit  (also long vector of medium)
Floating          2 bits (also long vector of floating)
Double            3 bits (also long vector of double)
Quad              4 bits (also long vector of quad)
Short Vector      5 bits (any type of contents)
Character String  0 bits
Packed Decimal    0 bits

Array indexing makes sense when operand items are placed in arrays of values of the same type; when they are placed in records containing items of different types, even if each individual item is fully aligned, the default mode of byte indexing is more appropriate. The array indexing mask allows the number of registers used in each fashion to be adjusted depending on the application addressed by a given program, without the need to explicitly specify this component of the addressing mode within the bits representing an instruction.

If this component of the addressing mode is explicitly specified in an instruction format, these bits are overriden and ignored.

Note that for the mode where the word length is 24 bits, the shifts for the Medium, Floating, Double, and Quad types are increased by one bit, and in the mode where the word length is 60 bits, the shifts for the Medium, Floating, Double, and Quad types are decreased by one bit.

A group of four bits is used to indicate if divide by zero, fixed-point overflow, and floating-point overflow are trapped. No bit is required to indicate whether or not a divide check is trapped because the Divide Extensibly instruction in this architecture produces a double-length quotient, and there is no other divide instruction provided with a double-length dividend but a single-length quotient.

As well, several bits are used to indicate whether or not additional conditions, associated with the IEEE 754 standard, are trapped. Following the Motorola 68881 coprocessor, invalid operation is divided into three, and inexact into two, cases for which trapping may be controlled separately.

The final four bits of the Program Status Block are the condition code bits which indicate the results of arithmetic operations. Boolean operations set the zero bit as applicable, but not the negative bit, which is potentially applicable, as well as not the carry and overflow bits, which do not apply to them; floating-point operations always clear the carry bit, to ensure the ability to use all conditional branches with them in a consistent manner. As well, there are five latching status bits present in the first byte of the last thirty-two bits of the Program Status Block to meet the requirements of the IEEE 754 standard.

The two main copies of the four condition code bits, as an extra one ends the first sixteen bits of the last 32 bits of the Program Status Block, since the sequence of instructions directed by the auxilliary program counter is handled separately, both for those condition codes and the additional ones for IEEE 754, are preceded by an extra copy, which are only changed by certain instructions, such as register-to-register instructions in which the condition code save bit is set.

The purpose of this feature is so that special conditional branch instructions can test those bits, instead of the regular ones, where those bits were last changed several instructions previously... thus avoiding the need for the immediately preceding instruction to finish executing before the correct outcome of the conditional branch can be determined. This allows efficient pipelined operation even where speculative execution is not available in an implementation.

Concluding Remarks

Note that as the Program Status Block describes the state of a process, not the global state of the machine, the bit indicating that some interrupt vectors are to be found at the high end of physical address space rather than at the low end of physical address space will be contained in a control register, not in the Program Status Block. The contents of the control registers are largely implementation dependent, and thus are not discussed here at this time. This bit is normally intended to be set when a system including the processor described here is powered on, so that it can initially use read-only memory for initialization purposes. Only interrupt vectors, not locations where copies of the Program Status Block and the program counters are saved, are moved by this bit.

Thus, there are additional program status bits outside the program status block that are visible only to supervisor processes, and the contents of these may vary, in whole or in part, between implementations of the architecture. Another example of this is that the machine state saved during an interrupt may include information supplementing the program counter value, in order that instructions such as string instructions can be interrupted before they have completed their execution: in the section on interrupts, it is noted that a maximum of 384 bits is allocated, in the 1024-bit process state vector which also includes the Program Status Block, the program counter, and the aux program counter, for this purpose.

The machine state for a given process also includes up to 4,096 extended translation procedures defined using the Define Translation Program instruction. While some procedures might be short, the possibility of procedures as long as 131,072 bytes in length is not excluded. Bits in the status word can be used either to lock this part of the machine state out completely, or to make it read-only to a process.