As seen on the previous page, the instruction formats for this architecture are as shown below:
The first line shows the format of sixteen-bit register-to-register instructions.
These operate on banks of 32 registers. The source register is indicated by a field that is only three bits long; this is achieved by requiring that the source and destination registers both belong to the same one of four groups of eight registers: registers 0 to 7, registers 8 to 15, registers 16 to 23, and registers 24 to 31.
One bit in the instruction, as with many RISC architectures, is reserved for indicating if the instruction sets the condition codes. As a result, only six opcode bits are available, and thus the instruction format is used for instructions dealing only with the most common integer and floating-point types:
0000 0001 0010 0011 0100 0101 0110 0111 SWH IH SW I SWM SWF SWD SWQ 000 001 LH ULH L UL LM LF LD LQ 010 STH XH ST X STM STF STD STQ 011 AH NH A N AM AF AD AQ 100 SH OH S O SM SF SD SQ 101 MH MEH M ME MM MF MD MQ 110 DH DEH D DE DM DF DD DQ 111
applying the types:
Fixed-point: H Halfword 16 bits Integer 32 bits Floating-point: M Medium 48 bits (16-bit aligned) F Floating 32 bits D Double 64 bits Q Quad 128 bits
to the operations:
SW Swap L Load ST Store A Add S Subtract M Multiply D Divide I Insert UL Unsigned Load X Exclusive OR N AND O OR ME Multiply Extensibly DE Divide Extensibly
As on the System/360, Load performs sign extension of fixed-point quantities. Insert instead loads the operand into the least significant part of the destination register without changing the most significant part, and Unsigned Load clears the most significant part.
Multiply Extensibly produces a product that is twice the length of its terms.
Divide Extensibly involves a dividend that is twice the length of the divisor; both the quotient and the remainder are also twice as long as the divisor. Except for Divide Extensibly Long (not part of this group of instructions, but which is available in other instruction formats), the remainder is in the destination register, and the quotient is placed in the following register; in the case of Divide Extensibly Long, as the divisor is a 64 bit integer, the dividend, remainder, and quotient are all contained in register pairs.
Instructions that have the same register as source and destination are not allowed in this format, permitting a small number of additional 16-bit instructions.
The second line of the diagram shows an instruction which sets one of seven predication flags based on the status of the condition codes. The field in this instruction that indicates the condition to test for has the same interpretation as that in the conditional branch instructions.
The third line of the diagram shows an instruction the purpose of which is to indicate, for DSP instructions, which U bit, indicating an instruction upon which there is a dependency, corresponds to which D bit, indicating a dependent instruction. The offset field contains a value from 0 to 7, indicating the number of set D bits to skip prior to the first set D bit following this instruction, after the set U bit, which may have preceded this instruction, to which it corresponds.
It is important to re-establish this synchronization at those points in the code which serve as branch targets, or dependencies will not be processed correctly.
The configuration field is used to speed decoding of the following block of instructions, if that block is composed of five 48-bit DSP instructions and one 16-bit instruction (presumably another instruction of this type). Its contents indicate as follows how the next block will be laid out:
0 [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ] 1 [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ] 2 [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ] 3 [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ] 4 [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ] 5 [ 3 ][ 3 ][ 3 ][ 3 ][ 3 ] 6 [ 3 ][ 3 ][ 3 ][ 3 ]
The values from 0 to 5 allow a single 16-bit instruction to be in any position, and the value 6 permits a block of DSP instructions to have multiple branch targets within it.
If the configuration field contains 7, this means that no layout is imposed on the following block.
This instruction may be placed at any position in the block because if there is a branch target within DSP code, it should be this instruction rather than the following DSP instruction to which the branch points. That is because the length of instructions, being indicated at the start of an instruction rather than the end, can only be determined moving forwards from a given address known to be the start of an instruction; the address portion of a 32-bit instruction can also be a valid 16-bit instruction, for example.
As may be expected, this implies that this instruction only speeds the decoding of the instructions in the following block when it is reached by fall-through; when an instruction within a block is reached by a branch, the lengths of the instructions from the branch point onwards within that block must be determined in the conventional sequential manner.
This instruction is not defined as overriding the normal length indications; if the leading bits of the opcodes of the instructions in the next block do not correspond to the lengths indicated, an error condition may result; but the behavior is undefined; for example, there may be a fall-back to normal sequential decoding, or the lengths may be treated as an override.
The reason the behavior is left as undefined instead of being specified is to permit the most efficient implementation of the normal case, when the indicated lengths do match the indication, so that the maximum gains, or minimum losses, in speed result from the use of the instruction, whatever techniques may be used for the implementation of the architecture.
This is one of two ways in which instructions may be arranged to permit rapid decoding, without the need to serially proceed through instructions one at a time to determine from one instruction where the next one begins. In addition to rapid decoding, this is intended to be used with those 48-bit instructions that contain a U bit and a D bit, thus explicitly indicating instructions that may be executed in parallel, allowing functionality similar to that provided in DSP designs.
It should also be noted that, because the length is indicated in a simple fashion, with instructions beginning with 0 being 16 bits long, instructions beginning with 10 being 32 bits long, and instructins beginning with 11 being 48 bits long, determining where each instruction begins in a number of instructions in a wide fetch from memory should only take a limited number of gate delays, as opposed to multiple cycles.
The other way in which instructions can be arranged to allow for more rapid decoding is to use only 32-bit instructions, possibly accompanied by 16-bit instructions in pairs. This is a feasible option because the 32-bit instructions alone are a reasonably complete instruction set, and it provides performance comparable to conventional RISC architectures.
It is intended that implementations will attempt to accelerate decoding both in the case of exclusive sequences of 32-bit instructions aligned on 32-bit boundaries, and exclusive sequences 32-bit instructions which begin on odd 16-bit boundaries, so that occasional 48-bit or 16-bit instructions in a program primarily composed of 32-bit instructions will not slow down the decoding of instructions in later blocks, without the need for padding with 16-bit no-operation instructions.
The fourth through ninth lines of the diagram show the format of those basic memory-reference instructions which utilize the short base registers. These instructions may reference unaligned operands, and they may use any register as their destination register.
Instructions of this format begin with the prefix bits 10, indicating an instruction 32 bits in length.
Next is a six-bit opcode; for reasons of pipeline efficiency, as well as to allow access to as many data types as possible, following RISC practice, this format is used only for load and store instructions. However, in addition to the normal Load instruction, which performs sign extension for integer operands, Unsigned Load and Insert are provided as well for integer data types shorter than the arithmetic-index registers.
Next is a five-bit field indicating the destination register for the instruction. In the case of store instructions, the register indicated by that field serves as the source operand, of course.
Then there is a three-bit index register field. It is interpreted as follows:
Field Register used for indexing value 0 No indexing 1 1 2 2 3 3 4 4/12/20/28 5 5/13/21/29 6 6/14/22/30 7 7/15/23/31
Thus, the value zero indicates that the address is not indexed; other values in the first half refer to index registers shared among all subthreads working in different groups of eight registers, and values in the second half refer to index registers in the same group of eight registers as the destination register (or, in the case of floating-point instructions, the corresponding group of eight registers).
The basic memory-reference instruction format occupies six lines of the diagram as there are six possible formats for the address constant portion of these instructions, as well as all other instructions which reference main memory with the aid of the short base registers.
In the first address constant format, the fifteen-bit address field is an unsigned positive displacement which is added to the contents of base register zero to form the effective address of the instruction (additionally subject to indexing, if specified).
In the second address constant format also uses base register zero. In this case, the eleven-bit address field of the instruction is shifted left by two bits (in the case of 32-bit addressing) or three bits (in the case of 64-bit addressing) before being added to the contents of base register zero to indicate the memory location containing the address to use for the instruction itself. If indexing is indicated, the contents of the index register are then added to that address to form the effective address of the instruction.
This is post-indexed indirect addressing; it is provided so that a program can make use of multiple large arrays without having to reserve one base register as a pointer to each array.
The third address constant format, the eleven-bit displacement field contains a signed two's complement number which is relative to the contents of the program counter. A value of zero corresponds to the address of the byte immediately following the end of the instruction.
The fourth address constant format allows an address contained in an integer register to be used instead of one produced by adding a displacement to the content of a base register; this is register indirect addressing. The fifth address constant format is similar, but also causes the contents of that register to be incremented by the size of the operand after use; this is auto-increment register indirect addressing.
The sixth address constant format uses base registers 1 through 7; their contents are added to the twelve-bit address field to form the effective address of the instruction.
It is intended that base register zero will be used to point to the local storage of a program, and the other base registers from 1 to 7 will be used for tasks such as passing parameters when calling subroutines, accessing storage shared between routines, and so on.
Allowing address constants which include a base register specification to fit into 16 bits, while also allowing the displacement field to be 15 bits long in one case instead of only 12 bits long, as is the case for most choices of base register, in this fashion is, of course, inspired by the IBM System/360 Model 20. As well, the additional addressing modes which are offered of register indirect addressing and auto-increment register indirect addressing take their inspiration from those offered on the DEC PDP-11 computer and the later TI 9900 microprocessor.
The instructions provided in this format are:
10 000000 LB Load Byte (8-bit integer) 10 000001 STB Store Byte 10 000010 ULB Unsigned Load Byte 10 000011 IB Insert Byte 10 000100 LH Load Halfword (16-bit integer) 10 000101 STH Store Halfword 10 000110 ULH Unsigned Load Halfword 10 000111 IH Insert Halfword 10 001000 L Load (32-bit integer) 10 001001 ST Store 10 001010 UL Unsigned Load 10 001011 I Insert 10 001100 LL Load Long (64-bit integer) 10 001101 STL Store Long 10 010000 LC Load Classic (40-bit float) 10 010001 STC Store Classic 10 010010 LM Load Medium (48-bit float) 10 010011 STM Store Medium 10 010100 LF Load Floating (32-bit float) 10 010101 STF Store Floating 10 010110 LD Load Double (64-bit float) 10 010111 STD Store Double 10 011000 LQ Load Quad (Extended) (128-bit float) 10 011001 STQ Store Quad
Note that some of the floating-point data types are not accessible by instructions in this format, and instead require the use of the 48-bit Supplementary Memory-Reference instruction format.
The tenth through nineteenth lines of the diagram show the format of 32-bit memory-reference instructions that use the long base registers that are additionally provided.
These instructions are only allocated a limited amount of opcode space. The three-bit destination register field is interpreted as follows:
Value Register used 0 0 1 1 2 8 3 9 4 16 5 17 6 24 7 25
The one-bit index register field indicates no indexing if 0; if it is 1, it indicates that the seventh register in the group of eight registers corresponding to the destination register is used: that is, either register 7, 15, 23, or 31.
Note that all eight long base registers are used as base registers; specifying zero in the base register field does not indicate absolute addressing.
These instructions use the opcodes, here shown with the prefix bits and the address postfix bits, if applicable, included:
10 1010 0 00 LB Load Byte 10 1010 0 01 STB Store Byte 10 1010 0 10 ULB Unsigned Load Byte 10 1010 0 11 IB Insert Byte 10 1010 1 00 ... 0 LH Load Halfword 10 1010 1 01 ... 0 STH Store Halfword 10 1010 1 10 ... 0 ULH Unsigned Load Halfword 10 1010 1 11 ... 0 IH Insert Halfword 10 1010 1 00 ... 01 L Load 10 1010 1 01 ... 01 ST Store 10 1010 1 10 ... 01 UL Unsigned 10 1010 1 11 ... 01 I Insert 10 1010 1 00 ... 011 LL Load Long 10 1010 1 01 ... 011 STL Store Long 10 10110 0 0 LC Load Classic 10 10110 0 1 STC Store Classic 10 10110 1 0 ... 0 LM Load Medium 10 10110 1 1 ... 0 STM Store Medium 10 10110 1 0 ... 01 LF Load Floating 10 10110 1 1 ... 01 STF Store Floating 10 10110 1 0 ... 011 LD Load Double 10 10110 1 1 ... 011 STD Store Double 10 10110 1 0 ... 0111 LQ Load Quad (Extended) 10 10110 1 1 ... 0111 STQ Store Quad 10 10110 1 0 ... 01111 LO Load Octuple (Double Extended) 10 10110 1 1 ... 01111 STO Store Octuple
Note that a different set of floating-point variable types are accessible through these instructions; since they only refer to aligned operands as a way of conserving opcode space, certain operand lengths are more convenient than others to use with this format.
The scheme of using the last few bits, which would otherwise be unused if only aligned operands are addressed, of the displacement field in an instruction to allow the same opcode bits to be used for instructions dealing with different data types was used by the Systems Engineering Laboratories SEL 32 computer, as I have acknowledged elsewhere for other designs of mine as well.
The twentieth and twenty-first lines show the formats of the conditional jump instructions. With these instructions, the condition code field can be thought of as part of the opcode of the instruction.
The instructions the format of which is shown in the twentieth line have an index register field, but they do not have a destination register field. The subroutine jump instructions which specify an index register, shown in the twenty-second line of the diagram, to be described below, do indicate a base register as the destination register in which the return address is stored, but this is also not relevant to indicating a group of eight arithmetic-index registers within the complement of thirty-two such registers.
Thus, for these instructions, the index register field either indicates no indexing, if it is zero, or indicates that the index register is one of registers 1 through 7. Since branching affects all subthreads, no attempt has been made to distribute the index register assignments through the four groups of eight arithmetic-index registers.
The opcodes of the conditional jump instructions described in the twentieth and twenty-first lines of the diagram are:
10 011100 0000 0 NOP No Operation 10 011100 0001 0 JLT Jump if Less Than 10 011100 0010 0 JEQ Jump if Equal 10 011100 0011 0 JLE Jump if Less than or Equal 10 011100 0100 0 JGT Jump if Greater Than 10 011100 0101 0 JNE Jump if Not Equal 10 011100 0110 0 JGE Jump if Greater than or Equal 10 011100 0111 0 JMP Jump 10 011100 1000 0 JV Jump if oVerflow 10 011100 1001 0 JNV Jump if No oVerflow 10 011100 1010 0 JC Jump if Carry 10 011100 1011 0 JNC Jump if No Carry
In addition, conditional jump instructions of the form 10 011100 cccc 1 are provided; the format of these instructions is shown in the nineteenth line of the diagram. These instructions use the long base registers, and may not be indexed.
The twenty-second and twenty-third lines show the format of the subroutine jump instructions.
These instructions have the opcodes:
10 01110100 JSR Jump to Subroutine 10 01110101 JSRLR Jump to Subroutine Long Return 10 01110110 JSRYR Jump to Subroutine Extended Return 10 0111100000 JLSRSR Jump to Long Subroutine Short Return 10 0111100001 JLSR Jump to Long Subroutine 10 0111100010 JLSRYR Jump to Long Subroutine Extended Return
thus, the return address may be placed in any of the three types of base register available, independently of the type of base register used for the address of the instruction.
The twenty-fourth line shows the format of the shift instructions.
The available instructions are:
10 10000 000 SLB Shift Left Byte 10 10000 001 SRB Shift Right Byte 10 10000 011 ASRB Arithmetic Shift Right Byte 10 10000 100 ROLB Rotate Left Byte 10 10000 101 RORB Rotate Right Byte 10 10000 110 RLCB Rotate Left through Carry Byte 10 10000 111 RRCB Rotate Right through Carry Byte 10 10001 000 SLH Shift Left Halfword 10 10001 001 SRH Shift Right Halfword 10 10001 011 ASRH Arithmetic Shift Right Halfword 10 10001 100 ROLH Rotate Left Halfword 10 10001 101 RORH Rotate Right Halfword 10 10001 110 RLCH Rotate Left through Carry Halfword 10 10001 111 RRCH Rotate Right through Carry Halfword 10 10010 000 SL Shift Left 10 10010 001 SR Shift Right 10 10010 011 ASR Arithmetic Shift Right 10 10010 100 ROL Rotate Left 10 10010 101 ROR Rotate Right 10 10010 110 RLC Rotate Left through Carry 10 10010 111 RRC Rotate Right through Carry 10 10011 000 SLL Shift Left Long 10 10011 001 SRL Shift Right Long 10 10011 011 ASRL Arithmetic Shift Right Long 10 10011 100 ROLL Rotate Left Long 10 10011 101 RORL Rotate Right Long 10 10011 110 RLCL Rotate Left through Carry Long 10 10011 111 RRCL Rotate Right through Carry Long
The twenty-fifth through twenty-seventh lines show the format of multiple-register load and store instructions. These are useful to save and restore the registers at the beginning and end of subroutines.
The twenty-fifth line shows the instructions that save and restore the arithmetic-index registers and the floating-point registers, as well as those that save and restore the four groups of eight base registers as a single unit of thirty-two base registers:
10 110000 LML Load Multiple Long 10 110001 STML Store Multiple Long 10 110010 LMQ Load Multiple Quad 10 110011 STMQ Store Multiple Quad 10 110100 LMB Load Multiple Base 10 110101 STMB Store Multiple Base
The twenty-sixth line shows the instructions that save and restore the short vector registers:
10 110110 LMSV Load Multiple Short Vector 10 110111 STMSV Store Multiple Short Vector
The twenty-seventh line shows the instructions that save and restore individual sets of eight base registers:
10 11100000 LMSB Load Multiple Short Base 10 11100001 STMSB Store Multiple Short Base 10 11100010 LMLB Load Multiple Long Base 10 11100011 STMLB Store Multiple Long Base 10 11100100 LMEB Load Multiple Extended Base 10 11100101 STMEB Store Multiple Extended Base
All the base registers are 64 bits wide.
The twenty-eighth line shows the format of the short vector arithmetic instructions, which will be described in detail in a later section.
The twenty-ninth line shows the format of three-address register-to-register arithmetic instructions. These allow more operations, and more flexibility in performing those operations, than is available from the 16-bit register-to-register arithmetic instructions. They will be described on a later page, as there are a considerable number of these instructions, covering a large number of data formats.
Note that the instructions in the thirtieth through thirty-fourth lines have a large number of opcode bits in theif first 16 bits. This will allow all of these instructions to begin with the bits 10 11110 which are one of the few patterns left unused by the other 32-bit instructions.
The thirtieth line shows the format of additional two-address register-to-register arithmetic instructions; these are intended to provide sufficient opcode space for a wide variety of operations on a wide variety of data types.
The thirty-first line shows the format of instructions which perform various operations on a single operand contained in one register; these will also be described on a later page for the same reason as those the format of which was shown on the preceding two lines.
The thirty-second and thirty-third lines show the format of regular instructions which access the banks of 128 registers used primarily for DSP-type instructions.
These instructions are normal instructions, not DSP instructions, so they do not have a D bit, as their execution is delayed if necessary due to a dependency by means of the normal interlock mechanism. But they do have a U bit, because they can affect the contents of DSP registers, and thus a DSP instruction could be dependent on their completion.
For purposes of associating a U bit with its corresponding D bit, such an instruction behaves as a DSP instruction with the D bit set to zero.
The thirty-fourth line shows an instruction the purpose of which is to allow, with an overhead limited to 16 bits per instruction, instructions of lengths over 48 bits without complicating the decoding of instructions.
This instruction has the opcode:
10 11111 aaa 0 bbbbb
and would normally be inserted in a program by the assembler as the result of it encountering the mnemonic later for an instruction with an augmented length.
Inserting longer instructions is done as follows:
The 32-bit instruction contains the first 16 bits of the special instruction to be inserted in the program code as its last 16 bits.
A five-bit field indicates how many other instructions are to be executed before the instruction being composed is to be considered reached. This count includes instructions following this special instruction itself within the same 256-bit block of code, but the position of the composed instruction must be in either the next 256-bit aligned block of code, or the one following, not in a later block, or in the same block as the instruction creating it.
The length field in the instruction shows how many additional 16-bit instruction syllables are part of the instruction being composed. They will be read from the end of the next 256-bit code block, thus excluding it from instruction decoding.
Note that multiple instructions of this type are permitted in the same block, in which case they will read multiple sequences of 16-bit instruction syllables starting from the end of the block and not overlapping. Such instructions, however, may not call for more than a total of sixteen syllables from one block.
The intent is to allow instructions of odd lengths to be specified while ensuring that the difficult parts of identifying them are taken care of ahead of the normal fetch and decode cycle for a block of 256 bits of instruction code.
While this scheme definitely has taken inspiration from the "Heads and Tails" instruction format proposal of Heidi Pan, it differs from that scheme in many important respects.
There are three basic types of instructions immediately envisaged which require this technique, or some other technique, to be available so as to allow instructions to be provided outside the range of the three instruction lengths provided normally:
Note that the first two cases, due to a lack of available opcode space among 32-bit instructions, among other factors, also apply to the case where the immediate operand, or the predicated instruction, is 16 bits long, leading to an instruction that is 32 bits long instead of 48 bits long.
The thirty-fifth line shows the format of the first type of 48-bit instruction that we encounter, the string instructions.
The available string instructions are:
11 00000 CC Compare Character 11 00001 T Translate 11 00010 MVC Move Character 11 00011 TT Translate and Test 11 00100 P Pack 11 00101 UP Unpack 11 00110 E Edit 11 00111 EM Edit and Mark
For the Translate instruction, the source operation is the translate table 256 bytes in length, and the destination operand is the string to be transformed in place.
The thirty-sixth line shows the format of the packed decimal instructions:
11 01000 CP Compare Packed 11 01010 MVP Move Packed 11 01100 AP Add Packed 11 01101 SP Subtract Packed 11 01110 MP Multiply Packed 11 01111 DP Divide Packed
The thirty-seventh line shows the format of the long vector register-to-register instructions, and the thirty-eighth and thirty-ninth lines show the formats of the long vector memory-reference instructions. As there are quite a few of these, they will also be dealt with on a later page.