[Next] [Up/Previous]

Basic Opcodes

As seen on the previous page, the instruction formats for this architecture are as shown below:

The 16-bit instructions

The first line shows the format of sixteen-bit register-to-register instructions.

These operate on banks of 32 registers. The source register is indicated by a field that is only three bits long; this is achieved by requiring that the source and destination registers both belong to the same one of four groups of eight registers: registers 0 to 7, registers 8 to 15, registers 16 to 23, and registers 24 to 31.

One bit in the instruction, as with many RISC architectures, is reserved for indicating if the instruction sets the condition codes. As a result, only six opcode bits are available, and thus the instruction format is used for instructions dealing only with the most common integer and floating-point types:

 0000  0001  0010  0011  0100  0101  0110  0111
 SWH   IH    SW    I     SWM   SWF   SWD   SWQ   000
 LH    ULH   L     UL    LM    LF    LD    LQ    010
 STH   XH    ST    X     STM   STF   STD   STQ   011
 AH    NH    A     N     AM    AF    AD    AQ    100
 SH    OH    S     O     SM    SF    SD    SQ    101
 MH    MEH   M     ME    MM    MF    MD    MQ    110
 DH    DEH   D     DE    DM    DF    DD    DQ    111

applying the types:


H  Halfword   16 bits
   Integer    32 bits


M  Medium     48 bits (16-bit aligned)
F  Floating   32 bits
D  Double     64 bits
Q  Quad      128 bits

to the operations:

SW  Swap

L   Load
ST  Store
A   Add
S   Subtract
M   Multiply
D   Divide

I   Insert

UL  Unsigned Load
X   Exclusive OR
O   OR
ME  Multiply Extensibly
DE  Divide Extensibly

As on the System/360, Load performs sign extension of fixed-point quantities. Insert instead loads the operand into the least significant part of the destination register without changing the most significant part, and Unsigned Load clears the most significant part.

Multiply Extensibly produces a product that is twice the length of its terms.

Divide Extensibly involves a dividend that is twice the length of the divisor; both the quotient and the remainder are also twice as long as the divisor. Except for Divide Extensibly Long (not part of this group of instructions, but which is available in other instruction formats), the remainder is in the destination register, and the quotient is placed in the following register; in the case of Divide Extensibly Long, as the divisor is a 64 bit integer, the dividend, remainder, and quotient are all contained in register pairs.

Instructions that have the same register as source and destination are not allowed in this format, permitting a small number of additional 16-bit instructions.

The second line of the diagram shows an instruction which sets one of seven predication flags based on the status of the condition codes. The field in this instruction that indicates the condition to test for has the same interpretation as that in the conditional branch instructions.

The third line of the diagram shows an instruction the purpose of which is to indicate, for DSP instructions, which U bit, indicating an instruction upon which there is a dependency, corresponds to which D bit, indicating a dependent instruction. The offset field contains a value from 0 to 7, indicating the number of set D bits to skip prior to the first set D bit following this instruction, after the set U bit, which may have preceded this instruction, to which it corresponds.

It is important to re-establish this synchronization at those points in the code which serve as branch targets, or dependencies will not be processed correctly.

The configuration field is used to speed decoding of the following block of instructions, if that block is composed of five 48-bit DSP instructions and one 16-bit instruction (presumably another instruction of this type). Its contents indicate as follows how the next block will be laid out:

0 [1][   3   ][   3   ][   3   ][   3   ][   3   ]
1 [   3   ][1][   3   ][   3   ][   3   ][   3   ]
2 [   3   ][   3   ][1][   3   ][   3   ][   3   ]
3 [   3   ][   3   ][   3   ][1][   3   ][   3   ]
4 [   3   ][   3   ][   3   ][   3   ][1][   3   ]
5 [   3   ][   3   ][   3   ][   3   ][   3   ][1]
6 [1][   3   ][1][   3   ][1][   3   ][1][   3   ]

The values from 0 to 5 allow a single 16-bit instruction to be in any position, and the value 6 permits a block of DSP instructions to have multiple branch targets within it.

If the configuration field contains 7, this means that no layout is imposed on the following block.

This instruction may be placed at any position in the block because if there is a branch target within DSP code, it should be this instruction rather than the following DSP instruction to which the branch points. That is because the length of instructions, being indicated at the start of an instruction rather than the end, can only be determined moving forwards from a given address known to be the start of an instruction; the address portion of a 32-bit instruction can also be a valid 16-bit instruction, for example.

As may be expected, this implies that this instruction only speeds the decoding of the instructions in the following block when it is reached by fall-through; when an instruction within a block is reached by a branch, the lengths of the instructions from the branch point onwards within that block must be determined in the conventional sequential manner.

This instruction is not defined as overriding the normal length indications; if the leading bits of the opcodes of the instructions in the next block do not correspond to the lengths indicated, an error condition may result; but the behavior is undefined; for example, there may be a fall-back to normal sequential decoding, or the lengths may be treated as an override.

The reason the behavior is left as undefined instead of being specified is to permit the most efficient implementation of the normal case, when the indicated lengths do match the indication, so that the maximum gains, or minimum losses, in speed result from the use of the instruction, whatever techniques may be used for the implementation of the architecture.

This is one of two ways in which instructions may be arranged to permit rapid decoding, without the need to serially proceed through instructions one at a time to determine from one instruction where the next one begins. In addition to rapid decoding, this is intended to be used with those 48-bit instructions that contain a U bit and a D bit, thus explicitly indicating instructions that may be executed in parallel, allowing functionality similar to that provided in DSP designs.

It should also be noted that, because the length is indicated in a simple fashion, with instructions beginning with 0 being 16 bits long, instructions beginning with 10 being 32 bits long, and instructins beginning with 11 being 48 bits long, determining where each instruction begins in a number of instructions in a wide fetch from memory should only take a limited number of gate delays, as opposed to multiple cycles.

The other way in which instructions can be arranged to allow for more rapid decoding is to use only 32-bit instructions, possibly accompanied by 16-bit instructions in pairs. This is a feasible option because the 32-bit instructions alone are a reasonably complete instruction set, and it provides performance comparable to conventional RISC architectures.

It is intended that implementations will attempt to accelerate decoding both in the case of exclusive sequences of 32-bit instructions aligned on 32-bit boundaries, and exclusive sequences 32-bit instructions which begin on odd 16-bit boundaries, so that occasional 48-bit or 16-bit instructions in a program primarily composed of 32-bit instructions will not slow down the decoding of instructions in later blocks, without the need for padding with 16-bit no-operation instructions.

The 32-bit instructions

The fourth through ninth lines of the diagram show the format of those basic memory-reference instructions which utilize the short base registers. These instructions may reference unaligned operands, and they may use any register as their destination register.

Instructions of this format begin with the prefix bits 10, indicating an instruction 32 bits in length.

Next is a six-bit opcode; for reasons of pipeline efficiency, as well as to allow access to as many data types as possible, following RISC practice, this format is used only for load and store instructions. However, in addition to the normal Load instruction, which performs sign extension for integer operands, Unsigned Load and Insert are provided as well for integer data types shorter than the arithmetic-index registers.

Next is a five-bit field indicating the destination register for the instruction. In the case of store instructions, the register indicated by that field serves as the source operand, of course.

Then there is a three-bit index register field. It is interpreted as follows:

Field   Register used for indexing

  0     No indexing
  1      1
  2      2
  3      3
  4      4/12/20/28
  5      5/13/21/29
  6      6/14/22/30
  7      7/15/23/31

Thus, the value zero indicates that the address is not indexed; other values in the first half refer to index registers shared among all subthreads working in different groups of eight registers, and values in the second half refer to index registers in the same group of eight registers as the destination register (or, in the case of floating-point instructions, the corresponding group of eight registers).

The basic memory-reference instruction format occupies six lines of the diagram as there are six possible formats for the address constant portion of these instructions, as well as all other instructions which reference main memory with the aid of the short base registers.

In the first address constant format, the fifteen-bit address field is an unsigned positive displacement which is added to the contents of base register zero to form the effective address of the instruction (additionally subject to indexing, if specified).

In the second address constant format also uses base register zero. In this case, the eleven-bit address field of the instruction is shifted left by two bits (in the case of 32-bit addressing) or three bits (in the case of 64-bit addressing) before being added to the contents of base register zero to indicate the memory location containing the address to use for the instruction itself. If indexing is indicated, the contents of the index register are then added to that address to form the effective address of the instruction.

This is post-indexed indirect addressing; it is provided so that a program can make use of multiple large arrays without having to reserve one base register as a pointer to each array.

The third address constant format, the eleven-bit displacement field contains a signed two's complement number which is relative to the contents of the program counter. A value of zero corresponds to the address of the byte immediately following the end of the instruction.

The fourth address constant format allows an address contained in an integer register to be used instead of one produced by adding a displacement to the content of a base register; this is register indirect addressing. The fifth address constant format is similar, but also causes the contents of that register to be incremented by the size of the operand after use; this is auto-increment register indirect addressing.

The sixth address constant format uses base registers 1 through 7; their contents are added to the twelve-bit address field to form the effective address of the instruction.

It is intended that base register zero will be used to point to the local storage of a program, and the other base registers from 1 to 7 will be used for tasks such as passing parameters when calling subroutines, accessing storage shared between routines, and so on.

Allowing address constants which include a base register specification to fit into 16 bits, while also allowing the displacement field to be 15 bits long in one case instead of only 12 bits long, as is the case for most choices of base register, in this fashion is, of course, inspired by the IBM System/360 Model 20. As well, the additional addressing modes which are offered of register indirect addressing and auto-increment register indirect addressing take their inspiration from those offered on the DEC PDP-11 computer and the later TI 9900 microprocessor.

The instructions provided in this format are:

10 000000 LB    Load Byte                    (8-bit integer)
10 000001 STB   Store Byte
10 000010 ULB   Unsigned Load Byte
10 000011 IB    Insert Byte

10 000100 LH    Load Halfword                (16-bit integer)
10 000101 STH   Store Halfword
10 000110 ULH   Unsigned Load Halfword
10 000111 IH    Insert Halfword

10 001000 L     Load                         (32-bit integer)
10 001001 ST    Store
10 001010 UL    Unsigned Load
10 001011 I     Insert

10 001100 LL    Load Long                    (64-bit integer)
10 001101 STL   Store Long

10 010000 LC    Load Classic                 (40-bit float)
10 010001 STC   Store Classic

10 010010 LM    Load Medium                  (48-bit float)
10 010011 STM   Store Medium

10 010100 LF    Load Floating                (32-bit float)
10 010101 STF   Store Floating

10 010110 LD    Load Double                  (64-bit float)
10 010111 STD   Store Double

10 011000 LQ    Load Quad (Extended)         (128-bit float)
10 011001 STQ   Store Quad

Note that some of the floating-point data types are not accessible by instructions in this format, and instead require the use of the 48-bit Supplementary Memory-Reference instruction format.

The tenth through nineteenth lines of the diagram show the format of 32-bit memory-reference instructions that use the long base registers that are additionally provided.

These instructions are only allocated a limited amount of opcode space. The three-bit destination register field is interpreted as follows:

Value   Register used
 0       0
 1       1
 2       8
 3       9
 4      16
 5      17
 6      24
 7      25

The one-bit index register field indicates no indexing if 0; if it is 1, it indicates that the seventh register in the group of eight registers corresponding to the destination register is used: that is, either register 7, 15, 23, or 31.

Note that all eight long base registers are used as base registers; specifying zero in the base register field does not indicate absolute addressing.

These instructions use the opcodes, here shown with the prefix bits and the address postfix bits, if applicable, included:

10 1010 0 00              LB     Load Byte
10 1010 0 01              STB    Store Byte
10 1010 0 10              ULB    Unsigned Load Byte
10 1010 0 11              IB     Insert Byte

10 1010 1 00 ...     0    LH     Load Halfword
10 1010 1 01 ...     0    STH    Store Halfword       
10 1010 1 10 ...     0    ULH    Unsigned Load Halfword
10 1010 1 11 ...     0    IH     Insert Halfword

10 1010 1 00 ...    01    L      Load
10 1010 1 01 ...    01    ST     Store   
10 1010 1 10 ...    01    UL     Unsigned
10 1010 1 11 ...    01    I      Insert

10 1010 1 00 ...   011    LL     Load Long
10 1010 1 01 ...   011    STL    Store Long

10 10110 0 0              LC     Load Classic
10 10110 0 1              STC    Store Classic

10 10110 1 0 ...     0    LM     Load Medium
10 10110 1 1 ...     0    STM    Store Medium       

10 10110 1 0 ...    01    LF     Load Floating
10 10110 1 1 ...    01    STF    Store Floating       

10 10110 1 0 ...   011    LD     Load Double
10 10110 1 1 ...   011    STD    Store Double      

10 10110 1 0 ...  0111    LQ     Load Quad (Extended)
10 10110 1 1 ...  0111    STQ    Store Quad       

10 10110 1 0 ... 01111    LO     Load Octuple (Double Extended)
10 10110 1 1 ... 01111    STO    Store Octuple      

Note that a different set of floating-point variable types are accessible through these instructions; since they only refer to aligned operands as a way of conserving opcode space, certain operand lengths are more convenient than others to use with this format.

The scheme of using the last few bits, which would otherwise be unused if only aligned operands are addressed, of the displacement field in an instruction to allow the same opcode bits to be used for instructions dealing with different data types was used by the Systems Engineering Laboratories SEL 32 computer, as I have acknowledged elsewhere for other designs of mine as well.

The twentieth and twenty-first lines show the formats of the conditional jump instructions. With these instructions, the condition code field can be thought of as part of the opcode of the instruction.

The instructions the format of which is shown in the twentieth line have an index register field, but they do not have a destination register field. The subroutine jump instructions which specify an index register, shown in the twenty-second line of the diagram, to be described below, do indicate a base register as the destination register in which the return address is stored, but this is also not relevant to indicating a group of eight arithmetic-index registers within the complement of thirty-two such registers.

Thus, for these instructions, the index register field either indicates no indexing, if it is zero, or indicates that the index register is one of registers 1 through 7. Since branching affects all subthreads, no attempt has been made to distribute the index register assignments through the four groups of eight arithmetic-index registers.

The opcodes of the conditional jump instructions described in the twentieth and twenty-first lines of the diagram are:

10 011100 0000 0  NOP  No Operation
10 011100 0001 0  JLT  Jump if Less Than
10 011100 0010 0  JEQ  Jump if Equal
10 011100 0011 0  JLE  Jump if Less than or Equal
10 011100 0100 0  JGT  Jump if Greater Than
10 011100 0101 0  JNE  Jump if Not Equal
10 011100 0110 0  JGE  Jump if Greater than or Equal
10 011100 0111 0  JMP  Jump

10 011100 1000 0  JV   Jump if oVerflow
10 011100 1001 0  JNV  Jump if No oVerflow
10 011100 1010 0  JC   Jump if Carry
10 011100 1011 0  JNC  Jump if No Carry

In addition, conditional jump instructions of the form 10 011100 cccc 1 are provided; the format of these instructions is shown in the nineteenth line of the diagram. These instructions use the long base registers, and may not be indexed.

The twenty-second and twenty-third lines show the format of the subroutine jump instructions.

These instructions have the opcodes:

10 01110100       JSR      Jump to Subroutine
10 01110101       JSRLR    Jump to Subroutine Long Return
10 01110110       JSRYR    Jump to Subroutine Extended Return

10 0111100000     JLSRSR   Jump to Long Subroutine Short Return
10 0111100001     JLSR     Jump to Long Subroutine
10 0111100010     JLSRYR   Jump to Long Subroutine Extended Return

thus, the return address may be placed in any of the three types of base register available, independently of the type of base register used for the address of the instruction.

The twenty-fourth line shows the format of the shift instructions.

The available instructions are:

10 10000 000   SLB     Shift Left Byte
10 10000 001   SRB     Shift Right Byte

10 10000 011   ASRB    Arithmetic Shift Right Byte

10 10000 100   ROLB    Rotate Left Byte
10 10000 101   RORB    Rotate Right Byte
10 10000 110   RLCB    Rotate Left through Carry Byte
10 10000 111   RRCB    Rotate Right through Carry Byte

10 10001 000   SLH     Shift Left Halfword
10 10001 001   SRH     Shift Right Halfword

10 10001 011   ASRH    Arithmetic Shift Right Halfword

10 10001 100   ROLH    Rotate Left Halfword
10 10001 101   RORH    Rotate Right Halfword
10 10001 110   RLCH    Rotate Left through Carry Halfword
10 10001 111   RRCH    Rotate Right through Carry Halfword

10 10010 000   SL      Shift Left
10 10010 001   SR      Shift Right

10 10010 011   ASR     Arithmetic Shift Right

10 10010 100   ROL     Rotate Left
10 10010 101   ROR     Rotate Right
10 10010 110   RLC     Rotate Left through Carry
10 10010 111   RRC     Rotate Right through Carry

10 10011 000   SLL     Shift Left Long
10 10011 001   SRL     Shift Right Long

10 10011 011   ASRL    Arithmetic Shift Right Long

10 10011 100   ROLL    Rotate Left Long
10 10011 101   RORL    Rotate Right Long
10 10011 110   RLCL    Rotate Left through Carry Long
10 10011 111   RRCL    Rotate Right through Carry Long

The twenty-fifth through twenty-seventh lines show the format of multiple-register load and store instructions. These are useful to save and restore the registers at the beginning and end of subroutines.

The twenty-fifth line shows the instructions that save and restore the arithmetic-index registers and the floating-point registers, as well as those that save and restore the four groups of eight base registers as a single unit of thirty-two base registers:

10 110000   LML    Load Multiple Long
10 110001   STML   Store Multiple Long
10 110010   LMQ    Load Multiple Quad
10 110011   STMQ   Store Multiple Quad
10 110100   LMB    Load Multiple Base
10 110101   STMB   Store Multiple Base

The twenty-sixth line shows the instructions that save and restore the short vector registers:

10 110110   LMSV   Load Multiple Short Vector
10 110111   STMSV  Store Multiple Short Vector

The twenty-seventh line shows the instructions that save and restore individual sets of eight base registers:

10 11100000   LMSB    Load Multiple Short Base
10 11100001   STMSB   Store Multiple Short Base
10 11100010   LMLB    Load Multiple Long Base
10 11100011   STMLB   Store Multiple Long Base
10 11100100   LMEB    Load Multiple Extended Base
10 11100101   STMEB   Store Multiple Extended Base

All the base registers are 64 bits wide.

The twenty-eighth line shows the format of the short vector arithmetic instructions, which will be described in detail in a later section.

The twenty-ninth line shows the format of three-address register-to-register arithmetic instructions. These allow more operations, and more flexibility in performing those operations, than is available from the 16-bit register-to-register arithmetic instructions. They will be described on a later page, as there are a considerable number of these instructions, covering a large number of data formats.

Note that the instructions in the thirtieth through thirty-fourth lines have a large number of opcode bits in theif first 16 bits. This will allow all of these instructions to begin with the bits 10 11110 which are one of the few patterns left unused by the other 32-bit instructions.

The thirtieth line shows the format of additional two-address register-to-register arithmetic instructions; these are intended to provide sufficient opcode space for a wide variety of operations on a wide variety of data types.

The thirty-first line shows the format of instructions which perform various operations on a single operand contained in one register; these will also be described on a later page for the same reason as those the format of which was shown on the preceding two lines.

The thirty-second and thirty-third lines show the format of regular instructions which access the banks of 128 registers used primarily for DSP-type instructions.

These instructions are normal instructions, not DSP instructions, so they do not have a D bit, as their execution is delayed if necessary due to a dependency by means of the normal interlock mechanism. But they do have a U bit, because they can affect the contents of DSP registers, and thus a DSP instruction could be dependent on their completion.

For purposes of associating a U bit with its corresponding D bit, such an instruction behaves as a DSP instruction with the D bit set to zero.

The thirty-fourth line shows an instruction the purpose of which is to allow, with an overhead limited to 16 bits per instruction, instructions of lengths over 48 bits without complicating the decoding of instructions.

This instruction has the opcode:

10 11111 aaa 0 bbbbb

and would normally be inserted in a program by the assembler as the result of it encountering the mnemonic later for an instruction with an augmented length.

Inserting longer instructions is done as follows:

The 32-bit instruction contains the first 16 bits of the special instruction to be inserted in the program code as its last 16 bits.

A five-bit field indicates how many other instructions are to be executed before the instruction being composed is to be considered reached. This count includes instructions following this special instruction itself within the same 256-bit block of code, but the position of the composed instruction must be in either the next 256-bit aligned block of code, or the one following, not in a later block, or in the same block as the instruction creating it.

The length field in the instruction shows how many additional 16-bit instruction syllables are part of the instruction being composed. They will be read from the end of the next 256-bit code block, thus excluding it from instruction decoding.

Note that multiple instructions of this type are permitted in the same block, in which case they will read multiple sequences of 16-bit instruction syllables starting from the end of the block and not overlapping. Such instructions, however, may not call for more than a total of sixteen syllables from one block.

The intent is to allow instructions of odd lengths to be specified while ensuring that the difficult parts of identifying them are taken care of ahead of the normal fetch and decode cycle for a block of 256 bits of instruction code.

While this scheme definitely has taken inspiration from the "Heads and Tails" instruction format proposal of Heidi Pan, it differs from that scheme in many important respects.

There are three basic types of instructions immediately envisaged which require this technique, or some other technique, to be available so as to allow instructions to be provided outside the range of the three instruction lengths provided normally:

Note that the first two cases, due to a lack of available opcode space among 32-bit instructions, among other factors, also apply to the case where the immediate operand, or the predicated instruction, is 16 bits long, leading to an instruction that is 32 bits long instead of 48 bits long.

The 48-bit instructions

The thirty-fifth line shows the format of the first type of 48-bit instruction that we encounter, the string instructions.

The available string instructions are:

11 00000   CC   Compare Character
11 00001   T    Translate
11 00010   MVC  Move Character
11 00011   TT   Translate and Test
11 00100   P    Pack
11 00101   UP   Unpack
11 00110   E    Edit
11 00111   EM   Edit and Mark

For the Translate instruction, the source operation is the translate table 256 bytes in length, and the destination operand is the string to be transformed in place.

The thirty-sixth line shows the format of the packed decimal instructions:

11 01000   CP   Compare Packed

11 01010   MVP  Move Packed

11 01100   AP   Add Packed
11 01101   SP   Subtract Packed
11 01110   MP   Multiply Packed
11 01111   DP   Divide Packed

The thirty-seventh line shows the format of the long vector register-to-register instructions, and the thirty-eighth and thirty-ninth lines show the formats of the long vector memory-reference instructions. As there are quite a few of these, they will also be dealt with on a later page.

[Next] [Up/Previous]