The Opcodes

Having worked out, on the previous page, how to fit everything required for the instruction set into the space available, here are the instruction formats for the addressing modes that have survived:

The diagram is divided into three boxes, labelled DSP, RISC, and CISC.

The next step is to ask if the instructions for every operation of which I wish the architecture to be capable will really fit into the available space.

Some minor changes have been made to the CISC mode compared to the depiction on the previous page. The seven-bit opcodes for the vector instructions are no longer restricted to the basic set; the first two bits of the opcode may both be one. Instead, additional opcode space was created by not allowing both operands of a register-to-register instruction to be identical.

Thus, the instructions in the CISC mode from the Branch instructions to the Register-to-Register instructions are now shown in decoding order; if the first two bits of the opcode field are 1, one has a Branch or Conditional Branch instruction; otherwise, if the two register operands of the first of two register-to-register instructions in a 32-bit word are identical, one might have either a Three-address Register-to-Register instruction or a Shift instruction; otherwise, if the two register operands of the second of two register-to-register instructions in a 32-bit word are identical, one might have a Supplementary Register-to-Register instruction; otherwise, one has the default compound Register-to-Register word containing two instructions.

The Supplementary Register-to-Register instructions, like normal register-to-register instructions, cannot have both register fields identical; therefore, they appear after the Three-Address Register-to-Register instructions and the Shift instructions, because those instructions repurpose the last six bits of the 32-bit word, and thus they do not (at least in the case of the Shift instructions) guarantee that the last two three-bit areas of the word are not identical.

Another change is that the copious opcode space available for 64-bit instructions has now been utilized to provide a simple means of specifying unaligned memory-reference instructions. Just as a 32-bit instruction can contain two register-to-register instructions, a 64-bit instruction can contain two 32-bit instructions which may either be a dual register-to-register instruction, or a memory-reference instruction not restricted to aligned operands, but it may not contain 32-bit instructions of other types.

To be more specific, the "other types" that are excluded are the Branch, Conditional Branch, Vector, and Multiple-Register instructions. Those additional types of instructions that are indicated by making the source and destination registers of a register-to-register instruction the same, the Three-Address Register-to-Register, Shift, and Supplementary Register-to-Register types of instruction, are permitted to occur as one of the halves of a 64-bit instruction in the Dual Simple format.

It is important to allow multiple-register instructions, on the other hand, in regular 32-bit instructions so that one can appear, for example, at the very beginning of a subroutine (remember the restrictions on branch targets, so that 64-bit instructions can only appear where it is guaranteed that the previous memory block of instructions was fetched, so that the warning bits are available), as these instructions are used in calling sequences to save and restore registers.

Note that the opcode of the first register-to-register operation in the second word in this format is eight bits long, since a bit is not required in that location, as it is in the first word, to distinguish instructions in this format from the remaining 64-bit instructions.

CISC

Let's start with the CISC mode of operation.

Because 64-bit instructions are available in this mode, the amouont of opcode space available is plentiful indeed, as long as one allows oneself to resort to them.

However, the 32-bit portion of the instruction set is somewhat constrained, particularly as I sought to have the multiple-register instructions remain only 32 bits long; this meant that a quarter of the opcode space had to be turned over to them.

Both the register-to-register instructions and the vector instructions use seven-bit opcodes, the first two bits of which cannot both be one. These opcodes will use the same coding we have met frequently in connection with the Concertina architecture, seen elsewhere on these pages.

 0000  0001  0010  0011  0100  0101  0110  0111
 SWB   IB    SWH   IH    SW    I     SWL         000
 CB    UCB   CH    UCH   C     UC    CL    UCL   001
 LB    ULB   LH    ULH   L     UL    LL   *LA    010
 STB   XB    STH   XH    ST    X     STL   XL    011
 AB    NB    AH    NH    A     N     AL    NL    100
 SB    OB    SH    OH    S     O     SL    OL    101
*LAB         MH    MEH   M     ME    ML    MEL   110
*STAB *STBG  DH    DEH   D     DE    DL    DEL   111

 1000  1001  1010  1011
 SWM   SWF   SWD   SWQ                           000
 CM    CF    CD    CQ                            001
 LM    LF    LD    LQ                            010
 STM   STF   STD   STQ                           011
 AM    AF    AD    AQ                            100
 SM    SF    SD    SQ                            101
 MM    MF    MD    MQ                            110
 DM    DF    DD    DQ                            111

First, we have four groups of sixteen instructions, for each of four data types.

The data types are:

B  Byte      8 bits
H  Halfword 16 bits
   Integer  32 bits
L  Long     64 bits

and the operations are:

SW  Swap
C   Compare
L   Load
ST  Store
A   Add
S   Subtract
M   Multiply
D   Divide

I   Insert
UC  Unsigned Compare
UL  Unsigned Load
X   Exclusive OR
N   AND
O   OR
ME  Multiply Extensibly
DE  Divide Extensibly

Some of these operations may need some explanation.

In this architecture, the eight arithmetic/index registers are each 64 bits long, for compatibility with the RISC and DSP modes of operation.

But fixed-point data may be 8, 16, or 32 bits in length in addition to 64 bits in length.

Thus, loading data from memory into a register is handled by three different instructions.

Load performs sign extension, fixed-point values normally being handled as being in two's complement form.

Insert puts the data being addressed in the least significant portion of the register destination, leaving the more significant bits unaffected.

Unsigned Load clears the bits of the register which are more significant than the data being retrieved.

These operations, and even their names, will, of course, be familiar to System/360 assembler programmers.

Multiply and Divide, however, do not work the way the corresponding instructions did on the System/360. Instead, like the corresponding floating-point instructions, they take two operands of the same length to produce a result that is also of the same length. So no remainder is provided from the divide instruction, and overflows are quite possible from the multiplication of large numbers.

This makes them correspond directly to the usual multiplication and division operations used in higher-level languages.

When additional capabilities are needed, however, the Multiply Extensibly and Divide Extensibly operations are available. These two instructions, as described here, work somewhat differently from the instructions of the same name for the Concertina architecture.

Multiply Extensibly takes two operands, and produces an integer product that is twice as long.

This can still fit in a single register, except for the case of the Multiply Extensibly Long instruction. This instruction will take an odd-numbered register as its destination operand; the product of its contents and those of the source operand will have its most significant part placed in the even-numbered register preceding the one that is the destination operand, and its least significant part placed in the destination operand itself.

Divide Extensibly takes a destination operand that is twice as long as the source operand. An even-numbered register is its destination operand in all cases; in the case of the Divide Extensibly Long instruction, the constraint is more strict, and the number of the register used as the destination operand must be divisible by four.

The number in the destination operand is divided by the number at the source operand. The remainder is placed in the destination operand, and the quotient is placed in the register, or registers, that follow it.

In the case of Divide Extensibly Halfword, the destination operand is the last 32 bits of the register specified; for Divide Extensibly, it is the entire register, and for Divide Extensibly Long, it is the register pair consisting of the register specified (with the most significant part of the dividend) and the register following (with the least significant part of the dividend).

The quotient is placed in the next higher register for Divide Extensibly Halfword and Divide Extensibly; it is placed in the next higher register pair for Divide Extensibly Long.

The quotient and remainder are the same length as the dividend or destination operand, and thus are both twice the length as the divider or source operand.

Thus, a divide check exception, where the quotient of a double-length dividend and a single-length divisor is too long to fit in a single-length result does not arise from the Divide Extensibly instruction.

Additional opcodes, shown with an asterisk in the table above as they do not follow the standard pattern of the other entries there, but performing additional vital functions are:

0000110 LAB  Load Address/Base
0000111 STAB Store Address/Base

0001111 STBG Store Byte if Greater

0111010 LA   Load Address

The Load Address/Base and Store Address/Base instructions allow data to be transferred to and from the eight base registers.

The Store Byte if Greater instruction stores the byte in the destination register at the source memory location only if the unsigned value of that byte is greater than the value already stored in that memory location.

The Load Address instruction loads the destination register with the effective address itself, rather than the data contained in memory at that address.

Except for LAB and STAB, however, these opcodes are not applicable to register-to-register instructions.

These opcodes can fit in the format of memory-reference instructions provided in this CISC mode, but in different positions in most cases.

LA   Load Address               w0001110
STBG Store Byte if Greater      w0001111

LAB  Load Address/Base          w0010000 ... 011
STAB Store Address/Base         w0010010 ... 011

The Load Address instruction is placed among the byte-aligned instructions for maximum flexibility.

Since the base registers are 64 bits long, the instructions to load and store their contents are placed among the instructions for the 64-bit long integer type, using the unused Insert and Unsigned Load opcodes for that type.

Floating-point instructions use the first eight basic operations listed above:

SW  Swap
C   Compare
L   Load
ST  Store
A   Add
S   Subtract
M   Multiply
D   Divide

and apply them to the following data types:

M   Medium     48 bits
F   Floating   32 bits
D   Double     64 bits
Q   Quad      128 bits

Floating, Double, and Quad are the normal standard IEEE 754 single, double, and extended precision floating-point numbers respectively.

Medium floating-point numbers, 48 bits long, are similar to single and double precision floating-point numbers in IEEE 754, but with one sign bit, an exponent field of ten bits, and a significand that is thirty-seven bits long.

Note that the Medium floating-point type is placed first in the list; numbers of this type are considered aligned operands, as opposed to unaligned operands, when they are aligned on 16-bit boundaries in memory. Thus, the ordering reflects alignment granularity rather than length.

The memory-reference instructions, as is typical for a CISC architecture, allow all normal operations between registers and memory instead of just loads and stores. Since there are sixteen such operations, a four bit field for the opcode is sufficient. In the case of the floating-point type, for the medium, single, and double types, additional unused opcodes are therefore available.

In the case of extended precision floating point, however, as the first bit is not hidden with that type, additional operations involving unnormalized arithmetic are provided:

The last part of the instruction set that is potentially tricky are the multiple register instructions.

There are four possible opcodes. That obviously is sufficient for load multiple, store multiple, load multiple floating, and store multiple floating.

But what about the eight base registers?

Fortunately, a detail not shown in the diagram comes to the rescue. The ordinary memory-reference instructions have been restricted to aligned operands in order to save enough opcode space for the multiple-register instructions to be placed among the 32-bit instructions.

There is no reason for the multiple-register operands to be able to handle unaligned operands. Thus, the opcodes can be allocated as follows:

LM   Load Multiple             w1100 ...  011
STM  Store Multiple            w1100 ...  011
LMF  Load Multiple Floating    w1100 ... 0111
STMF Store Multiple Floating   w1100 ... 0111
LMB  Load Multiple Base        w1101 ...  011
STMB Store Multiple Base       w1101 ...  011

Incidentally, it may be desired to provide for short vector instructions. If only eight, rather than sixteen, as in the Concertina architecture, short vector registers (again of 256 bits) are provided, one can have:

LMSV  Load Multiple Short Vector    w1100 ... 01111
STMSV Store Multiple Short Vector   w1100 ... 01111

and 256-bit alignment can also provide opcodes for short vector memory-reference instructions.

Register-to-register short vector instructions, packed one to a 32-bit word instead of two to a word, could be provided within the very large extent of opcode space shown in the diagram as available for the shift instructions. This would also allow for additional fields for masking portions of the vectors, which would be lacking in the memory-reference instructions provided by the expedient noted above.

RISC

In RISC mode, memory-reference instructions are shown as having a five-bit opcode field which overlaps with the C bit, indicating that the condition codes may be set, for register-to-register operations.

Thus, some attention to how their opcodes are organized is required, and I propose the following organization:

00100 LM   Load Medium
00101 STM  Store Medium
00110 LF   Load Floating
00111 STF  Store Floating

01000 LB   Load Byte
01001 STB  Store Byte
01010 IB   Insert Byte
01011 ULB  Unsigned Load Byte
01100 LH   Load Halfword
01101 STH  Store Halfword
01110 IH   Insert Halfword
01111 ULH  Unsigned Load Halfword

10100 LD   Load Double
10101 STD  Store Double
10110 LQ   Load Quad
10111 STQ  Store Quad

11000 L    Load
11001 ST   Store
11010 I    Insert
11011 UL   Unsigned Load
11100 LL   Load Long
11101 STL  Store Long

This means that the first two bits of the opcode of a register-to-register instruction must be zero, but that still leaves a seven-bit opcode, which, as we have seen, is adequate.

However, this does imply that it might turn out to be useful, after all, to include the first bit of the instruction in the opcode, rather than just leaving it zero, as analogy to the other instruction modes, where it is used either as a block mark bit (DSP) or a warning bit (CISC) had suggested.

For one thing, it has been noted that RISC mode will require special instructions to transfer data between the set of 128 registers normally used in that mode, and the set of 8 registers used in CISC mode. Extra register-to-register opcodes will be needed for this special function; as well, the eight base registers need to be accessible, just as in CISC mode.

DSP

In DSP mode, the instructions have effectively been lengthened from 32 bits to 40. Even so, they are tightly squeezed, particularly if predicated.

The two bit opcode for memory-reference instructions is enough to specify load and store:

00 L  Load
01 ST Store
10 I  Insert
11 UL Unsigned Load

Unconditional branch and jump to subroutine can have these opcodes:

bc1010 ... 01 JMP Jump
bc1011 ... 01 JSR Jump to Subroutine

since insert and unsigned load are not applicable to floating-point numbers.

However, register-to-register operations have only a seven-bit opcode. This still leaves room for some additional operations in addition to the basic ones. Fortunately, a flag bit setting operation does not need as much opcode space as a conditional branch, since sixty-four flag bits only require a six-bit field to specify, as opposed to twenty-one bits - three for an index register, three for a base register, and fifteen for the displacement - as used for a memory address.

If one is desperate, a conditional jump can be implemented as a predicated unconditional jump, although that involves two instructions instead of one.