This is now my eighth attempt to propose a successor to my original Concertina architecture. I hope that finally this time I have found a way to achieve the goals I have set for myself while avoiding excessive complexity.
This architecture has the following sets of registers visible to the programmer:
One bank of 32 integer arithmetic/index registers, each 64 bits in length.
One bank of 32 floating-point registers, each 128 bits in length.
Most instructions that use a register from either of these two banks of registers can reference and use only the first eight of those registers; however, there are both memory-reference instructions and register-to-register instructions which can use all 32 of the registers.
Two banks of eight base registers, each 64 bits in length. One bank points to areas in memory that are 4,096 bytes in length, and the other points to areas in memory that are 32,768 bytes in length. The first bank allows certain memory-reference instructions to have a more compact form than would otherwise be possible.
One bank of sixteen short vector registers, each 256 bits in length.
One bank of sixty-four long vector registers, each composed of 64 scalar entries which are each 64 bits in length.
This complement of registers is illustrated in the diagram below:
The basic instruction formats for this architecture have the form:
This instruction set is organized so that program code may be fetched in 256-bit blocks, and the block is divided into 32-bit instruction words which may be decoded in parallel. After decoding, the instructions will still be executed in series.
Instructions longer than 32 bits are provided, but each 32-bit portion of the instuction is formatted so that it can be decoded independently.
One exception to the principle of separate decoding is permitted, under conditions that will not cause delays. The last instruction of a 256-bit instruction block, or any group of consecutive instructions which include the last instruction, may have an immediate operand that is a multiple of 32 bits in length. These immediate operands will then begin the next instruction block; as that will be fetched in a subsequent cycle, this makes use of an inherently serial situation. It is intended that if the immediate operands extend for more than 256 bits this is to be fully supported.
A 32-bit instruction word may be composed of two 16-bit instructions. This instruction format is illustrated by line 1 of the diagram.
In addition to the first two bits of the word being 11, the first two bits of the opcode field must not be both 1. If they are not both 1, the dR and sR fields will be checked; they must not both be the same: if they are the same when the first two bits of the opcode field are not both 1, the instruction word will instead indicate an instruction of several specialized types.
The second half of that instruction word is another 16-bit short instruction if its first two bits are zero. (If the second instruction has the first two opcode bits both one, or its destination and source registers the same, it may be a specialized 16-bit instruction; these are reserved for future expansion.)
Also, as there was room for a 7-bit opcode field, with the restriction that its first two bits not be zero, even after including a condition code bit in the instruction format, 16-bit operate instructions are allowed to affect the condition codes. However, it is not useful for both operate instructions in a 32-bit instruction word to change the condition codes. Therefore, instead, if both condition codes are set, the instruction word is instead indicated to consist of two 16-bit instructions, neither of which may alter the condition codes, taken from a set of additional instructions for specialized operations such as register BCD arithmetic; this instruction format is shown in Line 2 of the diagram.
An instruction word may consist of a valid 16-bit short instruction in its first half, and something else in its second half, as well.
One example of this is shown in line 3 of the diagram. The first three bits of the second half of the word are 000. This ensures the two bits following the initial 00 are not both 1, so decoding of a 16-bit instruction continues. But the source and destination registers of the instruction in the second half of the world are both indicated as register 0, which is invalid; instead, the seven bits following the initial 000 are used as a seven-bit prefix to the following instruction. Since this instruction word type may be decoded independently of anything which precedes it, the decoding of the following instruction word isn't required to wait for the decoding of all previous words in the block to check for the presence of a prefix, which is why this instruction extension mechanism doesn't violate the principle of independent decoding.
However, there is one important restriction to be noted: because of the existence of prefixes, when branching into a block where the first several 32-bit instruction words are instead used for immediate data from instructions in the preceding block, it is possible that the immediate data might resemble an instruction word type that provides a prefix, causing incorrect decoding of the first instruction in such a block. Therefore, the first actual instruction in a block starting with immediate data may not be a branch target from outside the block.
Next, we will look at memory-reference instructions.
Line 4 of the diagram shows another instruction prefix format; this one provides a 12-bit prefix (thus, it is large enough to possibly include additional fields indicating registers). The same note concerning decoding applies to this format.
As a consequence of this, Only one prefix, either of this type or of the preceding type, may be applied to any one instruction. If two instruction words indicating prefixes appear, the prefix indicated in the first one will apply to the 16-bit instruction in the first half of the second instruction word, not to the instruction following both.
Lines 5 through 10 of the diagram show that when a 16-bit instruction, indicated by the first two bits of the instruction word being 11, is followed by a second 16-bit half beginning with 01, then the instruction word also contains two 16-bit instructions, but this time the second one is of an alternate type, either a 16-bit shift instruction, as shown in lines 5 through 8, or a 16-bit conditional relative branch instruction, as shown in line 9.
Also in this group is another prefix instruction; but this one does not provide a prefix for the next instruction; instead, it contains an 8-bit prefix for the next 256-bit block. Each bit in the prefix corresponds to one of the eight 32-bit instruction words in the block. There is also an opcode field, as there can be different prefixes of this type. In one case, each zero in the prefix field might indicate an instruction that can be performed in parallel with those that precede it. In another, each one in the prefix field might indicate an instruction word to be decoded in a completely different manner for a secondary instruction set.
Line 11 of the diagram shows the format of a multiple-register load and store instruction. These instructions must have a short-format memory address, and they cannot be indexed. They are distinguished from other memory-reference instructions by their opcodes.
Line 12 shows a standard memory-reference instruction with a long-format memory address, and Line 13 shows a standard memory-reference instruction with a short-format memory address.
Line 14 shows a standard memory-reference instruction that can access all 32 registers of the main integer and floating-point banks. These instructions also must use a short-format memory address.
Line 15 shows a three-address register-to-register instruction. The two opcode bits which precede the condition code bit must not be both one.
Line 16 shows the format of an instruction with a 16-bit immediate operand. The first two opcode bits must not be both one, because it is decoded as an instruction word beginning with a 16-bit instruction, only made invalid by the register field contents, which are only tested if the first two opcode bits do not, by being 11, indicate a different kind of instruction word.
Lines 17 and 18 show the format of a 48-bit long string or packed-decimal instruction. Both memory addresses in these instructins must be short format. The first word is distinguished from a word beginning with a 16-bit instruction by the two bits immediately following the first two bits also being both one; therefore, the last six bits are not tested for being two identical octal digits. The second word does contain a 16-bit instruction, to be executed before the string or packed decimal instruction: this is why that instruction only occupies 48 bits instead of 64 bits.
Lines 19 and 20 show the format of a 64-bit long string or packed decimal instruction. Only the format of instructions with long format memory addresses is shown, however, the addresses in these instructions may also be short format instead, in which case the corresponding base address fields in the instruction are instead available to indicate indexing for that memory operand.
Lines 21 through 23 indicate the 96-bit format for a three-operand string instruction; this allows separate source and destination operands for the Translate instruction.
Lines 24 through 29 show the formats of the three types of long vector instructions. Long vector instructions are similar in philosophy to the vector capabilities of vector supercomputers such as the Cray I and its successors.
Lines 30 through 35 show the formats of the three types of short vector instructions. Short vector instructions work by splitting a fixed-length long word into different numbers of parts, in this case, a 256-bit long word into two 128-bit floating-point numbers, four 64-bit floating-point numbers or long integers, eight 32-bit floating-point numbers or integers, sixteen short integers, or thirty-two bytes. Thus, they follow the philosophy of the vector instructions that are today common on most microprocessor architectures at this time.
Line 36 shows the format of a three-address register-to-register instruction that can access all 32 registers of the register banks.
In a memory-reference instruction, if the index register field contains a zero, this means that the address is not indexed, so arithmetic-index register 0 can only be used as an accumulator, and not an index register.
If the base register field contains a zero, on the other hand, it indicates Array Mode or the related Direct Mode. The first bit of the displacement field, if 0, indicates Array Mode, and if 1, indicates Direct Mode. In Array Mode, the contents of base register 0 are added to the value in the displacement field, shifted three places left, to provide the address of a 64-bit quantity in memory; this quantity, indexed as indicated in the instruction, is the effective address of the instruction.
Array Mode therefore allows a program to access multiple large arrays without having to use one base register for each array.
Direct Mode is only used with long integer instructions; here, the contents of base register 0 are added to the value in the displacement field, shifted three places left, to form the effective address of the instruction (which may also be indexed) to permit the array pointers used in Array Mode to be accessed and updated without the need to set two base registers to the same value.