This is now my twelfth attempt to propose a successor to my original Concertina architecture. Building on previous attempts, it is intended to be reasonably simple, and yet to also keep overhead cost to a minimum at the same time.
Programs consist of 256-bit blocks of program code, each of which contains eight 32-bit instruction slots.
A 32-bit instruction slot will normally contain either
a pair of 15-bit instructions, preceded with
single 32-bit instruction, which begins with either
or an instruction that is 64 bits or 96 bits in length, the first 32 bits of which
will begin with
0101, and each remaining 32 bits of which will begin with
There is, however, some reserved opcode space within that for 32-bit instructions which allows for one other possibility:
If the contents of the first 32-bit instruction slot in a block begin with
that means that the beginning of the block is a header, containing additional information to control
the decoding of the block.
The intent of this scheme is, once the presence or absence of a prefix to a 256-bit instruction block is detected, and the prefix, if any, is decoded, to allow all the 32-bit instruction slots to be decoded in parallel, since each one will contain within itself the information needed to permit it to be decoded, despite the existence of 64-bit and 96-bit instructions.
Instructions may cross block boundaries, provided that they are either crossing between two blocks without a prefix, or two blocks with the same type of prefix. This usually only applies to instructions longer than 32 bits, but in the case of blocks with the fourth header type, it applies to 32-bit instructions as well.
The block prefixes currently defined are illustrated below:
A block may also consist entirely of instructions without a prefix.
The first format provides for predicated instructions. Four bits indicate which
of sixteen flags is tested to determine if instructions predicated with it are executed,
the S bit is
0 if the instruction is to be executed if the flag is set (or true, or
if the S bit is
1 the instruction is to be executed if the flag is reset (or false, or
The following seven-bit predicated field indicates which instructions are controlled by that flag,
each one referring to one of the remaining 32-bit instruction slots in the block in order.
Two sets of fields are present, so that instructions controlled by two different flags may be present in the same block. If a bit is set in one of the predicated fields, the corresponding bit in the other predicated field must not be set.
The second format includes a break field. This permits the explicit indication of parallelism; bits that are set in this field divide the block into parts, beginning with the 32-bit instruction slot which corresponds to them, within which all of the instructions may be executed in parallel.
In code in this block format, if there are two 16-bit instructions in a 32-bit instruction slot, they must be chosen so that they can both be safely executed in parallel.
Also, one set of fields is present to allow some of the instructions in the block to be predicated. This block format is the one that allows functionality most closely resembling that of the very long instruction word (VLIW) instruction formats of some digital signal processors (DSPs).
The third format also contains a break field, to which the comments for the second block format apply. Instead of allowing predication, it has a fourteen-bit target field, which indicates which sixteen-bit halfword locations within the block may be the target of branch instructions; attempting to branch anywhere else in the block will cause an error condition.
The fourth format keeps the normal 32-bit alignment of instructions 32 bits or longer, and requires 16-bit instructions to come in pairs, aligned on 32-bit boundaries, like all the other formats except the fourth.
A break field is also included in this header format; see the comments in the description of the second header format.
For any instruction slot in the part of the block after the header which is to contain a 32-bit instruction, or a pair of 15-bit short instructions, the corresponding bit in the decode field is to be set.
If the instruction is to be in the alternate format, the corresponding bit in the alternate field is to also be set.
Note that in this format, the Medium Immediate instructions that include the first 16 bits of the 48-bit immediate value they use within the first 32 bits of the instruction itself are necessary to avoid space that will be wasted because there is no way to put it to use with 48 bit immediates, whereas in the fourth block format, they are only useful in that they might save some space, by making the sequence of instructions more compact.
The length field included in this header format works as follows:
In normal operation, this field must contain all zeroes; the value
is used for a form of operation that is close to normal operation, except that the Medium floating-point
data type is replaced by 80-bit extended precision.
When it contains a value greater than 1, the operation of some 32-bit instructions will be affected as follows:
The least-significant five bits of the contents of base registers will not be used when adding displacements to those contents to form memory addresses.
Instead, the contents of those bits will indicate how the memory to which they point is organized.
If those bits are all zeroes, the memory is organized normally, and is either used for program code or for data organized around the 8-bit byte, which does apply to some of the alternate values of the length field.
If those bits contain the number 1 or 5, aligned 256-bit blocks of memory will be addressed in groups of three and no bits in them will be unused, thus favoring rapid access to data that is 24, 48, or 96 bits in length, particularly on implementations with triple-channel memory.
If those bits contain the number 2 or 6, each aligned 256-bit block of memory will have four unused bits at its beginning (the most significant bits of the byte at the lowest address), and the remaining bits will be divided into forty-two characters each six bits in length, thus favoring rapid access to data that is 36 or 72 bits in length.
If those bits contain the number 3 or 7, each aligned 64-bit doubleword of memory will have four unused bits at its beginning (its most significant bits, or the most significant bits of the byte at the lowest address: for it is in an architecture that is big-endian that our scene lies) so that each 256-bit block of memory will be divided into forty characters each six bits in length, thus favoring rapid access to data that is 60 bits in length.
Branch, Conditional Jump, and Jump to Subroutine instructions will continue to address memory in bytes, and are to be used with base registers which contain all zeroes in their least significant eight bits. This is also true of the Load Multiple and Store Multiple instructions, as they are to always operate on the full length of registers.
All instructions that address data will either address memory in 12-bit storage units, if they are used with
base registers which contain either
00011 in their
least significant five bits, or in 6-bit characters, if they are used with base registers which contain
00111 in their last five bits.
In this case, the displacement within the instruction and the values in any index register used are in units of 12-bit storage units or 6-bit characters, while the address in the base register, once its least significant eight bits are masked out, is still in bytes. The address in the base register points to a 256-bit block of physical memory, while the portion of the address that is in storage units ignores the unused bits of each 256-bit block.
The intent of the three possible values at the end of the base register for 6-bit addressing is, in the case of the value 1, to make accessing aligned 48-bit values fast and efficient, in the case of the value 2, to make accessing aligned 36-bit and 72-bit values fast and efficient, and in the case of the value 3, to make accessing aligned 60-bit values fast and efficient.
When the length field contains the number 8, most of the same changes to memory addressing noted
above will be made in the same manner, but now it is recommended that instructions that access data
use base registers the contents of which end in
010000, so that
displacements within instructions and index register contents will be in units of 9-bit characters.
Using 6-bit addressing is also an option, at least for accessing floating-point numbers, if the length field contains 8; 12-bit addressing would not be compatible with 54-bit Medium floating-point variables.
However, using such a base register when dealing with character data would cause problems.
Finally, in addition to ending the base register contents with
00000 for normal access
to memory, if the length code is either
0011, another option is to
end base register contents with
10000; in this case, the data to which the base register is
pointing is in little-endian format.
In order for little-endian versions of data accessed by addresses in units of 6 bits, 9 bits, and 12 bits all to be interoperable, it would be necessary for the individual bits of the data to be reversed in memory; it is not clear to me at the moment whether support for little-endian data in non-power-of-two bit lengths is a useful feature.
The lengths of variables of the different floating-point types, for the different length values in the block header, are:
Single Medium Double Extended 0000, 0010 32 48 64 128 0001, 0011 32 80 64 128 0100 36 48 60 120 0101 36 48 60 96 0110 36 48 72 120 0111 48 72 96 120 1000 36 54 72 108
For integer types, they are:
Byte Halfword Word Long 0, 1, 2 or 3 8 16 32 64 4, 5, 6 or 7 6 12 24 48 8 9 18 36 72
The exponent field of 36-bit floating-point numbers is one bit longer than the exponent field of 32-bit IEEE 754 floating-point numbers; otherwise the formats are similar except that the significand is three bits longer.
Note that with a length code of 1, 80-bit extended precision numbers are accessed using the Medium floating-point instructions; this keeps their alignment the same. In general, floating-point formats based on either 6-bit addressing or 9-bit addressing do not follow the scheme of alignments applicable to the corresponding standard types with length code 0. Length codes 7 and 8 approach this, for the single, medium, and double sizes of floating-point numbers, but because extended-precision floats cannot be longer than 128 bits, they cannot continue the scheme consistently by having extended precision floats that are 192 and 144 bits in length respectively.
On the other hand, alignments for integer variables are consistent across all length values, provided that for length values from 4 to 7, 6-bit addressing rather than 12-bit addressing is used, so that character data is accessible.
As noted above, instructions for transferring program control are not affected, because no alternate instruction sets defined around a different length of storage unit are defined.
As well, because the ordinary memory-reference instructions are made to use less opcode space by making use of the fact that aligned operands have addresses that end in a certain number of zeroes, depending on their size, they can't be modified by the length field either, as this ceases to be the case in alternate lengths. Instead, the length field only modifies the operation of the unaligned memory-reference instructions.
Because of that, the arithmetic instructions that can be modified by the length field contain an L bit, to indicate whether or not they are to be so modified; in this way, even when the length field contains a non-zero value, operations on data in standard sizes are also possible, and can be mixed with operations on data in the sizes specified by the length field.
The H bit, if set, affects floating-point instructions which meet the following conditions:
These floating-point instructions are caused to become the corresponding instructions which operate on floating-point numbers in the Compatible format instead of floating-point numbers in the Standard format.
The fifth format allows programs to include instructions that are longer than 32 bits. As well, it allows an instruction block to contain space which is not used for instructions, and it allows instructions to begin at any 16-bit boundary, instead of being governed by 32-bit boundaries.
Each bit in the instruction start field corresponds to 16 bits
in the remainder of the instruction block after the header. If a bit in that field is
1, the corresponding 16 bits of the instruction block are the first 16
bits of an instruction.
If the following bit in the instruction start field is set, or, in the case of the last bit in the instruction start field, if the E bit which follows that field is set, then that instruction is a 16-bit instruction. Since short instructions are explicitly indicated in the header, all 16 bits are available to specify the instruction, and thus their format is changed for the fifth header format.
Otherwise, the instruction that begins in the indicated 16 bits of the instruction block is usually a 32-bit instruction.
Each bit in the alternate field corresponds to 32 bits in the remainder of the instruction block, and to two consecutive bits in the instruction start field.
This field is used to indicate instructions that are 48 bits, 64 bits, or longer, in an alternative format with fewer overhead bits than the normal format for long instructions which can only indicate long instructions that are 64 bits or 96 bits in length.When a bit is set in the alternate field, then the last bit of the two bits corresponding to that bit in the instruction start field which is set indicates a longer instruction. So if only one of them is set, that bit indicates a longer instruction; if both of them are set, the first one must indicate a 16-bit instruction since a new instruction starts 16 bits later.
While bits within the instruction aren't needed with this header format to distinguish between 16-bit and 32-bit instructions, the different lengths of longer instructions are distinguished within the instruction:
48 11 64 101 96 1001
and so on.
As normal long instructions all start with
0, they may also be indicated in this
manner, which means that it is not necessary to redefine all long instructions in the new
The sixth format combines the instruction start field from the fifth block format, now reduced to twelve bits in length as the header now takes up 64 bits instead of thirty-two, thus reducing the number of 16 bit slots available for instructions by two, with the break field from the second, third, and fourth block formats, now increased to twelve bits in length, as instructions may begin on any 16 bit boundary instead of only on 32 bit boundaries, with two sets of flag bits and predicated bits.
The first bit in each predicated field is not used; a bit that is set applies to any instructions that may start in either 16 bit slot to which it corresponds.
The seventh format drops one of the sets of flag bits and predicated bits in order to include an alternate field (again, the first bit is not used) like that in the fifth block format.
The complement of registers included with this architecture is as follows:
There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.
Registers 1 through 7 may be used as index registers.
Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.
Register 24 serves as a base register which points to an area 32,768 bytes in length.
Registers 9 through 15 may be used as base registers, each of which points to an area of 4,096 bytes in length.
At least part of area of 4,096 bytes in length pointed to by register 8 will normally be used to contain up to 512 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.
There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.
Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.
As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.
However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits. In order to allow the floating-point registers to behave as if they are 160 bits long, the last four short vector registers are used to provide 32 additional bits to each of the floating-point registrers.
There are 16 short vector registers, each of which is 256 bits in length.
Each of these registers may contain:
As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.
These numbers all remain in these registers in the same format as that in which they appear in memory.
Also, there is a primary register group of eight long vector registers, and a scratchpad of sixty-four long vector registers, where each long vector register is composed of sixty-four floating-point registers, each 128 bits in length.
As for how data values are stored:
Signed integer values are stored in binary two's complement format.
The standard types of floating-point numbers used by this architecture are shown below:
The 32-bit and 64-bit floating point formats correspond to that used in the IEEE 754 specification. For normal operation, a similar 48-bit floating-point format is added.
The 80-bit floating-point format, except for being stored in big-endian order, is that found on popular microcomputers, and the 128-bit format is the same format except for the significand being larger.
For code executed with a length code other than 0, additional floating-points have been defined, also based on those used in the IEEE 754 specification.
Of particular interest is the choice to lengthen the exponent by one bit for the 36-bit floating-point format. Because IEEE 754 floats have a hidden first bit, this can be done while still leaving the numbers the same level of precision as floats on the IBM 7090, and it extends the exponent range to match that of floating-point numbers on the IBM System/360; thus, this particular choice facilitates the conversion of FORTRAN programs from either of those systems. As well, the 48-bit floating-point format was given the minimum exponent length consistent with allowing the exponent range to fully include numbers from 10^-99 to 10^99; this left the format with eleven digits of precision, thus making it as comparable as possible to the numerical range made available by typical scientific pocket calculators.