This is now my eleventh attempt to propose a successor to my original Concertina architecture. Building on previous attempts, it is intended to be reasonably simple, regardless of the overhead cost of achieving that goal.
Programs consist of 256-bit blocks of program code, each of which contains sixteen 16-bit instruction slots.
An instruction consists of a basic portion and an optional non-basic portion which immediately follows the basic portion.
The last instruction slot in the basic portion of an instruction contains
If the basic portion of an instruction occupies more than one instruction slot, all
preceding instruction slots so occupied contain
1 in their initial bit
All the bits in the non-basic portion of an instruction are available to contain data; thus, immediate values which are 16 bits or longer are placed in the non-basic portion of an instruction.
Instructions may not cross block boundaries.
The first bit in the last instruction slot of a block will be
0 if, and only if,
every instruction slot in that block is part of the basic portion of an instruction, and none of
the instructions in the block have a non-basic portion.
If the first bit of the final instruction slot in a block is
1, then that instruction
slot is not part of any instruction. Instead, each of the remaining bits in that instruction slot
corresponds, in order, to one of the remaining instruction slots in the block. If that bit is
it corresponds to an instruction slot which contains part of the basic portion of an instruction. If that
0, it corresponds to an instruction slot which does not contain part of
the basic portion of an instruction. It may contain part of the non-basic portion of an instruction,
or it may be unused.
Normally, since the non-basic portion of an instruction always immediately follows the basic
portion of the same instruction, and because instructions do not cross block boundaries, if the final
instruction slot of a block begins with
1, the second bit in that instruction slot will
also be a one. However, that is not necessarily the case.
In the event that the final instruction slot of a block begins with
10, the contiguous
string of zeroes immediately following the one in the first bit corresponds to the instruction slots
which contain the block header. Block headers may be used for purposes such as the explicit indication
of instruction parallelism, and will occupy one or more consecutive instruction slots in a block
beginnign with the first instruction slot.
The opcode space for instructions having a basic portion that is 16 bits in length includes all possible bit combinations from 0000... to 0111..., and the opcode space for instructions having a basic portion that is 32 bits in length includes all possible bit combinations from 1000... to 1111..., and the same is true for instructions that are 48 bits in length, or 64 bits in length, and so on.
Instructions with a non-basic portion are placed within the opcode space for the instructions with a basic portion of their length, with the opcode bits indicating the presence and length of the non-basic portion.
The block formats currently defined are illustrated below:
The first format shows the appearance of a block which consists entirely of instructions that do not have a non-basic portion; thus, the last instruction slot in the block is also the last instruction slot within the basic portion of an instruction, and begins with zero.
The second format shows the appearance of a block which does not have a header portion,
but which is not completely filled with the basic portions of instructions. Therefore, the
last instruction slot of the block begins with
1, and contains bits which indicate
which instruction slots in the block do contain the basic portion of some instruction. The first
of those bits must also be
1 when there is no header.
The third format shows one example of a block with a header. Here, the header is 16 bits long,
spanning only one instruction slot, and so the last instruction slot in the block begins with
indicating that it contains bits showing where the basic portions of instructions are, and those bits, in
the remaining 15 bit positions of that instruction slot, begin with
01, indicating that one
instruction slot contains the header, followed by the first instruction of the block.
The header in this format contains bits, each of which begins a group of instructions that may
be executed in parallel. Those bits occupy the last 14 bit positions of the first instruction slot, as
only 14 instruction slots remain available in the block. The first of them contains a
as a group of instructions to be executed in parallel cannot cross block boundaries.
The fourth format shows a block where the header indicates which instruction slots may be branch targets; a bit in the target field that is set to one indicates that the corresponding instruction slot may be the target of a branch.
The fifth format shows a block which combines explicit indication of parallelism with predication. The flag field specifies one of the sixteen flag bits; if the S bit is zero, where a bit in the predicated field is set to one, the instruction that begins in the corresponding instruction slot is executed only if the flag is set; if the S bit is one, instead the corresponding instruction slot is executed only if the flag is cleared.
The break field is only twelve bits long; the bits in it correspond to the fourth through
fifteenth instruction slots in the block; a bit that is a
1 corresponds to
the beginning of a series of instructions that may be executed simultaneously. The first
instruction slot available for instructions in this block format, the third instruction
slot, does not have a corresponding bit in this field, as that bit, which would always
be a one, is omitted to conserve space here.
Removing one bit from each of the 16-bit instruction slots in an instruction already severely limits the available opcode space compared to what is needed to needed in order to provide the desired functionality within 16-bit and 32-bit instructions.
But it may not be clear that even that is adequate to permit each instruction to be decoded independently in parallel.
If, for example, 16-bit instructions all began with
00, the two instruction
slots in a 32-bit instruction began with
the three instruction slots in a 48-bit instruction began with
1 respectively, then it would be obvious that each 16-bit instruction
slot could be processed independently, with the contents of succeeding instruction slots then
read if they are also part of the instruction. This scheme, however, would impose an excessive
level of overhead.
In the format given, some extra processing is needed, but it remains possible to process each instruction independently.
The first step in instruction decoding is decoding the block format. If the final instruction
slot begins with
0, it is part of the basic portion of an instruction, and all
the instruction slots in the block are part of the basic portion of an instruction. Otherwise, that
instruction slot contains a bit map which indicates which instruction slots belong to the basic
portion of an instruction.
Once that is handled, it's possible to immediately determine if an instruction slot contains the beginning of the basic portion of an instruction; this is true if:
0which means that it is the last instruction slot within the preceding instruction.
Thus, each instruction can then be processed in a self-contained manner, starting from either the beginning or the end of the instruction, and proceeding to the other end to determine the length of the instruction and then commence decoding.
The complement of registers included with this architecture is as follows:
There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.
Registers 1 through 7 may be used as index registers.
Registers 29 through 31 may be used as base registers, each of which points to an area of 32,768 bytes in length.
Registers 9 through 15 may be used as base registers, each of which points to an area of 4,096 bytes in length.
At least part of area of 4,096 bytes in length pointed to by register 8 will normally be used to contain up to 512 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.
There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.
Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.
As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.
However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits. In order to allow the floating-point registers to behave as if they are 160 bits long, the last four short vector registers are used to provide 32 additional bits to each of the floating-point registrers.
There are 16 short vector registers, each of which is 256 bits in length.
Each of these registers may contain:
As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.
These numbers all remain in these registers in the same format as that in which they appear in memory.
Also, there is a primary register group of eight long vector registers, and a scratchpad of sixty-four long vector registers, where each long vector register is composed of sixty-four floating-point registers, each 128 bits in length.
As for how data values are stored:
Signed integer values are stored in binary two's complement format.
The standard types of floating-point numbers used by this architecture are shown below:
The 32-bit and 64-bit floating point formats correspond to that used in the IEEE 754 specification. For normal operation, a similar 48-bit floating-point format is added.
The 80-bit floating-point format, except for being stored in big-endian order, is that found on popular microcomputers, and the 128-bit format is the same format except for the significand being larger.
For code executed with a length code other than 0, additional floating-points have been defined, also based on those used in the IEEE 754 specification.
Of particular interest is the choice to lengthen the exponent by one bit for the 36-bit floating-point format. Because IEEE 754 floats have a hidden first bit, this can be done while still leaving the numbers the same level of precision as floats on the IBM 7090, and it extends the exponent range to match that of floating-point numbers on the IBM System/360; thus, this particular choice facilitates the conversion of FORTRAN programs from either of those systems. As well, the 48-bit floating-point format was given the minimum exponent length consistent with allowing the exponent range to fully include numbers from 10^-99 to 10^99; this left the format with eleven digits of precision, thus making it as comparable as possible to the numerical range made available by typical scientific pocket calculators.