[Next] [Up] [Previous] [Next Section] [Home] [Other]

The Concertina II Architecture

This is the fourteenth attempt on my part to propose a successor to my original Concertina architecture. Once again, it builds on previous attempts; a major goal is to keep overhead to a minimum, and ensure that program code is compact.

The design attempts to combine many of the benefits of RISC, CISC, and VLIW architectures.

The basic instruction set consists entirely of 32-bit instructions, there are 32 integer general registers and 32 floating-point registers, and those instructions that perform arithmetic or logical operations include a bit for enabling changes to the condition codes as a result of those instructions. These are characteristics found in RISC architectures.

Having register banks of 32 registers allows different calculations to be intertwined in the code, and being able to control if instructions affect the condition codes allows more intervening instructions between an instruction that sets the condition codes and a branch instruction that makes use of those results. Both of these things allowed code to be designed to offer some of the same benefits as are obtained from out-of-order execution, without the hardware overhead. However, at the microprocessor clock rates in use today, these measures normally are not enough to be effective: however, if code written this way is combined with simultaneous multi-threading (SMT), then there is still the potential for competing with out-of-order execution.

It is also possible to place a pair of 16-bit instructions in a 32-bit instruction slot in the basic instruction set. The most common register-to-register arithmetic operations are available this way, with the restriction that both operands must be within the same eight-register area of the register bank used. This again facilitates intertwining multiple independent computations.

Instructions are organized into 256-bit blocks which contain eight 32-bit instruction slots.

The first instruction slot may contain a header which indicates that several instruction slots are not to be decoded.

In this way, all the instructions in a block can be decoded in parallel, and yet immediate values corresponding in length to any of the major data types used in the architecture can be placed in the block without having to allow for instructions having a large number of different possible lengths. This avoids the need for a separate memory access when a program makes use of a constant value.

Headers have other functions as well. The header can provide instruction predication, so that some instructions can be marked for conditional execution without the need for branching.

As well, it can be indicated which instructions depend on the results of previous instructions, or which instructions cannot execute simultaneously with preceding instructions for other reasons.

This allows rapid superscalar execution of suitable code without the overhead of interlock and out-of-order circuitry, and is a feature of VLIW designs.

Another thing that may be indicated by the header is that the instructions in a block are from the extended instruction set, instead of the basic instruction set. Here, instructions may be 16, 32, 48 or 64 bits long, exclusive of immediate values. This allows for an instruction set that offers a rich set of potential functions, simialr to that offered with CISC architectures.

As well, an additional security feature is available which allows branch instructions to be restricted to only those instructions, as destinations, which are explicitly indicated as branch targets, and thus some header formats include a field having the purpose of indicating the permissible branch targets in the block. (In addition, a 16-bit instruction is provided that indicates itself to be a permissible branch target.)


Programs consist of 256-bit blocks of program code, each of which contains eight 32-bit instruction slots either when no header is present, or for most header formats, although viewing the block as consisting of sixteen 16-bit instruction slots is also a possibility for some other header formats.

A 32-bit instruction slot will normally contain either a pair of 15-bit instructions, each of which is preceded with 0, or a single 32-bit instruction.

That 32-bit instruction may begin with 1, in which case it will usually be a memory-reference instruction, or its first and second 16-bit halves may begin with 0 and 1 respectively, in which case it will usually be a register-to-register instruction.

There is, however, some reserved opcode space within that for 32-bit instructions which allows for one other possibility; the beginning of the block may be a header, containing additional information to control the decoding of the block.

Instructions may not cross block boundaries.


The block prefixes currently defined are illustrated below:


A block may also consist entirely of instructions without a prefix.


To allow parallel decoding of instructions, either all instructions are the same length, 32 bits, or the following is true instead: as seen in the eighth through fourteenth header formats, a redundant encoding for instruction length is used which allows each 16-bit slot in an instruction block to be independently examined to determine what decoding action is to be taken with it.


The first format for a header is a 32-bit register-to-register instruction which is also a header at the same time.

In this way, the overhead required to allow immediate values for instructions is kept to an absolute minimum.

As instructions operate on data values which may be 8, 16, 32 or 64 bits in length, as well as having other values of length, including immediate values of any type within an instruction would result in the decoding of instruction length instead being inordinately complex.

Therefore, instead, immediate values are replaced by pseudo-immediates. In a register-to-register instruction, the contents of one source register field are replaced by a five bit pointer to the value to be used. Since this pointer points within the 256-bit block in which the instruction is contained, given that a 256-bit instruction block is fetched as a unit, the immediate data is within an internal register used for processing instructions, so despite the pseudo-immediate being addressed by a pointer, it still has the advantage of a genuine immediate in not requiring a separate memory access to be fetched.

In the first header format, we have the following elements:

Belonging to it as a header, we have a decode field; this would normally contain a value from 0 through 6, one less than the number of 32-bit instruction slots in the block which will actually contain 32-bit instructions. This instruction/header is included in that count.

There is a position field. It is intended that this combined header and instruction format will be useful if any of the instructions within a block, not just the first instruction, is an instruction with a pseudo-immediate value which also fits within the limitations of this instruction/header format.

If the position field contains 0, this immediate-mode instruction is actually the first instruction in the block. If it contains 1, it is to be treated as the second instruction in the block, so the 32-bit instruction (or pair of 15-bit instructions) in instruction slot 1 is execugted first, followed by the instruction/header from slot 0, and then the instructions remaining in slot 2 and later slots are executed in order. Similarly, if it contains 2, the instructions in slots 1 and 2 are executed before the instruction/header from slot 0, and the rest of the block proceeds normally, and so on.

Belonging to it as an instruction, we have a nine-bit opcode field, split into a four-bit part and a five-bit part; we have a C bit, which, if it is 1, indicates that the instruction is allowed to affect the condition code bits, we have a five-bit destination register field, and we have a five-bit pImm field, containing a byte pointer within the 256-bit instruction block to the source operand of the instruction.

Note that this combined instruction/header format is not the only format that instructions having immediate values may have. If a block only contains instructions with immediates that don't fit within the limitations of this format, the block will need to have a header, to indicate that some 32-bit instruction slots aren't decoded as instructions, but now the header will have to be at least a full 32 bits long, and the advantage of this combined format in still allowing a full eight instructions in the block is no longer available.


The remaining header formats are the pure header formats, where the header is not also an instruction.


The second format of header shown in this diagram provides for predicated instructions.

Indicating that an instruction is not to be executed, called predication, can be more efficient than using a conditional branch to skip over the instruction.

In this block format, the flag field is only three bits long, and so only flag bits 0 through 7 may be used to control the execution of instructions.

In the predicated field, there is a bit corresponding to each of the seven remaining instruction slots of the block. If that bit is set, the instruction in that instruction slot will depend on the value of the flag bit in order to execute.

There is also an S bit. If that bit is 0, execution is controlled normally; predicated instructions execute if the flag bit used to control them is set. If that bit is 1, those instructions execute if the flag bit is clear instead.

As well, there is a break field. If that bit is a 1, it indicates that the instruction in the instruction slot to which it corresponds may not be executed simultaneously with the instructions that precede it. Thus, instructions for which that bit is zero are executed simultaneously with those that preceed them, without waiting for those instructions to even be decoded enough to determine what resources they use so that interlocks could operate with respect to them.

There is also a decode field in this header format, and so in this block format, instructions with pseudo-immediates may be present.


The third header format provides for predicated instructions.

Indicating that an instruction is not to be executed, called predication, can be more efficient than using a conditional branch to skip over the instruction.

As in the second block format, the flag field is only three bits long, only flag bits 0 through 7 may be used to control the execution of instructions.

In the predicated field, there is a bit corresponding to each of the seven remaining instruction slots of the block. If that bit is set, the instruction in that instruction slot will depend on the value of the flag bit in order to execute.

There is also an S bit. If that bit is 0, execution is controlled normally; predicated instructions execute if the flag bit used to control them is set. If that bit is 1, those instructions execute if the flag bit is clear instead.

As well, this block format provides for the indication of instructions the meaning of which is modified by the instruction preceding them in the instruction stream.

Normally, since an entire block of 256 bits may be fetched at once, with the contents of each 32-bit (or 16-bit) instruction slot interpreted in parallel following the decoding of the block header, there is a potential for issues if any feature is included in the instruction set which involves one instruction modifying how following instructions are interpreted.

By setting the bit in the prefixed field to 1 which corresponds to any instruction slot containing an instruction so modified, this block format prevents such issues.


The fourth format provides for predicated instructions.

Indicating that an instruction is not to be executed, called predication, can be more efficient than using a conditional branch to skip over the instruction.

In the flag field, one of sixteen flag bits that can be set or cleared based on a condition is indicated.

In the predicated field, there is a bit corresponding to each of the seven remaining instruction slots of the block. If that bit is set, the instruction in that instruction slot will depend on the value of the flag bit in order to execute.

There is also an S bit. If that bit is 0, execution is controlled normally; predicated instructions execute if the flag bit used to control them is set. If that bit is 1, those instructions execute if the flag bit is clear instead.

As well, this block format contains a target field; bits in that field which are set to 1 indicate that the corresponding 32-bit instruction slot, of the seven remaining in the block, is a permissible target for a branch instruction, when the controlled branch feature is activated.

There is also a decode field in this header format, and so in this block format, instructions with pseudo-immediates may be present.


The fifth header format allows an instruction format consisting primarily of 16-bit instructions to be used. When the mode bits are 00, the instruction format is as described on the page about 16-bit instructions. As well, it allows those instructions which may be the targets of branch instructions to be indicated.


The sixth header format uses the same instruction format, emphasizing 16-bit instructions, as the fifth and seventh block formats. In addition, it allows instructions modified by a preceding instruction in the instruction stream proper to be indicated.


The seventh header format allows a block consisting primarily of 16-bit instructions to have a header which is only 16 bits long. When the mode bits are 00, the instruction format is as described on the page about 16-bit instructions.


The eighth header format allows the use of an extended form of the instruction set instead of the basic instruction set. Four prefix bits are provided for each 32-bit instruction slot, making the 32-bit instructions effectively 36 bits long.

This instruction set is described on the page concerning Extended Format Instructions.

When the prefix bits are 1110, this indicates that the 32-bit instructions of the standard instruction set are used; this includes the possibility of placing two 16-bit instructions in the instruction slot. The 32-bit operate instructions are in this set, and as they already have enough opcode space allocated to them, no alternate forms are added in the extended instruction set.

When the prefix bits are 1111, this indicates that there is no 32-bit instruction in this instruction slot. In the fourth and fifth header modes, that means that the instruction slot is not used for an instruction, and may be used for a pseudo-immediate value; as we shall see, in the sixth, seventh, and eighth header modes, this has a somewhat different meaning.

The prefix bit values of 1100 and 1101 also have a special meaning, instead of forming part of a 32-bit instruction that is effectively 36 bits long. 1100 indicates the first of two 32-bit instruction slots that will form a 64-bit block, and 1101 indicates the second of those instruction slots (this may not be necessary, and 1111 may also work for this purpose).

The contents of such a block will be determined by its first few bits, as follows:

0    A 16-bit instruction, followed by a 48-bit instruction
10   A 48-bit instruction, followed by a 16-bit instruction
11   A 64-bit instruction

Thus, in addition to a 32-bit instruction slot with the prefix 1110 using the original instruction set containing a pair of 16-bit instructions, the 48-bit and 64-bit instructions are also available, but with restrictions on their positioning, in the extended 32-bit instruction set block formats, the fourth and the seventh, which do not have the instruction start field which allows 16-bit, 32-bit, 48-bit and 64-bit instructions to be in any position as long as they are aligned on 16-bit boundaries.


The ninth header format allows the use of an extended form of the instruction set instead of the basic instruction set in the same manner as the eighth header format.

As well, this block format allows the extended 32-bit instruction set to be used in combination with both an explicit indication of parallelism with the break bit as in the second block format, and with predication, using any of the sixteen possible flags, it allows the instructions to be indicated that may be the target of a branch, and it allows instructions that are modified by a preceding instruction to be indicated.

Note that the ninth through fourteenth header formats are distinguished from the sixth header format by having 1111 in the area coinciding with the first four prefix bits in that header format, as a code block must begin with an instruction and not unused space.


The tenth header format allows for combining the extended 32-bit instruction set with independent 16-bit instructions and with 48-bit and 64-bit instructions.

In the instruction start field, a 1 bit is present whenever a 16-bit instruction slot contains the beginning of an instruction.

Each of the four-bit prefix fields corresponds to a pair of 16-bit instruction slots, and hence to a pair of bits in the instruction start field.


In this block format, unlike the tenth through sixteenth header formats, because the header is only 48 bits long rather than 64, some special considerations apply.

First, the instruction start field remains only twelve bits long, as, since instructions cannot cross block boundaries, the first 16-bit instruction slot available must be the beginning of an instruction.

Second, the first four-bit prefix field corresponds to this implied instruction start bit, and the first bit of the instruction start field, subsequent four-bit prefix fields corresponding to odd pairs of bits in the instruction start field, with the final bit of the instruction field not associated with a four-bit prefix field. As will be seen below, this permits a 32-bit instruction, modified by a four-bit prefix field, to have any position in the block, since these instructions can be moved 16 bits later in the instruction sequence.


When a prefix field contains 1111, then no 32-bit instruction begins in either of the two 16-bit instruction slots with which it is associated. Therefore, if there are any 1 bits in the corresponding bits of the instruction start field, the length of the instruction that begins in the 16-bit instruction slot to which they correspond is indicated by the first few bits of the instruction, as follows:

Instruction start   Length
0                   16 bits
10                  48 bits
11                  64 bits

If a prefix field contains any other combination of bits, then exactly one 32-bit instruction begins in one of the two 16-bit instruction slots with which it corresponds.

The possibilities are:

01 A 32-bit instruction begins in the second slot
10 A 32-bit instruction begins in the first slot
11 A 16-bit instruction begins in the first slot, and a 32-bit instruction
   begins in the second slot.

The eleventh header format allows for the extended instruction set, in the same fashion as the tenth header format, and permits the instructions which may be the target of a branch instruction to be indicated.


The twelfth header format allows for the extended instruction set, in the same fashion as the tenth header format, and permits the instructions which are prefixed to be indicated.


The thirteenth header format provides both for explicitly indicating instruction dependencies and for instruction predication, and thus it provides for full VLIW (Very Long Instruction Word) operation.

The break field operates in the same manner as in the second header format.

As well, S bits, flag bits, and predicated bits are present.

Three sets of these bits are present, so with this header format, there may be instructions present in the same block the execution of which is controlled by three different flag bits. It is also possible for there to be instructions controlled by the same flag bit, but in opposite senses, within the same block using this format.

In each of those three sets, there are only six predicated bits, since only six instruction slots remain after a 64-bit header. The flag field is four bits long allowing any of the flag bits to be used to control the execution of instructions.

There is also a decode field in this header format, and so in this block format, instructions with pseudo-immediates may be present.

Note that the thirteenth and foureenth header formats are distinguished from the seventh through sixteenth header formats by having 1111 in the area coinciding with the first four prefix bits in those header formats, as a code block must begin with an instruction and not unused space, in the same manner as the ninth through twelfth (and also the thirteenth and fourteenth) header formats are distinguished from the eighth header format.


The foureenth header format allows explicit indication of parallelism and instruction predication in conjunction with the standard form of instructions, like the thirteenth header format, but in addition allows permissible branch targets to be indicated. Here, only two predication conditions are available, and all sixteen flag bits may be used in them.


This selection of fifteen possible block types, without a header, or with one of the fourteen possible headers, achieves the following goals:

No matter what the block type is, all the instructions in a block may be decoded in parallel, after the block is fetched, and the header is decoded.

A plain 32-bit instruction format, which need not require any overhead, an extended 32-bit instruction format, which need only require a header occupying a single instruction slot (of which it could be argued that only four bits are overhead, the others being part of the following instructions) as overhead, an extended instruction format which includes the instructions of the extended 32-bit instruction format as well as independent 16-bit instructions and instructions that are 48 bits and 64 bits in length, and an instruction set composed chiefly of 16-bit instructions which can be used with a header that is only sixteen bits long, are available.

The degree to which instructions may be executed in parallel may be indicated explicitly, and instruction predication is available, for code in the plain format, and code in the 32-bit version of the extended format, but not in the instruction formats which allow the general positioning of instructions of different lengths.

If the only reason a block requires a prefix is because one of the instructions requires an immediate value, the first prefix format lets an instruction with an immediate, subject to restrictions that are not onerous, and which therefore may be met by most instructions requiring an immediate, serve as the prefix; thus allowing the overhead in that case to be kept to an absolute minimum.

Also, when controlled branching is enabled, permissible branch targets may be indicated in conjunction with all the instruction formats, and, as well, prefixed instructions, that is, instructions that are modified by previous instructions (or prefix opcodes, as the case may be) in the instruction stream proper may be indicated in conjunction with all the instruction formats.

Registers and Data Formats

The complement of registers included with this architecture is as follows:

There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.

Registers 1 through 7 may be used as index registers.

Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.

Register 24 serves as a base register which points to an area 32,768 bytes in length.

Registers 9 through 15 may be used as base registers, each of which points to an area of 4,096 bytes in length.

At least part of area of 4,096 bytes in length pointed to by register 8 will normally be used to contain up to 512 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.

Registers 17 through 23 may be used as base registers, each of which points to an area of 1,048,576 bytes in length. This addressing format is used for 48-bit extended memory-reference instructions.

Register 16 may be used as a base register which points to an area 512 bytes in length. This is where the operands of the 16-bit memory-reference instructions used in association with blocks having a header in the eleventh header format are found.

There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.

Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.

As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.

However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits. In order to allow the floating-point registers to behave as if they are 160 bits long, the last four short vector registers are used to provide 32 additional bits to each of the floating-point registrers.

There are 16 short vector registers, each of which is 256 bits in length.

Each of these registers may contain:

As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.

These numbers all remain in these registers in the same format as that in which they appear in memory.


Two additional groups of registers exist, that should be viewed as optional features of the architecture:

The first consists of sixty-four long vector registers, where each long vector register is composed of sixty-four floating-point registers, each 128 bits in length.

The second consists of two extended register banks of 128 registers; one of 128 64-bit integer registers, and one of 128 128-bit floating-point registers. These are primarily intended to allow code to be written with a higher degree of instruction-level parallelism (ILP) for use in the block formats that offer VLIW features.

In addition to an implementation possibly not offering these as features, possibilities such as the following exist:

An implementation might offer eight-way SMT, but with the following restrictions:

Only two of the eight simultaneous threads at most may be executed out-of-order, with rename registers.

There will be only one set of long vector registers, and so only one thread may use them.

There will be only one set of banks of 128 registers, and so only one thread may use them.

There are no rename registers for the long vector registers or the banks of 128 registers. However, this does not preclude a thread using either or both of these features from being executed out-of-order in respect of the instructions that don't use those features.


Thus, for example, one situation that might arise on a core with this kind of large-scale implementation when fully-loaded might be this:

Five low-priority threads are running in-order;

One higher-priority thread is running out-of-order;

One thread uses the banks of 128 registers; it is running in-order, since it is using VLIW features to run at high speed instead;

One thread uses the long vector registers, and it is also running out-of-order so that scalar calculations within the thread will run as fast as possible.


Therefore, the way to think of these optional register banks is that they are only to be used in programs that are expected to be the only program of their kind running on the computer at a given time. Which permits them to monopolize one or both of these additional resources that allow the program to execute with higher performance.


As for how data values are stored:

Signed integer values are stored in binary two's complement format.

Floating-point numbers are stored in IEEE 754 format.

The architecture is big-endian: the most significant bits of a value are stored in the byte at the lowest numbered address.



[Next] [Up] [Previous] [Next Section] [Home] [Other]