[Next] [Up] [Previous] [Next Section] [Home] [Other]

The Concertina II Architecture

This is now my tenth attempt to propose a successor to my original Concertina architecture. Unlike my previous attempts, however, it has a distinct chance of not being superseded by subsequent attempts, as it finally achieves my goal of achieving a high code density while minimizing overhead, without requiring multiple modes of operation or (most) other (kinds of) excessive complication - although the number of block formats used has grown considerably during the design process.


Programs consist of 256-bit blocks of program code, each of which contains eight 32-bit instruction slots, sixteen 16-bit instruction slots, or fourteen 18-bit instruction slots.

A block may be either a header block, which starts with either 0011 or 0100, or a non-header block, which may start with a 1 or any of the six other combinations of four bits starting with a 0 which do not begin header blocks.


There are four formats of non-header block. This is made possible by using the format II and format III instruction formats for these blocks. In both these instruction formats, all 32-bit instructions must begin with 1, and there are also further limits, in the case of format III, on how an instruction can begin. Therefore, in one case, replace the 1 that begins every instruction with a 0, and two types of non-header block can be distinguished. Because format III instructions also leave some initial four-bit combinations starting with 1 unused, these are available to indicate the header blocks (as it is the format III instructions that have their initial 1 bit replaced by a 0, the four-bit initial combinations used for header blocks all start with a zero).

The reason that there are four formats of non-header block is that of the two types of non-header block, distinguished by their first bit, one of them is in association with a header block that precedes a group of those non-header blocks, and as three types of header block may be used in that fashion, the following non-header blocks are divided into three types based on the specific type of header block that preceded them with which they are associated.

There also exist, to be described later, two types of block which are not non-header blocks, but which have the same form as non-header blocks that begin with 0, and the use of these blocks, and the circumstances in which they may occur, will be described later.


In the type of non-header block shown as the twelfth, the fourteenth, and the sixteenth block format, the 1 that begins the first instruction is replaced with a zero. In these blocks, the first instruction must be in format III. Because format III instructions also cannot begin with 1011 or 1100, this means that 0011 and 0100 are available as combinations of the initial four bits for block headers.


The twelfth block format, the fourteenth block format, and the sixteenth block format do not need to be distinguished from each other, despite the absence of persistent mode bits in the architecture.

These dependent non-header blocks exist in order to allow a low-overhead block format, in which the three-bit decode fields for several blocks of code are contained in a single 32-bit header which serves as the only header needed for a group of 256-bit code blocks.

Since the decode value is in the first header block on which these dependent non-header blocks depend, the non-header blocks can't be decoded on their own; essential information for decoding them is contained in that preceding block. Therefore, there are restrictions placed on branching to a dependent non-header block.

Given that the header block that begins a low-overhead block group contains essential information for decoding the following dependent non-header blocks, there is no reason why other essential information for decoding those blocks could not be placed in it in addition to the decode field.

Thus, the header block, by whether it is in the eleventh block format or the fifteenth block format on the one hand, or the thirteenth block format on the other, indicates if the following non-header blocks will use the first bit of each of their instruction slots, except the first, as a B bit.

As well, while the instruction in the first instruction slot of a dependent non-header block must be of format III, in order to make enough bit combinations available for the beginnings of the various header block formats, that first header block may also specify an instruction format for the remaining instructions in the subsequent dependent non-header blocks, as well as the instruction format in that block itself.


In addition, in all the instruction formats, no 32-bit instruction may begin with 1111111, and this means that the initial combinations 0111111 and 1111111 are also available to begin header blocks; the first of these is used for some additional header formats, the seventeenth through the twenty-second, and now as well the second is used for the twenty-third header format.

As well, the fact that no format I instruction may begin with 1100 is used in the second instruction slot in a block to distinguish the tenth block format from the ninth block format.


Programs are organized into blocks in this way in order to allow, in the case of some of the possible block formats, instructions to be longer than 32 bits in length while allowing instructions to be fetched and decoded in parallel as if every instruction was the same length.

The usual scheme by which this is achieved is as follows:

As an aligned 256-bit block of memory within program code is to be fetched as a unit, after a header indicates which of the 32-bit instruction slots in that block actually contain instructions, those instructions can, of course, all be decoded in parallel if none of them were longer than 32 bits.

So as to include instructions that are longer than 32 bits in a program, and yet still allow parallel decoding, only the first 32 bits of those instructions are placed in that one of the instruction slots indicated as being decoded which corresponds to that instruction's position in the program sequence.

The first 32 bits of an instruction longer than 32 bits will contain one or more short pointers, four or five bits in length, to the remaining portion or portions of that instruction.

This is why the header indicates how many instruction slots contain the first 32 bits of an instruction, so that space not used in that fashion is available for the remaining portions of instructions.

Two kinds of pointers exist: pointers to the supplemental portion of an instruction, designated pSupp, which are four bits long, and which point to a portion of the block aligned on a 16-bit boundary, and pointers to immediate values, designated pImm, which are five bits long, as they may point to an item aligned on only a byte boundary.

In the fourteenth block format, 16-bit instruction slots which contain the start of an instruction are indicated by the corresponding bit being set to 1 in the instruction start field of the header.

The second block format also uses a different scheme; instructions may be decoded in parallel, despite varying in length, because every 18-bit instruction slot indicates what is to be done with it; those that start with 0 contain 18-bit instructions; those that start with 10 contain the first 16 bits of an instruction that is 32 bits or longer, and those that start with 11 contain an additional 16 bits of the remaining bits of such an instruction.

A modified version of the original scheme is combined with this second scheme in the fourth block format, in which the block is also divided into instruction slots that are 18 bits in length. The unused portion of the block is still accessed in terms of the 8-bit byte and the 16-bit halfword, despite the imperfect fit making wasted space likely.

This scheme is again modified in the fifth block format, in that the block is now divided into instruction slots that are 36 bits in length; once again, the original scheme and the second scheme, as modified, are combined.

In the second block format, while the absence of any decode field means that instructions with pImm fields may not be used, other instructions, that have a pSupp field in order to be longer than 32 bits may be used; here, the pSupp field does not contain a pointer, but instead indicates the length of the instruction:

0000 48 bits
0001 64 bits
0010 80 bits
0011 96 bits

The fourth block format does have a decode field, which is four bits long instead of three as the instruction slots in these formats are only 18 bits long, and instructions with a pImm field may be used in those formats. As well, an instruction with a pSupp field, if it spans only two instruction slots, may use that pSupp field normally as a pointer, or it may span more than two instruction slots as in the second block format; in the latter case, the pSupp field must contain all zeroes.

The fifth block format also has a decode field; however, it is three bits long, as in that block format, the instruction slots are thirty-six bits long. Because 18-bit short instructions must be used in pairs, two things are gained: the second 18-bit short instruction of a pair may optionally set the condition codes, as one extra bit is available, and 32-bit instructions are now accompanied by a three-bit supplementary opcode field, to allow access to special instructions that extend the normal instruction set.

The F bit in the fourth and fifth block formats indicate the instruction format to be used with it as follows:

0 Format I
1 Format IV

For block formats two through eleven, thirteen, fifteen, and seventeen through twenty-three, the goal of allowing instructions longer than 32 bits to be included in programs, while also allowing all the instructions in a block to be fetched and decoded in parallel is achieved in one of two ways. Both ways apply to the fourth and fifth block formats.

Of these block formats, the third and most of those from the sixth block format onwards achieve this by including a decode field in the block header. This indicates how many of the 32-bit instruction slots following the header are to be ignored when decoding instructions, because the space in those instruction slots will be used for other purposes.


Instructions that are effectively longer than 32 bits contain pointers which normally point to some area within the unused portion of the instruction block.

These pointers can be of two kinds.

A four-bit pointer, marked pSupp in the diagrams of instruction formats, points to additional bits belonging to the instruction proper, which can be any multiple of 16 bits in length up to 192 bits (the maximum that leaves room for the 32-bit header and one 32-bit instruction in the block) and which is aligned on 16-bit boundaries.

A five-bit pointer, marked pImm in the diagrams of instruction formats, pointing to an immediate value used by the instruction. This points to bits in the format of any data type used by the computer. The pointer can point to any byte within the block; the data should be aligned as appropriate based on its length.


The second, fourth, and fifth block formats achieve this instead by indicating within each 18-bit instruction slot what the decoder is to do with its contents. If the instruction slot begins with 0, the slot contains an 18-bit short format instruction. If the instruction slot begins with 10, the slot contains the first 16 bits of a 32-bit instruction. (However, the fourth and fifth block format also have a decode field, allowing the pImm pointers to be used in that format, and, if an instruction with a pSupp field occupies only two 18-bit instruction slots, then its pSupp field is taken as being a pointer.)

The remaining portions of a 32-bit instruction are in instruction slots which begin with 11; these don't initiate any decoding action, instead the decoding of their contents, if it takes place, does so under the control of the decoding of an instruction slot that began with 10.


The various block formats are as illustrated below:

The information shown to the left of the numbers of the individual block formats is supplementary information relating to the use of dependent non-header blocks or blocks which have the same appearance as those blocks, and will be explaned later.


The features of the various block formats are summarized in the table below:


123456789 10111213141516 17181920212223
Instruction Format IIIII/IVI/IVIIVIII anyIII/anyII or IIIIII/II or IIIanyIII/any II/IVanyIVI/IVI/IVII
Short Instructions 16 bit18 bit-18 bit18 bit----- 16 bit*16 bit*--16 bit*16 bit* -16 bit16 bit-16 bit16 bit16 bit
Long Instructions -S-S IS I S IS IS IS IS I S IS IS IS I S IS IS IS IS IS IS I S IS I
Overhead 14.574.576.716.7110.6710.6710.674.5710.67 0.690.691.581.581.151.15 10.6710.674.574.574.574.575.57
Explicit Indication of Parallelism -----UDBUDBUDBUDBUDB --BB-- U/D/B--U/D/B---
Predication --4*--8816*-4* ------ 1616=---4*

The different instruction formats differ only in the basic 32-bit memory-reference instructions. The features these formats offer for those instructions are:

In Short Instructions, for the eleventh and twelfth block formats, as well as the fifteenth and sixteenth block formats, 16 bit is followed by an asterisk, because short instructions may or may not be available (but will be 16-bit short instructions if they are available) depending on the instruction format chosen.

In Long Instructions, S means that instructions inherently longer than 32 bits which have a pSupp field are available; S I means that both those instructions and pseudo-immediate values indicated with a pImm field are available.

In Overhead, a number indicating the number of overhead bits additional to the bits in the instructions themselves is given. This figure is based on instructions being 32 bits long as far as is possible, and otherwise is an illustrative best-case figure. The bit made available because the instruction set used only makes use of half the opcode space is counted as overhead.

In Explicit Indication of Parallelism, UDB means that each instruction has a bit associated with it that indicates each of the following things: if that instruction has another instruction that depends on its result, if that instruction depends on the result of a preceding instruction, or if the instruction, even in the absence of a dependency issue, cannot start on the same cycle as the preceding instruction. U/D/B means that a two-bit field is available for each instruction that can indicate any one of those conditions applies to that instruction. B means that only the break bit is provided to group instructions that can be executed simultaneously, the processor must still keep track of dependencies itself.

In Predication, the number is the number of predication flags that can be used; if it is followed by an asterisk, only one sense bit is present, so either all the flags permit execution if they are set, or all the flags permit execution if they are reset.


In the third, ninth, eleventh, thirteenth, nineteenth through twenty-first, and twenty-third block formats, the three-bit decode field contains a number from 0 to 6, and in the sixth, seventh, eighth, tenth, fifteenth, seventeenth and eighteenth block formats (where the header portion of the block is 64 bits long instead of 32 bits long), the three-bit decode field contains a number from 0 to 5, which indicates the number of 32-bit instruction slots, starting from the end of the block (the higher addresses) which do not contain instructions, and thus are not to be decoded, processed, and executed as instructions.


In the eleventh and thirteenth block formats, the six fields d1 through d6 each contain a number from 0 to 7 indicating the number of 32-bit instruction slots not to be decoded, starting from the end of the block, for up to six non-header blocks which directly follow the header block with a header in that form.

The fifteenth block format, also related, has seven fields d1 through d7 which have the same function, for up to seven non-header blocks which may follow a header block of that format.

Thus, in the case of the eleventh and thirteenth block formats, only one 32-bit instruction slot needs to be taken away as overhead to indicate the number of instruction slots used for supplementary information in as many as seven 256-bit blocks of program code (in the fifteenth block format, two slots are used for as many as eight blocks of program code).

The eleventh and thirteenth block formats indicate that no additional information is included with the instructions in that block and the dependent non-header blocks that follow; the thirteenth block format indicates that the first bit of the instruction slots in the header block itself, and in all but the first instruction slot in the dependent non-header blocks that follow, is a B bit. This allows the instruction stream to be divided into groups of instructions which may be executed simultaneously; dependencies, however, are not indicated but are left to the computer to detect.

The eleventh and fifteenth block formats contain a two-bit field labelled F. This field indicates the format of instructions in the header block itself, and in all but the first block of the following dependent non-header blocks. This field is coded as follows:

00 Format I
01 Format IV
10 Format II
11 Format III

This ordering is because Format I and Format IV are very similar; the numbering by which they are referred to simply follows the order in which it seemed appropriate to introduce them.

In the thirteenth block format, there is also a field labelled F, but it is one bit long. Here, it is interpreted as follows:

0 Format II
1 Format III

and again it governs the format of instructions in the header block itself and in all but the first instruction slot of the immediately following dependent non-header blocks which depend on that particular header block.


There are four kinds of non-header blocks.

The first block format shown in the diagram is that of independent non-header blocks. These use format II instructions; here, all instructions that are 32 bits or longer must begin with a 1, so as to distinguish them from 16-bit instructions.

In these blocks, the first instruction slot may only contain 32-bit instructions. The remaining instruction slots may contain either a 32-bit instruction, or a pair of 16-bit instructions. Instruction slots containing a pair of 16-bit instructions are indicated by beginning with a 0; as no distinctive indication is required for the second 16-bit instruction of a pair, the first bit is instead used as a C bit, allowing the second 16-bit instruction to modify the condition codes.


The twelfth block format is that of the non-header blocks which are associated with a block in the eleventh block format.

The fourteenth block format shown in the diagram is that of the non-header blocks which are associated with a block in the thirteenth block format.

The sixteenth block format is that of the non-header blocks which are associated with a block in the fifteenth block format.

The first instruction in a dependent non-header block in the twelfth, fourteenth, or sixteenth block formats is in format III; this allows the initial bit combinations 0011 and 0100 to be available to indicate block headers. (As well, the initial combination 0111111 may indicate a block header, due to all the instruction formats excluding instructions which begin with 1111111.)


The following restrictions on branches apply to dependent non-header blocks in both the twelflth block format or the fourteenth block format; they also apply to those blocks when in the sixteenth block format, but with certain exceptions, which will be explained in the description of the fifteenth block format, which will shortly follow:

The target of a branch instruction must be in either a header block, or an independent non-header block (a block in the first block format), and never in a dependent non-header block (a block in the tenth block format), with one exception, as follows:

A branch instruction may have its target in a dependent non-header block if:

Thus, one may think of an implementation as containing an internal register storing the seven decode fields from the last header encountered, as well as information concerning the position of that header in storage, used to permit branches within the group of instruction blocks those decode fields apply to without having to re-read the header block.

Note that it is generally not permitted to branch forwards to subsequent blocks, even though they may have their decoding controlled by that group of decode fields, as that might not be the case; possibly a header block intrudes between the block with the branch instruction and the block containing its target, so that some of the decode fields are not used. However, as this problem cannot arise if the target is in the block immediately following, and that block is in fact a dependent non-header block, branches forwards to that limited extent are allowed.

The intent is that it is to be possible to enforce the condition that a branch to a dependent non-header block will certainly branch to an instruction within a dependent non-header block for which the decode value is known, and that this enforcement can be accomplished without reading any program code blocks other than the one containing the target instruction.

While one can confidently expect that the internal register which contains the offset values for dependent non-header blocks and other required information (specifically, the location of the associated header block) will be saved and restored by an interrupt service routine, no such expectation is possible for a subroutine call instruction. Hence, subroutine calls may not be made within dependent non-header blocks (except as the last instruction in a dependent non-header block followed by a header block), nor may the last instruction in a header block followed by a dependent non-header block be a subroutine call instruction.


Not only is a dependent non-header block restricted from being the target of a branch except under limited circumstances, an error will also be produced if code falls through into a dependent non-header block from any type of block except another dependent non-header block, or a header block in the eleventh, thirteenth, or fifteenth block formats, the ones explicitly intended to work with dependent non-header blocks.

It is also an error to have more than six consecutive dependent non-header blocks following a header block in the eleventh or thirteenth block formats, or more than seven consecutive dependent non-header blocks following a header block in the fifteenth block format.


The restriction on branching into dependent non-header blocks from the outside limits the usefulness of these header formats, which provide a way of minimizing overhead and yet using a decode field with each block to allow instructions to have both supplemental portions and pseudo-immediates.

The fifteenth block format attempts to address this as follows:

Instead of a 32-bit header containing seven extra decode fields for successive non-header blocks, there is a 64-bit header, which, in addition to eight extra decode fields for successive non-header blocks, also contains a jump table with four entries.

Four instructions, either in the header block in the fifteenth block format, or in the following dependent non-header blocks associated with that header block, can be entered into that table.

An external jump instruction that branches to one of the last four 32-bit instruction slots in the header block will instead branch to the instruction specified in the corresponding position in the jump table.

The reason the last four positions are re-purposed instead of the first four is so as not to restrict the use of a Jump to Subroutine instruction as the last instruction of the preceding block, which may be in any block format.

Jump table entries are six bits long, and have this form: the first three bits, if 000, indicate an instruction in the header block in the fifteenth block format itself, and if greater indicate the first through seventh dependent non-header blocks following in that order; and the last three bits indicate which instruction slot, of the eight in a block, is the target of a branch.

In this way, a branch is possible to instructions in the dependent non-header blocks by having that branch instruction first trigger the reading of the header block, which has the necessary information for permitting the branch to take place, before the actual block containing the instruction itself is read.


In addition to allowing this low-overhead method of providing decode fields to be used with code containing branch targets, it also restricts which instructions may be branch targets.

By default, that restriction is imperfect, as the first four slots in the header block itself can still always be branch targets. Since placing restrictions on branching may be useful for security purposes, the header block also contains two bits labelled T and R. If the T bit is set to 1, transfers to the first four blocks in the header block are allowed; if 0, they are disabled. If the R bit is set to 1, direct branching to instructions in the dependent non-header blocks within the restrictions given above is also disabled.


Also note that while normal branch instructions can be prepared in advance to point to the header block instead of the actual location of the instruction, it is not workable to make Jump to Subroutine instructions put the proxy address for accessing the jump table entry as the return address, and so subroutine calls still may not be included in code in this format (and the restriction in this regard on the header block itself now covers the last five instruction slots instead of just the last one).

It is not absolutely impossible from a technical point of view to design hardware so as to have subroutine call instructions within blocks of types fifteen and sixteen use a proxy address for their return address, but it seems inappropriate to place that level of complexity in the execution of that instruction: while block decoding is definitely complex, dealing with that complexity is in most respects confined to the process of initially fetching and interpreting a block of program code.

This, however, does not mean that it is impossible to call subroutines from within code in this form. Instead, one could use two instructions: one to place the proxy address in the header block that causes a bounce jump to the correct location in the register to be used for the return address, and then a Jump instruction instead of a Jump to Subroutine instruction to go to the subroutine.

And, indeed, that sequence of two instructions could be made to look like a single instruction to the programmer by means of an assembler macro.

The twenty-first block format provides an alternative option for preventing any instructions, other than those explicitly permitted, in a block from being the targets of a branch, without the overhead of having to fetch both a header block and then the dependent non-header block containing the actual instruction.

This 32-bit headers contain one three-bit decode field, a four-bit length field, a seven-bit split field to indicate instruction slots which contain two 16-bit short instructions, and a seven-bit target field to indicate which of the remaining instruction slots contain instructions to which branching is allowed: branching is permitted to an instruction slot if the corresponding bit in the target field is a 1. Note that such an instruction slot may contain a pair of 16-bit short instructions; in that case, only a branch to the first of those instructions is permitted in that way.

Blocks in this format could be used in combination with blocks in the fifteenth and sixteenth block formats so as to combine code that is as compact as possible with low overhead with efficient rapid branching that does not involve fetching an extra block in addition to the one containing the destination instruction.

The F bit in this block format indicates the instruction format to be used with it as follows:

0 Format I
1 Format IV

Blocks in the twenty-first and the tenth block format may be followed by blocks which have the appearance of a dependent non-header block. These blocks are instead actually disguised header blocks, none of the instructions in which may be branch targets.

The intention here is to allow code which is protected against branching to use a greater selection of block formats.


The letter A preceding the numbers for the eleventh, thirteenth, and fifteenth block formats indicates these blocks may be followed by dependent non-header blocks as shown in the following line.

The letter B preceding the numbers for the twelfth, fourteenth, and sixteenth block formats indicates these blocks may be followed by further dependent non-header blocks in the same format as they have.

The letter C preceding the numbers for the tenth and twenty-first block formats indicates that they may be followed by blocks which begin with either 0101 or 0110, which are disguised header blocks in the second through ninth block formats which normally begin with 0011 or 0100 respectively, so as to allow other block formats, in addition to the tenth and twenty-first block formats, or the fifteenth and sixteenth block formats, to be used within code that is to be protected against branching.

The column preceding the letters shows the alternate first four bits of code blocks to be used for this type of disguised code - or the first seven in the case of block formats where these are the bits not used in Format III instructions with the first bit inverted.

Since the tenth and twenty-first instruction formats themselves contain a target field, it is not necessary to invoke their transformed form by means of this technique. Revised starting bits are shown for these formats, but with a dotted border, indicating that they are only used for the case indicated by the letter E.

The letter D precedes the number of the nineteenth block format.

The nineteenth block format contains a decode field, a length field, and a split field which have the same functions as found in other block formats.

The F bit in this block format indicates the instruction format to be used with it as follows:

00 Format I
01 Format IV
10 Format II
11 Format III

Since a split field is present, when Format II or Format III instructions are selected, both 16-bit instructions in a 32-bit instruction slot have a condition code bit available when a bit in the split field is set for that slot, just as with Format I or Format IV instructions.

It also contains an I bit and an S bit. This block format does not provide predication, so the S bit is not a sense bit here.

What is indicated by the letter D is that blocks in this format may be followed by exactly one block which has the appearance of a dependent non-header block, and which begins with 0000.

This block provides additional unused space to be used with instructions.

The I bit indicates that the pointers to pseudo-immediate values in instructions in the block will point into this following block, instead of unused space in the block containing the instructions.

The S bit indicates that the pointers to the supplentary portions of instructions in the block will point into this following block, instead of unused space in the block containing the instructions.

This is done to provide an alternative which may save space in occasional unusual cases where the distribution of additional data needed to be used with an instruction creates an awkward situation where a considerable amount of space has to be wasted by being left unused.

The letter E precedes the number of the fifth block format. This block format was described above; it includes a three-bit supplemental opcode field with 32-bit instructions, allowing access to an extended instruction set.

When a block appearing to be a dependent non-header block follows it, once again it is a transformed header block, as in the case indicated by the letter C, but this time the instruction set is modified, to provide a further extension to the instruction set.


The third, sixth through tenth, seventeenth, eightenth, twentieth and twenty-third block formats provide for including additional information with each instruction, for purposes of either the explicit indication of parallelism, instruction predication, or both.


In the ninth block format, for each of instruction slots 1 through 7 in the block (the first instruction slot, containing the header, is slot 0) three bits are provided, U, D, and B, and in the sixth, seventh, eighth and tenth block formats, these bits are provided for each of instruction slots 2 through 7 in the block.

If the D bit corresponding to a given instruction slot is set, this indicates that the instruction in that slot depends on the result of some previous instruction.

If the U bit corresponding to a given instruction slot is set, this indicates that some subsequent instruction depends on the result of the instruction in that slot.

The offset field is used to connect D bits to U bits. The value it contains is the number, on entry to the block, of additional D bits which are set that intervene between a given U bit that is set, and the corresponding D bit that is set.

The B bit is used to indicate instruction slots which contain instructions that do not need to be delayed for a dependency, but which cannot execute simultaneously with preceding instructions due to another cause, such as a resource conflict, and therefore do need to be delayed somewhat, but only to the extent that they are required to start on the next cycle.

The offset value is used to determine which U bit that is set is associated to a given D bit. It is the number of additional D bits that are set which will be found between a U bit that is set and the D bit that is set which corresponds to it.

The sixth, seventh, and eighth block formats work the same way as the ninth block format, but their headers are 64 bits long, leaving only six instruction slots in the block available, so that they can provide for predicated instructions.

In the sixth and seventh block formats, corresponding to each of the six instruction slots which may contain instructions, there are the following fields:

A P bit, which is 1 if the instruction is predicated. Otherwise, if that bit is zero, the instruction is always executed unconditionally.

A set of three flag bits, which indicates the number, from 0 to 7, of the flag bit to be used to control the execution of the corresponding instruction, if it is predicated.

An S bit; if it is 0, normal predication is in effect; the instruction will execute when predicated only if the selected flag bit is set; if it is 1, inverse predication is in effect, the instruction will execute when predicated only if the selected flag bit is cleared.

The seventh block format only differs from the sixth block format in that format IV instructions are used with it.

In the eighth block format, there is only one S bit shared among all the instruction slots, in order that the flag bit field for each instruction can be expanded to four bits in length, allowing the use of all sixteen available flag bits, numbered from 0 to 15.

The seventeenth block format allows the explicit indication of parallelism to be combined with full predication, where all sixteen flag bits are available, and with an individual S bit for each instruction slot. This is achieved without increasing the size of the header beyond 64 bits by using a more compact format for the explicit indication of parallelism.

Only the U and D bits are present for each instruction slot; if both are set, the instruction slot is treated as though only the B bit is set for that slot. This takes advantage of the fact that it is often not necessary to set more than one of those three bits for any instruction.

The eighteenth block format also allows 16-bit short instructions to be combined with either format I or format IV instructions; in addition, predication is allowed. Both 16-bit instructions in an instruction slot are affected if that instruction slot is predicated. There was room to allow full predication, with all sixteen flag bits available and an individual S bit, indicating that a set flag bit inhibits rather than enabling execution, is available for each instruction slot. As well, the length field is included. For normal operation, the length field must contain 0000; the purpose of the length field and its use will be explained in the description of the fourth and fifth block formats.

The F bit in this block format indicates the instruction format to be used with it as follows:

0 Format I
1 Format IV

The twentieth block format, like the ninth block format, provides the explicit indication of parallelism, but it omits the B bit; thus, to indicate what a B bit is normally used to indicate, both the U and D bits are to be set. This block format allows format IV instructions to appear in a block with the explicit indication of parallelism that has a header which is only 32 bits long.


The third and twenty-third block formats allow predication without the explicit indication of parallelism: here, one S bit is shared among all the instruction slots that may contain instructions, and the flag fields are two bits long, so that flag bits 0 through 3 are available for use.

The twenty-third block format is distinguished from the third block format in that it works with format II instructions, and each instruction slot may also contain a pair of 16-bit short instructions if its contents start with a zero; the first bit of the second 16-bit short instruction in an instruction slot is used as a C bit for that instruction.

The tenth block format also both shares one S bit among all the instruction slots and uses only two bits to indicate the flag; this block format does also allow the explicit indication of parallelism; its special property is that it includes a six-bit target field; if a bit in the target field is set to 1, then the corresponding instruction slot is allowed to be a branch target; this allows control over branch destinations to be used in code requiring the explicit indication of parallelism for performance.

The second, fourth and fifth block formats each occupy two lines in the diagram, as opposed to the other formats for header blocks, which only occupy one line, but like the first format for non-header blocks at the top of the diagram.

Also, while the other header formats are either 32 or 64 bits long, headers in the second block format are four bits long, and headers in the fourth and fifth block formats are 22 bits long.

These three forms of the header allow short-format instructions, which are 18 bits long, to be included in programs.

Since instruction slots are 18 bits long in these two block formats, a question arises of how program instructions are to be addressed for purposes of jump instructions. The answer is that each 18 bit instruction slot is given the address that it would have had if the 18-bit instruction slots were instead all only 16 bits long, and were aligned to the end of the block.

The header in the second block format consists only of the four bits 0011 by themselves. This header format is distinguished from that in the third block format because it can only be followed either by a 0 bit, which begins an 18-bit short format instruction, or by the bits 10, which precede a 32-bit instruction.

Such a header is followed by fourteen instruction slots, each 18 bits in length, to make up a 256-bit program code block.

Each instruction slot may contain:

As noted above, only instruction slots that start with 0 or 10 initiate decoding, so instruction slots which begin with 11 may also be used as padding.

In this header format, no decode bits are provided.

This means that 32-bit instructions used in blocks with this header format may not contain any pImm fields for immediate values.

They may, however, contain pSupp fields. In that case, the pSupp field is to contain all zeroes, and the supplementary portion of the instruction to which that field would have pointed will instead be contained in additional instruction slots starting with 11 following the two instruction slots starting with 10 and 11 which contain the primary 32 bits of the instruction.

Also, since the second block format is distinguished from the fourth and fifth block formats, as well as the third block format, and even the sixth through the eighth block formats, by the fact that the four-bit header 0011 of the second block format may not be followed by the two bits 11, instructions may not cross block boundaries in the second format. As well, they may not cross block boundaries in the fourth and fifth block formats either. (Instructions may also not cross block boundaries in the other block formats the headers for which are shown in the diagram, but in those cases, that is an inherent consequence of the block format, rather than a specific additional restriction imposed on it.)


The headers in the fourth and fifth block formats are 22 bits long; thus, they occupy the first four bits of the block, and the first 18-bit instruction slot in the block.

This header format also allows short-format 18-bit instructions and regular 32-bit instructions to be mixed.

Since this header format does contain a decode field, this time four bits long, to indicate a number from 0 to 12 specifying how many 18-bit instruction slots from the end of the block are not to be decoded, 32-bit instructions may now contain pImm fields.

As well, if they contain pSupp fields, the pSupp field may either contain a zero, in which case the supplementary portion of the instruction is contained 16 bits at a time in subsequent 18-bit instruction slots, or it may contain a nonzero value which acts as a pointer in the normal fashion.

Of course, given instruction slots that are 18 bits long, and data built around the 8-bit byte being placed in the remaining portion of a block, some unusable space will often be present in this block format.

Since these block formats, unlike the second one, are not distinguished from other block formats by having only 0 or 10, and not 11 following the header, it might be thought that it would be appropriate to allow instructions to cross block boundaries in these block formats.

This would mean that instructions could also cross the ending boundary of blocks in the second format, as long as the following block is in the fourth format rather than the second one.

However, since these formats include a decode field in the header, so that instructions including a pImm field are allowed, and to maintain consistency between the second and fourth block formats, it has been decided that it is most appropriate not to permit instructions to cross block boundaries here either, even though that would allow a closer approach to an unblocked experience.

A special feature of the fourth and fifth block formats is the length field contained in its header. In normal operation, this field must contain all zeroes; the value 0001 is used for a form of operation that is close to normal operation, except that the Medium floating-point data type is replaced by 80-bit extended precision.


When it contains a value greater than 1, the operation of the computer will be affected as follows:


The least-significant five bits of the contents of base registers will not be used when adding displacements to those contents to form memory addresses.

Instead, the contents of those bits will indicate how the memory to which they point is organized.


If those bits are all zeroes, the memory is organized normally, and is either used for program code or for data organized around the 8-bit byte, which does apply to some of the alternate values of the length field.

If those bits contain the number 1 or 5, aligned 256-bit blocks of memory will be addressed in groups of three and no bits in them will be unused, thus favoring rapid access to data that is 24, 48, or 96 bits in length, particularly on implementations with triple-channel memory.

If those bits contain the number 2 or 6, each aligned 256-bit block of memory will have four unused bits at its beginning (the most significant bits of the byte at the lowest address), and the remaining bits will be divided into forty-two characters each six bits in length, thus favoring rapid access to data that is 36 or 72 bits in length.

If those bits contain the number 3 or 7, each aligned 64-bit doubleword of memory will have four unused bits at its beginning (its most significant bits, or the most significant bits of the byte at the lowest address: for it is in an architecture that is big-endian that our scene lies) so that each 256-bit block of memory will be divided into forty characters each six bits in length, thus favoring rapid access to data that is 60 bits in length.

Branch, Conditional Jump, and Jump to Subroutine instructions will continue to address memory in bytes, and are to be used with base registers which contain all zeroes in their least significant eight bits. This is also true of the Load Multiple and Store Multiple instructions, as they are to always operate on the full length of registers.

All instructions that address data will either address memory in 12-bit storage units, if they are used with base registers which contain either 00001, 00010, or 00011 in their least significant five bits, or in 6-bit characters, if they are used with base registers which contain either 00101, 00110, or 00111 in their last five bits.

In this case, the displacement within the instruction and the values in any index register used are in units of 12-bit storage units or 6-bit characters, while the address in the base register, once its least significant eight bits are masked out, is still in bytes. The address in the base register points to a 256-bit block of physical memory, while the portion of the address that is in storage units ignores the unused bits of each 256-bit block.

The intent of the three possible values at the end of the base register for 6-bit addressing is, in the case of the value 1, to make accessing aligned 48-bit values fast and efficient, in the case of the value 2, to make accessing aligned 36-bit and 72-bit values fast and efficient, and in the case of the value 3, to make accessing aligned 60-bit values fast and efficient.

When the length field contains the number 8, most of the same changes to memory addressing noted above will be made in the same manner, but now it is recommended that instructions that access data use base registers the contents of which end in 010000, so that displacements within instructions and index register contents will be in units of 9-bit characters.

Using 6-bit addressing is also an option, at least for accessing floating-point numbers, if the length field contains 8; 12-bit addressing would not be compatible with 54-bit Medium floating-point variables.

However, using such a base register when dealing with character data would cause problems.

Finally, in addition to ending the base register contents with 00000 for normal access to memory, if the length code is either 0010 or 0011, another option is to end base register contents with 10000; in this case, the data to which the base register is pointing is in little-endian format.

In order for little-endian versions of data accessed by addresses in units of 6 bits, 9 bits, and 12 bits all to be interoperable, it would be necessary for the individual bits of the data to be reversed in memory; it is not clear to me at the moment whether support for little-endian data in non-power-of-two bit lengths is a useful feature.


The lengths of variables of the different floating-point types, for the different length values in the block header, are:

        Single  Medium  Double  Extended

0000, 0010        32      48      64       128
0001, 0011        32      80      64       128

0100              36      48      60       120
0101              36      48      60        96
0110              36      48      72       120
0111              48      72      96       120

1000              36      54      72       108

For integer types, they are:

            Byte Halfword Word Long

0, 1, 2 or 3   8       16   32   64

4, 5, 6 or 7   6       12   24   48

           8   9       18   36   72

The exponent field of 36-bit floating-point numbers is one bit longer than the exponent field of 32-bit IEEE 754 floating-point numbers; otherwise the formats are similar except that the significand is three bits longer.

Note that with a length code of 1, 80-bit extended precision numbers are accessed using the Medium floating-point instructions; this keeps their alignment the same. In general, floating-point formats based on either 6-bit addressing or 9-bit addressing do not follow the scheme of alignments applicable to the corresponding standard types with length code 0. Length codes 7 and 8 approach this, for the single, medium, and double sizes of floating-point numbers, but because extended-precision floats cannot be longer than 128 bits, they cannot continue the scheme consistently by having extended precision floats that are 192 and 144 bits in length respectively.

On the other hand, alignments for integer variables are consistent across all length values, provided that for length values from 4 to 7, 6-bit addressing rather than 12-bit addressing is used, so that character data is accessible.

The fifth block format is distinguished from the fourth block format only in that it works with format IV instructions instead of format I instructions.


Finally, the twenty-second block format divides the block into 16-bit instruction slots. Instruction slot 0 contains the header, and instruction slot 1 always contains the first 16 bits of an instruction; therefore, instructions may not cross block boundaries in this format.

For the remaining instruction slots, the corresponding bits in the instruction start field of the header are 1 if and only if that instruction slot contains the first 16 bits of an instruction.

Because the instruction start bits indicate in which individual 16-bit instruction slot each instruction begins, 16-bit instructions and 32-bit instructions may be freely mixed in sequence in these block formats; thus, short instructions do not have to occur in pairs in this block format.

This means that the decode field is once again four bits long, as it was for the fourth and fifth block formats, but here it refers to actual 16-bit instruction slots rather than 18-bit instruction slots mapped to the addresses of 16-bit locations.

The F bit in this block format indicates the instruction format to be used with it as follows:

0 Format I
1 Format IV

Registers and Data Formats

The complement of registers included with this architecture is as follows:

There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.

Registers 1 through 7 may be used as index registers.

Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.

Register 16 may be used as a base register pointing to an area of 32,768 bytes in length.

Registers 18 through 23 may be used as base registers, each of which points to an area of 4,096 bytes in length.

At least part of area of 4,096 bytes in length pointed to by register 18 will normally be used to contain up to 512 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.

Also, registers 8 through 15 may be used as base registers each pointing to an area 1,048,576 bytes in length for extended memory-reference instructions.

There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.

There are 32 type extension registers, each of which is 32 bits in length, and each of which is associated with a floating-point register.

Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.

As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.

However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits. The type extension registers allow the floating-point registers to behave as registers which are 160 bits in length for programs which use such data types.

There are 16 short vector registers, each of which is 256 bits in length.

Each of these registers may contain:

As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.

These numbers all remain in these registers in the same format as that in which they appear in memory.

Also, there is a primary register group of eight long vector registers, and a scratchpad of sixty-four long vector registers, where each long vector register is composed of sixty-four floating-point registers, each 128 bits in length.

As for how data values are stored:

Signed integer values are stored in binary two's complement format.


The standard types of floating-point numbers used by this architecture are shown below:

The 32-bit and 64-bit floating point formats correspond to that used in the IEEE 754 specification. For normal operation, a similar 48-bit floating-point format is added.

The 80-bit floating-point format, except for being stored in big-endian order, is that found on popular microcomputers, and the 128-bit format is the same format except for the significand being larger.

For code executed with a length code other than 0, additional floating-points have been defined, also based on those used in the IEEE 754 specification.

Of particular interest is the choice to lengthen the exponent by one bit for the 36-bit floating-point format. Because IEEE 754 floats have a hidden first bit, this can be done while still leaving the numbers the same level of precision as floats on the IBM 7090, and it extends the exponent range to match that of floating-point numbers on the IBM System/360; thus, this particular choice facilitates the conversion of FORTRAN programs from either of those systems. As well, the 48-bit floating-point format was given the minimum exponent length consistent with allowing the exponent range to fully include numbers from 10^-99 to 10^99; this left the format with eleven digits of precision, thus making it as comparable as possible to the numerical range made available by typical scientific pocket calculators.


[Next] [Up] [Previous] [Next Section] [Home] [Other]