[Next] [Up] [Previous] [Next Section] [Home] [Other]

The Concertina II Architecture

Welcome to the home page of the Concertina II computer architecture.

The original Concertina computer architecture was originally intended as a simple example of a conventional old-style CISC architecture, to help explain how computers work. It was expanded over time to include many features from a wide selection of historical computer architectures, to explain those as well.

Concertina II was intended as an ISA that could conceivably be of practical use in an actual implementation. However, I cannot make ambitious claims for it, as my experience in this area is quite limited. This architecture went through quite a number of drafts before I felt that I had struck an acceptable balance between the various factors that had to be compromised to provide the architecture with the capabilities I sought.

However, I believe that the current version of the ISA is a sound basis on which to proceed, and I only expect to be changing it with minor tweaks as I continue to flesh out the architecture and describe its features.

Once I have it completed, it may serve as an alternative to RISC-V, even though the designer of that architecture is far more knowledgeable and experienced than I am. This is because I feel it may at least suit some people's tastes more than RISC-V does.

Introduction

What is the Concertina II ISA, and what choices were made in its design?


The Concertina II design is still unfinished; many parts of it are yet to be described, and, although I do not intend to tear it up and start afresh, as I no longer feel I will be able to do better, it is still subject to minor tweaks.

It will be freely available to all to implement without restrictions once completed, subject to export controls on computer technology.


The basic Concertina II instruction set is largely patterned after today's most popular type of ISA (instruction set architecture) design, RISC (reduced instruction set computing), but it does not qualify as a genuine RISC design by any reasonable contemporary definition of RISC, even the least puristic.

The basic instruction set consists of 32-bit instructions, but also adds the ability to use a pair of 16-bit instructions at any point in the sequence of instructions in place of a 32-bit instruction.

This allows increasing code density by using smaller instructions for many operations, without losing the simplicity of fetching and decoding instructions gained by having all instructions of the same length.

As in many RISC designs, there are two main register files, one for integer values (with registers that are 64 bits wide) and one for floating-point values (with registers that are 128 bits wide), each of which contains 32 registers.

Also, the memory-reference instructions are of the load-store variety, following standard RISC practice.

The following extensions to the RISC model are included in the most basic portion of the instruction set:

It is precisely because base-index addressing is provided by restricting potential index registers to registers 1-7, and potential base registers to groups of 7 (which group depends on the displacement length) that this design does not qualify as RISC, and instead could be called CISC in RISC clothing.

Typically, RISC architectures normally only allow two registers to be indicated in a memory-reference instruction. One is the destination register of the instruction, and the other one is the one the contents of which are added to the displacement to form the effective address, Since a base register is needed for any memory access when the displacement is not large enough to indicate any location in the available memory, this means that the advantage of having an index register isn't available, and array access require additional explicit arithmetic instructions to compute addresses.

Thus, since the use of arrays is a very common operation, full base-index addressing was considered a very important feature to add.

In order to make it possible to provide this feature, the integer registers were split up into groups of eight so that the index register and base register fields could be only three bits long instead of five bits long, thus allowing both to fit in an instruction.

Normally, if one allocates a block of memory containing 65,536 bytes, using a base register to point to that block, it is not useful to have addressing modes that can only access the first 4,096 bytes of that block. Therefore, separate groups of registers are used as the possible base registers for different sizes of displacement values.

Only one register serves as the implicit base register for 15-bit displacements; this is done to allow one larger block of memory to be used in conjunction with those accessed with 12-bit displacements. This permits more compact memory-reference instructions, and is inspired by the System/360 Model 20 computer.


The above summarizes how the basic instruction set of this computer was designed to take the basic RISC design, and offer important extensions to it, while still having instructions that fit in 32 bits.

But a number of other extensions are also offered. These require going beyond the somewhat RISC-like model of the basic instruction set, and instead recognizing that this architecture also has VLIW (Very Long Instruction Word) characteristics.

Instructions are grouped in blocks of 256 bits, each of which contains eight 32-bit instruction slots. If feasible, an implementation aiming for maximum performance should have at least a 256-bit data bus to main memory, permitting a block of instructions to be fetched at once.

A small portion of the opcode space for instructions is dedicated to codes which represent headers instead of instructions. A block may begin with a header, and if it does, an additional header may follow it. A header may be 32, 48, or 64 bits long. 48-bit long headers are possible because some headers indicate that the instruction set to be used in the current block will not be the basic one composed only of 32-bit instructions, but instead one containing variable-length instructions, with the length of each instruction being a multiple of 16 bits.

Headers, if any, are processed before the instructions in a block are decoded.

After the headers are processed, or after it is determined that the block does not begin with a header, the computer has the information required to decode all the instructions in the block in parallel.


One of the most important features that having headers provides, which is still considered part of the basic instruction set of the Concertina II architecture, is pseudo-immediate values.

Some register-to-register instructions may have a source register specification replaced by a five-bit byte pointer to an address within the current instruction block, which points to an operand for that instruction.

This capability is supported by headers which contain a three bit decode field, which indicates that some of the eight 32-bit instruction slots in the current block are to be ignored during instruction decoding, and skipped over in execution, so that pseudo-immediate values can be placed in them.


What are pseudo-immediate values, and why are they included in this ISA? Essentially, they are inspired by the Heads and Tails design of Heidi Pan. As Mitch Alsup has reminded us all in the design of his "My 66000" ISA, immediate mode instructions have the advantage that a constant value can be used in a calculation without requiring an additional fetch of data, with all the delays and overhead of memory accesses in modern architectures, where DRAM is slow compared to processor logic.

This is because the immediate value is part of the instruction itself, and thus has already been fetched as part of the instruction stream.

But since data items come in several widths, comprehensive support of immediate values means that instructions must come in many different lengths, and I felt this would complicate their decoding to an unacceptable extent.

With pseudo-immediate values, the length of the instruction doesn't have to be changed. A pointer to the value only takes up the same space as a register specification.

But if the value is fetched from a location indicated by a pointer, it isn't an immediate value any more. Hence the term "pseudo-immediate" - given that instructions are fetched from memory in 256-bit blocks, and the data to which the pointer refers is within the same block as the instruction itself, even though the values are not actually immediate values, they still offer the same basic advantage as immediate values. (To some extent, of course, this depends on how the implementation handles the instruction stream. Specifically, to gain the full advantages of this, the entire block needs to be buffered within the processor during instruction decoding.)


In addition to pseudo-immediate values, headers allow two basic sets of features to be added to the ISA that go beyond the RISC model.

Thus, while the architecture initially has the appearance of a conventional RISC architecture, it is intended to combine the basic features and advantages of RISC, CISC, and VLIW architectures.

Note, however, that by VLIW, I mean modern VLIW architectures, such as the Itanium or, even more particularly, the Texas Instruments TMS320C6000 chip, and not the type of classic VLIW architecture the term was originally concieved of as referring to, such as that of the Control Data Cyber 200 computer.

Given that both the Itanium and the i860 were failures in the marketplace, despite being backed by the might of Intel, it is understandable that some might doubt my sanity in proposing a VLIW design in this day and age.

However, instead of including a break bit in every instruction, the break bits are in an optional header at the beginning of a 256-bit block of instructions. Implementations don't need to be designed around VLIW operation, but they can be, if they are aimed at a niche where a VLIW design is appropriate.

The Architecture

There are 32 integer general registers and 32 floating-point registers, and those instructions that perform arithmetic or logical operations include a bit for enabling changes to the condition codes as a result of those instructions. These are characteristics found in RISC architectures.

Having register banks of 32 registers allows different calculations to be intertwined in the code, and being able to control if instructions affect the condition codes allows more intervening instructions between an instruction that sets the condition codes and a branch instruction that makes use of those results. Both of these things allowed code to be designed to offer some of the same benefits as are obtained from out-of-order execution, without the hardware overhead. However, at the microprocessor clock rates in use today, these measures normally are not enough to be effective: however, if code written this way is combined with simultaneous multi-threading (SMT), then there is still the potential for competing with out-of-order execution.

Also, the architecture provides extended register banks of 128 integer registers, 64 bits in width, and 128 floating-point registers, 128 bits in width, which will also promote efficient VLIW operation.

Block Organization

Instructions are organized into 256-bit blocks which contain eight 32-bit instruction slots.

These blocks are always aligned on the boundaries of aligned 32-byte areas in memory, so an instruction slot that may contain the initial header of a block must have an address the last five bits of which are zero.

When a block header makes provision for instructions longer than 32 bits, it is possible that these instructions may cross block boundaries, depending on the rules applicable to the particular block header format in use.

The instruction set is organized so that the computer is able to fetch a 256-bit block of instructions, and, after processing any block header within the block, to determine what, if any, special processing is required, immediately begin decoding each 32-bit instruction slot independently of the others in the block.

There are a few different types of block header, which are shown in the diagram below.

Eleven types of header are illustrated in this diagram. These are those that are required to provide the functionality that will be described here.


The first type of header allows for instructions of different lengths to be freely mixed. This is achieved by having fourteen two-bit prefix fields in the header, each of which corresponds, in order, to the remaining 16-bit halves of the seven remaining instruction slots in the current instruction block.

These fields each indicate what is contained in the corresponding 16-bit area of the instruction block, and are interpreted as follows:

00 A 17-bit instruction starting with 0
01 A 17-bit instruction starting with 1
10 The start of an instruction 32 bits in length or longer
11 Not the start of an instruction

Note that the prefix value of 10 must always be followed by the prefix value of 11. An invalid combination where this is not the case is used to distinguish the other two types of header, so as to avoid the need for additional opcode space for those headers.


The second type of header indicates legitimate branch targets in the same manner as the seventh type of header, but for code with variable-length instructions of the kind allowed with the Type I header. The group and type fields have the same meaning as in the Type VII header; the target field performs the same function as the one in the Type VII header, but its bits now each correspond to a 16-bit area in the remaining portion of the current instruction block.


The third type of header provides a three-bit prefix field for each remaining 16-bit area in the current instruction block, so as to allow both 17-bit short instructions and 34-bit memory-reference operate instructions without restriction, using the normal way of handling variable-length instructions as done with the Type I header.

The bits in a prefix field are interpreted as follows:

000 A 17-bit instruction starting with 0
001 A 17-bit instruction starting with 1
010 The start of an instruction 32 bits in length or longer
011 Not the start of an instruction
100 The start of a 34-bit instruction starting with 00
101 The start of a 34-bit instruction starting with 01
110 The start of a 34-bit instruction starting with 10
111 (not used, reserved for future expansion)

The 34-bit memory-to-register operate instructions are as illustrated for the eleventh header type, but without the need for re-arrangement of bits.

Note how the third type of header is distinguished from the second type of header by making use of the one unused value for its type field.

The fifth type of header also functions as a two-operand register-to-register operate instruction, as well as a header which, with its decode field, specifies the number of 32-bit instruction slots at the end of the block which are not decoded as instructions, but are instead reserved for other purposes, such as the data values for pseudo-immediates.

The decode field is used to indicate the number of 32-bit instruction slots that are reserved for data other than instructions, such as pseudo-immediate values, for which no attempt is to be made to decode them as instructions. A value of 000 in the decode field indicates that all the remaining instruction slots are to be decoded as instructions; a value of 001 indicates the last instruction slot is to be reserved, and not decoded, and so on.

An immediate value in an instruction allows it to perform an arithmetic operation involving a constant without having to perform a fetch of data from memory in addition to the fetching from memory already performed as part of reading in the instruction stream.

An important design goal of the Concertina II architecture has been to drastically simplify the decoding of instructions; once a 256-bit instruction block has been checked for a header, and that header, if present, has been processed, all the instructions in the block can be decoded in parallel independently. The varying lengths of different data types mean that including a wide selection of instructions with immediate values would conflict with this.

A pseudo-immediate is addressed by a pointer in the instruction, which seems to be the same thing as a memory-to-register instruction making use of a constant value stored somewhere else. However, the pointer is a short-range one, which only points to a location within the same 256-bit instruction block as the current instruction is contained in.

Therefore, although it involdes a pointer reference, and thus is not "really" an immediate, hence the name "pseudo-immediate", it provides the same advantage of the constant argument having been fetched as part of the instruction stream!

This third type of header reserves space for these constants which therefore won't be decoded erroneously as instructions, and because the header is also an instruction, it lets these three bits of information be provided without the overhead of using a full 32-bit instruction slot for a header and nothing else.


It is now possible to explain the fourth type of header, after having understood the fifth type of header.

Like the fifth type of header, the fourth type of header aims at providing a way to specify the amount of space reserved for pseudo-immediates in the current block through the use of the decode field.

The fifth type of header had enough room to provide a two-address register-to-register operate instruction. But there was not enough room in that header to provide the option of starting the block with a memory-reference instruction, not even one that was severely restricted.

The fourth type of header addresses this limitation by being expanded to 64 bits. Adding a full 32 bits to the header allows it to contain two instructions, each of which has twenty-six bits available to it. This allows either or both of those instructions to be memory-reference instructions, although with severe restrictions:

Thus, this header format is not a cure-all for allowing a decode field to be provided with absolutely no overhead, but it does provide additional flexibility which still requires some effort to use.

In the event this is not clear from the diagram, the two instructions are distributed between the two 32-bit halves of the header as follows: all twenty-five bits of the second instruction are contained in the last twenty-five bits of the second word, with the bit which indicates whether the second instruction is a register-to-register operate instruction or a memory-reference instruction being the firat bit (bit zero) of the second word; the first instruction has the bit determining if it is a register-to-register instruction or a memory-reference instruction as the tenth bit of the first 32-bit word of the header; the first six bits of the twenty-five bit instruction proper are the second through seventh bits (bits 1 through 6) of the second word of the header, and the remaining 19 bits of the first instruction are the last 19 bits of the first word of the header. The intent of this somewhat complicated arrangement is to keep as many of the bits of the two instructions as possible in the same positions in the 32-bit words which contain them.

Note that in addition to being a memory-reference instruction in a restricted format, each of the two instructions in this header may also be a three-operand register-to-register operate instruction, limited to the basic standard set of operate instructions, a two-operand register-to-register operate instruction, a single-operand instruction, or a shift instruction.


The sixth type of header also attempts to provide a reduced-overhead form of header, although in this case the instruction included with the header has a format similar to that of a 15-bit short instruction, rather than attempting to approximate the functionality of a full 32-bit instruction.

In addition to the decode field, however, this header also contains a two-bit field which, as the diagram shows, determines the format of the eight bits which specify the source and destination registers for the instruction contained in the header.

What is not immediately visible in the diagram, but which is the primary purpose of this header type, is that this format also applies to all the remaining 15-bit short instructions which are register-to-register operate instructions in the instruction block that starts with this header.

In the first case, these instructions may involve any of the 32 registers in a register bank, but both the source and destination registers must be in the same group of eight registers within those thirty-two.

In the second case, the destination register must be one of registers 0 through 7, while the source register may be any of the 32 possible registers. In this case, the opcodes for all the store instructions are defined.

In the third case, only registers 0 through 15 may serve as the source register and also as the destination register.


The seventh type of header is used to provide a security feature for the architecture.

In the page table, a region of memory which will be used for executable code may be designated as having branch control. In that case, branches to instructions within that region of memory will only be permissible if the instructions are within an instruction block beginning with a header of the fifth or seventh type, and only under the conditions specified by that header.

The decode field has its usual meaning in this header.

The type field indicates what type of branches are permitted to the potential branch targets indicated by this header:

00 This indicates an entry point, with branches to the branch target being permitted from anywhere.

10 This indicates a local branch target. Only branches in the same page of storage are permitted.
11 This indicates a single-source branch target. Only branches from a specific address are permitted.

In the case of a type field containing 10, any branch instruction with a target in the block with this header must also have a Type V or Type VII header, even if there are no branch targets in that block, and the group field must match, which, of course, is a necessary result of being in the same memory area indicated by the same page table entry.

In the case of a type field containing 11, the instruction indicated as a branch target is to be immediately followed by a 64-bit absolute address indicating the location of the branch instruction that is permitted to branch to the instruction.

By an absolute address, an address relative to the address space of the current program is meant, so it is not required to add the contents of any base register to the address. But it is still not a physical hardware address, since offsets due to the page table mechanism and similar functionality transparent to the programmer are still applied.

The bits of the target field each correspond, in order, to one of the remaining 32-bit instruction slots in the current instruction block, and if a bit is 1, it indicates that the instruction in the corresponding position may be the subject of a branch instruction, subject to the criterion indicated by the type field.

Note that the sole purpose of the Type V header is to supply a decode field to an instruction block; therefore, in addition to the value of 111 being invalid for it, in this case the value 000 is also invalid, which is why it may be used to differentiate the Type VII header from it. Also note that this is not true of the Type VI header, and therefore the relevant bits of the Type VII header are specified as 00 to differentiate it from that type of header as well.


The ninth type of header functions in a similar fashion to the tenth type of header, to be described below, but at the cost of the header being increased in length by sixteen bits, it removes the restrictions on the placement of instructions present with that type of header by providing a type bit for every remaining 16-bit area in the instruction block. The details of the instruction formats provided with this header type will be found in the description of the tenth header type.

Unlike the Type X header, this type of header includes a decode field. Since no combination of type bit and prefix bit indicates that a 16-bit area does not contain the start of an instruction, a decode field is required to allow space to be reserved for pseudo-immediate operands. Note that the decode field is four bits in length, as there was room for a four-bit field, and this header type supports variable-length instructions the lengths of which are in units of 16 bits.

This type of header (but not the eleventh type of header or the third type of header) contains a bit marked C. This indicates that, for all operate instructions in the block of a type such that the opcode field may indicate floating-point instructions with the default Standard format for floating-point numbers, but which cannot, perhaps because of not being long enough, indicate floating-point instructions in the Compatible floating-point format, those instructions are interpreted as being for the Compatible floating-point format instead.

This unusual option is provided for this particular block format as it is focused on providing instructions which perform the same operations of those of a particular popular mainframe architecture.

There is also a bit marked T, which causes floating-point operations in the Compatible floating-point type performed by instructions within the block with this header, instead of having normal rounding behavior (which is to be rounded to the value closest to the exact result, as specified in IEEE 754, for addition, subtraction, and multiplication, and to a result within 1/64 of the units in the last place of the exact result for division, just as in the case of the Standard floating-point type), to be truncated, for further compatibility. This bit does not affect any instructions in which the rounding type is explicitly indicated in the instruction.

In addition, this header type contains a bit marked E. When this bit is set, for the instructions starting in the block, the first sixteen floating-point registers, as seen by the program, are changed from 128-bit registers to 64-bit registers, and they are placed in pairs in the first eight actual 128-bit floating-point registers of the machine. Registers 16 through 31 are not changed.

This affects instructions operating on both the Standard and the Compatible floating-point formats.

The purpose of this is to enable code with this header type to interface with code running in emulation mode for one particular computer architecture.

The locations of the 64-bit registers within the 128-bit registers are shown in the table below:

128-bit   64-bit
register  registers

0         0,2
1         4,6
2         8,10
3         12,14
4         1,3
5         5,7
6         9,11
7         13,15

This arrangement stems from the historical characteristics of the architecture being emulated; originally, it only had four floating-point registers, and they were numbered 0, 2, 4, and 6, and so when a register pair was needed, only even-numbered registers were available out of which to build it.

A pictorial representation of this arrangement is shown below:

The rightmost portion of the image is after Figure 2-2 on page 2-5 of the Ninth Edition of Enterprise Systems Architecture/390 Principles of Operation, publication SA22-7201-08, by IBM.

A block header with this bit set also modifies the behavior of floating-point instructions involving the Standard floating-point type in another important way. Since the first sixteen registers are now only 64 bits long, floating point values in these registers will be in the same form as they are kept in main memory, and will not be converted to internal form on being loaded, and from internal form on being stored.

The conversion will remain in place for registers 16 through 31, since the purpose of this block format is to facilitate communication between programs running in emulation mode and ordinary programs. Thus, instructions operating on 128-bit floats in the Standard floating-point type will continue to use the internal form of floats without a hidden first bit, rather than the IEEE 754 standard format for 128-bit floats.


The tenth type of header provides supplementary information which allows the computer to provide VLIW functionality.

The primary feature of this type of header is to provide for VLIW features which can be used to accelerate the speed of instruction execution, particularly on lightweight implementations of the architecture which lack out-of-order execution.

There are seven bits marked B, for break; they correspond to the seven remaining 32-bit instruction slots in the block, and if a bit marked B is set, this indicates that the instruction in its corresponding instruction slot may not be executed in parallel with the instructions that precede it.


Important note: it is intended that this ISA may be implemented in a number of ways. Specifically, in relation to the VLIW feature of the break bit, these three classes of implementations are possible:

  • Implementations without superpipelining (that is, pipelining of the execution of instructions; a pipeline that breaks instructions into fetch, decode, and execute, performing fetch and decode of subsequent instructions in parallel with the execution of one instruction is still possible) or superscalar capabilities, which simply execute instructions serially one after another, and thus ignore the break bit as they cannot execute instructions in parallel;
  • Implementations where the break bit materially speeds up execution, by allowing more efficient pipelining of instructions;
  • Implementations which have out-of-order execution, guided by a full set of interlocks, which do not require explicit guidance from break bits for the optimum execution of a sequence of instructions.

In consequence, any programs which would produce a different result on the first two types of implementation listed above are to be considered to be invalid programs which have been written incorrectly.

Thus, the architecture specification requires implementations to execute code which does not contain any explicit indications of parallel execution with sequential consistency.

When code does contain such indications, implementations may follow those indications, or they may execute the code sequentially, even if different results are produced in the two cases; it is the programmer's responsibility, if consistent model-independent execution of programs is desired, only to indicate parallelism where it does not lead to results different from those of completely sequential code.


In this header format, there is also a four-bit flag field. This indicates which of the sixteen flag bits may be used for predicating instructions in this block. A seven-bit predicated field indicates which instruction slots contain an instruction the execution of which is conditional, based on that flag bit. There is also a bit marked S, for sense; if that bit is zero, a predicated instruction will execute if and only if the selected flag bit is set (equal to 1); if it is one, the predicated instruction will instead execute if and only if the selected flag bit is cleared (equal to 0).

This header also has a decode field.


The eleventh type of header is an attempt to allow 17-bit short register-to-register operate instructions (but not any other 17-bit short instructions) and/or 34-bit memory-to-register operate instructions to augment the existing set of 32-bit instructions.

Each bit in the prefix field corresponds, in order, to one of the fourteen remaining 16-bit regions in the current instruction block after the 32-bit header.

There is also a type field in this header. Each bit of that field corresponds to two successive 16-bit regions in the current instruction block; the regions are taken in order.

When a bit in the type field is zero, instructions which start in any of the 16-bit regions corresponding to that bit may be 17-bit register-to-register operate instructions in addition to standard 32-bit instructions.

When a bit in the type field is one, instructions which start in any of the 16-bit regions corresponding to that bit may be 34-bit memory-to-register operate instructions in addition to standard 32-bit instructions.

The format of instructions in a block with this type of header is shown in the diagram below:

The top line of the diagram shows how the prefix bit, shown as raised, combines with the contents of a 16-bit region to form a 17-bit register-to-register operate instruction.

This applies to the case when the corresponding type bit is a zero.

The middle portion of the diagram shows the format of 34-bit memory-to-register operate instructions in blocks with this type of header. As with the 17-bit short instructions described above, the prefix bit and the first bit of the corresponding 16-bit region cannot both be ones. So these two bits remain in place as the first two bits of the instruction.

The prefix bit for the second half of the instruction supplies the next bit of the instruction. In this way, the remaining 15 bits of the first 16-bit region used for the instruction, and all the bits of the second 16-bit region used for the instruction become the last 31 bits of the instruction without having to be shifted.

This applies to the case when the corresponding type bit is a one.

Finally, the bottom portion of the diagram shows how standard 32-bit instructions are represented in blocks having this header type.

Because the 34-bit memory-reference instructions have seven bits allocated to their opcodes, they include operate instructions and not just load and store instructions. In order to make room for a C bit, so that as with all the other operate instructions, it is possible to control whether or not they affect the condition codes, these instructions have been limited to acting on aligned operands only.

Note that in the blocks with headers of the sixth and ninth types, the places where an instruction begins are not explicitly indicated, so where instructions begin must be detected essentially by means of serial decoding of each instruction, as is the case with a classic CISC architecture with variable length instructions.

In previous iterations of the Concertina II design, I had always completely avoided this being the case under any circumstances. However, I had recently been reminded that since instruction decoding can be done well in advance of execution, the apparent decoding overhead of classic CISC instruction sets with variable-length instructions really is not a bottleneck. Initially, this simply led me to examine the possibility of proposing a Concertina III architecture in the classic CISC mold, but when I turned back to Concertina II, it was possible to make use of that insight here as well.

Instructions may cross from one block to the next where each of the two blocks involved is either of the sixth or the ninth type. Note that blocks with a header of the ninth type do not support pseudo-immediate values, but they are available with a Type VI header, as it does have room for a decode field.

Instructions longer than 32 bits may be used in the blocks with these types of headers. The first 32 bits of such an instruction are rearranged in the manner shown in the diagram above for standard 32-bit instructions. For the remainder of the instruction, the contents of the following 16-bit areas of the instruction block are used without modification, and the corresponding prefix bit is not used.


Four additional types of header are shown below:

These header types are still subject to change. Their purpose is to allow the use of instructions from additional sets of instructions in combination with the existing basic instruction set, so as to allow for future extensions to the architecture.


The twelfth type of header allows up to 256 sets of additional 32-bit instructions to be added to the architecture. The alternate field contains bits, corresponding to the remaining instruction slots in the current block, which, if 1, indicate that the instruction in that slot belongs to the selected set of additional instructions.


The thirteenth type of header is 64 bits long. It is used to provide additional flexibility in extending the instruction set, should it be required. Here, in addition to the standard set of instructions, two additional alternate instruction sets may be chosen, and instructions from all three of these instruction sets may be mixed together in the same block.

The contents of the twelve three-bit prefix fields in this header are interpreted as follows:

000 A 17-bit instruction starting with 0
001 A 17-bit instruction starting with 1
010 The start of an instruction in the standard instruction set that is 32 bits or longer
011 Not the start of an instruction
100 The start of an instruction in alternate instruction set 1 that is 32 bits or longer
101 The start of an instruction in alternate instruction set 2 that is 32 bits or longer

The fourteenth type of header provides for sets of additional instructions which have had to include instructions longer than 32 bits.

Each two-bit prefix field corresponds to a 16-bit area in the remainder of the block. The contents of these fields are interpreted as follows:

00 A short instruction
01 The start of an alternate instruction in the selected set, 32 bits or longer
10 The start of a conventional instruction, 32 bits or longer
11 Not the start of an instruction

Unlike the Type I header, this block format doesn't provide for 17-bit short instructions; instead, 15-bit instructions prefixed by 0 may be used, with 1 prefixing any short instructions added by the alternate set of instructions. If, however, the new instruction set being added does not include any short instructions, but only requires instructions longer than 32 bits, then 16-bit short instructions may be used instead.


The fifteenth type of header also provides for sets of additional instructions which have had to include instructions longer than 32 bits, but with important differences from the twelfth type of header.

This type of header provides for alternate sets of instructions which, while they require their own instructions longer than 32 bits, do not require any short instructions of their own. Therefore, the 17-bit short instructions are included with the regular set of variable-length instructions.

Up to sixteen alternate sets of instructions may be chosen between.

The three-bit prefix field refers to the 16-bit area immediately following the header, and is interpreted as follows:

000 A 17-bit short instruction starting with 0
001 A 17-bit short instruction starting with 1
010 The start of a conventional instruction, 32 bits or longer
011 Not the start of an instruction
100 The start of an alternate instruction in the selected set, 32 bits or longer

The four seven-bit prefix fields each correspond to three of the remaining 16-bit areas in the current instruction slot. As a seven-bit field may contain any number from 0 to 127, and sequences of three base-5 digits are equivalent to numbers from 0 to 125, there is enough space to encode all the possibilities.

In order to simplify encoding, a scheme simiar to Chen-Ho encoding is used:

AA
BB
CC

00 A 17-bit short instruction starting with 0
01 A 17-bit short instruction starting with 1
10 The start of a conventional instruction, 32 bits or longer
11 Not the start of an instruction
-- The start of an alternate instruction in the selected set, 32 bits or longer

with the three digits being encoded as follows:

0AABBCC
100AABB
101AACC
110BBCC
11100AA
11101BB
11110CC
1111100

with the two bits for the three 16-bit extents AA, BB, and CC being as in the table above, and all extents not mentioned instead being the option marked with -- instead of two bits.

Registers and Data Formats

The basic complement of registers included with this architecture is as follows:


There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.

Registers 1 through 7 may be used as index registers.

Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.

Register 24 serves as a base register which points to an area 32,768 bytes in length.

Registers 17 through 23 may be used as base registers, each of which points to an area of 4,096 bytes in length.

At least part of the area of 3,072 bytes in length pointed to by register 16 will normally be used to contain up to 384 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.

Registers 9 through 15 may be used as base registers, each of which points to an area of 1,048,576 bytes in length. This addressing format is used for 48-bit extended memory-reference instructions.

Register 8 serves as a pointer to a table of pseudo-operations, if this feature is used.


There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.

Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.

As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.

However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits.

This is dealt with as follows: Only 24 DFP numbers that are 128 bits in length may be stored in the 32 floating-point registers. When such a DFP number is stored in an even-numbered register, it is stored in that register, and the first 32 bits of the following register. When it is stored in a register the number of which is of the form 4n + 1 for integer n, the first 84 bits of the internal form of that number are stored in the last 84 bits of that register, and the remainder of the internal form of that number is stored in the last 84 bits of the second register after that register.

In this way, the same principle that storing double-length numbers in two adjacent registers is respected: numbers too long to be stored in a given register are stored in that register, and in another register of the same register file that is nearby. But the method is extended to allow more efficient use of the available space.

The same technique is used for the 128-bit floating-point format which has recently been added to IEEE 754 which does have a hidden first bit; therefore, in order to support this format, the usual 128-bit floating-point format offered by this architecture, while similar to, and based on, the Temporary Real format of the original 8087 coprocessor, has an exponent field that is one bit longer than that of the Temporary Real format.


There are 16 short vector registers, each of which is 256 bits in length.

Each of these registers may contain:

As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.

These numbers all remain in these registers in the same format as that in which they appear in memory.

The entire set of 16 short vector registers can contain a table of bits used for bit-matrix-multiply operations on 64 bit binary words. As well, the short vector registers may also be used as four string registers, each 128 bytes in length.

This is done, rather than using them as two string registers, each containing 256 bytes, because four registers are the minimum number of registers required for thye general register style of operations, at least as claimed in advertising literature for the Data General Nova. Having these strings only half the maximum length of those available to memory-to-memory string operations is presumed to be accessible, since strings "really" only have to be at least 80 characters long, as everyone knows.


In addition to the basic set of registers, two other larger sets of registers are also included in the architecture:

A set of 128 64-bit integer registers, and a set of 128 128-bit floating point registers.


A set of 8 vector registers, each of which contains 64 storage locations for floating-point numbers, each one 80 bits wide. This allows the computer to process vectors of 72-bit floating-point numbers in addition to vectors of 64-bit floating-point numbers, if the optional variable memory width feature is included.


As for how data values are stored in memory:

Signed integer values are stored in binary two's complement format.

Floating-point numbers are stored in IEEE 754 format, but in addition there are instructions for processing data in the format originally used by IBM's System/360 computers, including the Extended Precision format introduced on the Model 85.

The architecture is big-endian: the most significant bits of a value are stored in the byte at the lowest numbered address.

As well, there are 16 flag bits which are used for instruction predication, and of course there is a 64-bit program counter. The program status quadword includes eight alternate sets of condition codes in addition to the normal set of condition codes, and the program counter and flag bits are also part of the program status quadword.

Branch Targets

In general, as with most other computer architectures, instructions are provided to jump to a specified location in memory to continue execution.

In this architecture, instructions are considered to fall on 16-bit boundaries.

The normal rule for branch instructions is that their targets must be executable instructions. There are exceptions to that principle, as well as another special consideration, which apply to this architecture, however.

Athough these points might seem obvious, it should also be obvious that it is advisable to state them explicitly to avoid the possibility of confusion.


[Next] [Up] [Previous] [Next Section] [Home] [Other]