The Concertina II Architecture

Welcome to the home page of the Concertina II computer architecture.

The original Concertina computer architecture was originally intended as a simple example of a conventional old-style CISC architecture, to help explain how computers work. It was expanded over time to include many features from a wide selection of historical computer architectures, to explain those as well.

Concertina II was intended as an ISA that could conceivably be of practical use in an actual implementation. However, I cannot make ambitious claims for it, as my experience in this area is quite limited. This architecture went through quite a number of drafts before I felt that I had struck an acceptable balance between the various factors that had to be compromised to provide the architecture with the capabilities I sought.

Concertina II uses instructions that are normally 32 bits in length. A 32-bit instruction slot may also contain a pair of 16-bit instructions, or it may contain a portion of an instruction longer than 32 bits.

Introduction

What is the Concertina II ISA, and what choices were made in its design?

The basic Concertina II instruction set is still largely patterned after today's most popular type of ISA (instruction set architecture) design, RISC (reduced instruction set computing), but now it allows 16-bit instructions, and instructions longer than 32 bits, to be freely mixed in the instruction stream.

As in many RISC designs, there are two main register files, one for integer values (with registers that are 64 bits wide) and one for floating-point values (with registers that are 128 bits wide), each of which contains 32 registers.

Also, the memory-reference instructions are of the load-store variety, following standard RISC practice.

The following extensions to the RISC model are included:

For integer variable types shorter than the full register length of 64 bits, following the lead of the IBM System/360, there are three kinds of load instruction, all of which load values into the least significant bits of the destination register: Load, which performs sign extension; Unsigned Load, which clears the bits more significant than those of the value loaded; and Insert, which leaves all other bits intact and unaffected.
Full base-index addressing is provided. Memory-reference instructions typically have a three-bit index register field, a two-bit or three-bit base register field, and a displacement which may be 15 or 12 bits in width. The 32 integer registers are divided into four groups of eight registers; the index registers are found in the first group, base registers for 20-bit displacements are found in the second, base registers for 12-bit displacements are found in the third, and base registers for 16-bit displacements are found in the fourth.

Typically, RISC architectures normally only allow one register, the contents of which are added to the displacement to form the effective address, to be indicated in a memory-reference instruction. Since a base register is needed for any memory access when the displacement is not large enough to indicate any location in the available memory, this means that the advantage of having an index register isn't available, and array access require additional explicit arithmetic instructions to compute addresses.

Thus, since the use of arrays is a very common operation, full base-index addressing was considered a very important feature to add.

In order to make it possible to provide this feature, the integer registers were split up into groups of eight so that the index register and base register fields could be only three bits long instead of five bits long, thus allowing both to fit in an instruction.

Normally, if one allocates a block of memory containing 32,768 bytes, using a base register to point to that block, it is not useful to have addressing modes that can only access the first 4,096 bytes of that block. Therefore, separate groups of registers are used as the possible base registers for different sizes of displacement values.

The above summarizes how the basic instruction set of this computer was designed to take the basic RISC design, and offer important extensions to it, while still having instructions that fit in 32 bits.

The option of writing code following the VLIW (Very Long Instruction Word) model is also present.

Note, however, that by VLIW, I mean modern VLIW architectures, such as the Itanium or, even more particularly, the Texas Instruments TMS320C6000 chip, and not the type of classic VLIW architecture the term was originally concieved of as referring to, such as that of the Control Data Cyber 200 computer.

The Architecture

There are 32 integer general registers and 32 floating-point registers, and those instructions that perform arithmetic or logical operations include a bit for enabling changes to the condition codes as a result of those instructions. These are characteristics found in RISC architectures.

Having register banks of 32 registers allows different calculations to be intertwined in the code, and being able to control if instructions affect the condition codes allows more intervening instructions between an instruction that sets the condition codes and a branch instruction that makes use of those results. Both of these things allowed code to be designed to offer some of the same benefits as are obtained from out-of-order execution, without the hardware overhead. However, at the microprocessor clock rates in use today, these measures normally are not enough to be effective: however, if code written this way is combined with simultaneous multi-threading (SMT), then there is still the potential for competing with out-of-order execution.

Block Organization

Instructions are organized into 256-bit blocks which contain eight 32-bit instruction slots.

The first instruction in a block may contain a decode field that indicates that a portion of the block is to be reserved for pseudo-immediate values, and thus its contents are not to be decoded as instructions. This type of instruction, therefore, acts as a block header in addition to an instruction, and is the only type of block header used in this architecture.

Registers and Data Formats

The basic complement of registers included with this architecture is as follows:

There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.

Registers 1 through 7 may be used as index registers.

Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.

Registers 9 through 15 may be used as base registers, each of which points to an area of 4,096 bytes in length.

At least part of the area of 4,096 bytes in length pointed to by register 8 will normally be used to contain up to 512 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.

Registers 17 through 23 may be used as base registers, each of which points to an area of 1,048,576 bytes in length. This addressing format is used for 48-bit extended memory-reference instructions.

There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.

Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.

As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.

However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits.

This is dealt with as follows: Only 24 DFP numbers that are 128 bits in length may be stored in the 32 floating-point registers. When such a DFP number is stored in an even-numbered register, it is stored in that register, and the first 32 bits of the following register. When it is stored in a register the number of which is of the form 4n + 1 for integer n, the first 84 bits of the internal form of that number are stored in the last 84 bits of that register, and the remainder of the internal form of that number is stored in the last 84 bits of the second register after that register.

In this way, the same principle that storing double-length numbers in two adjacent registers is respected: numbers too long to be stored in a given register are stored in that register, and in another register of the same register file that is nearby. But the method is extended to allow more efficient use of the available space.

The same technique is used for the 128-bit floating-point format which has recently been added to IEEE 754 which does have a hidden first bit; therefore, in order to support this format, the usual 128-bit floating-point format offered by this architecture, while similar to, and based on, the Temporary Real format of the original 8087 coprocessor, has an exponent field that is one bit longer than that of the Temporary Real format.

There are 16 short vector registers, each of which is 256 bits in length.

Each of these registers may contain:

Two 128-bit floating-point numbers.
Four 64-bit floating-point numbers.
Four 64-bit integers.
Eight 32-bit floating-point numbers.
Eight 32-bit integers.
Sixteen 16-bit integers.
Thirty-two 8-bit integers.

As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.

These numbers all remain in these registers in the same format as that in which they appear in memory.

Also, the entire set of 16 short vector registers can contain a table of bits used for bit-matrix-multiply operations on 64 bit binary words.

In addition to the basic set of registers, two other larger sets of registers are also included in the architecture:

A set of 128 64-bit integer registers, and a set of 128 128-bit floating point registers.

A set of 8 vector registers, each of which contains 64 storage locations for floating-point numbers, each one 80 bits wide. This allows the computer to process vectors of 72-bit floating-point numbers in addition to vectors of 64-bit floating-point numbers, if the optional variable memory width feature is included.

As for how data values are stored in memory:

Signed integer values are stored in binary two's complement format.

Floating-point numbers are stored in IEEE 754 format, but in addition there are instructions for processing data in the format originally used by IBM's System/360 computers, including the Extended Precision format introduced on the Model 85.

The architecture is big-endian: the most significant bits of a value are stored in the byte at the lowest numbered address.

As well, there are 16 flag bits which are used for instruction predication, and of course there is a 64-bit program counter. The program status quadword includes eight sets of condition codes, and the program counter and flag bits are also part of the program status quadword.

32-bit Instruction Format
- 16-bit Instructions
- Composed Instructions