[Next] [Up] [Previous] [Next Section] [Home] [Other]

The Concertina II Architecture

Welcome to the home page of the Concertina II computer architecture.

The original Concertina computer architecture was originally intended as a simple example of a conventional old-style CISC architecture, to help explain how computers work. It was expanded over time to include many features from a wide selection of historical computer architectures, to explain those as well.

Concertina II was intended as an ISA that could conceivably be of practical use in an actual implementation. However, I cannot make ambitious claims for it, as my experience in this area is quite limited. This architecture went through quite a number of drafts before I felt that I had struck an acceptable balance between the various factors that had to be compromised to provide the architecture with the capabilities I sought.

Introduction

What is the Concertina II ISA, and what choices were made in its design?

The basic Concertina II instruction set is patterned after today's most popular type of ISA (instruction set architecture) design, RISC (reduced instruction set computing).

The basic instruction set consists of 32-bit instructions, but also adds the ability to use a pair of 16-bit instructions at any point in the sequence of instructions in place of a 32-bit instruction.

This allows increasing code density by using smaller instructions for many operations, without losing the simplicity of fetching and decoding instructiions gained by having all instructions of the same length. It was considered important enough that the length of the base register field was shortened from three bits to two bits for 16-bit displacements in the standard memory-reference instructions, as the least painful compromise that would provide the opcode space reguired.

As in many RISC designs, there are two main register files, one for integer values (with registers that are 64 bits wide) and one for floating-point values (with registers that are 128 bits wide), each of which contains 32 registers.

Also, the memory-reference instructions are of the load-store variety, following standard RISC practice.

The following extensions to the RISC model are included:

Typically, RISC architectures normally only allow one register, the contents of which are added to the displacement to form the effective address, to be indicated in a memory-reference instruction. Since a base register is needed for any memory access when the displacement is not large enough to indicate any location in the available memory, this means that the advantage of having an index register isn't available, and array access require additional explicit arithmetic instructions to compute addresses.

Thus, since the use of arrays is a very common operation, full base-index addressing was considered a very important feature to add.

In order to make it possible to provide this feature, the integer registers were split up into groups of eight so that the index register and base register fields could be only three bits long instead of five bits long, thus allowing both to fit in an instruction.

Normally, if one allocates a block of memory containing 65,536 bytes, using a base register to point to that block, it is not useful to have addressing modes that can only access the first 4,096 bytes of that block. Therefore, separate groups of registers are used as the possible base registers for different sizes of displacement values.

Only one register serves as the implicit base register for 15-bit displacements; this is done to allow one larger block of memory to be used in conjunction with those accessed with 12-bit displacements. This permits more compact memory-reference instructions, and is inspired by the System/360 Model 20 computer.


The above summarizes how the basic instruction set of this computer was designed to take the basic RISC design, and offer important extensions to it, while still having instructions that fit in 32 bits.

But a number of other extensions are also offered. These require going beyond the RISC-like model of the basic instruction set, and instead recognizing that this architecture also has VLIW (Very Long Instruction Word) characteristics.

Instructions are to be fetched in blocks of 256 bits, each of which contains eight 32-bit instruction slots.

A small portion of the opcode space for instructions is dedicated to codes which represent headers instead of instructions. The first instruction slot in a block may contain a header, and if it does, the following slot may also contain a header, and so on.

Headers, if any, are processed before the instructions in a block are decoded.

After the headers are processed, or after it is determined that the block does not begin with a header, the computer has the information required to decode all the instructions in the block in parallel.

One of the most important features that having headers provides, which is still considered part of the basic instruction set of the Concertina II architecture, is pseudo-immediate values.

Some register-to-register instructions may have a source register specification replaced by a four-bit halfword (that is, denominated in units of 16 bits) pointer to an address within the current instruction block, which points to an operand for that instruction.

This capability is supported by headers which contain a three bit decode field, which indicates that some of the eight 32-bit instruction slots in the current block are to be ignored during instruction decoding, and skipped over in execution, so that pseudo-immediate values can be placed in them.


What are pseudo-immediate values, and why are they included in this ISA? Essentially, they are inspired by the Heads and Tails design of Heidi Pan. Immediate mode instructions have the advantage that a constant value can be used in a calculation without requiring an additional fetch of data, with all the delays and overhead of memory accesses in modern architectures, where DRAM is slow compared to processor logic.

This is because the immediate value is part of the instruction itself, and thus has already been fetched as part of the instruction stream.

But since data items come in several widths, comprehensive support of immediate values means that instructions must come in many different lengths, complicating their decoding.

With pseudo-immediate values, the length of the instruction doesn't have to be changed. A pointer to the value only takes up the same space as a register specification.

But if the value is fetched from a location indicated by a pointer, it isn't an immediate value any more. Hence the term "pseudo-immediate" - given that instructions are fetched from memory in 256-bit blocks, and the data to which the pointer refers is within the same block as the instruction itself, even though the values are not actually immediate values, they still offer the same basic advantage as immediate values.


In addition to pseudo-immediate values, headers allow two basic sets of features to be added to the ISA that go beyond the RISC model.

Thus, while the architecture initially has the appearance of a conventional RISC architecture, it is intended to combine the basic features and advantages of RISC, CISC, and VLIW architectures.

Note, however, that by VLIW, I mean modern VLIW architectures, such as the Itanium or, even more particularly, the Texas Instruments TMS320C6000 chip, and not the type of classic VLIW architecture the term was originally concieved of as referring to, such as that of the Control Data Cyber 200 computer.

The Architecture

There are 32 integer general registers and 32 floating-point registers, and those instructions that perform arithmetic or logical operations include a bit for enabling changes to the condition codes as a result of those instructions. These are characteristics found in RISC architectures.

Having register banks of 32 registers allows different calculations to be intertwined in the code, and being able to control if instructions affect the condition codes allows more intervening instructions between an instruction that sets the condition codes and a branch instruction that makes use of those results. Both of these things allowed code to be designed to offer some of the same benefits as are obtained from out-of-order execution, without the hardware overhead. However, at the microprocessor clock rates in use today, these measures normally are not enough to be effective: however, if code written this way is combined with simultaneous multi-threading (SMT), then there is still the potential for competing with out-of-order execution.

Block Organization

Instructions are organized into 256-bit blocks which contain eight 32-bit instruction slots.

When a block header makes provision for instructions longer than 32 bits, it is possible that these instructions may cross block boundaries, depending on the rules applicable to the particular block header format in use.

The instruction set is organized so that the computer is able to fetch a 256-bit block of instructions, and, after processing any block header within the block, to determine what, if any, special processing is required, immediately begin decoding each 32-bit instruction slot independently of the others in the block.

If a block begins with an instruction slot which begins with the bits 011, and in which the last 16 bits begin with the bit 1, then the first 32 bits of a block will be a 32-bit block header, in the form shown in the diagram below:

A 32-bit header, after the three bits 011 that begin it, is usually divided into four sections. The first section is four bits long, the second is nine bits long, and the final section is sixteen bits long.

The four-bit first section in the first form of the header consists of a 0 followed by a 3-bit decode field.

The decode field may contain 7, indicating that all the remaining 32-bit instruction slots in the block are to be decoded, or it may contain a lesser number, in which case its difference from 7 indicates how many instruction slots at the end of the block are to be ignored in decoding, so that they can be used to contain pseudo-immediate values.


The second section of the header consists of a nine-bit short header clause which may have four possible forms.

The third section of the full 32-bit header contains the body of the header.

This section contains a fifteen-bit long header field.

The first thing that the long header field can contain is a fourteen-bit target field. If this field is present, no part of the current 256-bit block may be the target of an instruction except for those instructions which correspond to bits set to 1 in the target field.


The second thing the long header field may contain is a sequence of thirteen bits, all of which are indicated by a B. These bits each correspond to one 16-bit area within the remainder of the block in order, not including the first one, as a break between blocks is implicit, and they mark the beginning of a group of instructions which may all be executed in parallel, thus, a B bit equal to 1 marks a break, across which the instructions cannot execute in parallel.

When a field consisting of B bits is present in the block header in the first 32-bit instruction slot of a block, it indicates that the instructions in that block may, normally, begin execution independently of the instructions in the block which precede them. The fourteen last bits of the first instruction slot, when equal to 1, indicate when this is not the case; each 1 bit shows the start of the first instruction of a group of one or more instructions, each of which may execute independently of the others in that group, but which must wait for the completion of the instructions which precede the group. Thus, the 1 bits split the instructions in the block into multiple groups of independently executing instructions, where the groups must still execute in sequence.

In order that groups of instructions indicated as executing in parallel will work properly on all implementations, certain restrictions are to be observed for the instructions within a single such group.

The most basic such restriction is that more than one instruction can only access a given register if all the accesses are read accesses only. One or more reads, and only one write is not permitted.

This is necessary because not all implementations will actually have a microarchitecture designed around the VLIW philosophy, and so instructions specified as executing in parallel may still execute serially.

Depending on the technologies used in producing implementations, more severe limitations may be required for conformant and portable code. One possibility is that even multiple read accesses to a single register are not allowed, because while register files need multiple ports, individual registers in them might run into fan-out considerations.

An even more extreme scenario is that the register files are divided into groups of registers, and two different simultaneous instructions can't access the same group of registers. 32-register files would be divided into groups of four registers, and 128-register files would be divided into groups of eight registers. To be clear, this would not prevent a single instruction from referencing multiple registers in the same group of registers; indeed, that is the normal and expected case, but an instruction would have to have entirely to itself all the groups of registers that it accesses in any way.

It may turn out that there is no reason for an implementation to require this restriction, and if experience shows that to be the case, it would be dropped as the criterion for portable code. However, while this restriction may seem severe, it does not interfere with what is the intended normal use of the ability to execute multiple instructions in parallel.


The third option is that the long header clause may also instead contain a predication clause.

A predication clause begins with a bit marked S. If that bit is 0, the instruction (or instructions) in the 32-bit instruction slot corresponding to a bit in the predicated field that is a 1 will execute only if the flag bit indicated in the flag field is set (that is, equal to 1); if the S bit is 1 instead, indicated instruction slots will execute when the flag bit is cleared.


Finally, if the long header field begins with 1111, then, following those bits, a bit indicated as T, and a ten-bit short header clause, which may be in any of the possible forms, is placed following those bits.

If the bit T is zero, the instructions will be in the usual format, except as modified by the two short header clauses. If it is a one, the instructions will instead be in the full format, which allows all seven of the possible base registers to be used with 16-bit displacements. In that case, 16-bit instructions can still be present in the block, although there is no space reserved for them in the opcode space of the version of the instruction set in use, but they must be indicated explicitly in the header, for example, in a dual-16 short header clause.

Also, if the T bit is one, the opcode space normally reserved for headers is no longer available, so this header clause cannot be followed by another header.


The first type of short header clause extends the instruction set by supplying an additional bit for each 32-bit instruction slot. It may potentially be used in conjunction with the alternate and alternate II bits also being set for the same instruction slot, but not dual 16 or long, as the bit will be ignored if the instruction slot it affects contains anything but a 32-bit instruction.


The second type of short header clause provides six break bits, associated in order with each 32-bit instruction slot in the remainder of the block, except the first of them (the second instruction slot). While this type of short instruction clause is redundant as the second header clause, as a long header clause with a break bit for every 16 bits of the block is available there, it is very useful as the first header clause, as this allows it to be combined with other long and short header clause types.


The third type of short header clause indicates instruction slots which can only contain a pair of 16-bit instructions, never a 32-bit instruction, nor anything else. In these slots, the first bit of each 16-bit half is used to indicate, if it is a 1, that the instruction is permitted to modify the condition codes.

A 32-bit instruction slot within a block will normally contain either a 32-bit instruction which starts with 1, a pair of 16-bit instructions, both of which start with 0, or a 32-bit instruction which starts with 0 and the second half of which starts with 1 to distinguish it from a pair of 16-bit instructions; this latter form of 32-bit instructions is primarily used for the operate instructions, but it is also used for an abbreviated form of a complete instruction set, including memory-reference instructions, where three bits are used for a decode field, to allow one to be specified with minimal overhead.


The fourth type of short header clause indicates those instruction slots which are used for instructions longer than 32 bits.

Bits inside each of the 32-bit portions of such an instruction indicate whether it is the first, last, or middle portion of the instruction, and, also, for the first instruction slot occupied by a long instruction, if a 16-bit instruction precedes an instruction the length of which is an odd multiple of 16 bits, and so this information does not need to be indicated in the header.


The fifth type of short header clause indicates instruction slots which are to be decoded as 32-bit instructions in an alternate format. This allows more general memory load and store instructions, and more general load and store multiple instructions, to be present with limited additional overhead.

A bit that is 1 indicates that the alternate form of decoding is to be applied.


The sixth type of short header clause is used to indicate instruction slots which contain compound operations. These operations take operands from certain registers that are indicated, and put results in registers that are also indicated, while using operand forwarding instead of registers for immediate results. The notation used is the same as would be used for operations performed on a stack.

The final type of short header clause contains a seven-bit decode field.

This field allows individual 32-bit instruction slots in the block to be indicated as not to be decoded as instructions. However, this type of decode field works differently from the three-bit decode field also present in the header.

Although it causes the contents of a 32-bit instruction slot not to be initially decoded when the block is read from memory, it does not by itself prevent an attempt to execute instructions in those slots. Such an attempt will result in a program halting with an error condition. Therefore, those slots have to be skipped explicitly in the code, for example by a branch instruction, or by incrementing the return address of a jump to subroutine instruction before returning from the subroutine.

The bits in this type of decode field that correspond to instruction slots at the end of the block that are indicated to be skipped by the three-bit decode field are to also be zero. That will not prevent those instruction slots from being skipped over.


If the header begins with 0111 then the entire header is used to indicate a block containing a special type of code.

Because headers of this form completely transform the nature of the remainder of the block, they cannot normally be followed by another header, unlike the regular types of header described above. However, the bit marked C in this header type, if equal to zero, indicates that the specified change to the block is delayed by at least one instruction slot, so that an additional header may follow. Note that if a conventional header, instead of a header in this format, is what follows, the chain of headers may not be extended further.

In this way, for example, the header for 8-bit instructions can be combined with the header for High Density Code, as may be desired.


The first such case, indicated by the value 0000 following the C bit, is a block intended to permit and facilitate free-format code, where instructions of all lengths can be freely mixed.

In this block format, the decode field is replaced by a four bit decode field later in the header, with the same function, but specifying how much of the block is to be decoded in units of 16 bits rather than in 32-bit instruction slots. A value of 15, or all ones, indicates that all of the remaining fourteen 16-bit extents in the block will be considered to belong to instructions.

A fifteen-bit field follows; the first fourteen bits of it are labelled as the instruction start field, and the last fourteen bits of it are labelled as the instruction end field. There are fourteen 16-bit extents in what remains of the block. Each of these extents can be thought of as corresponding to either the bits of the instruction start field or the instruction end field; each extent corresponds both to one bit as part of the instruction start field and to the following bit as part of the instruction end field.

How this works is: if a particular 16-bit extent, which is intended to contain part of an instruction (as determined by the contents of the decode field), is the first extent belonging to an instruction, the corresponding bit in the instruction start field is set to 1.

If an extent contains the first extent of an instruction, then the preceding extent must be the last extent of the previous instruction, in this part of the block that may only contain instructions.

By making this field of bits fifteen bits long, and sharing it between two overlapping fourteen-bit fields in this manner, it's possible to see where every instruction both begins and ends; most importantly, it's possible to determine if the last instruction in the block is complete within the block, and can be decoded and executed, or if it is incomplete, and cannot be decoded and executed until the next block, with its remaining extents, is fetched.

Bits in the instruction end field corresponding to extents past the last one containing executable code, as indicated by the decode field, must contain zeroes. While such extents can neither be the start or the end of an instruction, indicating that the last available extent is the end of an instruction is also giving information about the first available extent in the next block being the start of an instruction - so there is no contradiction involved in going one extent beyond in the instruction start field.

Because instructions may cross block boundaries, it is also the case that when the last instruction of a block in this format is not complete within the block, the following block of code must also have a header in this format, since this is the only way for the same instruction repertoire, including instructions of all the possible lengths, to continue to be available in the same form.

The length of an instruction is inferred from the number of bits in the instruction start field between two consecutive 1 bits, plus one for the first 1 bit.

Also, because instructions may cross block boundaries, if the first bit of the instruction start field is not a 1, and this block is reached by a branch, rather than by falling through from the previous block, the 16-bit extents corresponding to all the 0 bits in the instruction start field preceding the first 1 bit in that field will definitely not be decoded, as it is not possible to determine what kind of instruction they belong to. (In some implementations, decoding of all instructions prior to the branch location can be skipped as well.)

16-bit instructions have the same form as in the dual-16 mode. The first bit is available as a C bit, which, if a 1, indicates the instruction may modify the condition codes.

32-bit instructions are in the full format, which allows all seven of the appropriate registers to be used as base registers with a 16-bit displacement, rather than in the regular format in blocks of this type.

64-bit and 96-bit instructions have the same form as in the long mode.

48-bit and 80-bit instructions have the same form as in the long mode, except that their first 16 bits, including the prefixing 0 bit and the 16-bit instruction sharing the instruction slots they use, are omitted.

Note that while this type of header allows code to be written that resembles that on conventional computer architectures, where instructions are fetched and decoded individually and in sequence, in one respect the code that follows such a header differs from the code on such architectures. Except for the 16-bit instruction slots at the end of a block indicated as not decoded by the four-bit decode field, following the header there must be a continuous stream of instructions, since everything following the header, and not excluded by the decode field, will be decoded as instructions.

In order to avoid this, and allow data which is not instruction code to be inserted within instruction code, as is sometimes done on such architectures, it is necessary to have a second instruction slot containing a header, where that header has the type of long header clause which includes a fourteen-bit decode field as described above.


The second type of special code provided for is indicated when 00010 follows the C bit; in this case. Here, what is indicated is High Density Code.

The first field in the header is the HDC field. If its first bit is a 1, the next three instruction slots, after the instruction slot containing the header, each contain two 16-bit instructions of high-density code. If its second bit is a 1, the final four instruction slots of the current block each contain two 16-bit instructions of high-density code.

This allows high-density code 16-bit instructions to be mixed with conventinoal 32-bit instructions (pairs of 16-bit instructions are also possible, where both start with zero, of course) which is useful, because some vital instructions, such as the subroutine call instruction, are absent from the set of 16-bit high-density code instructions, even though memory reference instructions are now included among high-density code instructions that are only 16 bits long, in order to achieve the desired high code density.

Next, there is a three bit field marked ssB. This indicates which base register, from register 9 to register 15, is used as the base register, in combination with the 12-bit addresses in HDC memory-reference instructions.

The next bit indicates the format of HDC memory-reference instructions. The two possible formats are illustrated in the diagram below:

If that bit is 0, the format in the first line is used; if that bit is 1, the format in the second line is used.

When the format in the first line is used, then if the bit marked X is 1, the instruction is indexed by arithmetic/index register 1.

All memory-reference instructions in high-density code have register 0, either arithmetic/index register 0 or floating-point register 0, as appropriate, as their destination register.

Finally, there is a 7-bit memref field.

Each bit in this field corresponds, in order, to one of the seven remaining instruction slots in the block.

If the bit is zero, then the instruction slot is of the dual-16 format; if the first bit of a 16-bit instruction is a 1, that means the instruction may set the condition codes.

If the bit is one, then when the first bit of a 16-bit instruction is a 1, the instruction is a memory-reference instruction, in whichever of the two formats shown above is selected for the whole block.


The third header of this form provides for adding two bits to each instruction slot remaining in the block, so as to allow the use of what are effectively 34-bit instructions. A two bit type field allows four different sets of 34-bit instructions to be defined.


The fourth header of this form, indicated by 00011, indicates where what would be 16-bit instructions are replaced by pairs of 8-bit instructions. The type field indicates, as above, the type of operands on which the 8-bit instructions operate; the bits of the 8-bit field correspond to 16-bit extents in the remainder of the block.

When a bit in the 8-bit field causes half of a 32-bit instruction slot to contain two 8-bit instructions, and the interpretation of that instruction slot is not modified by any other header field, the other 16 bits of that instruction slot will be in the dual-16 format.

These 8-bit instructions can credit their inspiration to the SDS 92 computer.


The fifth header of this form is indicated by 010 following the C bit, and the sixth header of this form is indicated by 011 following the C bit.

These two headers work together; a header in the fifth kind is to immediately follow a header in the fourth kind to form what is essentially a 64-bit header.

The header of the fifth type contains four sets of an S bit and its associated flag field; the header of the sixth type contains two sets of an S bit and its associated flag field, and six pairs of a B bit and a P bit (excepting that the first B bit is shown as a zero, as it must always be zero). The first B bit is located within what would have been the decode field for this header, as, since it must always follow a header of the fourth type, it can never be the first header of a block, which is the only header in which the decode value can be specified, so that space is available for other purposes. Thus, together, they contain both six pairs of a B bit and a P bit, and six sets of an S bit and its associated flag field. This permits a block to include six 32-bit instructions, each of which may be predicated, with the flag bit to use to control predication, and whether the instruction is to be executed when the flag bit is set (if the S bit is zero) or if it is cleared (if the S bit is one) specified independently for each of these instructions. And, as well, a break bit can be applied to each of those instructions, except the first (as there is an automatic break at the beginning of each block).

Since a header of the fifth type always follows a header of the fourth type, it is never the first header, and therefore never appears in the first instruction slot, and thus its decode field is never used, and thus those bits are available to provide one of the pairs of a B bit and a P bit that is required.


The seventh header of this form is indicated by 1 immediately following the C bit.

What it contains is a P bit, a two-bit prefix field for each of the seven remaining instruction slots in the block, and a bit marked B for each of those slots, even the first of them.

The P bit indicates that the B bit, which is not normally relevant for the first instruction slot in a block, since there is always a break between blocks, instead is used to indicate a pseudo-instruction.

The B bit functions as a break bit; the prefix field acts as a prefix to the instruction, effectively lengthening instructions by two bits, or three bits, if one includes the break bit as part of the instruction. The lengthened instruction format will be modified so that load and store memory-reference instructions which have one of the 128 registers of the extended memory banks, instead of the 32 registers of the usual register banks, as their destination registers are also available.


What if it is desired to specify that some of the instructions in a block are branch targets, and that some of them are to be executed in parallel, and to predicate some of them?

In addition to using the alternate II instruction format to indicate predication and parallelism, instead of the bits in the header which serve that function, another option is available:

It is possible to begin a block with more than one 32-bit header.

The following rules apply:

Only the format and decode fields in the first 32-bit header is valid, and those in subsequent headers must contain all zeroes.

Unused header clauses are to be those which contain alternate fields, which are to consist of all zeroes.


Instructions Longer than 32 Bits

Instructions longer than 32 bits also exist. These instructions can only occupy those instruction slots which are explicitly indicated in the header as containing an instruction longer than 32 bits. Such instructions consume two or more instruction slots, and have this form:

The instruction slot the contents of which start with 11 indicate the end of the instruction.

When an instruction slot starts with 0, the next 15 bits of the instruction contain a 16-bit instruction with the implied first bit of 0 omitted; this instruction executes first, immediately before the long instruction being composed. Thus, instructions that are 48 or 80 bits in length start with 0, and instructions that are 64 or 96 bits in length start with 100.

Registers and Data Formats

The complement of registers included with this architecture is as follows:

There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.

Registers 1 through 7 may be used as index registers.

Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.

Register 24 serves as a base register which points to an area 32,768 bytes in length.

Registers 9 through 15 may be used as base registers, each of which points to an area of 4,096 bytes in length.

At least part of the area of 4,096 bytes in length pointed to by register 8 will normally be used to contain up to 512 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.

Registers 17 through 23 may be used as base registers, each of which points to an area of 1,048,576 bytes in length. This addressing format is used for 48-bit extended memory-reference instructions.

Register 16 serves as a pointer to a table of pseudo-operations, if this feature is used.

There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.

Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.

As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.

However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits.

This is dealt with as follows: Only 24 DFP numbers that are 128 bits in length may be stored in the 32 floating-point registers. When such a DFP number is stored in an even-numbered register, it is stored in that register, and the first 32 bits of the following register. When it is stored in a register the number of which is of the form 4n + 1 for integer n, the first 80 bits of the internal form of that number are stored in the last 80 bits of that register, and the remainder of the internal form of that number is stored in the last 80 bits of the second register after that register.

In this way, the same principle that storing double-length numbers in two adjacent registers is respected: numbers too long to be stored in a given register are stored in that register, and in another register of the same register file that is nearby. But the method is extended to allow more efficient use of the available space.

There are 16 short vector registers, each of which is 256 bits in length.

Each of these registers may contain:

As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.

These numbers all remain in these registers in the same format as that in which they appear in memory.


As for how data values are stored:

Signed integer values are stored in binary two's complement format.

Floating-point numbers are stored in IEEE 754 format.

The architecture is big-endian: the most significant bits of a value are stored in the byte at the lowest numbered address.

As well, there are 16 flag bits which are used for instruction predication, and of course there is a 64-bit program counter. The program status quadword includes eight sets of condition codes, and the program counter and flag bits are also part of the program status quadword.



[Next] [Up] [Previous] [Next Section] [Home] [Other]