[Next][Up/Previous][Next Section]

And Then There's 12 Bits

On the previous page, we looked at the possibility of fetching a fixed, power-of-2, number of 36-bit or 45-bit words at one time, and then subdividing that unit into aliquot parts so as to permit the use of units of different sizes.

The opposite approach, starting with a small unit, and building objects of the desired size from it, is also possible, perhaps for less ambitious projects. Or for very ambitions ones. If 12 bits is used as the fundamental unit, from which floating-point numbers occupying 36, 48, or 60 bits of storage are obtained, one could have an unambitious computer with a 12-bit bus to memory, or a very ambitious one with a 720-bit bus to memory.

If one thinks in terms of using standard parts, one possibility would be to use memory modules that are 128 bits wide, with each fetch obtaining 120 bits of data, and 8 bits used for (maximally efficient!) SECDED (single error correcting, double error detecting) coding. Six fetches would obtain a single 720-bit block, and larger designs with wider memory buses could be two or three times as wide as this as well as the full six times as wide.

Here, then, are the instruction formats for a small-scale computer which still handles the usual big-computer data types.

It is envisaged that we are dealing with a computer which has four index registers, each one 24 bits in length, although only the last 15 bits are used when indexing is performed, four general registers, each one 36 bits in length, and four floating-point registers, each one 60 bits in length.

Regular memory reference instructions could use any of the four general (or floating-point, as applicable) registers as their destination, while indexed memory-reference instructions would always have register 0 as their destination. Note that index register 0 would also operate as a normal index register, unlike the convention in some architectures.

Memory addresses indicate a particular 12-bit storage unit in memory.

The opcodes would be the following:

000001 C   Compare
000010 L   Load
000011 ST  Store
000100 A   Add
000101 S   Subtract
000110 M   Multiply
000111 D   Divide

001000 N   AND
001001 O   OR
001010 LX  Load Index
001011 STX Store Index
001100 X   XOR
001101 (conditional jump instruction)
001110 MX  Multiply Extensibly
001111 DX  Divide Extensibly

010000 XS  XOR Small
010001 CS  Compare Small
010010 LS  Load Small
010011 STS Store Small
010100 AS  Add Small
010101 SS  Subtract Small
010110 NS  AND Small
010111 OS  OR Small

011000 (subroutine jump instructions)
011001 CF  Compare Floating
011010 LF  Load Floating
011011 STF Store Floating
011100 AF  Add Floating
011101 SF  Subtract Floating
011110 MF  Multiply Floating
011111 DF  Divide Floating

100000 IS  Insert Small
100001 CI  Compare Intermediate
100010 LI  Load Intermediate
100011 STI Store Intermediate
100100 AI  Add Intermediate
100101 SI  Subtract Intermediate
100110 MI  Multiply Intermediate
100111 DI  Divide Intermediate

101000 ULS Unsigned Load Small
101001 CD  Compare Double
101010 LD  Load Double
101011 STD Store Double
101100 AD  Add Double
101101 SD  Subtract Double
101110 MD  Multiply Double
101111 DD  Divide Double

Since a mode bit changes the standard integer type from 36 bits to 24 bits, there are no insert or unsigned load instructions for the 24 bit type; in the 24-bit integer mode, all instructions leave the 12 most significant bits of the four general registers unchanged, and the integer ALU behaves as a 24-bit ALU.

As there are only 48 operations, 11 in the first two bit positions moves the opcode over two bits, and indicates a register to register operation.

The conditional jump instructions would use the next three bits to indicate the circumstance under which a jump is performed:

001101001 JG  Jump if greater
001101010 JE  Jump if equal
001101011 JGE Jump if greater or equal
001101100 JL  Jump if less
001101101 JNE Jump if not equal
001101110 JLE Jump if less or equal
001101111 JMP Jump

The jump to subroutine instruction stores the return address in the index register selected as the destination register, and cannot be indexed. When the index bit is set with the same opcode, an indexed unconditional jump is the instruction, and such a jump with an address displacement of zero can be used to return from a subroutine.

0110000 JSR Jump to Subroutine
0110001 XJ  Indexed Jump

An indexed jump could also be used to jump into a series of jump instructions, although the index would have to be an even number, so a pre-indexed indirect jump is not needed for choosing one of multiple alternatives.

Also, instructions starting with 1111 would be used for things like shift instructions.

Floating, Intermediate, and Double would be floating-point types that are 36, 48, and 60 bits long respectively.

Short would refer to 12-bit integer operands, and the operations without a type in their mnemonic would operate on either 24-bit integers or 36-bit integers, depending on the setting of a mode bit; some programs need 36-bit integers, and some only need 24-bit integers.

A computer such as this would easily satisfy many computing needs; a 32K contiguous address space would be well suited to compiling and running FORTRAN programs of reasonable size.

Fitting memory-reference instructions into 24 bits and register to register instructions into 12 bits makes highly efficient use of memory for programs as well.

But while 32K 12-bit words is space enough for many programs, it is still constricting for some. One can assume that there might be a 24-bit or 36-bit register containing the address of the 32K area in which a given program runs.

Using the index registers as base registers for distant accesses is one possibility; but better compatibility would be obtained by reserving some of the opcodes beginning with 1111 for prefix symbols, particularly as one important use for a larger address space would be the ability to operate on arrays larger than 32K in size, so both an index and a base are needed.

To increase the address space from being enough to reach 32K 12-bit storage cells, or 48K bytes, to being enough to reach 96 Gigabytes, by using 36-bit addresses, and in addition to show the formats of operate instructions that would be needed even without address extension, the instruction formats can be augmented as shown below:



prefix an instruction to indicate that the contents of one of eight 36-bit base registers is added to its address. The bits marked xx indicate one of three 36-bit long index registers, or, if 00, indicate no indexing is present, and the bits marked dd indicate the destination register in the normal way.

The LDX and STX instructions still refer to the normal index registers, but in this format, opcodes with the first two bits equal to one are allowed, adding the following operations:

110000 LB   Load Base
110001 STB  Store Base
110010 LLX  Load Long Index
110011 STLX Store Long Index

Further, the shift instructions can be defined:

1111000000rr LSR Logical Shift Right
1111000001rr ASR Arithmetic Shift Right
1111000010rr LSL Logical Shift Left

1111000100rr RR  Rotate Right
1111000101rr RL  Rotate Left

where rr specifies the register whose contents are shifted, and the shift count is in the next 12 bits of the instruction.

and a number of operate instructions can be defined:

1111010000rr CLR   Clear
1111010001rr INC   Increment
1111010010rr INV   Invert (one's complement)
1111010011rr NEG   Negate (two's complement)

1111011111ci MODE  Mode set

The mode set instruction controls whether 6-bit or 12 bit characters will be used, based on whether the bit c is 0 or 1 respectively, and whether 24-bit or 36-bit integers will be used, based on whether the bit i is 0 or 1 respectively. The instructions affected by the character size will be described below.

The contents of base register zero will be added to addresses in the 24-bit memory reference instructions, so that these wouldn't be limited to the beginning of the virtual address space; programs which are not cognizant of the 36-bit memory-reference instruction format would also not be cognizant of the eight base registers, and compatibility is maintained.

Simply because register to register instructions are 12 bits instead of 16 bits, and memory reference instructions are 24 bits instead of 32 bits, while there are some limitations, such as having four general registers instead of eight or sixteen, it should not be thought that this architecture is not powerful or extensible.

The diagram below illustrates the format

of a 96-bit long three-address instruction, used for packed decimal, string, and vector operations.

The opcodes for this instruction type would be:

000010 MV  Move

000100 A   Add
000101 S   Subtract
000110 M   Multiply
000111 D   Divide

001000 N   AND
001001 O   OR

001100 X   XOR

010000 XS  XOR Small

010010 MVS Move Small

010100 AS  Add Small
010101 SS  Subtract Small
010110 NS  AND Small
010111 OS  OR Small

011010 MVF Load Floating

011100 AF  Add Floating
011101 SF  Subtract Floating
011110 MF  Multiply Floating
011111 DF  Divide Floating

100010 MVI Move Intermediate

100100 AI  Add Intermediate
100101 SI  Subtract Intermediate
100110 MI  Multiply Intermediate
100111 DI  Divide Intermediate

101010 MVD Move Double

101100 AD  Add Double
101101 SD  Subtract Double
101110 MD  Multiply Double
101111 DD  Divide Double

110010 MVP Move Packed

110100 AP  Add Packed
110101 SP  Subtract Packed
110110 MP  Multiply Packed
110111 DP  Divide Packed

111010 MVC Move Character

111100 PK  Pack
111101 UPK Unpack
111110 T   Translate

With the Packed and Character types, the length represents the number of digits in a single number, or the number of characters in a single string. With the other types, the length represents the length of a vector, and the source and operand lengths can only be either the same as the destination length, or equal to one for a constant operand. A value of 0 in a length field represents a length of 64 items.

The Move, Pack, and Unpack instructions are two-address, so only source and destination addresses are used, and the instruction is 72 bits long instead of 96 bits long.

For the translate instruction, the operand address is that of the translate table. The length is ignored, since a translate table has either 64 or 4,096 entries depending on the character length mode in use.

On the previous page, a word length of 36 bits was chosen because a bus width of four times that word length would permit fetching any aligned 36-bit, 48-bit, or 72-bit operand in a single memory access.

Here, we build up 36-bit, 48-bit, and 60-bit operands from individual 12-bit cells, and the only way to handle aligned operands of these three sizes would be to have a very wide 720-bit bus. How could we avoid frequent penalties for fetching operands which are not aligned, at least not with respect to the memory layout in use?

If a 12-bit wide bus to memory is used, the issue does not arise; a 60-bit operand will take five fetches to load, no matter where it starts.

But how to deal with this for a wider memory? One way would be this: to have a 96-bit wide data bus, but to divide it into two halves, each with its own address bus. Then, a 60-bit wide value starting on any 12-bit boundary could be fetched in a single operation, either by fetching the 48 bits at the same address in the left and right halves, or by fetching the 48 bits in the left half at an address one greater than the 48 bits in the right half.

It is felt that a 36-bit floating-point format will be significantly more useful than a 32-bit one in more cases, and that a 48-bit floating-point format will also further reduce the need for double precision. As well, a 60-bit format should satisfy the requirement for double precision, when it genuinely arises.

However, the IBM 360 had 64-bit double precision, and with the model 81, it still introduced a 128-bit extended precision floating-point type, and more recently, one has been added to the IEEE 754 standard, even though that already provided for an 80-bit floating-point type.

Given that a 12-bit word is the basic unit, of 36-bit, 48-bit, and 60-bit floating-point types, only the 48-bit type is a power-of-two multiple of the basic unit; the architecture above, with dual 48-bit buses, as it allows 60-bit quantities to be unaligned, can also cope with unaligned 48-bit quantities.

It could also handle 96-bit floating-point numbers, as long as they were aligned on 48-bit boundaries. The long instruction format for distant references allows the first two bits of an opcode to be 1. A few opcodes of that form have been added above to allow access to the new registers required for those instructions. Available opcodes still exist to provide a complete set for 96-bit floating-point numbers:

111001 CE  Compare Extended
111010 LE  Load Extended
111011 STE Store Extended
111100 AE  Add Extended
111101 SE  Subtract Extended
111110 ME  Multiply Extended
111111 DE  Divide Extended

It again seems to me that 96-bit floating point will be both quite long enough, and that it will be rarely used, so having these instructions only in the longer instruction format will be satisfactory.

Perhaps another mode bit will be required, to allow the character instructions to be replaced by extended precision vector operations, if this type is added.

It may be noted that a 12-bit program status word would suffice to hold the carry bit, condition codes, a user/supervisor bit, and the two mode bits for character and integer lengths. But to support IEEE 754 type operation, even with different lengths, the status word would have to be expanded, perhaps only to 24 bits.

In an interrupt, the program counter, the program status word, and base register zero would all have to be automatically saved before the interrupt service routine could begin; it could store everything else using conventional instructions without disturbing anything, although automatically saving an index register and replacing it with a stack pointer in memory as well would simplify dealing with multiple levels of interrupts that were dispatched initially to the same interrupt service routine. If an interrupt service routine can only be interrupted by higher-priority interrupts which have their own interrupt vectors, then they can have their own static locations to store saved information, and this would not be needed.

However, in the more ambitious case where a 720-bit block is used as the basic unit, 96-bit floating point numbers would not be possible to align. Remaining with multiples of 12 bits, either 72 bits or 120 bits would be possible. If that condition is dropped, in addition to dividing a 720-bit block into six 120 bit numbers or ten 72 bit numbers, eight 90-bit numbers or nine 80-bit numbers would also be possible as alternatives. Following IEEE 754, 16 of those bits would be used for the sign and the exponent. It might be useful, however, to chop off the last five bits of the mantissa, and use them instead to indicate additional data that might be kept in the floating point registers:

Two bits to indicate the precision of the number currently stored:

00  36 bits
01  48 bits
10  60 bits
11 120 bits

and three bits indicating how the last result was rounded, or if the result is a NaN (Not a Number):

000 exact
001 rounded down        (higher than nominal value)
010 rounded up          (lower than nominal value)
011 rounding indefinite (previous operation did not produce exact rounding)
100 value indefinite            (i.e. 0/0)
101 plus infinity  
110 minus infinity
111 infinite, but sign unknown  (i.e. 1/0)

If memory is divided into blocks of 120 bits, simple binary divisions would be 15, 30, and 60 bits. To treat it as being composed of parts 12 bits in length would require dividing a block of 120 bits into ten parts, or a block of 720 bits into sixty parts.

How can addressing be prevented from being a nightmare?

Multiplying by three requires one shift and add. So if base registers contain the number of a 720-bit block in memory, this can be converted to the address of a 120-bit word in external memory through multiplication by three and an additional shift.

If an aligned 60-bit double-precision number is addressed, it can be referenced using a raw address, then, which requires only shifting to obtain the address of a 120-bit word. Matters can be equally simple for addressing 30-bit or 15-bit entities, which is an argument for using 30-bit integer variables, and instructions composed of 15-bit halfwords in this case.

What of 36-bit and 48-bit floating-point numbers, however? A hardware divide-by-five circuit could deal with indexing and addressing in the case of such numbers. And 60-bit floating-point numbers could also be addressed in an alternate fashion within that system.

To simplify matters, since a multiply by three circuit is seen as needed so as to make the architecture independent of the use of an external 120-bit bus instead of the ideal 720-bit bus, thus avoiding the need for a divide by three circuit in the latter case, one can think instead of a divide by fifteen circuit. In binary, fifteen is 1111, and, thus, one-fifteenth is 0.11111111... in hexadecimal notation.

[Next][Up/Previous][Next Section]