Sixteen Registers

A very complex imaginary architecture was defined here, primarily motivated by what I perceived as a flaw in the architecture of the IBM System/360: that the displacement field in an instruction was only 12 bits long, instead of 16 bits long, as on many microprocessors or the PDP-11.

On the previous page, the possibility of creating instruction formats very closely resembling that of the IBM System/360 which would address this issue was examined.

On this page, an alternative approach, resulting in something closer to my imaginary architecture, able to retain most of its capabilities, is illustrated. Important similarities to the 360 are retained: registers are in sets of 16 rather than 8, and the instructions are limited to a small number of lengths, with length decoding kept very simple.

Here are the instruction formats:

Because sets of 16 registers are used, a full indexed memory-reference instruction is forced to be 48 bits long. So a non-indexed format is provided, and also a short indexed format. In the short indexed format, the destination register is one of general registers 0 to 7, the index register is one of general registers 1 through 7, and the base register is one of base registers 8 to 15; as in my architecture, unlike the System/360, a separate set of registers is used for base registers rather than having general registers be consumed by being allocated as base registers, even though there are sixteen of each instead of eight.

It's all very well to draw pretty pictures, but is there actually enough opcode space to fit all the instructions in that I would want using these formats?

There's only one way to find out: allocate them.

The fixed-point register-to-register instructions take up half of the opcode space allocated to 16-bit instructions:

00 SWBR   Swap Byte Register
01 CBR    Compare Byte Register
02 LBR    Load Byte Register

04 ABR    Add Byte Register
05 SBR    Subtract Byte Register

08 IBR    Insert Byte Register
09 UCBR   Unsigned Compare Byte Register
0A ULBR   Unsigned Load Byte Register
0B XBR    XOR Byte Register
0C NBR    AND Byte Register
0D OBR    OR Byte Register

10 SWHR   Swap Halfword Register
11 CHR    Compare Halfword Register
12 LHR    Load Halfword Register

14 AHR    Add Halfword Register
15 SHR    Subtract Halfword Register
16 MHR    Multiply Halfword Register
17 DHR    Divide Halfword Register

18 IHR    Insert Halfword Register
19 UCHR   Unsigned Compare Halfword Register
1A ULHR   Unsigned Load Halfword Register
1B XHR    XOR Halfword Register
1C NHR    AND Halfword Register
1D OHR    OR Halfword Register
1E MYHR   Multiply Extensibly Halfword Register
1F DYHR   Divide Extensibly Halfword Register

20 SWR    Swap Register
21 CR     Compare Register
22 LR     Load Register

24 AR     Add Register
25 SR     Subtract Register
26 MR     Multiply Register
27 DR     Divide Register

28 IR   * Insert Register
29 UCR    Unsigned Compare Register
2A ULR  * Unsigned Load Register
2B XR     XOR Register
2C NR     AND Register
2D OR     OR Register
2E MYR    Multiply Extensibly Register
2F DYR    Divide Extensibly Register

30 SWLR   Swap Long Register
31 CLR    Compare Long Register
32 LLR    Load Long Register

34 ALR    Add Long Register
35 SLR    Subtract Long Register
36 MLR    Multiply Long Register
37 DLR    Divide Long Register

39 UCLR   Unsigned Compare Long Register

3B XLR    XOR Long Register
3C NLR    AND Long Register
3D OLR    OR Long Register
3E MYLR   Multiply Extensibly Long Register
3F DYLR   Divide Extensibly Long Register

The instructions marked with an asterisk are valid only in 64-bit mode. Load sign-extends into the unused high portion of the register, Unsigned Load clears it, Insert leaves it undisturbed. Multiply and Divide take two inputs and return one output of the same length, just like Add and Subtract. Multiply Extensibly returns a double-length product, with the additional part in the high portion of the destination register if possible; if not, the high part of the product is in the destination register, and the low part is in the following register. Divide Extensibly has a double-length dividend, a single-length divisor, a double-length quotient and a single-length remainder. The quotient is therefore placed where the dividend was found, and the remainder in the next available register.

Because IEEE 754 floating-point will be used, there will be no need for unnormalized floating-point instructions, and so but 1/4 of the opcode space for 16-bit instructions is used by the floating-point instructions:

40 SWMR   Swap Medium Register
41 CMR    Compare Medium Register
42 LMR    Load Medium Register

44 AMR    Add Medium Register
45 SMR    Subtract Medium Register
46 MMR    Multiply Medium Register
47 DMR    Divide Medium Register

48 SWFR   Swap Floating Register
49 CFR    Compare Floating Register
4A LFR    Load Floating Register

4C AFR    Add Floating Register
4D SFR    Subtract Floating Register
4E MFR    Multiply Floating Register
4F DFR    Divide Floating Register

50 SWDR   Swap Double Register
51 CDR    Compare Double Register
52 LDR    Load Double Register

54 ADR    Add Double Register
55 SDR    Subtract Double Register
56 MDR    Multiply Double Register
57 DDR    Divide Double Register

58 SWER   Swap Extended Register
59 CER    Compare Extended Register
5A LER    Load Extended Register

5C AER    Add Extended Register
5D SER    Subtract Extended Register
5E MER    Multiply Extended Register
5F DER    Divide Extended Register

Only half of what is left is needed for the shift instructions:

60 110 SHLB   Shift Left Byte
60 10  SHLH   Shift Left Halfword
60 0   SHL    Shift Left
61 110 SHRB   Shift Right Byte
61 10  SHRH   Shift Right Halfword
61 0   SHR    Shift Right

63 110 ASRB   Arithmetic Shift Right Byte
63 10  ASRH   Arithmetic Shift Right Halfword
63 0   ASR    Arithmetic Shift Right
64 110 ROLB   Rotate Left Byte
64 10  ROLH   Rotate Left Halfword
64 0   ROL    Rotate Left
65 110 RORB   Rotate Right Byte
65 10  RORH   Rotate Right Halfword
65 0   ROR    Rotate Right
66 110 RLCB   Rotate Left through Carry Byte
66 10  RLCH   Rotate Left through Carry Halfword
66 0   RLC    Rotate Left through Carry
67 110 RRCB   Rotate Right through Carry Byte
67 10  RRCH   Rotate Right through Carry Halfword
67 0   RRC    Rotate Right through Carry

68     SHLL   Shift Left Long
69     SHRL   Shift Right Long

6B     ASRL   Arithmetic Shift Right Long
6C     ROLL   Rotate Left Long
6D     RORL   Rotate Right Long
6E     RLCL   Rotate Left through Carry Long
6F     RRCL   Rotate Right through Carry Long

leaving the other half for the relative branch instructions:

71            BL      Branch if Low
72            BE      Branch if Equal
73            BLE     Branch if Low or Equal
74            BH      Branch if High
75            BNE     Branch if Not Equal
76            BHE     Branch if High or Equal
77            BNV     Branch if No Overflow

78            BV      Branch if Overflow

7A            BC      Branch if Carry
7B            BNC     Branch if No Carry

7F            BRA     Branch

This should have been the hard part, as restricting the 32-bit memory-reference instructions to aligned operands, and making use of that to distinguish between operations on data of different lengths by means of the least-significant bits of the address field (an idea pioneered by the SEL 32 computer) produces an extreme saving of opcode space, but even with that saving, in order to have as much space available as I deemed required, it was necessary to complicate the formatting of the 48-bit and 64-bit long instructions:

80    0   SWHX    Swap Halfword Indexed
80   01   SWX     Swap Indexed
80  011   SWLX    Swap Long Indexed
82    0   CHX     Compare Halfword Indexed
82   01   CX      Compare Indexed
82  011   CLX     Compare Long Indexed
84    0   LHX     Load Halfword Indexed
84   01   LX      Load Indexed
84  011   LLX     Load Long Indexed
86    0   STHX    Store Halfword Indexed
86   01   STX     Store Indexed
86  011   STLX    Store Long Indexed
88    0   AHX     Add Halfword Indexed
88   01   AX      Add Indexed
88  011   ALX     Add Long Indexed
8A    0   SHX     Subtract Halfword Indexed
8A   01   SX      Subtract Indexed
8A  011   SLX     Subtract Long Indexed
8C    0   MHX     Multiply Halfword Indexed
8C   01   MX      Multiply Indexed
8C  011   MLX     Multiply Long Indexed
8E    0   DHX     Divide Halfword Indexed
8E   01   DX      Divide Indexed
8E  011   DLX     Divide Long Indexed

90    0   IHX     Insert Halfword Indexed
90   01 * IX      Insert Indexed

92    0   UCHX    Unsigned Compare Halfword Indexed
92   01   UCX     Unsigned Compare Indexed
92  011   UCLX    Unsigned Compare Long Indexed
94    0   ULHX    Unsigned Load Halfword Indexed
94   01   ULX     Unsigned Load Indexed
94  011   ULLX    Unsigned Load Long Indexed
96    0   XHX     XOR Halfword Indexed
96   01   XX      XOR Indexed
96  011   XLX     XOR Long Indexed
98    0   NHX     AND Halfword Indexed
98   01   NX      AND Indexed
98  011   NLX     AND Long Indexed
9A    0   OLX     OR Halfword Indexed
9A   01   OX      OR Indexed
9A  011   OLX     OR Long Indexed
9C    0   MYHX    Multiply Extensibly Halfword Indexed
9C   01   MYX     Multiply Extensibly Indexed
9C  011   MYLX    Multiply Extensibly Long Indexed
9E    0   DYHX    Divide Extensibly Halfword Indexed
9E   01   DYX     Divide Extensibly Indexed
9E  011   DYLX    Divide Extensibly Long Indexed

A0    0   SWMX    Swap Medium Indexed
A0   01   SWFX    Swap Floating Indexed
A0  011   SWDX    Swap Double Indexed
A0 0111   SWEX    Swap Extended Indexed
A2    0   CMX     Compare Medium Indexed
A2   01   CFX     Compare Floating Indexed
A2  011   CDX     Compare Double Indexed
A2 0111   CEX     Compare Extended Indexed
A4    0   LMX     Load Medium Indexed
A4   01   LFX     Load Floating Indexed
A4  011   LDX     Load Double Indexed
A4 0111   LEX     Load Extended Indexed
A6    0   STMX    Store Medium Indexed
A6   01   STFX    Store Floating Indexed
A6  011   STDX    Store Double Indexed
A6 0111   STEX    Store Extended Indexed
A8    0   AMX     Add Medium Indexed
A8   01   AFX     Add Floating Indexed
A8  011   ADX     Add Double Indexed
A8 0111   AEX     Add Extended Indexed
AA    0   SMX     Subtract Medium Indexed
AA   01   SFX     Subtract Floating Indexed
AA  011   SDX     Subtract Double Indexed
AA 0111   SEX     Subtract Extended Indexed
AC    0   MMX     Multiply Medium Indexed
AC   01   MFX     Multiply Floating Indexed
AC  011   MDX     Multiply Double Indexed
AC 0111   MEX     Multiply Extended Indexed
AE    0   DMX     Divide Medium Indexed
AE   01   DFX     Divide Floating Indexed
AE  011   DDX     Divide Double Indexed
AE 0111   DEX     Divide Extended Indexed

C0    0   SWHA    Swap Halfword Aligned
C0   01   SWA     Swap Aligned
C0  011   SWLA    Swap Long Aligned
C2    0   CHA     Compare Halfword Aligned
C2   01   CA      Compare Aligned
C2  011   CLA     Compare Long Aligned
C4    0   LHA     Load Halfword Aligned
C4   01   LA      Load Aligned
C4  011   LLA     Load Long Aligned
C6    0   STHA    Store Halfword Aligned
C6   01   STA     Store Aligned
C6  011   STLA    Store Long Aligned
C8    0   AHA     Add Halfword Aligned
C8   01   AA      Add Aligned
C8  011   ALA     Add Long Aligned
CA    0   SHA     Subtract Halfword Aligned
CA   01   SA      Subtract Aligned
CA  011   SLA     Subtract Long Aligned
CC    0   MHA     Multiply Halfword Aligned
CC   01   MA      Multiply Aligned
CC  011   MLA     Multiply Long Aligned
CE    0   DHA     Divide Halfword Aligned
CE   01   DA      Divide Aligned
CE  011   DLA     Divide Long Aligned

DO    0   IHA     Insert Halfword Aligned
DO   01 * IA      Insert Aligned

D2    0   UCHA    Unsigned Compare Halfword Aligned
D2   01   UCA     Unsigned Compare Aligned
D2  011   UCLA    Unsigned Compare Long Aligned
D4    0   ULHA    Unsigned Load Halfword Aligned
D4   01   ULA     Unsigned Load Aligned
D4  011   ULLA    Unsigned Load Long Aligned
D6    0   XHA     XOR Halfword Aligned
D6   01   XA      XOR Aligned
D6  011   XLA     XOR Long Aligned
D8    0   NHA     AND Halfword Aligned
D8   01   NA      AND Aligned
D8  011   NLA     AND Long Aligned
DA    0   OLA     OR Halfword Aligned
DA   01   OA      OR Aligned
DA  011   OLA     OR Long Aligned
DC    0   MEHA    Multiply Extensibly Halfword Aligned
DC   01   MEXA    Multiply Extensibly Aligned
DC  011   MELA    Multiply Extensibly Long Aligned
DF    0   DYHA    Divide Extensibly Halfword Aligned
DF   01   DYA     Divide Extensibly Aligned
DF  011   DYLA    Divide Extensibly Long Aligned

E0    0   SWMA    Swap Medium Aligned
E0   01   SWFA    Swap Floating Aligned
E0  011   SWDA    Swap Double Aligned
E0 0111   SWEA    Swap Extended Aligned
E2    0   CMA     Compare Medium Aligned
E2   01   CFA     Compare Floating Aligned
E2  011   CDA     Compare Double Aligned
E2 0111   CEA     Compare Extended Aligned
E4    0   LMA     Load Medium Aligned
E4   01   LFA     Load Floating Aligned
E4  011   LDA     Load Double Aligned
E4 0111   LEA     Load Extended Aligned
E6    0   STMA    Store Medium Aligned
E6   01   STFA    Store Floating Aligned
E6  011   STDA    Store Double Aligned
E6 0111   STEA    Store Extended Aligned
E8    0   AMA     Add Medium Aligned
E8   01   AFA     Add Floating Aligned
E8  011   ADA     Add Double Aligned
E8 0111   AEA     Add Extended Aligned
EA    0   SMA     Subtract Medium Aligned
EA   01   SFA     Subtract Floating Aligned
EA  011   SDA     Subtract Double Aligned
EA 0111   SEA     Subtract Extended Aligned
EC    0   MMA     Multiply Medium Aligned
EC   01   MFA     Multiply Floating Aligned
EC  011   MDA     Multiply Double Aligned
EC 0111   MEA     Multiply Extended Aligned
EE    0   DMA     Divide Medium Aligned
EE   01   DFA     Divide Floating Aligned
EE  011   DDA     Divide Double Aligned
EE 0111   DEA     Divide Extended Aligned

In order to double the space available to indicate registers in the Short Indexed instructions, there are no 48-bit indexed memory reference instructions for the byte data type, which would use up fully half of the opcode space available as they require an undiminished displacement field.

Only a tiny bit of space is left to include 32-bit versions of the conditional jump instructions and subroutine call instructions, but it can be made to serve.

The destination register field, while it serves to indicate where to store the return address for the jump to subroutine instruction, is unused for a jump instruction, and is therefore available to indicate the condition applicable to a conditional jump, as was done on the IBM System/360. This is complicated somewhat for the Short Indexed format, in which the destination register field has been reduced to three bits in length, but sufficient opcode space is still available.

Given that instructions are aligned on halfword boundaries, and there is no alternate kind of instruction that is aligned on 32-bit boundaries to jump to, both possible values of the one spare bit in the displacement may be used given that 16-bit alignment is the terminal alignment for this instruction type. This is needed because while there is extra space among the Aligned format instructions, none is available among the Short Indexed format instructions.

In order to fit the Short Indexed jump instructions in the limited space available, their format is modified. Only index registers 0 to 3 are available for them, unlike registers 0 to 7, made available for the other Short Indexed memory reference instructions. Also, since the destination register field is three bits long, instead of four, two opcodes rather than one are allocated to the conditional jump instructions.

E8    0   JSRA    Jump to Subroutine Register Aligned
E9    0   JSBA    Jump to Subroutine Base Aligned

EA1   0   JLA     Jump if Low Aligned
EA2   0   JEA     Jump if Equal Aligned
EA3   0   JLEA    Jump if Low or Equal Aligned
EA4   0   JHA     Jump if High Aligned
EA5   0   JNEA    Jump if Not Equal Aligned
EA6   0   JHEA    Jump if High or Equal Aligned
EA7   0   JNVA    Jump if No Overflow Aligned

EA8   0   JVA     Jump if Overflow Aligned

EAA   0   JCA     Jump if Carry Aligned
EAB   0   JNCA    Jump if No Carry Aligned

EAF   0   JA      Jump Aligned

E8    1   JSRX    Jump to Subroutine Register Indexed
E9    1   JSBX    Jump to Subroutine Base Indexed

EB0   1   JLX     Jump if Low Indexed
EA2   1   JEX     Jump if Equal Indexed
EB2   1   JLEX    Jump if Low or Equal Indexed
EA4   1   JHAX    Jump if High Indexed
EB4   1   JNEX    Jump if Not Equal Indexed
EA6   1   JHEX    Jump if High or Equal Indexed
EB6   1   JNVX    Jump if No Overflow Indexed
EA8   1   JVX     Jump if Overflow Indexed

EAA   1   JCX     Jump if Carry Indexed
EBA   1   JNCX    Jump if No Carry Indexed

EBE   1   JX      Jump Indexed

Two versions of the Jump to Subroutine instruction are required; one saves the return address to a specified general register, the other to a specified base register. In the case of the Jump to Subroutine Base instruction, the possible targets are base registers 8 through 15, even though the instruction can only use base registers 11 through 15 with its destination address.

The Aligned instruction format clearly poses no issues, as it handles all non-indexed memory references well, with full four-bit fields available for both the destination register and the base register.

The Indexed instruction is hoped to be able to handle most indexed memory references, as the destination register can be anything from 0 to 3, and the index register anything from 1 to 7, and the base register field is used to indicate a base register from 8 to 15.

Adequate opcode space is still available for the 48-bit full memory reference instructions, multiple register instructions, and vector register instructions, despite their having been allocated only a very small portion of total opcode space.

Thus, for the full memory-reference instructions, we have:

FE 0nn

as the opcode, where nn is the opcode of the corresponding register-to-register instruction; i.e., we have

FE 024     A     Add

for the instruction that performs 32-bit addition.

Also, there are the jump instructions:

FE  060  JSR    Jump to Subroutine Register
FE  061  JSB    Jump to Subroutine Base

FE1 062  JL     Jump if Low
FE2 062  JE     Jump if Equal
FE3 062  JLE    Jump if Low or Equal
FE4 062  JHA    Jump if High
FE5 062  JNE    Jump if Not Equal
FE6 062  JHE    Jump if High or Equal
FE7 062  JNV    Jump if No Overflow

FE8 062  JV     Jump if Overflow

FEA 062  JC     Jump if Carry
FEB 062  JNC    Jump if No Carry

FEF 062  J      Jump

FE  064  JXLE   Jump if Index Low or Equal
FE  065  JXH    Jump if Index High

The Jump if Index Low or Equal instruction increments the general register indicated in the index register field, and jumps if its contents are less than or equal to those of the general register indicated in the destination register field. Jump if High decrements, and jumps if greater, instead.

This provides space for further expansion to handle additional data types.

To permit register-to-register instructions for additional data types, the opcode CF, which would indicate a Divide Extensibly Byte instruction if one were useful, can be used to indicate a 32-bit instruction that contains a 16-bit instruction in its second half and which provides additional opcode bits.

Thus, we have:

CF02 nnds      FEdx 2nnb aaaa

  0 1 2        3       4       5 6 7 8 9 A B C D E F
0     SFSWH    SFSW    SFSWL
1     SFCH     SFC     SFCL 
2     SFLH     SFL     SFLL 
3     SFSTH    SFST    SFSTL
4     SFAH     SFA     SFAL 
5     SFSH     SFS     SFSL 
6     SFMH     SFM     SFML 
7     SFDH     SFD     SFDL 
8     SFMEUH   SFMEU   SFMEUL
9     SFDEUH   SFDEU   SFDEUL
A     SFLUH    SFLU    SFLUL 
B     SFSTUH   SFSTU   SFSTUL
C     SFAUH    SFAU    SFAUL 
D     SFSUH    SFSU    SFSUL 
E     SFMUH    SFMU    SFMUL 
F     SFDUH    SFDU    SFDUL

for the Simple Floating instructions,

CF03 nnds      FEdx 3nnb aaaa

  0      1       2 3 4 5 6 7 8 9 A B C D E F
0
1 RPC    RCDC   
2 RPME   RCDME 
3 RPDE   RCDDE 
4 RPA    RCDA  
5 RPS    RCDS  
6 RPM    RCDM  
7 RPD    RCDD  
8              
9 RPCL   RCDCL 
A RPMEL  RCDMEL
B RPDEL  RCDDEL
C RPAL   RCDAL 
D RPSL   RCDSL 
E RPML   RCDML 
F RPDL   RCDDL

for the Register Packed Decimal and the Register Compressed Decimal instructions, and

CF04 nnds      FEdx 4nnb aaaa

  0       1       2        3       4        5        6 7 8        9        A        B C D E F
0 SWFRC           SWDRC            SWERC                 SWNFRC   SWNDRC   SWNERC
1 CFRC            CDRC             CERC                  CNFRC    CNDRC    CNERC 
2 LFRC            LDRC             LERC                  LNFRC    LNDRC    LNERC 
3 STFRC           STDRC            STERC                 STNFRC   STNDRC   STNERC
4 AFRC    AFRCH   ADRC     AFDCH   AERC     AFDCH        ANFRC    ANDRC    ANERC 
5 SFRC    SFRCH   SDRC     SFDCH   SERC     SFDCH        SNFRC    SNDRC    SNERC 
6 MFRC    MFRCH   MDRC     MFDCH   MERC     MFDCH        MNFRC    MNDRC    MNERC 
7 DFRC    DFRCH   DDRC     DFDCH   DERC     DFDCH        DNFRC    DNDRC    DNERC 
8
9
A LUFRC           LUDRC            LUERC 
B STUFRC          STUDRC           STUERC
C AUFRC           AUDRC            AUERC 
D SUFRC           SUDRC            SUERC 
E MUFRC           MUDRC            MUERC 
F DUFRC           DUDRC            DUERC

for the Floating Register Compressed Decimal instructions.

Finally, rather than switching into a special mode for the Subdivided Floating and Subdivided Medium data types, they as well may be given their own instructions, although this will mean that the instructions are longer:

CF05 nnds      FEdx 5nnb aaaa

   0 1 2 3 4       5       6 7 8       9      A B C D E F
0          SWSM
1          CSM
2          LSM
3          STSM    STRSM       STRM    STRD
4          ASM
5          SSM
6          MSM
7          DSM
8          SWSF
9          CSF
A          LSF
B          STSF    STRSF       STRF
C          ASF
D          SSF
E          MSF
F          DSF

Store Rounded instructions are shown for both these formats and the regular floating-point formats other than Extended: they will be discussed below.

Thus, the floating-point formats available on this machine are the following:

where Floating is 32 bits long, Subdivided Floating is 36 bits long, Medium is 48 bits long, Subdivided Medium is 51 bits long, Double is 64 bits long, and Extended is 128 bits long.

The formats are based on those specified by the IEEE 754 standard for 32 and 64 bit floating-point numbers. The sizes of the exponent fields for other sizes of floating-point numbers were chosen on the basis of the following rationales:

In the case of 36-bit floats, since the hidden first bit grants an extra bit of precision, the floating-point precision of the IBM 7090 can be attained while adding a bit to the exponent to approximate the exponent range of the IBM 360. This allows a broader range of older FORTRAN programs to run without modification if this type of number is used for single precision (provided, of course, that the unusual storage layout is not an issue).

In the case of 48-bit and 51-bit floats, the size of the exponent was chosen because these formats provide a precision of 11 or 12 decimal digits, thus approximating that provided by a typical scientific pocket calculator; therefore, it was deemed desirable that the exponent range exceed that of such a calculator as well (10^-99 to 10^99).

Medium floating point numbers, as can be seen from the instruction formats, are aligned on 16 bit boundaries. Floating, Double, and Extended numbers are aligned when placed on multiples of their lengths, as they have power-of-two lengths.

It is assumed that memory is connected to a microprocessor using this ISA by a 256-bit data bus. A 256 bit memory word can be divided into seven 36-bit Subdivided Floating numbers with three bits left over, and into five 51-bit Subdivided Medium floating numbers with one bit left over.

In order to avoid the slow operation of dividing by either five or seven when addressing numbers of these types, however, they are stored with some additional wastage.

32 Subdivided Floating numbers are stored in five 256-bit memory words, leaving the 33rd, 34th, and 35th slots vacant.

64 Subdivided Medium numbers are stored in thirteen 256-bit memory words, leaving the 65th slot vacant.

In this way, only multiplication plus a small table lookup is required to locate the Nth element of an array of numbers of these types for any value of N. Numbers of these types are addressed as if they are 32 bits long, with only the first seven or the first five positions in a 256-bit memory word being valid. When an instruction is indexed, the index is treated as a displacement in floating point numbers rather than one in bytes. The last five bits, in the case of Subdvided Floating, or the last six bits, in the case of Subdivided Medium, are used to indicate the position within a block, and the higher portion of the index is multiplied by the length of a block in 256-bit memory words.

The portion of the basic address, formed by the sum of the base register contents and the displacement field in the instruction, that indicates a 32-bit word within a 256-bit memory cell, when shifted as required to make it in units of 32 bits, is added to the index register contents before processing. In the indexed case, values of 7 or 5 or higher respectively will work properly, whereas only values of 0 to 6 or 0 to 4 are valid for Subdivided Floating and Subdivided Medium respectively without indexing.

Because Medium floating-point numbers do not work well with indexing if the inefficiency of occasional double memory accesses is not to be tolerated, while Subdivided Medium floating-point numbers do not have this issue, but do not allow overlapping offset arrays, it might be useful to mix both types in the same program.

Thus, some points about type interoperability need to be noted.

Internally, all conventional floating-point numbers will be stored with exponents as in the Extended floating point format, followed by a mantissa with the appropriate number of bits, which will be one more than for the external form of any size other than Extended, as there will be no hidden first bit. (Note that these notes apply only to this architecture, not the parent architecture to which reference is made for the definitions of some exotic data types; that architecture allows considerably more flexibility in floating-point formats, and so it needs to do type conversions explicitly and store all numbers internally in their memory format.)

The floating-point registers, however, are still essentially 128-bit wide fast memory locations without special circuitry; the internal format simply allows numbers to be quickly fed to the floating-point ALU. This permits flexibility in register renaming. (This particular item is a characteristic of the parent architecture as well.)

Floating-point operations clear all the unused less significant mantissa bits. But storing a floating-point number at a lesser precision involves truncation, not rounding, to avoid imposing an overhead on the great majority of operations which involve numbers being stored at their own precision. This conflicts with the aim of the IEEE 754 standard to retain the maximum possible accuracy in all calculations, and, thus, the Store Rounded instructions, placed in the same region of opcode space as the Subdivided floating-point instructions are provided.

The Floating Register Compressed Decimal numbers are also stored in the floating-point registers. Because they are actually 128 bits long, the internal and external formats of 128-bit Floating Register Compressed Decimal numbers are the same. For maximum efficiency in handling decimal floating-point numbers of other precisions, they are stored internally as a sign, a 15-bit binary exponent, and seven or sixteen BCD digits. So Chen-Ho (or, rather, Densely Packed Decimal) encoding and decoding is combined with memory operations for the shorter formats, but is done during register operations for the extended precision.

The short vector instructions also require an additional field to indicate the mask register being used, and another to indicate if masking is present, so they're 16 bits longer. Fortunately, even among 48-bit instructions, there is space for extra codes, using F8 as the opcode:

               FE00 1nn0 mMds      F801 nn0d mMxb aaaa

  0      1      2      3      4      5      6
0 SWBSV  SWHSV  SWSV   SWLSV  SWFSV  SWDSV  SWESV
1
2 LBSV   LHSV   LSV    LLSV   LFSV   LDSV   LESV
3 STBSV  STHSV  STSV   STLSV  STFSV  STDSV  STESV
4 ABSV   AHSV   ASV    ALSV   AFSV   ADSV   AESV
5 SBSV   SHSV   SSV    SLSV   SFSV   SDSV   SESV
6        MHSV   MSV    MLSV   MFSV   MDSV   MESV
7        DHSV   DSV    DLSV   DFSV   DDSV   DESV
8 SMBPB  SMBPH  SMBP   SMBPL  SMBPF  SMBPD  SMBPE
9 SMBZB  SMBZH  SMBZ   SMBZL  SMBZF  SMBZD  SMBZE
A SMBNB  SMBNH  SMBN   SMBNL  SMBNF  SMBND  SMBNE
B XBSV   XHSV   XSV    XLSV
C NBSV   NHSV   NSV    NLSV
D OBSV   OHSV   OSV    OLSV
E
F

Key to the instruction formats:

nn: opcode as indicated in the table
aaaa: address displacement field
m: 0000 if no mask, 0001 if masked
M: mask register
d: destination register
s: source register
x: index register
b: base register

and, of course, mnemonics are suffixed R for register to register instructions.

In the case of the multiple register and the vector register instructions, the secondary opcode field is eight bits long rather than twelve, but this is still sufficient:

FE 0 82      LBVR     Load Byte Vector Register
             (and so forth)

FE 4 82      LMVR    Load Medium Vector Register
             (and so forth)

FE 4 D0      SWSMVR  Swap Subdivided Medium Vector
             (and so forth)

FE   F2      LM       Load Multiple
FE   F3      STM      Store Multiple
FE   F4      LML      Load Multiple Long
FE   F5      STML     Store Multiple Long
FE   F6      LME      Load Multiple Extended
FE   F7      STME     Store Multiple Extended

Because the first digit of the seconary opcode field of these instructions is 8 or greater, to distinguish them from the full memory reference instructions, the primary opcode field contains the first digit of the two-digit opcode of the analogous instruction, with the first digit of the secondary opcode field containing 8 plus the supplementary opcode digit, and the second digit of the secondary opcode field containing the second digit of the analogous instruction.

Thus, a full memory reference instruction with opcode 123, analogous to a register to register instruction with opcode 23, would correspond to a vector instruction with the split opcode 2 93.

The vector memory-reference instructions are 80 bits long, rather than 64 bits long, partly because of a serious shortage of opcode space for the 64-bit long packed decimal and string instructions, on the other hand.

For the 80-bit instructions, we have:

FF 0 02      LBV      Load Byte Vector
             (and so forth)

FF 4 02      LMV      Load Medium Vector
             (and so forth)

FF 4 50      SWSMV    Swap Subdivided Medium Vector
             (and so forth)

FF   A       TR       Translate

FF   C       FMT      Format

FF   E       SC       Scan

Because the first bit of the secondary opcode must be zero, to distinguish it from the translate instructions, once again the prefix digit becomes the first digit of the secondary opcode, and the two digits of the corresponding original opcode become the primary opcode and the second digit of the secondary opcode respectively. Thus, the opcode 123 becomes 2 13.

While indexing works properly with Subdivided Medium and Subdivided Floating vector operations, remember that the address pointed to by the base and the displacement always defines the beginning of a block of floats with unused space at the end (even though it does not have to point to the very first element of that block), and so it is not possible, in general, to have overlapping vectors work the way one would expect from datatypes which fit more neatly into storage.

The rule to remember is that the counter locating array elements for a vector operation is treated like an index register for purposes of address formation with Subdivided floating-point numbers; even stride works properly with them without issues.

Also, not only are vector operations on 51-bit Subdivided Medium numbers are allowed, vector operations on 48-bit Medium numbers are also supported: while elements of such vectors would cross memory word boundaries, since handling a vector from memory involves fetching each memory word once, not repeatedly for each element containing it, no actual efficiency issue results during vector operations, provided no nonunit stride is present, although one would exist when operating on those individual elements that cross such boundaries.

The FMT instruction operates as follows: The first operand is a translation table with 256 one-byte entries in which entries 0, 1, and 255 have a special significance. The length field of the instruction determines the length of the source operand. Successive characters from the source operand are moved to the destination operand as follows:

only bytes in the destination operand containing zero are filled, and other bytes are skipped, but no more than three successive bytes in the destination operand can be skipped between zero bytes (excessively long runs of nonzero bytes in the destination operand cause the instruction to stop with an overflow condition);
if a byte taken from the source operand is the index of a nonzero character in the translation table, the character found in the translation table is placed in the destination operand;
if a byte taken from the source operand is the index of a zero character in the translation table, then the character placed in the destination operand is:
- the character found in position 0 in the translation table, if no previous characters taken from the source operand translated to nonzero entries in the translation table, unless
- the source character is the last such character, in which case the character found in position 255 in the translation table is taken,
- and the source character found in position 1 in the translation table if there have been previous bytes taken from the source operand that translated to nonzero values.
The source operand may not contain bytes having the values 0, 1, or 255. If it does, the instruction stops, and an overflow condition is set.

This instruction, except that it does not convert from packed decimal to unpacked, performs a similar function to the edit and edit with mark instructions on the IBM System/360 computer. Thus, a translation table can contain a fill character in position 0, the digit zero in position 1, and a floating currency symbol (or another fill character) in position 255 to convert a raw zoned decimal string (produced by an unpack instruction) to the format used in printing. Note that the decimal point would be placed in the destination operand, with zero bytes in the positions to be filled with digits.

The SC instruction begins by ignoring any characters in the source operand that translate to bytes containing zero in the translation table; then, bytes not translating to zero are copied with translation until an entry in the translate table containing a zero is encountered. The number of characters translated is placed in accumulator/index register 2, giving the length of the result in the destination operand, and the number of characters translating to zero that were initially ignored, plus the number of characters translated, is placed in accumulator/index register 1, giving the portion of the source operand that was scanned until the first character translating to zero following a character not translating to zero was found, and the remainder of the destination operand is filled with the character found in position 0 of the translate table.

The source operand may not contain a byte having the value 0. If it does, the instruction stops, and an overflow condition is set.

If the instruction completes within the provided length, then it is treated as having a zero result for a subsequent conditional branch instruction; if it did not complete, but characters with nonzero translations were encountered, it has a positive result; if only characters translating to zero were encountered, it has a negative result.

This instruction can be used for some of the same purposes as the translate and test instruction of the IBM System/360, although it works differently. It can be used to scan for keywords and translate them to upper case, for example.

and as for the 64-bit instructions, they do still squeeze in:

F1         CP    Compare Packed
F2         MVP   Move Packed

F4         AP    Add Packed
F5         SP    Subtract Packed
F6         MP    Multiply Packed
F7         DP    Divide Packed

FA         MVB   Move Byte

FC         P     Pack
FD         UP    Unpack

Note that the Pack and Unpack instructions take a packed argument that is half the length of the argument that would be indicated if it were in character form: these instructions have a single length field, like the MVB instruction. Unpack adds hexadecimal 30 to each packed decimal digit, converting it to an ASCII digit.

The length fields of these instructions contain one minus the operand length.