Three bits in the program status block allow the fundamendal width of data memory to be changed under program control.
It is envisaged that external memory connected to the computer chip will have a bus width such as 256 bits or 128 bits, and this will not be subject to change.
Instead, because external memory is to be interleaved at least eight ways, and preferably sixteen ways, the on-chip cache would consist of units wider than the bus to main memory. Thus, with sixteen-way interleaved main memory, and a bus width of 256 bits, a line of the cache would correspond to 4,096 bits of memory.
Additional circuitry on the chip would permit a line of the cache to instead be filled by means of nine, twelve, ten, or fifteen memory accesses.
Another possibility would be to have a cache each line of which could be filled with eighteen memory accesses, to replace the high inefficiency of using nine out of sixteen locations with the lesser inefficiency of using sixteen out of eighteen locations; however, the latter case is far more likely, since the normal mode of operation would be overwhelmingly more common than that using this special feature to assist emulation, and thus a large waste of cache space in the mode that is much less frequently used would be preferred to even a small waste in the mode almost always used.
Then, it would be possible for the processor to work, in the first case, with 36-bit integers and 72-bit double-precision floating-point nubers, in the second case with 48-bit integers and floating-point numbers, in the third case with 40-bit integers and floating-point numbers, and in the fourth case with 60-bit integers and double-precision floating-point numbers, while still using external memory with full efficiency.
This would primarily be useful in enabling the efficient emulation of older computer architectures.
The lengths of the various types would be modified as follows:
Memory width bits: 000 010 011 100 110 Length in bits Byte 8 6 9 10 15 Halfword 16 12 18 20 30 Word 32 24 36* 40* 60* Long 64* 48* 72** 80** 120** Medium (standard) 80 -- 90* 100* 75* Medium (compatible) 48 72 -- 60 -- Medium (all other formats) 48 72 54* 60 45* Floating 32 24 36 40 30 Double 64 48 72 80 60 Quad 128 96 -- -- 120 Subdivided Medium (standard) 85 -- -- -- 80 256/3 240/3 Subdivided Medium (compatible) -- 64 48 -- 48 192/3 144/3 240/5 Subdivided Medium (all others) 51 64 48 53 48 256/5 192/3 144/3 160/3 240/5 String character size: Normal 8 6 9 10 15 Alternate (1) -- 8 6 8 6 Alternate (2) -- -- -- -- 10 Alternate (3) (never used)
and the bit indicating alternate string character size would affect the operation of the string instructions, including conversions to and from packed decimal and translate instructions. When this bit is on, memory addresses for those instructions must be aligned on 24-bit boundaries or 18-bit boundaries for the 24-bit and 36-bit word modes respectively.
For the fixed-point data types, one asterisk indicates the type requires two consecutive arithmetic/index registers, and two asterisks indicate the type requires four consecutive arithmetic/index registers or two consecutive supplementary registers.
For the Medium floating-point data type, an asterisk indicates the type is not available with long vector instructions.
The standard floating-point format is not available in 24-bit word mode, and the compatible floating-point format is not available in 36-bit word mode or 60-bit word mode. Quad precision floating-point is only available in two alternate length modes, the ones with a 24-bit word and a 60-bit word.
In general, because the number of bits in the exponent may be selected with single-bit resolution, it might appear that floating-point types with base-4, octal, hexadecimal, base-32, base-256, or base-65536 exponents would be available in any memory width, with the only restriction that the only allowed values for the length of the exponent field are those which would lead to the length of the mantissa field being a multiple of two, three, four, five, eight, or sixteen bits respectively. However, when the length difference between floating-point formats is not a multiple of the width of a digit, any given floating-point format would divide the mantissa into an integral number of digits of the base chosen for some precisions but not others.
This is purely an aesthetic consideration, however, not a practical one. The mantissa of a floating-point number is still a binary fraction, the size of the digit merely setting the condition for normalization. Since it is intended for the floating-point flexibility of this architecture to permit emulation or approximation of a wide variety of existing and historic computers, and some floating-point formats involved unused bits in the floating-point format, and others, such as that of the MANIAC II, called for incomplete digits in the mantissa, there is no enforced requirement for the mantissa to consist of whole digits in the exponent base used, and instead floating-point operations will perform properly in any case indpendently of this consideration.
To the extent that this is felt to be a consideration, however, this may also be dealt with by the use of mixed exponent mode. When mixed exponent mode is not used, a floating-point type that leads to fractional digits only for the Medium precision may reasonably be used, where the floating-point type used would at least have only complete digits for the Floating, Long, and Quad types. An exception to this rule applies when the word length is 24 bits; in that mode, the Floating type is 24 bits long, which may not be considered particularly useful, and thus in that case, floating-point types may be used even if they are valid only for Double and Quad.
The following table indicates reccomended exponent bases when mixed exponent mode is not in use for various memory widths, base-2 exponents not raising this issue in any widths:
Memory width bits: 000 010 011 100 110 Basic floating-point number length: 32 48 36 40 30 Base-4 Floating, Double, Quad Y Y Y Y Y Medium Y Y Y Y -- Octal Floating, Double, Quad -- Y Y -- Y Medium -- Y Y -- Y Hexadecimal Floating, Double, Quad Y Y Y Y -- Medium Y Y -- Y -- Base-32 Floating, Double, Quad -- -- -- Y Y Medium -- -- -- Y Y Base-256 Floating, Double, Quad Y Y -- Y -- Medium Y Y -- -- -- Base 65,536 Floating, Double, Quad Y Y* -- -- -- Medium Y -- -- -- --
In alternate length modes, since the cache handles memory containing instructions in the normal fashion, while memory containing data is handled so as to correspond to a multiple of the alternate unit of length, code and data must be kept separate in programs. Failing to do so may simply result in the same area of memory being cached twice, leading to unpredictable results.
This appears to mean that the switch to an alternate length mode can only take place after program code has been stored in memory. That, however, would prevent the use of techniques such as "just-in-time compilation" of code being emulated, and stored in data memory because it has instruction lengths characteristic of the architecture emulated. Thus, one of the mode-independent prefix codes is used to indicate instructions which refer to program memory instead of data memory.
Since an alternate length mode changes the interpretation of register contents, length modes cannot simply be associated with ranges of memory, but are part of the state of the current running program.
Also note that while the 128-bit length of the floating-point registers is an upper limit to the length of floating-point formats in any word length, the effective length of the arithmetic-index registers may be increased from 32 bits to 36 or 40 bits. This is achieved by using the least significant 36 or 40 bits of a 64-bit register. This can be done directly in the case of the supplementary arithmetic/index registers; in the case of the regular arithmetic/index registers, it means that integer operations in those length modes, like long operations in other modes, require that only an even-numbered arithmetic/index register be used as their destination, and as their source if the source is a register instead of a memory location.
Thus, just as quad precision floating-point arithmetic is only available in modes 000 and 110, for vector operations (cache-internal parallel operation always takes place with the standard memory width) long fixed-point arithmetic is only available in modes 000, 010, and 110.
Long fixed-point operands 72, 80, or 120 bits in length are available for scalar operations involving the regular arithmetic/index registers, and involve using four of those registers at once to create a 128-bit register, and for scalar operations involving the supplementary registers, using two of those registers at once to create a 128-bit register.
Also, bit-field instructions and the other extended operations, such as population count and bit matrix multiply are not available when the memory width is changed from its default value of a pure power of two.
Once a cache line is partially filled, to allow nine 256-bit entries in a cache line which normally contains sixteen 256-bit entries to be transmitted to the 64 ALUs as either one set of sixty-four words each 36 bits long, or two sets of sixty-four halfwords each 18 bits long, and so on, also requires additional circuitry. One form this circuitry might take is that of an assembly such as that illustrated on this page, but oriented towards each of the nonstandard memory sizes. At the end, it still directs its output to the individual ALUs separately.
Usually, to continue the specific example given, an architecture with a 36-bit word uses a 6-bit character instead of a 9-bit character. Although this means the global scatter/local gather model will not address all the cases, the global scatter/local gather circuitry for a 48-bit word could be employed in that case, in combination with the operand shifting logic illustrated here to avoid the need for more additional circuitry.
While twelve is greater than nine, however, both twelve and ten are less than fifteen, and so using six-bit or ten-bit characters with a 60-bit word will require some special circuitry. Using eight-bit characters with a 40-bit word or a 48-bit word, the other case not yet discussed, will, of course, not have this problem, since sixteen is the largest width, embracing the whole cache line.
It may also be noted that the arithmetic registers require considerable additional flexibility to handle the data types with modified precision which this produces.
Although it would be theoretically possible to allow changes in data memory width apply to the short vector arithmetic unit as well, with the result that the effective width of the short vector arithmetic unit becomes, for each word length, the following:
Word length Short Vector Length 24 bits 192 bits 30 bits 240 bits 32 bits 256 bits 36 bits 144 bits 40 bits 160 bits
which would mean that the short vector arithmetic unit contains eight words of 24, 30, or 32 bits, but only four words of 40, 48, or 36 bits, and there would be some justification for doing so, since the Lincoln Laboratory TX-2, which pioneered the type of arithmetic-unit flexibility embodied in short vector operations, had a 36-bit word, and the AN-FSQ/31 and AN-FSQ/32, which also had this type of operation, had a 48-bit word.
Would doing so impose a prohibitive cost in circuitry and speed? Allowing the internal arithmetic units that work on the components of a 256-bit value to work with shorter-precision numbers would not, in itself, be prohibitive. The operands could be placed in the required positions when data is fetched from the cache by first using the global scatter/local gather circuitry used to fetch operands of the required width from the cache, and then using the local scatter/global gather circuitry used to store operands of regular width in the cache. This would limit the cost of circuitry, but impose a significant cost in speed.
Thus, it is proposed that while short vector instructions, unlike bit-field operations, are available when the data memory width is changed, by default their operands are treated as belonging to instruction memory, and always have their standard lengths, unaffected by the value in these two bits. In that case, in order that the short vector instructions avoid being modified for alternate precision regimes, any memory they reference, like the memory from which program code is drawn, is cached normally instead of in the modified manner. Thus, they can only interoperate with instructions with the 173730 prefix, as the memory they access will be treated as code memory rather than data memory. But the option will also be available, in the definition of the architecture, to direct that short vector instructions operate in terms of the current modified data width. Not every implementation will necessarily provide that feature.
Note also that the Extended Translate instructions, however, do refer to data memory, and are affected by the modifications to the precision of all data types, thus, the arithmetic-logic units within the circuitry for these instructions also require the flexibility to handle different data memory widths.
Note that for the standard floating-point format, the number of bits used for the exponent is determined by the identity of the operand type, not its actual length: for example, with a 40-bit word length, 80-bit double-precision floating-point numbers have the same format, except for additional mantissa bits, as 64-bit double-precision floating-point numbers with a 32-bit word length, rather than the same exponent field size and format as that of 80-bit floating-point numbers of type medium when the word length is 32 bits.
The following table shows how the two modified simple floating-point number sizes correspond to the precisions provided by the regular floating-point arithmetic unit for different basic word lengths:
Word Length 32 bit 24 bit 36 bit 40 bit 30 bit Integer Floating-point Format Format Byte Float Halfword Float Float Float Float Double Word Double Double Double Double Quad Long Quad Quad
as a result of the nonexistence of quad precision floating-point sizes longer than 128 bits, and of the number of words in a floating-point number being doubled when the word size is reduced to 24 bits, and being halved when the word size is increased to 60 bits. The correspondence is, of course, based on the fact that a simple floating number consists of two integers, one with the exponent, one with the mantissa.
The following diagram:
shows the different organizations of memory that are obtained from setting these bits.
In tagged word mode, each word of memory has an additional four bits of tag information associated with it. This is achieved by including one additional memory fetch when filling a cache line, but then using the circuitry for distributing cache line contents to the memory bus associated with the word width not including the tag bits.
The size of the word to which four bits are attached is shown below:
Memory Units in Word Units in Instruction width Cache Size Cache Line Syllable bits Line when Tagged Sizes 000 16 32+4 9 8 010 12 48+4 13 8,12 011 9 36+4 10 12 100 10 40+4 11 8 110 15 60+4 16 12
Thus, in physical memory, on an implementation with a 256-bit-wide data bus, where the width of a cache line is 16 256-bit units, one has one 256-bit area containing the four-bit tag fields for the following words, and then one has from eight to fifteen 256-bit areas containing the data words themselves, packed across word boundaries. The data bus can be smaller, in which case the organization is in terms of the smaller unit. A data bus as narrow as 32 bits would still mean that eight memory accesses would fill the 256-bit width of a short vector, so all data types could be properly accessed through the cache.
The meanings of the tag values are:
0000 Executable Code 0001 Subsequent Word of Multi-Word Item 0010 Array Descriptor 0011 Character String 0100 Register Packed 0101 Register Packed Long 0110 Simple Floating 0111 Simple Floating Long 1000 Byte 1001 Halfword 1010 Integer 1011 Long 1101 Floating 1110 Double 1111 Quad
The tag values are set when storing a value, and the tag value can be overriden by the use of the 17146x/17147x prefix.
The tagged mode as set here does not cause the tags to be enforced, or array descriptors to be used. Instead, this mode is used to allow memory to be set up for the subsequent execution of a program which will be bound by tag values in its operation. Thus, where user programs are of that type, the operating system will run in the tagged mode in order to create new arrays, or to load executable programs into memory.
Note that some, but not all, of the additional data types have codes associated with them.
The character type is used for character strings; if a 48+4 bit word length is chosen, the length of a byte is always 6 bits, but an 8-bit character length might be in use; if a 36+4 bit word length is chosen, the length of a byte is always 9 bits, but a 6-bit character length might be in use, and so on.
Since the Medium floating point does not fit evenly into words, it is not available in this mode.
Note that since there is a tag for executable code, the programs which will run in the user mode which this tagged mode provides services for will be located in what is considered to be data memory, not code memory, from the viewpoint of data memory width control. Thus, two instruction syllable sizes are supported for this form of programs, 8 bits and 12 bits, at least one of which will be available for any memory width.
The user programs associated with this feature will be similar to those used with such computers as the Burroughs 5000 and 6000 series.
Also available is tagged character mode. This is indicated by a two-bit field.
Its values are:
Word size: 32 bits 36 bits 40 bits 48 bits 60 bits 00 Tagged Character Mode not present 10 One Supplementary Unit 8 9 10 12 15 11 Two Supplementary Units 4 -- 5 6 -- Size of character with one tag bit
This mode can be used to tag 8-bit or 6-bit characters, and also 4-bit BCD digits, with a single bit that is used as a field delimiter.
This makes it unnecessary to include length codes in packed decimal instructions and/or character instructions. It is useful in assisting the emulation of computers such as the IBM 1401 and the IBM 1620.
It may be noted that the mode with two supplementary units is not possible with a 60-bit word. In addition to the problem of assigning one tag bit to each seven and one-half bits of memory, 60-bit word mode normally uses 15 of the 16 units in a cache line. Unlike the 16 units used with the 32-bit word, the number of units used for this word length cannot simply be halved to allow this tag mode.
Note that in both of these modes, addressing refers only to the portion of memory containing the actual data; the tag bits are treated as supplementary to the address space, and associated with individual words or characters, regardless of how they are actually stored. Thus, appropriate conversions to base register contents would need to be performed when entering or leaving these modes. For word lengths other than 32 bits, it may also be noted that in general these modes also move word boundaries in memory; thus, while nine 256-bit blocks contain a number of bits divisible by nine, ten 256-bit blocks do not. Dual-format mode is the appropriate method for moving data from non-tagged storage to tagged storage.
It should be clear that it is intended that, for any memory width chosen, it is intended for the memory units fetched which contain actual data, as opposed to the one (or two) containing the tag bits, to be placed in the same locations in a cache line as they would when a tagged mode is not in effect. This, plus the fact that the unit containing the tag bits was described as preceding the units containing data in memory, may lead one to assume that data in a partially-filled cache line is normally right-aligned. In fact, when the unit containing tag bits comes first, by starting to fill a cache line divided into 16 units from 0 to 15 at unit 15, one can both put data items into their usual left-aligned positions, and additionally put the tag bits in a constant position which is independent of the data memory width in use.
While for many purposes, the method described on this page of making the computer's memory appear as if it is composed of 36-bit words or 60-bit words instead of being oriented around the 8-bit byte and its power-of-two multiples will be convenient and seamless, it makes an inefficient use of cache memory and only has a low overhead for those programs the data for which can efficiently fit in the cache.
Thus, this architecture provides two alternative methods of operating on data of nonstandard widths. Unlike data memory width control, however, they are primarily oriented towards altering the length of floating-point values only, rather than changing the length of characters and fixed-point numbers.
The first such method, subdivided floating-point, has already been mentioned above, since it can be used in conjunction with data memory width control so as to permit the use of floating-point variables the width of which is alternative to the default for the basic unit of memory that is currently in effect, even when that basic unit is not the original default unit of 8 bits.
The second one is Fast Long Single/Fast Long Intermediate operation, which attempts to provide alternate-length floating-point numbers while retaining a high degree of compatibility with the normal architecture of memory, built around the 8-bit byte.
These methods will be discussed in two following pages.