IBM has recently developed a decimal floating point format which it is including on its new z9 computers. These computers replace the z990, the previous top-of-the-line z/Architecture machine from IBM, z/Architecture being the 64-bit extension to the architecture which began with System/360 and continued with extensions to System/370 and ESA/390.
This section refers to instructions which implement operations on numbers in that format and in related formats.
This format is also described on this page.
The basic characteristics of this data type are as follows:
Three data types are defined. All three data types feature a five-bit field which contains both the first decimal digit of the mantissa (or coefficient) of the floating-point number, and the first two bits of the exponent (which is in binary form), those two bits being allowed to take only the values 00, 01, or 10, but not 11.
This provides an efficient means of coding decimal floating-point numbers, as in each case, the remaining digits of the mantissa are all contained within 10-bit fields. Had there been no extra decimal digit left over, of course, a simple binary exponent field would have been just as efficient, and simpler, but as it happened, the coding scheme used allowed efficient coding to be retained while providing an exponent field which was neither too large nor too small, particularly for 32-bit and 64-bit floating-point values, and it also ensured that the size of the exponent field would monotonically increase as the length of the number increased.
Since this data type permits unnormalized values to be represented, not only are instructions provided which follow the "ideal exponent" rules described in the standard, which are the humanized floating-point instructions given below, but instructions are also provided for conventional unnormalized operation, for the purpose of carrying out significance arithmetic, and for conventional normalized arithmetic.
In addition, another type is provided that allows only normalized numbers to be represented, and which may include a partial decimal digit appended at the end of the number depending on the value of the first digit. This type is called numeric floating register compressed decimal. The coding of the first digit is shown in the table below:
0000 1 ... - 1 ... 0 0001 1 ... 1/5 1 ... 2 0010 1 ... 2/5 1 ... 4 0011 1 ... 3/5 1 ... 6 0100 1 ... 4/5 1 ... 8 ---------------------------- 0101 2 ... - 2 ... 0 0110 2 ... 1/2 2 ... 5 0111 3 ... - 3 ... 0 1000 3 ... 1/2 3 ... 5 1001 4 ... - 4 ... 0 1010 4 ... 1/2 4 ... 5 ---------------------------- 1011 5 5 ... 0 1100 6 6 ... 0 1101 7 7 ... 0 1110 8 8 ... 0 1111 9 9 ... 0
The four-bit field containing the first and last digits of the mantissa is referred to as the compound field. In the case of 32, 64, and 128-bit numbers, it replaces the combination field, and so the length of the exponent field is increased by one bit, leading to the range of exponents being first divided by three and then multiplied by two as compared to that in the standard format.
In alternate precisions, the general rule is that the compound field encodes the most significant digit of the number, and then the remaining digits are encoded as appropriate, with a combination field if the number of remaining digits is of the form 3n+1.
For the regular floating register compressed decimal type, when a compound field is present, the values 11110 and 11111 are used, as provided for by the revised IEEE 754 standard to encode infinity and NaN. When one is not present, inadmissible codes for the 7-bit or 10-bit field including the most significant digit of the number will be used.
For the numeric floating register compressed decimal type, gradual underflow is provided for by replacing the compound field with a four-bit field containing a single BCD digit when the exponent is at its minimum value; thus, in that case, the most significant digit may be zero. For this type, whether or not a combination field is present, the values E and F in the compound field, when the exponent is at its minimum value, encode infinity and NaN respectively.
The instructions which deal with these numbers have the opcodes shown below:
001000 100xxx SWFRC Swap Floating Register Compressed 001000 101xxx CFRC Compare Floating Register Compressed 001000 102xxx LFRC Load Floating Register Compressed 001000 103xxx STFRC Store Floating Register Compressed 001000 104xxx AFRC Add Floating Register Compressed 001000 105xxx SFRC Subtract Floating Register Compressed 001000 106xxx MFRC Multiply Floating Register Compressed 001000 107xxx DFRC Divide Floating Register Compressed 001000 112xxx LUFRC Load Unnormalized Floating Register Compressed 001000 113xxx STUFRC Store Unnormalized Floating Register Compressed 001000 114xxx AUFRC Add Unnormalized Floating Register Compressed 001000 115xxx SUFRC Subtract Unnormalized Floating Register Compressed 001000 116xxx MUFRC Multiply Unnormalized Floating Register Compressed 001000 117xxx DUFRC Divide Unnormalized Floating Register Compressed 001000 124xxx AFRCH Add Floating Register Compressed Humanized 001000 125xxx SFRCH Subtract Floating Register Compressed Humanized 001000 126xxx MFRCH Multiply Floating Register Compressed Humanized 001000 127xxx DFRCH Divide Floating Register Compressed Humanized 001100 100xxx SWDRC Swap Double Register Compressed 001100 101xxx CDRC Compare Double Register Compressed 001100 102xxx LDRC Load Double Register Compressed 001100 103xxx STDRC Store Double Register Compressed 001100 104xxx ADRC Add Double Register Compressed 001100 105xxx SDRC Subtract Double Register Compressed 001100 106xxx MDRC Multiply Double Register Compressed 001100 107xxx DDRC Divide Double Register Compressed 001100 112xxx LUDRC Load Unnormalized Double Register Compressed 001100 113xxx STUDRC Store Unnormalized Double Register Compressed 001100 114xxx AUDRC Add Unnormalized Double Register Compressed 001100 115xxx SUDRC Subtract Unnormalized Double Register Compressed 001100 116xxx MUDRC Multiply Unnormalized Double Register Compressed 001100 117xxx DUDRC Divide Unnormalized Double Register Compressed 001100 124xxx AFDCH Add Double Register Compressed Humanized 001100 125xxx SFDCH Subtract Double Register Compressed Humanized 001100 126xxx MFDCH Multiply Double Register Compressed Humanized 001100 127xxx DFDCH Divide Double Register Compressed Humanized 001200 100xxx SWQRC Swap Quad Register Compressed 001200 101xxx CQRC Compare Quad Register Compressed 001200 102xxx LQRC Load Quad Register Compressed 001200 103xxx STQRC Store Quad Register Compressed 001200 104xxx AQRC Add Quad Register Compressed 001200 105xxx SQRC Subtract Quad Register Compressed 001200 106xxx MQRC Multiply Quad Register Compressed 001200 107xxx DQRC Divide Quad Register Compressed 001200 112xxx LUQRC Load Unnormalized Quad Register Compressed 001200 113xxx STUQRC Store Unnormalized Quad Register Compressed 001200 114xxx AUQRC Add Unnormalized Quad Register Compressed 001200 115xxx SUQRC Subtract Unnormalized Quad Register Compressed 001200 116xxx MUQRC Multiply Unnormalized Quad Register Compressed 001200 117xxx DUQRC Divide Unnormalized Quad Register Compressed 001200 124xxx AFDCH Add Quad Register Compressed Humanized 001200 125xxx SFDCH Subtract Quad Register Compressed Humanized 001200 126xxx MFDCH Multiply Quad Register Compressed Humanized 001200 127xxx DFDCH Divide Quad Register Compressed Humanized 001300 100xxx SWNFRC Swap Numerical Floating Register Compressed 001300 101xxx CNFRC Compare Numerical Floating Register Compressed 001300 102xxx LNFRC Load Numerical Floating Register Compressed 001300 103xxx STNFRC Store Numerical Floating Register Compressed 001300 104xxx ANFRC Add Numerical Floating Register Compressed 001300 105xxx SNFRC Subtract Numerical Floating Register Compressed 001300 106xxx MNFRC Multiply Numerical Floating Register Compressed 001300 107xxx DNFRC Divide Numerical Floating Register Compressed 001300 110xxx SWNDRC Swap Numerical Double Register Compressed 001300 111xxx CNDRC Compare Numerical Double Register Compressed 001300 112xxx LNDRC Load Numerical Double Register Compressed 001300 113xxx STNDRC Store Numerical Double Register Compressed 001300 114xxx ANDRC Add Numerical Double Register Compressed 001300 115xxx SNDRC Subtract Numerical Double Register Compressed 001300 116xxx MNDRC Multiply Numerical Double Register Compressed 001300 117xxx DNDRC Divide Numerical Double Register Compressed 001300 120xxx SWNQRC Swap Numerical Quad Register Compressed 001300 121xxx CNQRC Compare Numerical Quad Register Compressed 001300 122xxx LNQRC Load Numerical Quad Register Compressed 001300 123xxx STNQRC Store Numerical Quad Register Compressed 001300 124xxx ANQRC Add Numerical Quad Register Compressed 001300 125xxx SNQRC Subtract Numerical Quad Register Compressed 001300 126xxx MNQRC Multiply Numerical Quad Register Compressed 001300 127xxx DNQRC Divide Numerical Quad Register Compressed
As well, a few additional instructions are provided for the regular register compressed formats that provide targeted arithmetic. These instructions, so that they can retain the same standard format as other instructions with a register type, where a zero base register indicates a register-to-register instruction, use the 32-bit prefix form to allow additional room in the instruction for the target exponent.
141100 00nnnn 114xxx ATFRC Add Targeted Floating Register Compressed 141100 00nnnn 115xxx STFRC Subtract Targeted Floating Register Compressed 141100 00nnnn 116xxx MTFRC Multiply Targeted Floating Register Compressed 141100 00nnnn 117xxx DTFRC Divide Targeted Floating Register Compressed 141100 00nnnn 124xxx AETFRC Add Extensibly Targeted Floating Register Compressed 141100 00nnnn 125xxx SETFRC Subtract Extensibly Targeted Floating Register Compressed 141100 00nnnn 126xxx METFRC Multiply Extensibly Targeted Floating Register Compressed 141100 00nnnn 127xxx DETFRC Divide Extensibly Targeted Floating Register Compressed 141110 00nnnn 114xxx ATDRC Add Targeted Double Register Compressed 141110 00nnnn 115xxx STDRC Subtract Targeted Double Register Compressed 141110 00nnnn 116xxx MTDRC Multiply Targeted Double Register Compressed 141110 00nnnn 117xxx DTDRC Divide Targeted Double Register Compressed 141110 00nnnn 124xxx AETDRC Add Extensibly Targeted Double Register Compressed 141110 00nnnn 125xxx SETDRC Subtract Extensibly Targeted Double Register Compressed 141110 00nnnn 126xxx METDRC Multiply Extensibly Targeted Double Register Compressed 141110 00nnnn 127xxx DETDRC Divide Extensibly Targeted Double Register Compressed 141120 00nnnn 114xxx ATQRC Add Targeted Quad Register Compressed 141120 00nnnn 115xxx STQRC Subtract Targeted Quad Register Compressed 141120 00nnnn 116xxx MTQRC Multiply Targeted Quad Register Compressed 141120 00nnnn 117xxx DTQRC Divide Targeted Quad Register Compressed 141120 00nnnn 124xxx AETQRC Add Extensibly Targeted Quad Register Compressed 141120 00nnnn 125xxx SETQRC Subtract Extensibly Targeted Quad Register Compressed 141120 00nnnn 126xxx METQRC Multiply Extensibly Targeted Quad Register Compressed 141120 00nnnn 127xxx DETQRC Divide Extensibly Targeted Quad Register Compressed
In these instructions, the field marked xxx contains the destination register, the index register or source register, and the base register in the usual manner for memory-reference instructions. The field marked nnnn contains a twelve-bit target exponent value in excess-6,176 format, matching the exponent in the largest size of register compressed decimal numbers.
For decimal fixed-point arithmetic where all the numbers involved have the same exponent value, only a small range of exponent values is useful, since otherwise multiplication and division cannot produce a usable result. However, the inputs to a targeted instruction may have any exponent, and so the target exponent of the result can be one applicable to holding the result of an operation on two operands whose exponents are themselves determined through previous targeted operations, but which differ from that which is specified for the result.
A targeted arithmetic operation has the final operand aligned so that its exponent has the value specified as the target. This permits fixed-point arithmetic to be carried out automatically, without separate instructions for alignment, and in addition it has the benefit that since the fixed-point quantities are valid floating-point quantities, they are tagged with an indication of their magnitude. Normally, fixed-point arithmetic depends on adjustment steps being carried out after multiplies and divides, and the fixed-point quantities, being no different from the patterns of bits that represent integers, can easily be used incorrectly in calculations that assume a different location of the radix point.
Extensibly targeted arithmetic operations are carried out without rounding, and overflows from the most significant part of the mantissa will be ignored unless integer overflows are trapped, so they behave like integer operations in this respect as well. Ordinary targeted arithmetic operations, on the other hand, do not do this, so as to produce valid numerical results that can be incorporated into floating-point calculations.
This is inspired by a capability provided by the NORC computer.
For this type, add, subtract, divide, and multiply humanized instructions are defined.
These operations accept both normalized and unnormalized numbers as operands, and may produce unnormalized results, but they do so based on a different criterion from the conventional unnormalized instructions previously described.
The divide instructons always produce a normalized result, but they explicitly accept unnormalized inputs without creating an exception.
The multiply instructions produce a result having the same number of significant digits (where possible) as there would be digits in the product of the mantissas of the two operands, where these mantissas are treated as decimal integers.
The add and subtract instructions produce a result that includes, as its least significant digit, a digit having the same place value as the lesser of the place values of the least significant digits of the two operands.
When two numbers are added together using an unnormalized operation, they are brought into alignment, and digit positions not part of either operand originally before alignment are omitted from the result; with a humanized operation, only digits not part of both operands originally before alignment are omitted from the result.
These rules correspond to the "ideal exponent" rules used with the new Decimal Floating Point architecture to be specified in the revised version of the IEEE 754 standard, and implemented in the IBM z9 computer.
Note that the use of a combination field, while it is appropriate with floating-point sizes of 32, 64, and 128 bits, may not necessarily work well with floating-point sizes of 48 and 96 bits, 36 and 72 bits, 30, 60, and 120 bits, or 40 and 80 bits.
This is because the overall length of the field in memory allocated to a floating-point number determines the number of decimal digits of precision it may have. Given that the compressed decimal format involves placing three digits at a time in a 10-bit long field, and the design of the combination field was predicated on there being one digit left over after a number of such fields for each of the three formats defined, we can conclude that there are three possible cases:
Given these three choices of format, it seems as though decimal floating-point when implemented across varying word sizes, if it is desired to maintain a relatively close correspondence with the exponent sizes provided by the existing IBM formats, and to follow the same rule as they in regards to choice of exponent bias, might lead to the following formats:
Value size: Exponent Values Exponent Bias Precision in Digits Sign Exponent Coefficient Conventional Exponent Bias 32 bits 3 * 64 = 192 101 ( 96+ 5) 2 * 3 + 1 = 7 1 6+(2-) 20+(3+) 94 64 bits 3 * 256 = 768 398 ( 384+14) 5 * 3 + 1 = 16 1 8+(2-) 50+(3+) 382 128 bits 3 * 4,096 = 12,288 6,176 ( 6,144+32) 11 * 3 + 1 = 34 1 12+(2-) 110+(3+) 6,142 36 bits 256 134 ( 128+ 6) 2 * 3 + 2 = 8 1 8 20+7 126 72 bits 2,048 1,040 ( 1,024+16) 6 * 3 = 18 1 11 60 1,038 48 bits 1,024 521 ( 512+ 9) 3 * 3 + 2 = 11 1 10 30+7 510 96 bits 3 * 1,024 = 3,072 1,559 ( 1,536+23) 8 * 3 + 1 = 25 1 10+(2-) 80+(3+) 1,534 30 bits 512 260 ( 256+ 4) 2 * 3 = 6 1 9 20 254 60 bits 512 269 ( 256+13) 5 * 3 = 15 1 9 50 254 120 bits 4,096 2,078 ( 2,048+30) 10 * 3 + 2 = 32 1 12 100+7 2,046 40 bits 512 263 ( 256+ 7) 3 * 3 = 9 1 9 30 254 80 bits 4,096 2,066 ( 2,048+18) 6 * 3 + 2 = 20 1 12 60+7 2,046
The notations (2-) and (3+) above refer to components of the 5-bit field which combines a value from 0 to 2 for the beginning of the exponent with a value from 0 to 9 for the beginning of the mantissa included in IBM's decimal floating point format.
The final column, Conventional Exponent Bias, shows what the exponent bias would be, if the radix point of the coefficient (or mantissa) were regarded, as has been the more common convention, as being at the beginning of the field rather than at the end of the field. This is derived by subtracting the precision of the number in digits to the exponent bias value normally given for the format, which has that number of digits, less two, added to half the exponent range.
An exponent in excess-n notation has n subtracted from the exponent to determine the power of the radix by which the mantissa is to be multiplied, and so, if we regard the mantissa as a fraction instead of an integer, we are making it smaller, and that power needs to be increased. Therefore, n, which is subtracted from it, is decreased. Thus, the difference between this floating-point format and conventional formats, which place the radix point in front of the mantissa and simply choose an exponent bias which is half the exponent range without adjustment, is not as large as it seems at first.
Since it is felt that each of the series of word sizes would normally be used independently, strict monotonicity between series is not treated as an overriding goal. In one case, the series of 30, 60, and 120 bits, even monotonicity in the growth of the exponent field within a series had to be set aside in order to achieve a reasonably large exponent field for the 30 bit size without this leading to excessively-large exponent fields for the other sizes.
Note that, in the absence of a 5-bit field combining the start of the exponent and mantissa, it is assumed that no limitation is placed on the range of the exponent field in order to indicate infinity and NaN values. Thus, either the 7 bit field giving the first two digits of the mantissa, or the 10 bit field giving the first three digits of the mantissa, would presumably be used for that purpose, two of the 28 or 24 unused combinations of bits serving this purpose.
In the case of the numerical register compressed decimal floating-point data type, for the 32, 64, and 128 bit-long data types, the five-bit combination field representing the first two bits of the exponent and the first digit of the number is replaced by a four-bit field representing the first digit of the number, an extra partial digit appended to the end of the number, and, in effect, the last two bits of the exponent if it is thought of as applying to a mixed-radix system with radices alternating between 2, 2.5, and 2 in a cycle of three.
For other lengths, this four bit field representing the first digit of the number must be retained, and therefore the presence of a seven-bit field containing the next two digits of the number, or a combination field, following the form used in the previous numerical format, but in this case containing the second most significant digit of the number, is determined by the number of digits represented (ignoring the final appended partial digit) less one.
The resulting numerical formats are:
Value size: Exponent Values Precision in Digits Sign Exponent Compound Mantissa 32 bits 128 2 * 3 + 1 = 7 1 7 4 20 64 bits 512 5 * 3 + 1 = 16 1 9 4 50 128 bits 8,192 11 * 3 + 1 = 34 1 13 4 110 36 bits 3 * 64 = 192 2 * 3 + 2 = 8 1 6+(2-) 4 20+(3+) 72 bits 1,024 6 * 3 = 18 1 10 4 50+7 48 bits 3 * 256 = 768 3 * 3 + 2 = 11 1 8+(2-) 4 30+(3+) 96 bits 2,048 8 * 3 + 1 = 25 1 11 4 80 30 bits 256 2 * 3 = 6 1 8 4 10+7 60 bits 256 5 * 3 = 15 1 8 4 40+7 120 bits 3 * 1,024 = 3,072 10 * 3 + 2 = 32 1 10+(2-) 4 100+(3+) 40 bits 256 3 * 3 = 9 1 8 4 20+7 80 bits 3 * 1,024 = 3,072 6 * 3 + 2 = 20 1 10+(2-) 4 60+(3+)
In this format, unlike the one supporting unnormalized operation, the decimal point of the mantissa field lies before the most significant digit, and the exponent bias is always one-half the number of possible values for the exponent.
When the exponent is at its minimum value, the four bit compound field instead contains a single BCD digit, which may be zero, to allow gradual underflow as with the standard floating-point type.
The layout of the formats in the different sizes for these two types of floating-point number are illustrated below: