Floating-Point Conversion (IEEE Std. 754)
5-14
5.4
Floating-Point Conversion (IEEE Std. 754)
The ‘C3x floating-point format is not compatible with the IEEE standard 754
format. The IEEE floating-point format uses sign-magnitude notation for the
mantissa, and the exponent is biased by 127. In a 32-bit word representing a
floating-point number, the first bit is the sign bit. The next eight bits correspond
to the exponent, which is expressed in an offset-by-127 format (the actual expo-
nent is
e –127). The next 23 bits represent the absolute value of the mantissa
with the most significant 1 implied. The binary point follows this most significant
1. In other words, the mantissa actually has 24 bits (see Figure 5–14). There are
several special cases, summarized below.
These are the values of the represented numbers in the IEEE floating-point
format:
x = (–1)
s
x 2
e–127
x (01.f)
if 0 <
e < 255
Figure 5–14. IEEE Single-Precision Std. 754 Floating-Point Format
e
f
31
23
22
0
s
30
mantissa
The following five cases define the value
v of a number expressed in the IEEE
format:
1)
If
e = 255
and
f
≠
0,
then
v = NaN
2)
If
e = 255
and
f = 0,
then
v = (–1)
s
infinite
3)
If
0 <
e < 255,
then
v = (–1)
s
×
2
e
–127
(1.
f )
4)
If
e = 0
and
f
≠
0,
then
v = (–1)
s
×
2
–126
(0.
f )
5)
If
e = 0
and
f = 0,
then
v = (–1)
s
×
0
where:
s = sign bit
e = the exponent field
f = the fraction field
NaN = not a number
For the above five representations,
e is treated as an unsigned integer. Case
1 generates NaN (not an number) and is primarily used for software signaling.
Case 4 represents a denormalized number. Case 5 represents positive and
negative 0.