Floating-Point Addition and Subtraction
5-35
Data Formats and Floating-Point Operation
Example 5–14. Floating-Point Subtraction
A subtraction is performed in this example. Let:
α
= 01.0000000000000000000000000000001
×
2
0
b = 01.0000000000000000000000000000000
×
2
0
The operation performed is
α
–
b. The mantissas are already aligned because
the two numbers have the same exponent. The result is a large cancellation
of the upper bits, as shown below.
01.0000000000000000000000000000001
×
2
0
–01.0000000000000000000000000000000
×
2
0
00.0000000000000000000000000000001
×
2
0
The result must be normalized. In this case, a left shift of 31 is required. The
exponent of the result is modified accordingly. The result is:
01.0000000000000000000000000000001
×
2
0
– 01.0000000000000000000000000000000
×
2
0
01.0000000000000000000000000000000
×
2
–31
Example 5–15. Floating-Point Addition With a 32-Bit Shift
This example illustrates a situation where a full 32-bit shift is necessary to
normalize the result. Let:
α
= 01.1111111111111111111111111111111
×
2
127
b = 10.0000000000000000000000000000000
×
2
127
The operation to be performed is
α
+
b.
01.1111111111111111111111111111111
×
2
127
+ 10.0000000000000000000000000000000
×
2
127
11.1111111111111111111111111111111
×
2
127
Normalizing the result requires a left shift of 32 and a subtraction of 32 from
the exponent. The result is:
01.1111111111111111111111111111111
×
2
127
+ 10.0000000000000000000000000000000
×
2
127
11.1111111111111111111111111111111
×
2
127