Floating-Point Addition and Subtraction
5-32
5.6
Floating-Point Addition and Subtraction
In floating-point addition and subtraction, two floating-point numbers
α
and
b
can be defined as:
α
=
α
(
man)
×
2
α
(
exp)
b = b(man)
×
2
b(exp)
The sum (or difference) of
α
and
b can be defined as:
c =
α
±
b
= (
α
(
man)
±
(
b(man)
×
2
– (
α
(
exp) – b(exp))
)
×
2
α
(
exp)
, if
α
(
exp)
≥
b(exp)
= (
α
(
man)
×
2
– (
b(exp) –
α
(
exp))
)
±
b(man))
×
2
b(exp)
, if
α
(
exp) < b(exp)
Figure 5–17 shows the flowchart for floating-point addition. Because this flow-
chart assumes signed data, it is also appropriate for floating-point subtraction.
In this figure, it is assumed that
α
(
exp)
≤
b(exp).
-
In step 1, the source exponents,
α
(
exp) and b(exp), are compared, and
c(exp) is set equal to the largest of the two source exponents.
-
In step 2,
d is set to the difference of the two exponents.
-
In step 3, the mantissa with the smallest exponent, in this case
α
(
man),
is right-shifted
d bits to align the mantissas.
-
In step 4, after the mantissas have been aligned, they are added.
-
In steps 5 through 7, a check for a special case of
c(man). If c(man) is 0
(step 5), then
c(exp) is set to its most negative value (step 8) to yield the
correct representation of 0. If
c(man) has overflowed c (step 6), then in
step 9
c(man) is right-shifted one bit and 1 is added to c(exp). In step 10,
the result is normalized.
-
Steps 11 through 13 check for special cases of
c(exp). If c(exp) has over-
flowed (step 11) in the positive direction, then step 14 sets
c(exp) to the
most positive extended-precision format value. If
c(exp) has overflowed
(step 11) in the negative direction, then step 14 sets
c(exp) to the most
negative extended-precision format value. If
c(exp) has underflowed (step
12), then step 15 sets
c to 0; that is, c(man) = 0 and c(exp) = –128. If no
overflow or underflow occurred, then
c is not modified.