This function, which has already been considered by several authors for hyper-
cube target topologies [11, 21, 25], has several interesting properties: it is easy
to compute, allows incremental updates performed by iterative algorithms, and its
minimization favors the mapping of intensively intercommunicating processes onto
nearby processors; regardless of the type of routage implemented on the target
machine (store-and-forward or cut-through), it models the traffic on the intercon-
nection network and thus the risk of congestion.
The strong positive correlation between values of this function and effective
execution times has been experimentally verified by Hammond [21] on the CM-2,
and by Hendrickson and Leland [26] on the nCUBE 2.
The quality of mappings is evaluated with respect to the criteria for quality that
we have chosen: the balance of the computation load across processors, and the
minimization of the interprocessor communication cost modeled by function
f
C
.
These criteria lead to the definition of several parameters, which are described
below.
For load balance, one can define
µ
map
, the average load per computational
power unit (which does not depend on the mapping), and
δ
map
, the load imbalance
ratio, as
µ
map
def
=
P
v
S
∈
V
(
S
)
w
S
(
v
S
)
P
v
T
∈
V
(
T
)
w
T
(
v
T
)
and
δ
map
def
=
P
v
T
∈
V
(
T
)
1
w
T
(
v
T
)
P
v
S
∈
V
(
S
)
τ
S,T
(
v
S
) =
v
T
w
S
(
v
S
)
−
µ
map
P
v
S
∈
V
(
S
)
w
S
(
v
S
)
.
However, since the maximum load imbalance ratio is provided by the user in input
of the mapping, the information given by these parameters is of little interest, since
what matters is the minimization of the communication cost function under this
load balance constraint.
For communication, the straightforward parameter to consider is
f
C
. It can be
normalized as
µ
exp
, the average edge expansion, which can be compared to
µ
dil
,
the average edge dilation; these are defined as
µ
exp
def
=
f
C
P
e
S
∈
E
(
S
)
w
S
(
e
S
)
and
µ
dil
def
=
P
e
S
∈
E
(
S
)
|
ρ
S,T
(
e
S
)
|
|
E
(
S
)
|
.
δ
exp
def
=
µ
exp
µ
dil
is smaller than 1 when the mapper succeeds in putting heavily inter-
communicating processes closer to each other than it does for lightly communicating
processes; they are equal if all edges have same weight.
3.1.3
The Dual Recursive Bipartitioning algorithm
Our mapping algorithm uses a
divide and conquer
approach to recursively allocate
subsets of processes to subsets of processors [41]. It starts by considering a set of
processors, also called
domain
, containing all the processors of the target machine,
and with which is associated the set of all the processes to map. At each step, the
algorithm bipartitions a yet unprocessed domain into two disjoint subdomains, and
9