multiple nodes in the cluster, access to the database files is seamless and transparent to the applications
and users that reference the database. To the users, the partitioned files still behave as though they were
local to their system.
The most important aspect of obtaining optimal performance with DB2 Multisystem is to plan ahead for
what data should be partitioned and how it should be partitioned. The main idea behind this planning is to
ensure that the systems in the cluster run in parallel with each other as much as possible when processing
distributed queries while keeping the amount of communications data traffic to a minimum. Following is
a list of items to consider when planning for the use of distributed data via DB2 Multisystem.
y
Avoid large amounts of data movement between systems. A distributed query often achieves optimal
performance when it is able to divide the query among several nodes, with each node running its
portion of the query on data that is local to that system and with a minimum number of accesses to
remote data on other systems. Also, if a file that is heavily used for transaction processing is to be
distributed, it should be done such that most of the database accesses are local since remote accesses
may add significantly to response times.
y
Choosing which files to partition is important. The largest improvements will be for queries on large
files. Files that are primarily used for transaction processing and not much query processing are
generally not good candidates for partitioning. Also, partitioning files with only a small number of
records will generally not result in much improvement and may actually degrade performance due to
the added communications overhead.
y
Choose a partitioning key that has many different values. This will help ensure a more even
distribution of the data across the multiple nodes. In addition, performance will be best if the
partitioning key is a single field that is a simple data type.
y
It is best to choose a partition key that consists of a field or fields whose values are not updated.
Updates on partition keys are only allowed if the change to the field(s) in the key will not cause that
record to be partitioned to a different node.
y
If joins are often performed on multiple files using a single field, use that field as the partitioning key
for those files. Also, the fields used for join processing should be of the same data type.
y
It will be helpful to partition the database files based on how quickly each node can process its
portion of the data when running distributed queries. For example, it may be better to place a larger
amount of data on a large multiprocessor system than on a smaller single processor system. In
addition, current normal utilization levels of other resources such as main memory, DASD and IOPs
should be considered on each system in order to ensure that no one individual system becomes a
bottleneck for distributed query performance.
y
For the best query performance involving distributed files, avoid the use of commitment control when
possible. DB2 Multisystem uses two-phase commit, which can add a significant amount of overhead
when running distributed queries.
For more information on DB2 Multisystem refer to the DB2 Multisystem manual.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
©
Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
57