activation time. This means that a partition that requires 4 GB of memory could be assigned 2 GB from
the quad with 4 GB DIMMs and the other 2 GB from the quad with 8 GB DIMMs. This too can cause an
application to have different performance characteristics on partitions configured with exactly the same
amount of resources.
When system planning for the Power6 520, there are a number of memory related factors that
should be considered, each of which can affect performance of memory sensitive workloads. First and
foremost, the Power6 520 has no L3 cache. Having no L3 cache makes memory speed even more critical
for memory sensitive workloads. If memory capacity needs can be achieved with 4 GB DIMMs or
smaller, this will give the best memory speed. If memory capacity needs result in mixing 4 GB and 8 GB
DIMMs, that option is available, but can have a negative performance effect on memory sensitive
workloads. Mixing DIMMs can also cause partitions configured with exactly the same amount of
resources to have varying performance characteristics. Since the Power6 520 only has 8 available
memory DIMM slots, memory capacity can be an issue. If memory capacity is a concern, the 8 GB
DIMMs will increase the capacity, but result in a slower memory speed.
20.6 Aligning Floating Point Data on Power6
The PowerPC architecture specifies that storage operands ought to be appropriately aligned. In many
cases, there is a slight performance benefit and the compiler knows this, In other cases, the operands must
be aligned for functional reasons. For example:
1.
Pointers used by IBM i must be aligned on a 16-byte boundary,
2.
PowerPC instructions in a program must be word aligned,
3.
Binary Floating-Point operands ought to be word-aligned and should not cross a page boundary.
Other operand types allow generally free alignment of the data.
Although such a specification exists for Binary Floating-Point operands, the processor designs have the
option of allowing free alignment of Binary Floating-Pointer operands as well. The Power6 processors,
however, took a different approach. If either a 4-byte short form or 8-byte long form are not
word-aligned, the Power6 processor will produce an alignment interrupt. Fortunately, the IBM i
alignment interrupt handler recognizes this and does allow programs to successfully execute even if the
Binary Floating-Point operand is not word aligned. However, this emulation of each such operation
comes at a very considerable impact to the performance of such floating-point load and store instructions.
While an appropriately aligned floating-point load or store can execute extremely rapidly, the emulation
when misaligned can take thousands of times longer. If such accesses are rare compared to the remainder
of the function being provided, this emulation may not matter to the performance of the application. As
such floating-point accesses become more frequent, this emulation alone can account for most of the time
spent within an application.
The compiler does attempt to assure that such Binary Floating-Point operands are at least word aligned.
However, there are ways that the compiler's intent can be over-ridden. Packing data which includes
floating-point variables within a structure may result in this occurring; packing of structures can
occasionally save some space in memory. For this reason, it is prudent to assure that floating-point
variables are allowed to be at least word aligned. If this can not be done, it may be appropriate to first
copy the floating-point variables to a local aligned variable in storage; this may need to be done via an
explicit move operation which is unaware of the type of the data for if the type is known; without this the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
©
Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
324