IBM Power System POWER7 Series Optimization And Tuning Manual Download Page 46

Page: 46 / 224

POWER7 and Optimization and Tuning Guide

All of these caches are effectively shared. The L2 cache has a longer access latency than L1,
and L3 has a longer access latency than L2. Each chip also has memory controllers, allowing
direct access to a portion of the memory DIMMs in the system.

Thus, it takes longer for an

application thread to access data in cache or memory that is attached to a remote chip than
to access data in a local cache or memory. These types of characteristics are often referred to
as

affinity performance effects

(see “The POWER7 processor and affinity performance

effects” on page 14). In many cases, systems that are built around different processor models
have varying characteristics (for example, while L3 is supported, it might not be implemented
on some models).

Functionally, it does not matter which core in the system an application thread is running on,
or what memory the data it is accessing is on. However, this situation does affect the
performance of applications, because accessing a remote memory or cache takes more time
than accessing a local memory or cache.

This situation becomes even more imperative with

the capability of modern systems to support massive scaling and the resulting possibility for
remote accesses to occur across a large processor interconnection complex.

The effect of these system properties can be observed by application threads, because they
often move, sometimes rather frequently, between processor cores. This situation can
happen for various reasons, such as a page fault or lock contention that results in the
application thread being preempted while it waits for a condition to be satisfied, and then
being resumed on a different core. Any application data that is in the cache local to the
original core is no longer in the local cache, because the application thread moved and a
remote cache access is required.

Although modern operating systems, such as AIX,

attempt to ensure that cache and memory affinity is retained, this movement does occur, and
can result in a loss in performance. For an introduction to the concepts of cache and memory
affinity, see “The POWER7 processor and affinity performance effects” on page 14.

The IBM POWER Hypervisor is responsible for:

򐂰

Virtualization of processor cores and memory that is presented to the operating system

򐂰

Ensuring that the affinity between the processor cores and memory an LPAR is using is
maintained as much as possible

However, it is important for application designers to consider affinity issues in the design of
applications, and to carefully assess the impact of application thread and data placement on
the cores and the memory that is assigned to the LPAR the application is running in.

Various techniques that are employed at the system level can alleviate the effect of cache
sharing. One example is to configure the LPAR so that the amount of memory that is
requested for the LPAR is satisfied by the memories that are locally available to processor
cores in the system (the memory DIMMs that are attached to the memory controllers for each
processor core). Here, it is more likely that the POWER Hypervisor is able to maintain affinity
between the processor cores and memory that is assigned to the partition,
improving performance

For more information about LPAR configuration and running the

lssrad

command to query

the affinity characteristics of a partition, see Chapter 3, “The POWER Hypervisor” on
page 55.

Ibid