30
POWER7 and Optimization and Tuning Guide
All of these caches are effectively shared. The L2 cache has a longer access latency than L1,
and L3 has a longer access latency than L2. Each chip also has memory controllers, allowing
direct access to a portion of the memory DIMMs in the system.
13
Thus, it takes longer for an
application thread to access data in cache or memory that is attached to a remote chip than
to access data in a local cache or memory. These types of characteristics are often referred to
as
affinity performance effects
(see “The POWER7 processor and affinity performance
effects” on page 14). In many cases, systems that are built around different processor models
have varying characteristics (for example, while L3 is supported, it might not be implemented
on some models).
Functionally, it does not matter which core in the system an application thread is running on,
or what memory the data it is accessing is on. However, this situation does affect the
performance of applications, because accessing a remote memory or cache takes more time
than accessing a local memory or cache.
14
This situation becomes even more imperative with
the capability of modern systems to support massive scaling and the resulting possibility for
remote accesses to occur across a large processor interconnection complex.
The effect of these system properties can be observed by application threads, because they
often move, sometimes rather frequently, between processor cores. This situation can
happen for various reasons, such as a page fault or lock contention that results in the
application thread being preempted while it waits for a condition to be satisfied, and then
being resumed on a different core. Any application data that is in the cache local to the
original core is no longer in the local cache, because the application thread moved and a
remote cache access is required.
15
Although modern operating systems, such as AIX,
attempt to ensure that cache and memory affinity is retained, this movement does occur, and
can result in a loss in performance. For an introduction to the concepts of cache and memory
affinity, see “The POWER7 processor and affinity performance effects” on page 14.
The IBM POWER Hypervisor is responsible for:
Virtualization of processor cores and memory that is presented to the operating system
Ensuring that the affinity between the processor cores and memory an LPAR is using is
maintained as much as possible
However, it is important for application designers to consider affinity issues in the design of
applications, and to carefully assess the impact of application thread and data placement on
the cores and the memory that is assigned to the LPAR the application is running in.
Various techniques that are employed at the system level can alleviate the effect of cache
sharing. One example is to configure the LPAR so that the amount of memory that is
requested for the LPAR is satisfied by the memories that are locally available to processor
cores in the system (the memory DIMMs that are attached to the memory controllers for each
processor core). Here, it is more likely that the POWER Hypervisor is able to maintain affinity
between the processor cores and memory that is assigned to the partition,
improving performance
16
.
For more information about LPAR configuration and running the
lssrad
command to query
the affinity characteristics of a partition, see Chapter 3, “The POWER Hypervisor” on
page 55.
13
Ibid
14
Ibid
15
Ibid
16
Ibid
Summary of Contents for Power System POWER7 Series
Page 2: ......
Page 36: ...20 POWER7 and POWER7 Optimization and Tuning Guide...
Page 70: ...54 POWER7 and POWER7 Optimization and Tuning Guide...
Page 112: ...96 POWER7 and POWER7 Optimization and Tuning Guide...
Page 140: ...124 POWER7 and POWER7 Optimization and Tuning Guide...
Page 162: ...146 POWER7 and POWER7 Optimization and Tuning Guide...
Page 170: ...154 POWER7 and POWER7 Optimization and Tuning Guide...
Page 223: ......