Chapter 2. Architecture and technical overview
21
2.1.1 Simultaneous multi-threading
As a permanent requirement for performance improvements at the application level,
simultaneous multi-threading (SMT) functionality is embedded in the POWER5 chip
technology. Developers are familiar with process-level parallelism (multi-tasking) and
thread-level parallelism (multi-threads). SMT is the next stage of processor saturation for
throughput-oriented applications to introduce the method of instruction-level parallelism to
support multiple pipelines to the processor.
By default, SMT is activated. On a 2-way POWER5 processor-based system, the operating
system views the available processors as a 4-way system. To achieve a higher performance
level, SMT is also applicable in Micro-Partitioning, capped or uncapped, and dedicated
partition environments.
Simultaneous multi-threading is supported on POWER5 systems running the Linux operating
system-based at a required 2.6 kernel. For Linux, an additional boot option must be set to
activate SMT after a reboot.
The SMT mode maximizes the usage of the execution units. In the POWER5 chip, more
rename registers have been introduced (for floating-point operation, rename registers
increased to 120) that are essential for out of order execution and then vital for the SMT.
Enhanced SMT features
To improve SMT performance for various workload mixes and provide robust quality of
service, POWER5 provides two features:
Dynamic resource balancing
– The objective of dynamic resource balancing is to ensure that the two threads
executing on the same processor flow smoothly through the system.
– Depending on the situation, the POWER5 processor resource balancing logic has
different thread throttling mechanisms.
Adjustable thread priority
– Adjustable thread priority lets software determine when one thread should have a
greater (or lesser) share of execution resources.
– POWER5 supports eight software-controlled priority levels for each thread.
ST operation
Not all applications benefit from SMT. Having threads executing on the same processor will
not increase the performance of applications with execution unit limited performance or
applications that consume all the chip’s memory bandwidth. For this reason, the POWER5
supports the ST execution mode. In this mode, the POWER5 processor gives all the physical
resources to the active thread, allowing it to achieve higher performance than a POWER4
processor-based system at equivalent frequencies. Highly optimized scientific codes are one
example where ST operation is ideal.
2.1.2 Dynamic power management
In current Complimentary Metal Oxide Semiconductor (CMOS) technologies, chip power is
one of the most important design parameters. With the introduction of SMT, more instructions
execute per cycle per processor core, thus increasing the core’s and the chip’s total switching
power. To reduce switching power, POWER5 chips use a fine-grained, dynamic clock gating
mechanism extensively. This mechanism gates off clocks to a local clock buffer if dynamic
power management logic knows the set of latches driven by the buffer will not be used in the
Summary of Contents for eServer OpenPower 720
Page 2: ......
Page 28: ...18 OpenPower 720 Technical Overview and Introduction...
Page 68: ...58 OpenPower 720 Technical Overview and Introduction...
Page 72: ...62 OpenPower 720 Technical Overview and Introduction...
Page 73: ......