I
NTEL
® X
EON
® P
ROCESSOR
7500 S
ERIES
U
NCORE
P
ROGRAMMING
G
UIDE
UNCORE PERFORMANCE MONITORING
2-9
2.3
C-Box Performance Monitoring
2.3.1
Overview of the C-Box
For the Intel Xeon Processor 7500 Series, the LLC coherence engine (C-Box) manages the interface
between the core and the last level cache (LLC). All core transactions that access the LLC are directed
from the core to a C-Box via the ring interconnect. The C-Box is responsible for managing data delivery
from the LLC to the requesting core. It is also responsible for maintaining coherence between the cores
within the socket that share the LLC; generating snoops and collecting snoop responses to the local
cores when the MESI protocol requires it.
The C-Box is also the gate keeper for all Intel
®
QuickPath Interconnect (Intel
®
QPI) messages that
originate in the core and is responsible for ensuring that all Intel QuickPath Interconnect messages that
pass through the socket’s LLC remain coherent.
The Intel Xeon Processor 7500 Series contains eight instances of the C-Box, each assigned to manage a
distinct 3MB, 24-way set associative slice of the processor’s total LLC capacity. For processors with
fewer than 8 3MB LLC slices, the C-Boxes for missing slices will still be active and track ring traffic
caused by their co-located core even if they have no LLC related traffic to track (i.e. hits/misses/
snoops).
Every physical memory address in the system is uniquely associated with a single C-Box instance via a
proprietary hashing algorithm that is designed to keep the distribution of traffic across the C-Box
instances relatively uniform for a wide range of possible address patterns. This enables the individual C-
Box instances to operate independently, each managing its slice of the physical address space without
any C-Box in a given socket ever needing to communicate with the other C-Boxes in that same socket.
Each C-Box is uniquely associated with a single S-Box. All messages which a given C-Box sends out to
the system memory or Intel QPI pass through the S-Box that is physically closest to that C-Box.
2.3.2
C-Box Performance Monitoring Overview
Each of the C-Boxes in the Intel Xeon Processor 7500 Series supports event monitoring through six 48-
bit wide counters (CBx_CR_C_MSR_PMON_CTR{5:0}). Each of these six counters can be programmed
to count any C-Box event. The C-Box counters can increment by a maximum of 5b per cycle.
For information on how to setup a monitoring session, refer to
Section 2.1, “Global Performance
2.3.2.1
C-Box PMU - Overflow, Freeze and Unfreeze
If an overflow is detected from a C-Box performance counter, the overflow bit is set at the box level
(C_MSR_PMON_GLOBAL_STATUS.ov), and forwarded up the chain towards the U-Box. If a C-Box0
counter overflows, a notification is sent and stored in S-Box0 (S_MSR_PMON_SUMMARY.ov_c_l) which,
in turn, sends the overflow notification up to the U-Box (U_MSR_PMON_GLOBAL_STATUS.ov_s0). Refer
to
Table 2-26, “S_MSR_PMON_SUMMARY Register Fields”
to determine how each C-Box’s overflow bit is
accumulated in the attached S-Box.
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
must be cleared by setting the corresponding bit in C_MSR_PMON_GLOBAL_OVF_CTL.clr_ov. Assuming
all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the
overflow bit(s) has been cleared, the C-Box is prepared for a new sample interval. Once the global
controls have been re-enabled (
Section 2.1.4, “Enabling a New Sample Interval from Frozen
), counting will resume.