I
NTEL
® X
EON
® P
ROCESSOR
7500 S
ERIES
U
NCORE
P
ROGRAMMING
G
UIDE
UNCORE PERFORMANCE MONITORING
2-17
2.3.4.3
The Queues:
There are seven internal occupancy queue counters, each of which is 5bits wide and dedicated to its
queue: IRQ, IPQ, VIQ, MAF, RWRF, RSPF, IDF.
Note:
IDQ, ICQ, SRQ and IGQ occupancies are not tracked since they are mapped 1:1 to the
MAF and, therefore, can not create back pressure.
It should be noted that, while the IRQ, IPQ, VIQ and MAF queues reside within the C-Box; the RWRF,
RSPF and IDF queues do not. Instead, they live in-between the Core and the Ring buffering messages
as those messages transit between the two. This distinction is useful in that, the queues located within
the C-Box can provide information about what is going on in the LLC with respect to the flow of
transactions at the point where they become “observed” by the coherence fabric (i.e., where the MAF is
located). Occupancy of these buffers informs how many transactions the C-Box is tracking, and where
the bottlenecks are manifesting when the C-Box starts to get busy and/or congested.
There is no need to explicitly reset the occupancy counters in the C-Box since they are counting from
reset de-assertion.
2.3.4.4
Detecting Performance Problems in the C-Box Pipeline:
IRQ occupancy counters should be used to track if the C-Box pipeline is exerting back pressure on the
Core-request path. There is a one-to-one correspondence between the LLC requests generated by the
cores and the IRQ allocations. IPQ occupancy counters should be used to track if the C-Box pipeline is
exerting back pressure on the Intel QPI-snoop path. There is a one-to-one correspondence between the
Intel QPI snoops received by the socket, and the IPQ allocations in the C-Boxes. In both cases, if the
message is in the IRQ/IPQ then the C-Box hasn’t acknowledged it yet and the request hasn’t yet
entered the LLC’s “coherence domain”. It deallocates from the IRQ/IPQ at the moment that the C-Box
does acknowledge it. In optimal performance scenarios, where there are minimal conflicts between
transactions and loads are low enough to keep latencies relatively near to idle, IRQ and IPQ
occupancies should remain very low.
One relatively common scenario in which IRQ back pressure will be high is worth mentioning: The IRQ
will backup when software is demanding data from memory at a rate that exceeds the available
memory BW. The IRQ is designed to be the place where the extra transactions wait U-Box’s RTIDs to
become available when memory becomes saturated. IRQ back pressure becomes interesting in a
scenario where memory is not operating at or near peak sustainable BW. That can be a sign of a
performance problem that may be correctable with software tuning.
One final warning on LLC pipeline congestion: Care should be taken not to blindly sum events across C-
Boxes without also checking the deviation across individual C-Boxes when investigating performance
issues that are concentrated in the C-Box pipelines. Performance problems where congestion in the C-
Box pipelines is the cause should be rare, but if they do occur, the event counts may not be
homogeneous across the C-Boxes in the socket. The average count across the C-Boxes may be
misleading. If performance issues are found in this area it will be useful to know if they are or are not
localized to specific C-Boxes.
2.3.5
C-Box Events Ordered By Code
summarizes the directly-measured C-Box events.
Table 2-15. Performance Monitor Events for C-Box Events
Symbol Name
Event
Code
Max
Inc/Cyc
Description
Ring Events
BOUNCES_P2C_AD
0x01
1
Number of P2C AD bounces.
BOUNCES_C2P_AK
0x02
1
Number of C2P AK bounces.
BOUNCES_C2P_BL
0x03
1
Number of C2P BL bounces.