background image

Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring

108

     

Reference Number: 327043-001 

"speed" (for example, 8.0 GT/s), the "transfers" here refer to "fits".  Therefore, in L0, the system 

will transfer 1 "flit" at the rate of 1/4th the Intel® QPI speed.  One can calculate the bandwidth of 

the link by taking: flits*80b/time.  Note that this is not the same as "data" bandwidth.  For exam-

ple, when we are transfering a 64B cacheline across Intel® QPI, we will break it into 9 flits -- 1 with 

header information and 8 with 64 bits of actual "data" and an additional 16 bits of other informa-

tion.  To calculate "data" bandwidth, one should therefore do: data flits * 8B / time (for L0) or 4B 

instead of 8B for L0p.

TxL_FLITS_G1

• Title: 

Flits Transferred - Group 1

• Category: 

FLITS_TX Events

• Event Code: 

 0x00

• Extra Select Bit: 

 Y

• Max. Inc/Cyc: 

2,  

Register Restrictions:  

0-3

• Definition: 

Counts the number of flits trasmitted across the Intel® QPI Link.  This is one of three 

"groups" that allow us to track flits.  It includes filters for SNP, HOM, and DRS message classes.  

Each "flit" is made up of 80 bits of information (in addition to some ECC data).  In full-width (L0) 

mode, flits are made up of four "fits", each of which contains 20 bits of data (along with some addi-

tional ECC data).   In half-width (L0p) mode, the fits are only 10 bits, and therefore it takes twice as 

many fits to transmit a flit.  When one talks about Intel® QPI "speed" (for example, 8.0 GT/s), the 

"transfers" here refer to "fits".  Therefore, in L0, the system will transfer 1 "flit" at the rate of 1/4th 

the Intel® QPI speed.  One can calculate the bandwidth of the link by taking: flits*80b/time.  Note 

that this is not the same as "data" bandwidth.  For example, when we are transfering a 64B cache-

line across Intel® QPI, we will break it into 9 flits -- 1 with header information and 8 with 64 bits of 

actual "data" and an additional 16 bits of other information.  To calculate "data" bandwidth, one 

should therefore do: data flits * 8B / time.

Table 2-101. Unit Masks for TxL_FLITS_G0

Extension

umask

[15:8]

Description

IDLE

bxxxxxxx1

Idle and Null Flits: 

Number of flits transmitted over Intel® QPI that do not hold protocol 

payload.  When Intel® QPI is not in a power saving state, it 

continuously transmits flits across the link.  When there are no 

protocol flits to send, it will send IDLE and NULL flits  across.  These 

flits sometimes do carry a payload, such as credit returns, but are 

generall not considered part of the Intel® QPI bandwidth.

DATA

bxxxxxx1x

Data Tx Flits: 

Number of data flits transmitted over Intel® QPI.  Each flit contains 

64b of data.  This includes both DRS and NCB data flits (coherent and 

non-coherent).  This can be used to calculate the data bandwidth of 

the Intel® QPI link.  One can get a good picture of the Intel® QPI-

link characteristics by evaluating the protocol flits, data flits, and idle/

null flits.  This does not include the header flits that go in data 

packets.

NON_DATA

bxxxxx1xx

Non-Data protocol Tx Flits: 

Number of non-NULL non-data flits transmitted across Intel® QPI.  

This basically tracks the protocol overhead on the Intel® QPI link.  

One can get a good picture of the Intel® QPI-link characteristics by 

evaluating the protocol flits, data flits, and idle/null flits.  This 

includes the header flits for data packets.

Summary of Contents for Xeon Processor E5-2600

Page 1: ...Reference Number 327043 001 Intel Xeon Processor E5 2600 Product Family Uncore Performance Monitoring Guide March 2012...

Page 2: ...r shipped It is not a commercial name for products or services and is not intended to function as a trademark The Intel 64 architecture processors may contain design defects or errors known as errata...

Page 3: ...CBo Performance Monitoring Overview 23 2 3 2 1 Special Note on CBo Occupancy Events 23 2 3 3 CBo Performance Monitors 24 2 3 3 1 CBo Box Level PMON State 27 2 3 3 2 CBo PMON state Counter Control Pair...

Page 4: ...tch Facility 90 2 7 3 4 Intel QPI Extra Registers Companions to PMON HW 94 2 7 4 Intel QPI LL Performance Monitoring Events 94 2 7 4 1 An Overview 94 2 7 4 2 Acronyms frequently used in Intel QPI Even...

Page 5: ...7 HA_PCI_PMON_BOX_OPCODEMATCH Register Field Definitions 48 2 38 HA_PCI_PMON_BOX_ADDRMATCH1 Register Field Definitions 48 2 39 HA_PCI_PMON_BOX_ADDRMATCH0 Register Field Definitions 49 2 59 iMC Perform...

Page 6: ...ions 113 2 107 R2_PCI_PMON_CTR 3 0 Register Field Definitions 113 2 118 R3QPI Performance Monitoring Registers 120 2 119 R3_Ly_PCI_PMON_BOX_CTL Register Field Definitions 120 2 120 R3_Ly_PCI_PMON_CTL...

Page 7: ...Reference Number 327043 001 7 Revision History Revision Description Date 327043 001 Initial release March 2012...

Page 8: ...8 Reference Number 327043 001...

Page 9: ...onents provide similar performance monitoring capabilities 1 2 Uncore PMON Overview The uncore performance monitoring facilities are organized into per component performance monitoring or PMON units A...

Page 10: ...Irrespective of the address space difference and with only minor exceptions the bit granular layout of the control registers to program event code unit mask start stop and signal filtering via thresho...

Page 11: ...to Section 2 6 3 PCU Performance Monitors more information Also note that only a subset of the available control bits are presented in the diagram Selecting What To Monitor The main task of a configur...

Page 12: ...ng the Threshold Comparison invert Changes the thresh test condition to Counting State Transitions Instead of per Cycle Events edge_det Rather than accumulating the raw count each cycle for events tha...

Page 13: ...0 Counter Config Registers 0x3FC 0x3FD Fixed Counters Non PMON U Box Counters For U Box 0xC17 0xC16 Counter Registers 0xC11 0xC10 Counter Config Registers 0xC09 0xC08 Fixed Counter Config Register Tab...

Page 14: ...me field value1 For multiple fields with Register_Name field1 field2 value1 value2 e g with Cn_MSR_PMON_BOX_FILTER opc nid 0x182 my_node Requires reading a fixed data register For the case where the m...

Page 15: ...ified it is assumed that the bit must be set to 1 Requires gathering of extra information outside the box often for common terms See following section for a breakdown of common terms found in Derived...

Page 16: ...Introduction 16 Reference Number 327043 001...

Page 17: ...collecting events must be taken to set up a monitoring session Section 2 1 2 covers the steps to stop re start counter registers during a monitoring session For each box in which events will be measur...

Page 18: ...C nor the HA have a reset bit in their Unit Control register The counters in the UBox the HA each populated DRAM channel in the iMC will need to be manually reset by writing a 0 in each data register...

Page 19: ...supports event monitoring through two programmable 44 bit wide counters U_MSR_PMON_CTR 1 0 and a 48 bit fixed counter which increments each u clock Each of these counters can be programmed U_MSR_PMON_...

Page 20: ...against Threshold 0 comparison will be is event increment threshold 1 comparison is inverted is event increment threshold NOTE invert is in series following thresh Due to this the thresh field must be...

Page 21: ...nition Virtual Logical Wire legacy message were received from Uncore Specify the thread to filter on using NCUPMONCTRLGLCTR ThreadID Table 2 4 U_MSR_PMON_FIXED_CTL Register Field Definitions Field Bit...

Page 22: ...ning cache coherency within the socket the CBo is the gate keeper for all Intel QuickPath Interconnect Intel QPI messages that originate in the core and is responsible for ensuring that all Intel QPI...

Page 23: ...gister Each Cbo provides one filter register and allows only one such event be programmed at a given time see Section 2 3 3 3 For information on how to setup a monitoring session refer to Section 2 1...

Page 24: ...Counter 0 Box Level Control Status C0_MSR_PMON_BOX_CTL 0x0D04 32 CBo 0 PMON Box Wide Control CBo 1 PMON Registers Generic Counters C1_MSR_PMON_CTR3 0x0D39 64 CBo 1 PMON Counter 3 C1_MSR_PMON_CTR2 0x0...

Page 25: ...D73 32 CBo 3 PMON Control for Counter 3 C3_MSR_PMON_CTL2 0x0D72 32 CBo 3 PMON Control for Counter 2 C3_MSR_PMON_CTL1 0x0D71 32 CBo 3 PMON Control for Counter 1 C3_MSR_PMON_CTL0 0x0D70 32 CBo 3 PMON Co...

Page 26: ...0DD9 64 CBo 6 PMON Counter 3 C6_MSR_PMON_CTR2 0x0DD8 64 CBo 6 PMON Counter 2 C6_MSR_PMON_CTR1 0x0DD7 64 CBo 6 PMON Counter 1 C6_MSR_PMON_CTR0 0x0DD6 64 CBo 6 PMON Counter 0 Box Level Filter C6_MSR_PMO...

Page 27: ...TL2 0x0DF2 32 CBo 7 PMON Control for Counter 2 C7_MSR_PMON_CTL1 0x0DF1 32 CBo 7 PMON Control for Counter 1 C7_MSR_PMON_CTL0 0x0DF0 32 CBo 7 PMON Control for Counter 0 Box Level Control Status C7_MSR_P...

Page 28: ...thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also if edge_det is set to 1 the counter will increment when a...

Page 29: ...onitor for LLC_LOOKUP event Setting multiple bits in this field will allow a user to track multiple states b1xxxx F state bx1xxx M state bxx1xx E state bxxx1x S state bxxxx1 I state nid 17 10 0 0 Matc...

Page 30: ...ach CBo instance For any event to get an aggregate count of that event for the entire LLC the counts across the CBo instances must be added together The counts can be averaged across the CBo instances...

Page 31: ...4 3 The Queues There are several internal occupancy queue counters each of which is 5bits wide and dedicated to its queue IRQ IPQ ISMQ QPI_IGR IGR EGR and the TOR 2 3 5 CBo Events Ordered By Code The...

Page 32: ...OCCUPANCY edge_det thresh 0x1 with Cn_MSR_PMON_BOX_FILTER opc 0x182 AVG_TOR_DRDS_WHEN_NE Average Number of Data Read Entries when the TOR is not empty TOR_OCCUPANCY OPCODE COUNTER0_OCCUPANCY edge_det...

Page 33: ...Data Read and RFO misses satisfied by locally HOMed memory Only valid at processor level don t add counts across Cbos NOTE Count imperfect Will be polluted by remote hits where memory s home node is...

Page 34: ...cy counts can only be captured in the Cbo s 0 counter this event allows a user to capture occupancy related information by filtering the Cb0 occupancy count captured in Counter 0 The filtering availab...

Page 35: ...es at a time 0 I miss 1 S 2 E 3 M 4 F For example if you wanted to monitor F and S hits you could set 10010b in the 5 bit state field To monitor any lookup set the field to 0x1F LLC_VICTIMS Title Line...

Page 36: ...or example in a 4c part Cbo 0 UP AD is NOT the same ring as CBo 2 UP AD because they are on opposite sides of the ring S_STATE bxxxxx1xx Lines in S State MISS bxxxx1xxx NID bx1xxxxxx CBoFilter 1 7 10...

Page 37: ...when packets are passing by and when packets are being sunk but does not include when packets are being sent from the ring stop We really have two rings in JKT a clockwise ring and a counter clockwise...

Page 38: ...uld select both UP_EVEN and DN_EVEN To monitor the Odd ring they should select both UP_ODD and DN_ODD RING_SRC_THRTL Title Category RING Events Event Code 0x07 Max Inc Cyc 1 Register Restrictions 0 1...

Page 39: ...asks for RxR_EXT_STARVED Extension umask 15 8 Description IRQ bxxxxxxx1 IPQ IRQ is externally starved and therefore we are blocking the IPQ IPQ bxxxxxx1x IRQ IPQ is externally startved and therefore w...

Page 40: ...Sheet 1 of 2 Extension umask 15 8 Description ANY bxxxxxxx1 Any Reject Counts the number of IRQ retries that occur Requests from the IRQ are retried if they are rejected from the TOR pipeline for a va...

Page 41: ...Masks for RxR_ISMQ_RETRY Extension umask 15 8 Description ANY bxxxxxxx1 Any Reject Counts the total number of times that a request from the ISMQ retried because of a TOR reject ISMQ requests generall...

Page 42: ...Q Ordering FIFO in each cycle In JKT it is necessary to keep IO requests in order Therefore they are allocated into an ordering FIFO that sits next to the IRQ and must be satisfied from the FIFO in or...

Page 43: ...NID matched eviction transactions inserted into the TOR NID_ALL b01001000 CBoFilter 1 7 10 NID Matched All NID matched matches an RTID destination transactions inserted into the TOR The NID is progra...

Page 44: ...are no available TOR slots MISS_OPCODE b00000011 CBoFilter 3 1 23 Miss Opcode Match TOR entries for miss transactions that match an opcode This generally means that the request was sent to memory or...

Page 45: ...erency agents regardless of who is reading or modifying the data On Intel QPI the home agent is responsible for tracking all requests to a given address and ensuring that the results are consistent Me...

Page 46: ...s will increment by a maximum of 8b per cycle For information on how to setup a monitoring session refer to Section 2 1 Uncore Per Socket Performance Monitoring Control 2 4 3 HA Performance Monitors 2...

Page 47: ...this box will be frozen rsv 7 2 RV 0 Reserved rsv 1 0 RV 0 Reserved SW must write to 0 else behavior is undefined Table 2 35 HA_PCI_PMON_CTL 3 0 Register Field Definitions Sheet 1 of 2 Field Bits Att...

Page 48: ...ble 2 143 Opcode Match by Message Class to determine the encodings of the B Box Match Register fields rsv 17 16 RV 0 Reserved SW must write to 0 else behavior is undefined umask 15 8 RW V 0 Select sub...

Page 49: ...te queues Ring Stop Events To track Egress and ring utilization broken down by direction and ring type statistics as well as ring credits between the HA and Intel QPI Local Remote Filtering A number o...

Page 50: ...Direct2Core was Disabled DIRECT2CORE_TXN_OVERRIDE 0x13 0 3 1 Number of Reads that had Direct2Core Overridden RPQ_CYCLES_NO_REG_CREDITS 0x15 0 3 4 iMC RPQ Credits Empty Regular WPQ_CYCLES_NO_REG_CREDI...

Page 51: ...ze delays The HA is on the other side of the die from the fixed Ubox uclk counter so the drift could be somewhat larger than in units that are closer like the Intel QPI Agent CONFLICT_CYCLES Title Con...

Page 52: ...RECTORY_LOOKUP Title Directory Lookups Category DIRECTORY Events Event Code 0x0C Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of transactions that looked up the directory Can b...

Page 53: ...ategory IMC_MISC Events Event Code 0x1E Max Inc Cyc 1 Register Restrictions 0 3 Definition IMC_WRITES Title HA to iMC Full Line Writes Issued Category IMC_WRITES Events Event Code 0x1A Max Inc Cyc 1 R...

Page 54: ...the regular credits Common high banwidth workloads should be able to make use of all of the regular buffers but it will be difficult and uncommon to make use of both the regular and special buffers a...

Page 55: ...ent All requests destined for the memory controller must first be decoded to determine which TAD region they are in This event is filtered based on the TAD region ID and covers regions 8 to 10 This ev...

Page 56: ...See the filter descriptions for more details TxR_AD_CYCLES_FULL Title AD Egress Full Category AD_EGRESS Events Event Code 0x2A Max Inc Cyc 1 Register Restrictions 0 3 Definition AD Egress Full REGION...

Page 57: ...ket TxR_BL Title Outbound DRS Ring Transactions to Cache Category OUTBOUND_TX Events Event Code 0x10 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of DRS messages sent out on th...

Page 58: ...mmon to make use of both the regular and special buffers at the same time One can filter based on the memory controller channel One or more channels can be tracked at a given time Table 2 56 Unit Mask...

Page 59: ...r of DIMMs per channel depends on the speed it is running and the package Support for unbuffered DDR3 and registered DDR3 Up to four independent DDR3 channels Eight independent banks per rank Support...

Page 60: ...32 MC Channel y PMON Box Wide Control Generic Counter Control MC_CHy_PCI_PMON_FIXED_CTL F0 32 MC Channel y PMON Control for Fixed Counter MC_CHy_PCI_PMON_CTL3 E4 32 MC Channel y PMON Control for Coun...

Page 61: ...crement threshold NOTE invert is in series following thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also if ed...

Page 62: ...Monitoring Events 2 5 5 1 An Overview A sampling of events available for monitoring in the iMC Translated commands Various Read and Write CAS commands Memory commands CAS Precharge Refresh Preemptions...

Page 63: ...mands MAJOR_MODES 0x07 0 3 1 Cycles in a Major Mode PREEMPTION 0x08 0 3 1 Read Preemption Count ECC_CORRECTABLE_ERRORS 0x09 0 3 1 ECC Correctable Errors RPQ_INSERTS 0x10 0 3 1 Read Pending Queue Alloc...

Page 64: ..._CTR_FIXED PCT_CYCLES_DRAM_RANKx_IN_CKE The percentage of cycles DRAM rank x spent in CKE ON mode POWER_CKE_CYCLES RANKx MC_Chy_PCI_PMON_CTR_FIXED PCT_CYCLES_DRAM_RANKx_IN_THR The percentage of cycles...

Page 65: ...ds issued on this channel This includes both regular RD CAS commands as well as those with implicit Precharge AutoPre is only used in systems that are using closed page policy We do not filter based o...

Page 66: ...ition Number of cycles when all the ranks in the channel are in CKE Slow DLLOFF mode NOTE IBT Input Buffer Termination Off Table 2 67 Unit Masks for DRAM_REFRESH Extension umask 15 8 Description PANIC...

Page 67: ...here is no distinction between the different CKE modes APD PPDS PPDF This can be determined based on the system programming These events should commonly be used with Invert to get the number of cycles...

Page 68: ...N Title Read Preemption Count Category PREEMPTION Events Event Code 0x08 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of times a read in the iMC preempts another read or write...

Page 69: ...t to the memory controller and to track the requests Requests allocate into the RPQ soon after they enter the memory controller and need credits for an entry in this buffer before being sent from the...

Page 70: ...Write Pending Queue Not Empty Category WPQ Events Event Code 0x21 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of cycles that the Write Pending Queue is not empty This can the...

Page 71: ...provides information about how much queueing is actu ally happenning in the iMC for writes before they are actually issued to memory High average occupancies will generally coincide with high write m...

Page 72: ...communicates to the PCU thru standardized MSR registers and ACPI The PCU also acts as the interface to external management controllers via PECI and voltage regulators NPTM The DMI interface is the co...

Page 73: ...N_CTL2 0x0C32 32 PCU PMON Control for Counter 2 PCU_MSR_PMON_CTL1 0x0C31 32 PCU PMON Control for Counter 1 PCU_MSR_PMON_CTL0 0x0C30 32 PCU PMON Control for Counter 0 Box Level Control Status PCU_MSR_P...

Page 74: ...le for events that can increment by 1 per cycle the register can capture transitions from no event to an event incoming for the PCU s occupancy events when ev_sel 7 is set to 1 Table 2 75 PCU_MSR_PMON...

Page 75: ...ed SW must write to 0 for proper operation rsv 19 RV 0 Reserved edge_det 18 RW V 0 When set to 1 rather than measuring the event in each cycle it is active the corresponding counter will increment whe...

Page 76: ...e clear these counters track the number of cycles some core is in C3 6 state It does not track the total number of cores in the C3 6 state in any cycle For that a user should refer to the regular PCU...

Page 77: ...PCU has provided fixed occupancy counters to track the major queues 1 Cores in C0 4 bits 2 Cores in C3 4 bits 3 Cores in C6 4 bits Some Examples for Unlocking More Advanced Features The PCU perfmon im...

Page 78: ...REQ_BAND3_CYCLES 0x0E 0 0 3 1 Frequency Residency DEMOTIONS_CORE0 0x1E 0 0 3 1 Core C State Demotions DEMOTIONS_CORE1 0x1F 0 0 3 1 Core C State Demotions DEMOTIONS_CORE2 0x20 0 0 3 1 Core C State Demo...

Page 79: ...gister Restrictions 0 3 Definition Number of cycles spent performing core C state transitions There is one event per core NOTE This only tracks the hardware portion in the RCFSM CFCFSM This portion is...

Page 80: ...event per core CORE3_TRANSITION_CYCLES Title Core C State Transition Cycles Category CORE_C_STATE_TRANSITION Events Event Code 0x06 Extra Select Bit Y Max Inc Cyc 1 Register Restrictions 0 3 Definiti...

Page 81: ...Events Event Code 0x1E Max Inc Cyc 1 Register Restrictions 0 3 Filter Dependency PCUFilter 7 0 Definition Counts the number of times when a configurable cores had a C state demotion DEMOTIONS_CORE1 T...

Page 82: ...ency PCUFilter 7 0 Definition Counts the number of times when a configurable cores had a C state demotion FREQ_BAND0_CYCLES Title Frequency Residency Category FREQ_RESIDENCY Events Event Code 0x0B Max...

Page 83: ...ds One can use edge detect in conjunction with this event to track the number of times that we transitioned into a frequency greater than or equal to the configurable frequency One can also use invers...

Page 84: ...STRONGEST_UPPER_LIMIT is sampled at the output of the algorithm that determines the actual frequency while THERMAL_THROTTLE looks at the input FREQ_MAX_OS_CYCLES Title OS Strongest Upper Limit Cycles...

Page 85: ...it with the occupancy counter that monitors number of threads in C0 to estimate the performance impact that frequency transitions had on the system MEMORY_PHASE_SHEDDING_CYCLES Title Memory Phase She...

Page 86: ...3 Definition Number of cycles spent performing core C state transitions across all cores VOLT_TRANS_CYCLES_CHANGE Title Cycles Changing Voltage Category VOLT_TRANS Events Event Code 0x03 Max Inc Cyc...

Page 87: ...ion 2 9 R3QPI Performance Monitoring provides the interface to the Ring for the Link Layer It is also the point where VNA VN0 link credits are acquired In each Intel Xeon processor E5 2600 there are t...

Page 88: ...rol Generic Counter Control Q_Py_PCI_PMON_CTL3 E4 32 QPI Port y PMON Control for Counter 3 Q_Py_PCI_PMON_CTL2 E0 32 QPI Port y PMON Control for Counter 2 Q_Py_PCI_PMON_CTL1 DC 32 QPI Port y PMON Contr...

Page 89: ...rsv 15 9 RV 0 Reserved frz 8 WO 0 Freeze If set to 1 and the frz_en is 1 the counters in this box will be frozen rsv 7 2 RV 0 Reserved rst_ctrs 1 WO 0 Reset Counters When set to 1 the Counter Registe...

Page 90: ...sters b Set the counter s control register event select to 0x38 CTO_COUNT to capture the mask match as a performance event The following table contains the packet traffic that can be monitored if one...

Page 91: ...x0 Remote Node ID 3 0 Leat Significant Bits Table 2 89 Q_Py_PCI_PMON_PKT_MATCH0 Registers Field Bits HW Reset Val Description RNID_4 31 0x0 Remote Node ID Bit 4 Most Significant Bit 30 18 0x0 Reserved...

Page 92: ...se Data State valid when MC DRS and Opcode 0x0 2 Bit settings are mutually exclusive b1000 Modified b0100 Exclusive b0010 Shared b0001 Forwarding b0000 Invalid Non Coherent 15 4 0x0 Reserved Must writ...

Page 93: ...F_Cmp 0x1C40 Match1 19 16 0x1 0x1FE0 Mask1 19 16 0xF Complete Data Response message of a cache line in F state that is response to a core request The DRS DataC_F messages are only sent to Intel QPI DR...

Page 94: ...ses will be documented in the full Event List NCB AnyMsg9flits 0x1800 0x1F00 Any Non Coherent Bypass message that is 9 flits in length A 9 flit NCB message contains a full 64 byte cache line NCB AnyMs...

Page 95: ...Empty RxL_OCCUPANCY 0x0B 0 0 3 128 RxQ Occupancy All Packets TxL0_POWER_CYCLES 0x0C 0 0 3 1 Cycles in L0 TxL0P_POWER_CYCLES 0x0D 0 0 3 1 Cycles in L0p RxL0_POWER_CYCLES 0x0F 0 0 3 1 Cycles in L0 RxL0...

Page 96: ...O_HA_OR_IIO Data received from QPI forwarded to HA or IIO Expressed in Bytes DATA_FROM_QPI DATA_FROM_QPI_TO_LLC DATA_FROM_QPI_TO_LLC Data received from QPI forwarded to LLC Expressed in Bytes DIRECT2C...

Page 97: ...HELINE_MSGS_FROM_QPI DRS Partial Cacheline Data Messges From QPI in bytes CTO_COUNT with Q_Py_PCI_PMON_PKT_MATCH0 12 0 0x1D00 Q_Py_PCI_PMON_PKT_MASK0 12 0 0x1F00 64 DRS_WB_FROM_QPI DRS writeback packe...

Page 98: ...ning Category DIRECT2CORE Events Event Code 0x13 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of DRS packets that we attempted to do direct2core on There are 4 mutually exlusiv...

Page 99: ...tion could be in one state while Rx was in another NOTE Using edge_det to count transitions does not function if L1_POWER_CYCLES RxL0_POWER_CYCLES Title Cycles in L0 Category POWER_RX Events Event Cod...

Page 100: ...s consumed i e message uses a VN0 credit for the Rx Buffer This includes packets that went through the RxQ and those that were bypasssed RxL_CREDITS_CONSUMED_VNA Title VNA Credit Consumed Category RX_...

Page 101: ...Received Group 1 Category FLITS_RX Events Event Code 0x02 Extra Select Bit Y Max Inc Cyc 2 Register Restrictions 0 3 Definition Counts the number of flits received from the Intel QPI Link This is one...

Page 102: ...her information To calculate data bandwidth one should therefore do data flits 8B time Table 2 99 Unit Masks for RxL_FLITS_G1 Extension umask 15 8 Description SNP bxxxxxxx1 SNP Flits Counts the number...

Page 103: ...2 100 Unit Masks for RxL_FLITS_G2 Extension umask 15 8 Description NDR_AD bxxxxxxx1 Non Data Response Rx Flits AD Counts the total number of flits received over the NDR Non Data Response channel This...

Page 104: ...vent can be used in conjunction with the Flit Buffer Occupancy event in order to calculate the average flit buffer lifetime This monitors only NCB flits RxL_INSERTS_NCS Title Rx Flit Buffer Allocation...

Page 105: ...y event to calculate average occupancy or with the Flit Buffer Allocations event to track average lifetime RxL_OCCUPANCY_DRS Title RxQ Occupancy DRS Category RXQ Events Event Code 0x15 Extra Select Bi...

Page 106: ...t Buffer Not Empty event to calculate average occupancy or with the Flit Buffer Allocations event to track average lifetime This monitors NCS flits only RxL_OCCUPANCY_NDR Title RxQ Occupancy NDR Categ...

Page 107: ...n another The phy layer sometimes leaves L0 for training which will not be captured by this event TxL_BYPASSED Title Tx Flit Buffer Bypassed Category TXQ Events Event Code 0x05 Max Inc Cyc 1 Register...

Page 108: ...the Intel QPI speed One can calculate the bandwidth of the link by taking flits 80b time Note that this is not the same as data bandwidth For example when we are transfering a 64B cache line across In...

Page 109: ...s transmitted over Intel QPI These requests are contained in the snoop channel This does not include snoop responses which are transmitted on the home channel HOM_REQ bxxxxxx1x HOM Request Flits Count...

Page 110: ...D bxxxxxxx1 Non Data Response Tx Flits AD Counts the total number of flits transmitted over the NDR Non Data Response channel This channel is used to send a variety of protocol flits including grants...

Page 111: ...e waitng to be returned back across the link 2 8 R2PCIe Performance Monitoring 2 8 1 Overview of the R2PCIe Box R2PCIe represents the interface between the Ring and IIO traffic to from PCIe 2 8 2 R2PC...

Page 112: ...CI_PMON_CTL2 E0 32 R2PCIe PMON Control for Counter 2 R2_PCI_PMON_CTL1 DC 32 R2PCIe PMON Control for Counter 1 R2_PCI_PMON_CTL0 D8 32 R2PCIe PMON Control for Counter 0 Generic Counters R2_PCI_PMON_CTR3...

Page 113: ...t increment threshold NOTE invert is in series following thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also i...

Page 114: ...ol Name Event Code Ctrs Max Inc Cyc Description CLOCKTICKS 0x1 0 3 1 Number of uclks in domain RING_AD_USED 0x07 0 3 1 R2 AD Ring in Use RING_AK_USED 0x08 0 3 1 R2 AK Ring in Use RING_BL_USED 0x09 0 3...

Page 115: ...not include when packets are being sent from the ring stop IIO_RDS_TO_RING_IN_BYTES IIO Reads data transmitted to Ring in Bytes TxR_INSERTS BL 32 RING_THRU_DNEVEN_BYTES Ring throughput in the Down di...

Page 116: ...when packets are passing by and when packets are being sunk but does not include when packets are being sunk into the ring stop The IV ring is unidirectional Whether UP or DN is used is dependent on...

Page 117: ...CIe Ingress Occupancy Accumulator event in order to calculate average queue occupancy Multiple ingress buffers can be tracked at a given time using multiple counters TxR_CYCLES_FULL Title Egress Cycle...

Page 118: ...t any given time It is not possible to filter based on direction or polarity TxR_INSERTS Title Egress Allocations Category EGRESS Events Event Code 0x24 Max Inc Cyc 1 Register Restrictions 0 Definitio...

Page 119: ...In order to optimize layout and latency both full width Intel QPI interfaces share the same ring stop Therefore a Intel QPI packet might be received on one interrface and simply forwarded along on th...

Page 120: ...3_Ly_PCI_PMON_CTL2 E0 32 R3QPI Link y PMON Control for Counter 2 R3_Ly_PCI_PMON_CTL1 DC 32 R3QPI Link y PMON Control for Counter 1 R3_Ly_PCI_PMON_CTL0 D8 32 R3QPI Link y PMON Control for Counter 0 Gen...

Page 121: ...ncrement threshold NOTE invert is in series following thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also if e...

Page 122: ...RING_AD_USED 0x07 0 2 1 R3 AD Ring in Use RING_AK_USED 0x08 0 2 1 R3 AK Ring in Use RING_BL_USED 0x09 0 2 1 R3 BL Ring in Use RING_IV_USED 0x0A 0 2 1 R3 IV Ring in Use RxR_CYCLES_NE 0x10 0 1 1 Ingres...

Page 123: ...uried in the Intel QPI for sending messages on BL to the IIO There is one credit for each of these three message classes three credits total NCS is used for reads to PCIe space NCB is used for transfe...

Page 124: ...trictions 0 2 Definition Counts the number of cycles that the AD ring is being used at this ring stop This includes when packets are passing by and when packets are being sunk but does not include whe...

Page 125: ...when packets are passing by and when packets are being sent but does not include when packets are being sunk into the ring stop The IV ring is unidirectional Whether UP or DN is used is dependent on...

Page 126: ...he Intel QPI agent This can be used in conjunction with the Intel QPI Ingress Occupancy Accumulator event in order to calculate average queue occu pancy Multiple ingress buffers can be tracked at a gi...

Page 127: ...ed with the Intel QPI Ingress Not Empty event to calculate average occupancy or the Intel QPI Ingress Allocations event in order to calculate average queuing latency TxR_CYCLES_FULL Title Egress Cycle...

Page 128: ...irection or polarity TxR_NACK Title Egress NACK Category EGRESS Events Event Code 0x26 Max Inc Cyc 1 Register Restrictions 0 1 Definition VN0_CREDITS_REJECT Title VN0 Credit Acquisition Failed on DRS...

Page 129: ...ta with coherency For example remote reads and writes or cache to cache transfers will transmit their data using DRS NCB bxxx1xxxx NCB Message Class Filter for Non Coherent Broadcast NCB NCB is genera...

Page 130: ...rally used to pro vide the bulk of the Intel QPI bandwidth as opposed to the VN0 pool which is used to guarantee forward progress VNA credits can run out if the flit buffer on the receiving side start...

Page 131: ...rmance monitoring infrastructure allows a user to filter packet traffic according to certain fields A couple common fields the Message Class Opcode fields have been summarized in the following tables...

Page 132: ...possible conflict scenario AckCnfltWbI 1001 HOM0 In addition to signaling AckCnflt the caching agent has also written the dirty cache line data plus any partial write data back to memory in a WBiData...

Page 133: ...Interrupt priority update message to source interrupt agents InvItoE 1000 HOM0 Invalidate to E state requests exclusive ownership of a cache line without data InvXtoI 0101 HOM0 Flush a cache line from...

Page 134: ...10 HOM1 Peer has sent data to requestor and is left with line in S state RspFwdSWb 1100 HOM1 Peer has sent data to requestor and a WbSData to the home and is left with line in S state RspI 0000 HOM1 P...

Page 135: ...state back to memory and transition its state to E WbMtoS 1110 HOM0 Write a cache line in M state back to memory and transition its state to S WbSData 0101 DRS Writeback data downgrade to S state WcWr...

Page 136: ...Intel Xeon Processor E5 2600 Product Family Uncore Performance Monitoring 136 Reference Number 327043 001...

Reviews: