background image

Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring

18

     

Reference Number: 327043-001 

Program the .

ev_sel

 and .

umask

 bits in the control register with the encodings necessary to capture 

the requested event along with any signal conditioning bits (.

thresh

/.

edge_det

/.

invert

) used to qualify 

the event.

e.g., Set C0_MSR_PMON_CT2.{ev_sel, umask} to {0x03, 0x1} in order to capture 
LLC_VICTIMS.M_STATE in CBo 0’s C0_MSR_PMON_CTR2.

Note:

It is also important to program any additional filter registers used to further qualify the 

events (e.g., setting the opcode match field in Cn_MSR_BOX_FILTER to qualify 

TOR_INSERTS by a specific opcode).

Back to the box level:

e) Reset counters in each box to ensure no stale values have been acquired from previous sessions. 

• For each CBo, set Cn_MSR_PMON_BOX_CTL[1:0] to 0x2.
• For each Intel® QPI Port, set Q_Py_PCI_PMON_BOX_CTL[1:0] to 0x2.
• Set PCU_MSR_PMON_BOX_CTL[1:0] to 0x2.
• For each Link, set R3QPI_PCI_PMON_BOX_CTL[1:0] to 0x2.
• Set R2PCIE_PCI_PMON_BOX_CTL[1:0] to 0x2.

Note:

The UBox does not have a Unit Control register and neither the iMC nor the HA have a 

reset bit in their Unit Control register.  The counters in the UBox, the HA each populated 

DRAM channel in the iMC will need to be manually reset by writing a 0 in each data 

register. 

Back to the box level:

f) Commence counting at the box level by unfreezing the counters in each box

e.g., set Cn_MSR_PMON_BOX_CTL.frz to 0

And with that, counting will begin.

Note:

The UBox does not have a Unit Control register.  Once enabled and programmed with a 

valid event, they will be collecting events.  For somewhat better synchronization, a user 

can keep the U_MSR_PMON_CTL.ev_sel at 0x0 while enabled and write it with a valid 

value just prior to unfreezing the registers in other boxes.

2.1.2

Reading the Sample Interval

Software can 

poll

 the counters whenever it chooses. 

a) 

Polling

 - before reading, it is recommended that software freeze the counters in each box in which 

counting is to take place (by setting *_PMON_BOX_CTL.frz_en and .frz to 1).  After reading the event 
counts from the counter registers, the monitoring agent can choose to reset the event counts to avoid 
event-count wrap-around; or resume the counter register without resetting their values. The latter 
choice will require the monitoring agent to check and adjust for potential wrap-around situations.

Содержание Xeon Processor E5-2600

Страница 1: ...Reference Number 327043 001 Intel Xeon Processor E5 2600 Product Family Uncore Performance Monitoring Guide March 2012...

Страница 2: ...r shipped It is not a commercial name for products or services and is not intended to function as a trademark The Intel 64 architecture processors may contain design defects or errors known as errata...

Страница 3: ...CBo Performance Monitoring Overview 23 2 3 2 1 Special Note on CBo Occupancy Events 23 2 3 3 CBo Performance Monitors 24 2 3 3 1 CBo Box Level PMON State 27 2 3 3 2 CBo PMON state Counter Control Pair...

Страница 4: ...tch Facility 90 2 7 3 4 Intel QPI Extra Registers Companions to PMON HW 94 2 7 4 Intel QPI LL Performance Monitoring Events 94 2 7 4 1 An Overview 94 2 7 4 2 Acronyms frequently used in Intel QPI Even...

Страница 5: ...7 HA_PCI_PMON_BOX_OPCODEMATCH Register Field Definitions 48 2 38 HA_PCI_PMON_BOX_ADDRMATCH1 Register Field Definitions 48 2 39 HA_PCI_PMON_BOX_ADDRMATCH0 Register Field Definitions 49 2 59 iMC Perform...

Страница 6: ...ions 113 2 107 R2_PCI_PMON_CTR 3 0 Register Field Definitions 113 2 118 R3QPI Performance Monitoring Registers 120 2 119 R3_Ly_PCI_PMON_BOX_CTL Register Field Definitions 120 2 120 R3_Ly_PCI_PMON_CTL...

Страница 7: ...Reference Number 327043 001 7 Revision History Revision Description Date 327043 001 Initial release March 2012...

Страница 8: ...8 Reference Number 327043 001...

Страница 9: ...onents provide similar performance monitoring capabilities 1 2 Uncore PMON Overview The uncore performance monitoring facilities are organized into per component performance monitoring or PMON units A...

Страница 10: ...Irrespective of the address space difference and with only minor exceptions the bit granular layout of the control registers to program event code unit mask start stop and signal filtering via thresho...

Страница 11: ...to Section 2 6 3 PCU Performance Monitors more information Also note that only a subset of the available control bits are presented in the diagram Selecting What To Monitor The main task of a configur...

Страница 12: ...ng the Threshold Comparison invert Changes the thresh test condition to Counting State Transitions Instead of per Cycle Events edge_det Rather than accumulating the raw count each cycle for events tha...

Страница 13: ...0 Counter Config Registers 0x3FC 0x3FD Fixed Counters Non PMON U Box Counters For U Box 0xC17 0xC16 Counter Registers 0xC11 0xC10 Counter Config Registers 0xC09 0xC08 Fixed Counter Config Register Tab...

Страница 14: ...me field value1 For multiple fields with Register_Name field1 field2 value1 value2 e g with Cn_MSR_PMON_BOX_FILTER opc nid 0x182 my_node Requires reading a fixed data register For the case where the m...

Страница 15: ...ified it is assumed that the bit must be set to 1 Requires gathering of extra information outside the box often for common terms See following section for a breakdown of common terms found in Derived...

Страница 16: ...Introduction 16 Reference Number 327043 001...

Страница 17: ...collecting events must be taken to set up a monitoring session Section 2 1 2 covers the steps to stop re start counter registers during a monitoring session For each box in which events will be measur...

Страница 18: ...C nor the HA have a reset bit in their Unit Control register The counters in the UBox the HA each populated DRAM channel in the iMC will need to be manually reset by writing a 0 in each data register...

Страница 19: ...supports event monitoring through two programmable 44 bit wide counters U_MSR_PMON_CTR 1 0 and a 48 bit fixed counter which increments each u clock Each of these counters can be programmed U_MSR_PMON_...

Страница 20: ...against Threshold 0 comparison will be is event increment threshold 1 comparison is inverted is event increment threshold NOTE invert is in series following thresh Due to this the thresh field must be...

Страница 21: ...nition Virtual Logical Wire legacy message were received from Uncore Specify the thread to filter on using NCUPMONCTRLGLCTR ThreadID Table 2 4 U_MSR_PMON_FIXED_CTL Register Field Definitions Field Bit...

Страница 22: ...ning cache coherency within the socket the CBo is the gate keeper for all Intel QuickPath Interconnect Intel QPI messages that originate in the core and is responsible for ensuring that all Intel QPI...

Страница 23: ...gister Each Cbo provides one filter register and allows only one such event be programmed at a given time see Section 2 3 3 3 For information on how to setup a monitoring session refer to Section 2 1...

Страница 24: ...Counter 0 Box Level Control Status C0_MSR_PMON_BOX_CTL 0x0D04 32 CBo 0 PMON Box Wide Control CBo 1 PMON Registers Generic Counters C1_MSR_PMON_CTR3 0x0D39 64 CBo 1 PMON Counter 3 C1_MSR_PMON_CTR2 0x0...

Страница 25: ...D73 32 CBo 3 PMON Control for Counter 3 C3_MSR_PMON_CTL2 0x0D72 32 CBo 3 PMON Control for Counter 2 C3_MSR_PMON_CTL1 0x0D71 32 CBo 3 PMON Control for Counter 1 C3_MSR_PMON_CTL0 0x0D70 32 CBo 3 PMON Co...

Страница 26: ...0DD9 64 CBo 6 PMON Counter 3 C6_MSR_PMON_CTR2 0x0DD8 64 CBo 6 PMON Counter 2 C6_MSR_PMON_CTR1 0x0DD7 64 CBo 6 PMON Counter 1 C6_MSR_PMON_CTR0 0x0DD6 64 CBo 6 PMON Counter 0 Box Level Filter C6_MSR_PMO...

Страница 27: ...TL2 0x0DF2 32 CBo 7 PMON Control for Counter 2 C7_MSR_PMON_CTL1 0x0DF1 32 CBo 7 PMON Control for Counter 1 C7_MSR_PMON_CTL0 0x0DF0 32 CBo 7 PMON Control for Counter 0 Box Level Control Status C7_MSR_P...

Страница 28: ...thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also if edge_det is set to 1 the counter will increment when a...

Страница 29: ...onitor for LLC_LOOKUP event Setting multiple bits in this field will allow a user to track multiple states b1xxxx F state bx1xxx M state bxx1xx E state bxxx1x S state bxxxx1 I state nid 17 10 0 0 Matc...

Страница 30: ...ach CBo instance For any event to get an aggregate count of that event for the entire LLC the counts across the CBo instances must be added together The counts can be averaged across the CBo instances...

Страница 31: ...4 3 The Queues There are several internal occupancy queue counters each of which is 5bits wide and dedicated to its queue IRQ IPQ ISMQ QPI_IGR IGR EGR and the TOR 2 3 5 CBo Events Ordered By Code The...

Страница 32: ...OCCUPANCY edge_det thresh 0x1 with Cn_MSR_PMON_BOX_FILTER opc 0x182 AVG_TOR_DRDS_WHEN_NE Average Number of Data Read Entries when the TOR is not empty TOR_OCCUPANCY OPCODE COUNTER0_OCCUPANCY edge_det...

Страница 33: ...Data Read and RFO misses satisfied by locally HOMed memory Only valid at processor level don t add counts across Cbos NOTE Count imperfect Will be polluted by remote hits where memory s home node is...

Страница 34: ...cy counts can only be captured in the Cbo s 0 counter this event allows a user to capture occupancy related information by filtering the Cb0 occupancy count captured in Counter 0 The filtering availab...

Страница 35: ...es at a time 0 I miss 1 S 2 E 3 M 4 F For example if you wanted to monitor F and S hits you could set 10010b in the 5 bit state field To monitor any lookup set the field to 0x1F LLC_VICTIMS Title Line...

Страница 36: ...or example in a 4c part Cbo 0 UP AD is NOT the same ring as CBo 2 UP AD because they are on opposite sides of the ring S_STATE bxxxxx1xx Lines in S State MISS bxxxx1xxx NID bx1xxxxxx CBoFilter 1 7 10...

Страница 37: ...when packets are passing by and when packets are being sunk but does not include when packets are being sent from the ring stop We really have two rings in JKT a clockwise ring and a counter clockwise...

Страница 38: ...uld select both UP_EVEN and DN_EVEN To monitor the Odd ring they should select both UP_ODD and DN_ODD RING_SRC_THRTL Title Category RING Events Event Code 0x07 Max Inc Cyc 1 Register Restrictions 0 1...

Страница 39: ...asks for RxR_EXT_STARVED Extension umask 15 8 Description IRQ bxxxxxxx1 IPQ IRQ is externally starved and therefore we are blocking the IPQ IPQ bxxxxxx1x IRQ IPQ is externally startved and therefore w...

Страница 40: ...Sheet 1 of 2 Extension umask 15 8 Description ANY bxxxxxxx1 Any Reject Counts the number of IRQ retries that occur Requests from the IRQ are retried if they are rejected from the TOR pipeline for a va...

Страница 41: ...Masks for RxR_ISMQ_RETRY Extension umask 15 8 Description ANY bxxxxxxx1 Any Reject Counts the total number of times that a request from the ISMQ retried because of a TOR reject ISMQ requests generall...

Страница 42: ...Q Ordering FIFO in each cycle In JKT it is necessary to keep IO requests in order Therefore they are allocated into an ordering FIFO that sits next to the IRQ and must be satisfied from the FIFO in or...

Страница 43: ...NID matched eviction transactions inserted into the TOR NID_ALL b01001000 CBoFilter 1 7 10 NID Matched All NID matched matches an RTID destination transactions inserted into the TOR The NID is progra...

Страница 44: ...are no available TOR slots MISS_OPCODE b00000011 CBoFilter 3 1 23 Miss Opcode Match TOR entries for miss transactions that match an opcode This generally means that the request was sent to memory or...

Страница 45: ...erency agents regardless of who is reading or modifying the data On Intel QPI the home agent is responsible for tracking all requests to a given address and ensuring that the results are consistent Me...

Страница 46: ...s will increment by a maximum of 8b per cycle For information on how to setup a monitoring session refer to Section 2 1 Uncore Per Socket Performance Monitoring Control 2 4 3 HA Performance Monitors 2...

Страница 47: ...this box will be frozen rsv 7 2 RV 0 Reserved rsv 1 0 RV 0 Reserved SW must write to 0 else behavior is undefined Table 2 35 HA_PCI_PMON_CTL 3 0 Register Field Definitions Sheet 1 of 2 Field Bits Att...

Страница 48: ...ble 2 143 Opcode Match by Message Class to determine the encodings of the B Box Match Register fields rsv 17 16 RV 0 Reserved SW must write to 0 else behavior is undefined umask 15 8 RW V 0 Select sub...

Страница 49: ...te queues Ring Stop Events To track Egress and ring utilization broken down by direction and ring type statistics as well as ring credits between the HA and Intel QPI Local Remote Filtering A number o...

Страница 50: ...Direct2Core was Disabled DIRECT2CORE_TXN_OVERRIDE 0x13 0 3 1 Number of Reads that had Direct2Core Overridden RPQ_CYCLES_NO_REG_CREDITS 0x15 0 3 4 iMC RPQ Credits Empty Regular WPQ_CYCLES_NO_REG_CREDI...

Страница 51: ...ze delays The HA is on the other side of the die from the fixed Ubox uclk counter so the drift could be somewhat larger than in units that are closer like the Intel QPI Agent CONFLICT_CYCLES Title Con...

Страница 52: ...RECTORY_LOOKUP Title Directory Lookups Category DIRECTORY Events Event Code 0x0C Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of transactions that looked up the directory Can b...

Страница 53: ...ategory IMC_MISC Events Event Code 0x1E Max Inc Cyc 1 Register Restrictions 0 3 Definition IMC_WRITES Title HA to iMC Full Line Writes Issued Category IMC_WRITES Events Event Code 0x1A Max Inc Cyc 1 R...

Страница 54: ...the regular credits Common high banwidth workloads should be able to make use of all of the regular buffers but it will be difficult and uncommon to make use of both the regular and special buffers a...

Страница 55: ...ent All requests destined for the memory controller must first be decoded to determine which TAD region they are in This event is filtered based on the TAD region ID and covers regions 8 to 10 This ev...

Страница 56: ...See the filter descriptions for more details TxR_AD_CYCLES_FULL Title AD Egress Full Category AD_EGRESS Events Event Code 0x2A Max Inc Cyc 1 Register Restrictions 0 3 Definition AD Egress Full REGION...

Страница 57: ...ket TxR_BL Title Outbound DRS Ring Transactions to Cache Category OUTBOUND_TX Events Event Code 0x10 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of DRS messages sent out on th...

Страница 58: ...mmon to make use of both the regular and special buffers at the same time One can filter based on the memory controller channel One or more channels can be tracked at a given time Table 2 56 Unit Mask...

Страница 59: ...r of DIMMs per channel depends on the speed it is running and the package Support for unbuffered DDR3 and registered DDR3 Up to four independent DDR3 channels Eight independent banks per rank Support...

Страница 60: ...32 MC Channel y PMON Box Wide Control Generic Counter Control MC_CHy_PCI_PMON_FIXED_CTL F0 32 MC Channel y PMON Control for Fixed Counter MC_CHy_PCI_PMON_CTL3 E4 32 MC Channel y PMON Control for Coun...

Страница 61: ...crement threshold NOTE invert is in series following thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also if ed...

Страница 62: ...Monitoring Events 2 5 5 1 An Overview A sampling of events available for monitoring in the iMC Translated commands Various Read and Write CAS commands Memory commands CAS Precharge Refresh Preemptions...

Страница 63: ...mands MAJOR_MODES 0x07 0 3 1 Cycles in a Major Mode PREEMPTION 0x08 0 3 1 Read Preemption Count ECC_CORRECTABLE_ERRORS 0x09 0 3 1 ECC Correctable Errors RPQ_INSERTS 0x10 0 3 1 Read Pending Queue Alloc...

Страница 64: ..._CTR_FIXED PCT_CYCLES_DRAM_RANKx_IN_CKE The percentage of cycles DRAM rank x spent in CKE ON mode POWER_CKE_CYCLES RANKx MC_Chy_PCI_PMON_CTR_FIXED PCT_CYCLES_DRAM_RANKx_IN_THR The percentage of cycles...

Страница 65: ...ds issued on this channel This includes both regular RD CAS commands as well as those with implicit Precharge AutoPre is only used in systems that are using closed page policy We do not filter based o...

Страница 66: ...ition Number of cycles when all the ranks in the channel are in CKE Slow DLLOFF mode NOTE IBT Input Buffer Termination Off Table 2 67 Unit Masks for DRAM_REFRESH Extension umask 15 8 Description PANIC...

Страница 67: ...here is no distinction between the different CKE modes APD PPDS PPDF This can be determined based on the system programming These events should commonly be used with Invert to get the number of cycles...

Страница 68: ...N Title Read Preemption Count Category PREEMPTION Events Event Code 0x08 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of times a read in the iMC preempts another read or write...

Страница 69: ...t to the memory controller and to track the requests Requests allocate into the RPQ soon after they enter the memory controller and need credits for an entry in this buffer before being sent from the...

Страница 70: ...Write Pending Queue Not Empty Category WPQ Events Event Code 0x21 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of cycles that the Write Pending Queue is not empty This can the...

Страница 71: ...provides information about how much queueing is actu ally happenning in the iMC for writes before they are actually issued to memory High average occupancies will generally coincide with high write m...

Страница 72: ...communicates to the PCU thru standardized MSR registers and ACPI The PCU also acts as the interface to external management controllers via PECI and voltage regulators NPTM The DMI interface is the co...

Страница 73: ...N_CTL2 0x0C32 32 PCU PMON Control for Counter 2 PCU_MSR_PMON_CTL1 0x0C31 32 PCU PMON Control for Counter 1 PCU_MSR_PMON_CTL0 0x0C30 32 PCU PMON Control for Counter 0 Box Level Control Status PCU_MSR_P...

Страница 74: ...le for events that can increment by 1 per cycle the register can capture transitions from no event to an event incoming for the PCU s occupancy events when ev_sel 7 is set to 1 Table 2 75 PCU_MSR_PMON...

Страница 75: ...ed SW must write to 0 for proper operation rsv 19 RV 0 Reserved edge_det 18 RW V 0 When set to 1 rather than measuring the event in each cycle it is active the corresponding counter will increment whe...

Страница 76: ...e clear these counters track the number of cycles some core is in C3 6 state It does not track the total number of cores in the C3 6 state in any cycle For that a user should refer to the regular PCU...

Страница 77: ...PCU has provided fixed occupancy counters to track the major queues 1 Cores in C0 4 bits 2 Cores in C3 4 bits 3 Cores in C6 4 bits Some Examples for Unlocking More Advanced Features The PCU perfmon im...

Страница 78: ...REQ_BAND3_CYCLES 0x0E 0 0 3 1 Frequency Residency DEMOTIONS_CORE0 0x1E 0 0 3 1 Core C State Demotions DEMOTIONS_CORE1 0x1F 0 0 3 1 Core C State Demotions DEMOTIONS_CORE2 0x20 0 0 3 1 Core C State Demo...

Страница 79: ...gister Restrictions 0 3 Definition Number of cycles spent performing core C state transitions There is one event per core NOTE This only tracks the hardware portion in the RCFSM CFCFSM This portion is...

Страница 80: ...event per core CORE3_TRANSITION_CYCLES Title Core C State Transition Cycles Category CORE_C_STATE_TRANSITION Events Event Code 0x06 Extra Select Bit Y Max Inc Cyc 1 Register Restrictions 0 3 Definiti...

Страница 81: ...Events Event Code 0x1E Max Inc Cyc 1 Register Restrictions 0 3 Filter Dependency PCUFilter 7 0 Definition Counts the number of times when a configurable cores had a C state demotion DEMOTIONS_CORE1 T...

Страница 82: ...ency PCUFilter 7 0 Definition Counts the number of times when a configurable cores had a C state demotion FREQ_BAND0_CYCLES Title Frequency Residency Category FREQ_RESIDENCY Events Event Code 0x0B Max...

Страница 83: ...ds One can use edge detect in conjunction with this event to track the number of times that we transitioned into a frequency greater than or equal to the configurable frequency One can also use invers...

Страница 84: ...STRONGEST_UPPER_LIMIT is sampled at the output of the algorithm that determines the actual frequency while THERMAL_THROTTLE looks at the input FREQ_MAX_OS_CYCLES Title OS Strongest Upper Limit Cycles...

Страница 85: ...it with the occupancy counter that monitors number of threads in C0 to estimate the performance impact that frequency transitions had on the system MEMORY_PHASE_SHEDDING_CYCLES Title Memory Phase She...

Страница 86: ...3 Definition Number of cycles spent performing core C state transitions across all cores VOLT_TRANS_CYCLES_CHANGE Title Cycles Changing Voltage Category VOLT_TRANS Events Event Code 0x03 Max Inc Cyc...

Страница 87: ...ion 2 9 R3QPI Performance Monitoring provides the interface to the Ring for the Link Layer It is also the point where VNA VN0 link credits are acquired In each Intel Xeon processor E5 2600 there are t...

Страница 88: ...rol Generic Counter Control Q_Py_PCI_PMON_CTL3 E4 32 QPI Port y PMON Control for Counter 3 Q_Py_PCI_PMON_CTL2 E0 32 QPI Port y PMON Control for Counter 2 Q_Py_PCI_PMON_CTL1 DC 32 QPI Port y PMON Contr...

Страница 89: ...rsv 15 9 RV 0 Reserved frz 8 WO 0 Freeze If set to 1 and the frz_en is 1 the counters in this box will be frozen rsv 7 2 RV 0 Reserved rst_ctrs 1 WO 0 Reset Counters When set to 1 the Counter Registe...

Страница 90: ...sters b Set the counter s control register event select to 0x38 CTO_COUNT to capture the mask match as a performance event The following table contains the packet traffic that can be monitored if one...

Страница 91: ...x0 Remote Node ID 3 0 Leat Significant Bits Table 2 89 Q_Py_PCI_PMON_PKT_MATCH0 Registers Field Bits HW Reset Val Description RNID_4 31 0x0 Remote Node ID Bit 4 Most Significant Bit 30 18 0x0 Reserved...

Страница 92: ...se Data State valid when MC DRS and Opcode 0x0 2 Bit settings are mutually exclusive b1000 Modified b0100 Exclusive b0010 Shared b0001 Forwarding b0000 Invalid Non Coherent 15 4 0x0 Reserved Must writ...

Страница 93: ...F_Cmp 0x1C40 Match1 19 16 0x1 0x1FE0 Mask1 19 16 0xF Complete Data Response message of a cache line in F state that is response to a core request The DRS DataC_F messages are only sent to Intel QPI DR...

Страница 94: ...ses will be documented in the full Event List NCB AnyMsg9flits 0x1800 0x1F00 Any Non Coherent Bypass message that is 9 flits in length A 9 flit NCB message contains a full 64 byte cache line NCB AnyMs...

Страница 95: ...Empty RxL_OCCUPANCY 0x0B 0 0 3 128 RxQ Occupancy All Packets TxL0_POWER_CYCLES 0x0C 0 0 3 1 Cycles in L0 TxL0P_POWER_CYCLES 0x0D 0 0 3 1 Cycles in L0p RxL0_POWER_CYCLES 0x0F 0 0 3 1 Cycles in L0 RxL0...

Страница 96: ...O_HA_OR_IIO Data received from QPI forwarded to HA or IIO Expressed in Bytes DATA_FROM_QPI DATA_FROM_QPI_TO_LLC DATA_FROM_QPI_TO_LLC Data received from QPI forwarded to LLC Expressed in Bytes DIRECT2C...

Страница 97: ...HELINE_MSGS_FROM_QPI DRS Partial Cacheline Data Messges From QPI in bytes CTO_COUNT with Q_Py_PCI_PMON_PKT_MATCH0 12 0 0x1D00 Q_Py_PCI_PMON_PKT_MASK0 12 0 0x1F00 64 DRS_WB_FROM_QPI DRS writeback packe...

Страница 98: ...ning Category DIRECT2CORE Events Event Code 0x13 Max Inc Cyc 1 Register Restrictions 0 3 Definition Counts the number of DRS packets that we attempted to do direct2core on There are 4 mutually exlusiv...

Страница 99: ...tion could be in one state while Rx was in another NOTE Using edge_det to count transitions does not function if L1_POWER_CYCLES RxL0_POWER_CYCLES Title Cycles in L0 Category POWER_RX Events Event Cod...

Страница 100: ...s consumed i e message uses a VN0 credit for the Rx Buffer This includes packets that went through the RxQ and those that were bypasssed RxL_CREDITS_CONSUMED_VNA Title VNA Credit Consumed Category RX_...

Страница 101: ...Received Group 1 Category FLITS_RX Events Event Code 0x02 Extra Select Bit Y Max Inc Cyc 2 Register Restrictions 0 3 Definition Counts the number of flits received from the Intel QPI Link This is one...

Страница 102: ...her information To calculate data bandwidth one should therefore do data flits 8B time Table 2 99 Unit Masks for RxL_FLITS_G1 Extension umask 15 8 Description SNP bxxxxxxx1 SNP Flits Counts the number...

Страница 103: ...2 100 Unit Masks for RxL_FLITS_G2 Extension umask 15 8 Description NDR_AD bxxxxxxx1 Non Data Response Rx Flits AD Counts the total number of flits received over the NDR Non Data Response channel This...

Страница 104: ...vent can be used in conjunction with the Flit Buffer Occupancy event in order to calculate the average flit buffer lifetime This monitors only NCB flits RxL_INSERTS_NCS Title Rx Flit Buffer Allocation...

Страница 105: ...y event to calculate average occupancy or with the Flit Buffer Allocations event to track average lifetime RxL_OCCUPANCY_DRS Title RxQ Occupancy DRS Category RXQ Events Event Code 0x15 Extra Select Bi...

Страница 106: ...t Buffer Not Empty event to calculate average occupancy or with the Flit Buffer Allocations event to track average lifetime This monitors NCS flits only RxL_OCCUPANCY_NDR Title RxQ Occupancy NDR Categ...

Страница 107: ...n another The phy layer sometimes leaves L0 for training which will not be captured by this event TxL_BYPASSED Title Tx Flit Buffer Bypassed Category TXQ Events Event Code 0x05 Max Inc Cyc 1 Register...

Страница 108: ...the Intel QPI speed One can calculate the bandwidth of the link by taking flits 80b time Note that this is not the same as data bandwidth For example when we are transfering a 64B cache line across In...

Страница 109: ...s transmitted over Intel QPI These requests are contained in the snoop channel This does not include snoop responses which are transmitted on the home channel HOM_REQ bxxxxxx1x HOM Request Flits Count...

Страница 110: ...D bxxxxxxx1 Non Data Response Tx Flits AD Counts the total number of flits transmitted over the NDR Non Data Response channel This channel is used to send a variety of protocol flits including grants...

Страница 111: ...e waitng to be returned back across the link 2 8 R2PCIe Performance Monitoring 2 8 1 Overview of the R2PCIe Box R2PCIe represents the interface between the Ring and IIO traffic to from PCIe 2 8 2 R2PC...

Страница 112: ...CI_PMON_CTL2 E0 32 R2PCIe PMON Control for Counter 2 R2_PCI_PMON_CTL1 DC 32 R2PCIe PMON Control for Counter 1 R2_PCI_PMON_CTL0 D8 32 R2PCIe PMON Control for Counter 0 Generic Counters R2_PCI_PMON_CTR3...

Страница 113: ...t increment threshold NOTE invert is in series following thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also i...

Страница 114: ...ol Name Event Code Ctrs Max Inc Cyc Description CLOCKTICKS 0x1 0 3 1 Number of uclks in domain RING_AD_USED 0x07 0 3 1 R2 AD Ring in Use RING_AK_USED 0x08 0 3 1 R2 AK Ring in Use RING_BL_USED 0x09 0 3...

Страница 115: ...not include when packets are being sent from the ring stop IIO_RDS_TO_RING_IN_BYTES IIO Reads data transmitted to Ring in Bytes TxR_INSERTS BL 32 RING_THRU_DNEVEN_BYTES Ring throughput in the Down di...

Страница 116: ...when packets are passing by and when packets are being sunk but does not include when packets are being sunk into the ring stop The IV ring is unidirectional Whether UP or DN is used is dependent on...

Страница 117: ...CIe Ingress Occupancy Accumulator event in order to calculate average queue occupancy Multiple ingress buffers can be tracked at a given time using multiple counters TxR_CYCLES_FULL Title Egress Cycle...

Страница 118: ...t any given time It is not possible to filter based on direction or polarity TxR_INSERTS Title Egress Allocations Category EGRESS Events Event Code 0x24 Max Inc Cyc 1 Register Restrictions 0 Definitio...

Страница 119: ...In order to optimize layout and latency both full width Intel QPI interfaces share the same ring stop Therefore a Intel QPI packet might be received on one interrface and simply forwarded along on th...

Страница 120: ...3_Ly_PCI_PMON_CTL2 E0 32 R3QPI Link y PMON Control for Counter 2 R3_Ly_PCI_PMON_CTL1 DC 32 R3QPI Link y PMON Control for Counter 1 R3_Ly_PCI_PMON_CTL0 D8 32 R3QPI Link y PMON Control for Counter 0 Gen...

Страница 121: ...ncrement threshold NOTE invert is in series following thresh Due to this the thresh field must be set to a non 0 value For events that increment by no more than 1 per cycle set thresh to 0x1 Also if e...

Страница 122: ...RING_AD_USED 0x07 0 2 1 R3 AD Ring in Use RING_AK_USED 0x08 0 2 1 R3 AK Ring in Use RING_BL_USED 0x09 0 2 1 R3 BL Ring in Use RING_IV_USED 0x0A 0 2 1 R3 IV Ring in Use RxR_CYCLES_NE 0x10 0 1 1 Ingres...

Страница 123: ...uried in the Intel QPI for sending messages on BL to the IIO There is one credit for each of these three message classes three credits total NCS is used for reads to PCIe space NCB is used for transfe...

Страница 124: ...trictions 0 2 Definition Counts the number of cycles that the AD ring is being used at this ring stop This includes when packets are passing by and when packets are being sunk but does not include whe...

Страница 125: ...when packets are passing by and when packets are being sent but does not include when packets are being sunk into the ring stop The IV ring is unidirectional Whether UP or DN is used is dependent on...

Страница 126: ...he Intel QPI agent This can be used in conjunction with the Intel QPI Ingress Occupancy Accumulator event in order to calculate average queue occu pancy Multiple ingress buffers can be tracked at a gi...

Страница 127: ...ed with the Intel QPI Ingress Not Empty event to calculate average occupancy or the Intel QPI Ingress Allocations event in order to calculate average queuing latency TxR_CYCLES_FULL Title Egress Cycle...

Страница 128: ...irection or polarity TxR_NACK Title Egress NACK Category EGRESS Events Event Code 0x26 Max Inc Cyc 1 Register Restrictions 0 1 Definition VN0_CREDITS_REJECT Title VN0 Credit Acquisition Failed on DRS...

Страница 129: ...ta with coherency For example remote reads and writes or cache to cache transfers will transmit their data using DRS NCB bxxx1xxxx NCB Message Class Filter for Non Coherent Broadcast NCB NCB is genera...

Страница 130: ...rally used to pro vide the bulk of the Intel QPI bandwidth as opposed to the VN0 pool which is used to guarantee forward progress VNA credits can run out if the flit buffer on the receiving side start...

Страница 131: ...rmance monitoring infrastructure allows a user to filter packet traffic according to certain fields A couple common fields the Message Class Opcode fields have been summarized in the following tables...

Страница 132: ...possible conflict scenario AckCnfltWbI 1001 HOM0 In addition to signaling AckCnflt the caching agent has also written the dirty cache line data plus any partial write data back to memory in a WBiData...

Страница 133: ...Interrupt priority update message to source interrupt agents InvItoE 1000 HOM0 Invalidate to E state requests exclusive ownership of a cache line without data InvXtoI 0101 HOM0 Flush a cache line from...

Страница 134: ...10 HOM1 Peer has sent data to requestor and is left with line in S state RspFwdSWb 1100 HOM1 Peer has sent data to requestor and a WbSData to the home and is left with line in S state RspI 0000 HOM1 P...

Страница 135: ...state back to memory and transition its state to E WbMtoS 1110 HOM0 Write a cache line in M state back to memory and transition its state to S WbSData 0101 DRS Writeback data downgrade to S state WcWr...

Страница 136: ...Intel Xeon Processor E5 2600 Product Family Uncore Performance Monitoring 136 Reference Number 327043 001...

Отзывы: