Home
/
Sun Microsystems
/
UltraSPARC-I
/
User Manual

Sun Microsystems UltraSPARC-I, User Manual, Page: 289 / 410

Share

Page: 289 / 410

Sun Microsystems UltraSPARC-I User Manual Download Page 289

Sun Microelectronics
274

UltraSPARC User’s Manual

see later, this is desirable not only for improving the D-Cache hit rate (by increas-
ing its utilization density), but also for D-Cache misses where, for sequential ac-
cesses, one out of two requests to the E-Cache can be eliminated. Grouping load
data beyond a D-Cache sub-block is also desirable, since an E-Cache line contains
four D-Cache sub-blocks (for a total of 64 bytes). Thus, sequential accesses can
guarantee that only one E-Cache miss will occur for loads that access up to four
consecutive D-Cache sub-blocks (two D-Cache lines). Section 16.3.6 discuss how
code scheduled for accessing data directly out of the E-Cache can hide the extra
latency introduced by D-Cache misses.

Data alignment (right justification) for byte, halfword, and word accesses does
not add latency to the loads (unless superseded by the sign rule described in Sec-
tion 16.3.2.1, “Signed Loads”). This is true whether the load goes to the register
file or to internal pipeline bypasses.

16.3.4 Direct-Mapped Cache Considerations

A direct-mapped cache is more susceptible to collisions than a set-associative
cache. It is possible to organize data at compile time so that collisions are mini-
mized, however. For frequently executed loops, the compiler should organize the
data so that all accesses within the loop are mapped to different cache lines, un-
less the access is to a line that is already mapped and the access is to the same
physical line. For UltraSPARC, this means that accesses should differ in the virtual
address bits VA<13:5>. Hot spots can be detected by configuring the on-chip
counters to accumulate D-Cache accesses and D-Cache misses. The counters can
be turned on/off before/after the load of interest, or around a series of loads
where hot spots are suspected to occur.

16.3.5 D-Cache Miss, E-Cache Hit Timing

Under normal circumstances (for example, no snoops, no arbitration conflict for
the E-Cache bus, etc.), loads that hit the E-Cache are returned N cycles later than
loads that hit the D-Cache, where N is determined by the E-Cache SRAM mode.
Table 16-1 shows the latency for all supported SRAM Modes. (See Section 1.3.9.1,
“E-Cache SRAM Modes,” on page 9 for more information, including which
modes are supported by each UltraSPARC model.)

Table 16-1

D-Cache Miss, E-Cache Hit Latency Depends on SRAM Mode

SRAM Modes

1–1–1

2–2

# of Cycles

6

7

Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com

«
...
287
288
289
290
291
...
»

Summary of Contents for UltraSPARC-I

Page 1: ...service in house repair center WE BUY USED EQUIPMENT Sell your excess underutilized and idle used equipment We also offer credit for buy backs and trade ins www artisantg com WeBuyEquipment REMOTE IN...

Page 2: ...02 This July 1997 02 Revision is only available on line The only changes made were to support hypertext links in the pdf file UltraSPARC User sManual UltraSPARC I UltraSPARC II July 1997 Artisan Techn...

Page 3: ...ms Inc Sun Sun Microsystems and the Sun logo are trademarks or registered trademarks of Sun Microsystems Inc in the United States and other countries All SPARC trademarks are used under license and ar...

Page 4: ...ent Overview 5 1 4 UltraSPARC Subsystem 10 2 Processor Pipeline 11 2 1 Introductions 11 2 2 Pipeline Stages 12 3 Cache Organization 17 3 1 Introduction 17 4 Overview of the MMU 21 4 1 Introduction 21...

Page 5: ...Interfaces 73 7 1 Introduction 73 7 2 Overview of UltraSPARC External Interfaces 73 7 3 Interaction Between E Cache and UDB 76 7 4 SYSADDR Bus Arbitration Protocol 84 7 5 UltraSPARC Interconnect Tran...

Page 6: ...C Data Buffer UDB Control Register 185 11 5 Overwrite Policy 185 Section III UltraSPARC and SPARC V9 12 Instruction Set Summary 189 13 UltraSPARC Extended Instructions 195 13 1 Introduction 195 13 2 S...

Page 7: ...ons 290 17 8 Floating Point and Graphic Instructions 295 Appendixes Debug and Diagnostics Support 303 A 1 Overview 303 A 2 Diagnostics Control and Accesses 303 A 3 Dispatch Control Register 303 A 4 Fl...

Page 8: ...on 337 E 2 Pin Descriptions 337 E 3 Signal Descriptions 341 F ASI Names 345 F 1 Introduction 345 G Differences Between UltraSPARC Models 351 G 1 Introduction 351 G 2 Summary 351 G 3 References to Mode...

Page 9: ...Sun Microelectronics viii UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 10: ...rs Extensions to and implementation dependencies of the SPARC V9 architecture Techniques for managing the pipeline and for producing optimized code A Brief History of SPARC SPARC stands for Scalable P...

Page 11: ...dependencies are introduced in The SPARC Architecture Manual Version 9 they are numbered throughout the body of the text and are cross referenced in Appendix C that book This book the UltraSPARC User...

Page 12: ...he following notational conventions are used Square brackets indicate a numbered register in a register file Angle brackets indicate a bit number or colon separated range of bit numbers within a field...

Page 13: ...ter 9 Interrupt Handling describes how UltraSPARC processes interrupts Chapter 10 Reset and RED_state describes how UltraSPARC handles the various SPARC V9 reset conditions and how it implements RED_s...

Page 14: ...low level technical material or information not needed for a general understanding of the architecture The manual contains the following ap pendixes Appendix A Debug and Diagnostics Support describes...

Page 15: ...Sun Microelectronics 14 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 16: ...IntroducingUltraSPARC 1 UltraSPARC Basics 3 2 Processor Pipeline 11 3 Cache Organization 17 4 Overview of the MMU 21 Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artis...

Page 17: ...Sun Microelectronics 2 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 18: ...is a full implementation of the 64 bit SPARC V9 architecture It sup ports a 44 bit virtual address space and a 41 bit physical address space The core instruction set has been extended to include grap...

Page 19: ...e data cache or 8 bytes per cycle into the register files To reduce instruction dependency stalls UltraSPARC has short latency opera tions and provides direct bypassing between units or within the sam...

Page 20: ...struction Translation Lookaside Buffer iTLB and a 64 entry Data Translation Ext Cache RAM Prefetch and Dispatch Unit PDU Integer Execution Unit IEU Floating Point Unit FPU Graphics Unit GRU Instructio...

Page 21: ...refetch across conditional branches a dynamic branch prediction scheme is implemented in hardware The outcome of a branch is based on a two bit history of the branch A next field associated with every...

Page 22: ...ns are not pipelined and take 12 22 cycles single double to execute but they do not stall the processor Other in structions following the divide square root can be issued executed and retired to the r...

Page 23: ...g 16Kb direct mapped cache with two 16 byte sub blocks per line It is virtually indexed and physically tagged VIPT The tag array is dual ported so tag updates due to line fills do not collide with tag...

Page 24: ...supports The modes are described below 1 1 1 Pipelined Mode The E Cache SRAMS have a cycle time equal to the processor cycle time The name 1 1 1 indicates that it takes one processor clock to send th...

Page 25: ...subsystem which consists of the UltraSPARC processor synchronous SRAM components for the E Cache tags and data and two UltraSPARC Data Buffer UDB chips The UDBs isolate the E Cache from the system pro...

Page 26: ...line This simplifies pipeline synchronization and ex ception handling It also eliminates the need to implement a floating point queue Floating point instructions with a latency greater than three divi...

Page 27: ...e Stages Detail X1 IU Register File E C N1 N2 G D Cache TLB FP add FP RF 32 x 64 IST_data Icc FPST_data Annex D FPU IEU DU G ALU FP mul G mul GRU address bus data bus instruction bus LSU Tag Tag Check...

Page 28: ...nd then sent to the Instruction Buffer The pre decoded bits generated during this stage accompany the instruc tions during their stay in the Instruction Buffer Upon reaching the next stage where the g...

Page 29: ...that the data can be forwarded to dependent instruc tions in the pipeline as soon as possible ALU operations executed in the E Stage generate condition codes in the C Stage The condition codes are se...

Page 30: ...d to the data portion of the Store Buffer All loads that have entered the Load Buffer in N1 continue their progress through the buffer they will reappear in the pipeline only when the data comes back...

Page 31: ...Sun Microelectronics 16 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 32: ...are used to index into the I Cache tag and data arrays while accessing the I MMU that is the iTLB The resulting tag is compared against the translated physical address to determine I Cache hits 3 1 1...

Page 33: ...16 byte sub blocks per line Data accesses bypass the data cache when the D Cache enable bit in the LSU_Control_Register is clear see Section A 6 LSU_Control_Register on page 306 Load misses will not a...

Page 34: ...vide a noncacheable ECC less scratch memory for use of the booting code until the MMUs are enabled The E Cache is a unified write back allocating direct mapped cache The E Cache always includes the co...

Page 35: ...Sun Microelectronics 20 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 36: ...om features required of SPARC V8 Reference MMUs 4 2 Virtual Address Translation The UltraSPARC MMU supports four page sizes 8 Kb 64 Kb 512 Kb and 4 Mb It supports a 44 bit virtual address space with 4...

Page 37: ...either all zeros or all ones Figure 4 2 on page 23 illustrates the UltraSPARC virtual address space 0 0 12 12 13 13 63 40 8K byte Virtual Page Number 8K byte Physical Page Number Page Offset Page Offs...

Page 38: ...Translation Storage Buffer TSB which acts like a direct mapped cache is the in terface between the two The TSB can be shared by all processes running on a processor or it can be process specific The...

Page 39: ...se when multiple mappings from one VA context to multiple PAs produce a multi ple TLB match is not detected in hardware it produces undefined results Note The hardware ensures the physical reliability...

Page 40: ...ternal Architecture 41 7 UltraSPARC External Interfaces 73 8 Address Spaces ASIs ASRs and Traps 145 9 Interrupt Handling 161 10 Reset and RED_state 169 11 Error Handling 175 Artisan Technology Group Q...

Page 41: ...Sun Microelectronics 26 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 42: ...ization of memory accesses Accesses to addresses that cause side effects I O accesses Non faulting loads Instruction prefetching Load and store buffers This chapter only address coherence in a uniproc...

Page 43: ...locks from the I and D Caches because UltraSPARC main tains inclusion between the external and internal caches See Section 5 2 2 Com mitting Block Store Flushing on page 29 2 1 Address Aliasing Flushi...

Page 44: ...ection 13 6 4 Block Load and Store Instructions on page 230 5 2 3 Displacement Flushing Cache flushing also can be accomplished by a displacement flush This is done by reading a range of read only add...

Page 45: ...C a MEMBAR Lookaside executes more efficiently than a MEMBAR StoreLoad 3 1 1 Cacheable Accesses Accesses that fall within the coherence domain are called cacheable accesses They are implemented in Ult...

Page 46: ...correct ordering between the cacheable and noncacheable domains explicit memory synchronization is needed in the form of MEMBARs or atomic instructions Code Example 5 1 illustrates the issues in volve...

Page 47: ...EMBARs at both 1 and 2 are needed 3 2 Memory Synchronization MEMBAR and FLUSH The MEMBAR STBAR in SPARC V8 and FLUSH instructions are provide for ex plicit control of memory ordering in program execut...

Page 48: ...emIssue Forces all outstanding memory accesses to be completed before any memory ac cess instruction after the MEMBAR is issued It must be used to guarantee order ing of cacheable accesses following n...

Page 49: ...on immediately af ter the FLUSH 3 3 Atomic Operations SPARC V9 provides three atomic instructions to support mutual exclusion These instructions behave like both a load and a store but the operations...

Page 50: ...CASX Instruction Compare and swap combines a load compare and store into a single atomic in struction It compares the value in an integer register to a value in memory if they are equal the value in m...

Page 51: ...ng loads allow the null pointer to be accessed safely in a read ahead fashion if the OS can ensure that the page at virtual address 016 is accessed with no penalty The NFO non fault access only bit in...

Page 52: ...fined in The SPARC Architecture Manual Version 9 A data_access_MMU_miss exception D MMU disabled For PREFETCHA any ASI other than the following 0416 0C16 1016 1116 1816 1916 8016 8316 8816 8B16 Attemp...

Page 53: ...ads or stores to pages that have this bit set have the following behavior Noncacheable accesses are strongly ordered with respect to each other Noncacheable loads with the E bit set will not be issued...

Page 54: ...quires the following A MEMBAR Sync is needed after an internal ASI store other than MMU ASIs before the point that side effects must be visible This MEMBAR must precede the next load or noninternal st...

Page 55: ...on instructions MEMBAR and STBAR are entered into the Store Buffer 5 1 Stores Delayed by Loads The store buffer normally has lower priority than the load buffer when arbitrat ing for the D Cache or E...

Page 56: ...ache the tag is used to determine whether there is a hit in the TSB If there is a hit the data is fetched by software Figure 6 1 Translation Table Entry TTE from TSB G Global If the Global bit is set...

Page 57: ...LT _LITTLE ASI_SECONDARY_NO_FAULT _LITTLE are translated Any other access will trap with a data_access_exception trap FT 1016 The NFO bit in the I MMU is read as zero and ignored when written If this...

Page 58: ...tware must ensure that at least one entry is not locked when replacing a TLB entry otherwise the last TLB entry will be replaced CP CV The cacheable in physically indexed cache and cacheable in virtua...

Page 59: ...ftware The Global Privileged and Writable fields replace the 3 bit ACC field of the SPARC V8 Reference MMU Page Translation Entry 3 Translation Storage Buffer TSB The TSB is an array of TTEs managed e...

Page 60: ...amic sharing of the level 2 cache resource should provide a better overall solution than that provided by a fixed partitioning Figure 6 2 shows both the common and shared TSB organization The constant...

Page 61: ...e 55 Note that there are no separate physical registers in UltraSPARC hardware for the Pointer registers but rather they are implemented through a dynamic re ordering of the data stored in the Tag Acc...

Page 62: ...ling of TLB misses For the following traps the trap handler is presented with a special set of MMU globals fast_ instruction da ta _access_MMU_miss instruction data _access_exception and fast_data_acc...

Page 63: ...l Address Space on page 237 Note that the case of JMPL RETURN and branch CALL sequential are handled differently The contents of the I Tag Access Register are undefined in this case but are not needed...

Page 64: ...S_BYPASS_EC_WITH_EBIT _LITTLE ASIs In this case SFSR FT 0416 6 4 5 Data_access_protection Trap This trap occurs when the MMU detects a protection violation for a data access A protection violation is...

Page 65: ...r atomic Also access to UltraSPARC internal registers other than LDXA LDFA STDFA or STXA except for I Cache diagnostic accesses other than LDDA STDFA or STXA See Section 8 3 2 UltraSPARC Non SPARC V9...

Page 66: ...t The MMU signals a data_access_exception trap FT 2016 for this case Table 6 4 D MMU Operations for Normal ASIs Condition Behavior Opcode PRIV Mode ASI W TLB Miss E 0 P 0 E 0 P 1 E 1 P 0 E 1 P 1 Load...

Page 67: ...e Primary Context identifier there is no I MMU Primary Context register Note The endianness of a data access is specified by three conditions the ASI specified in the opcode or ASI register the PSTATE...

Page 68: ...ASI_NUCLEUS Table 6 7 ASI Mapping for Data Accesses Condition for Data Access Access Processed with Opcode PSTATE TL PSTATE CLE D MMU IE Endianness ASI Value Recorded in SFSR LD ST Atomic FLUSH 0 0 0...

Page 69: ...gs can be found in Section 6 10 MMU Bypass Mode on page 68 However if a bypass ASI is used while the D MMU is disabled the bypass operation behaves as it does when the D MMU is enabled that is the acc...

Page 70: ...e conditions 6 9 MMU Internal Registers and ASI Operations 6 9 1 Accessing MMU Registers All internal MMU registers can be accessed directly by the CPU through UltraSPARC defined ASIs Several of the r...

Page 71: ...are must guarantee that the VA is within range Writes to the TSB register Tag Access register and PA and VA Watchpoint Ad dress Registers are not checked for out of range VA No matter what is written...

Page 72: ...ts of the missing virtual address 6 9 3 Context Registers The context registers are shared by the I and D MMUs The Primary Context Register is defined as follows Figure 6 4 D MMU Primary Context Regis...

Page 73: ...field records the 8 bit ASI associated with the faulting instruction This field is valid for both D MMU and I MMU SFSRs and for all traps in which the FV bit is set JMPL and RETURN mem_address_not_ali...

Page 74: ...ys reads as 0 in the I MMU SFSR OW Overwrite Set to one when the MMU detects a fault if the Fault Valid bit Table 6 11 MMU Synchronous Fault Status Register FT Fault Type Field FT 6 0 Fault Type 0116...

Page 75: ...ns the virtual address that was not found in the I MMU TLB For instruction_access_exception traps privilege violation fault type TPC con tains the virtual address of the instruction in the privileged...

Page 76: ...emble the trapping instruction 6 9 6 I D Translation Storage Buffer TSB Registers The TSB registers provide information for the hardware formation of TSB point ers and tag target to assist software in...

Page 77: ...a it may load an incorrect TTE I D TSB_Size The Size field provides the size of the TSB according to the following Number of entries in the TSB or each TSB if split 512 2TSB_Size Number of entries in...

Page 78: ...e location of the missing or trapping TTE in the software maintained TSB The TSB 8 Kb and 64 Kb Pointer registers provide the possible locations of the 8 Kb and 64 Kb TTE re spectively The Direct Poin...

Page 79: ...d stores on the Tag Access register and the TLB Table 6 13 Effect of Loads and Stores on MMU Registers Software Operation Effect on MMU Physical Registers ad Store Register TLB tag TLB data Tag Access...

Page 80: ...Entry The TLB Entry number to be accessed in the range 0 63 The format for the Tag Read register is as follows Figure 6 14 I D MMU TLB Tag Read Registers I D VA 63 13 The 51 bit virtual page number P...

Page 81: ...d TLB entry ASI loads from the TLB Data In register are not supported 9 10 I D MMU Demap Demap is an MMU operation as opposed to a register as described above The purpose of Demap is to remove zero on...

Page 82: ...e data demap registers requires either a MEMBAR Sync FLUSH DONE or RETRY before the point that the effect must be visible to data accesses A STXA to the I MMU demap registers requires a FLUSH DONE or...

Page 83: ...from the specified TLB If the TTE Global bit is set the TTE is not removed 10 MMU Bypass Mode In a bypass access the D MMU sets the physical address equal to the truncated virtual address that is PA 4...

Page 84: ...In Data Access Tag Read Registers on page 64 Write operation The TLB simultaneously writes the CAM and RAM portion of the specified entry or the entry given by the replacement policy described in Sec...

Page 85: ...ardware Description The hardware diagram in Figure 6 16 on page 70 and the code fragment in Code Example 6 1 on page 71 describe the generation of the 8 Kb and 64 Kb pointers in more detail Figure 6 1...

Page 86: ...Mask marks the bits from TSB Base Reg TSBBaseMask 0xffffffffffffe000 split TSBSize 1 TSBSize Shift va towards lsb appropriately and zero out the original va page offset vaPortion va type 8K_POINTER 9...

Page 87: ...Sun Microelectronics 72 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 88: ...how to obtain the data sheet 7 2 Overview of UltraSPARC External Interfaces Figure 7 1 on page 74 shows the UltraSPARC s main interfaces Model dependent interface lengths are labeled in italics inste...

Page 89: ...an interconnect master and an interconnect slave As an interconnect master UltraSPARC issues read write transactions to the interconnect using part of the transaction set Section 7 5 As a master it al...

Page 90: ...s paper discussing this algorithm is documented in the Bibliography The UDBs generate ECC when sending data and check the ECC when receiving data The SYSDATA transaction set supports both 64 byte blo...

Page 91: ...UltraSPARC to the system by hiding system latency for example for Writebacks and noncacheable stores The UDB supports multiple outstanding transactions to increase overall bandwidth The UDB also handl...

Page 92: ...the E Cache Store buffer All cacheable stores go to the E Cache because the D Cache is write through the order of stores with respect to loads is determined by the memory ordering model Prefetch unit...

Page 93: ...tisfy copyback requests from the system Table 7 5 shows the number of Writeback buffer entries for each UltraSPARC model Note Models that support more than one Writeback buffer entry can be restricted...

Page 94: ...ments Notice that the reads are fully pipelined thus full throughput is achieved Three requests are made before the data of the first request comes back and the latency of each re quest is three cycle...

Page 95: ...The data address is presented on the ECAD pins in the cycle after the request cycle 6 for W0 and the data is sent in the following cycle cycle 7 Separating the ad dress and the data by one cycle redu...

Page 96: ...ated to Modified M state at the same time that the data is written as shown in Figure 7 7 on page 82 1 1 1 CLK CYCLE 0 1 2 3 4 5 6 7 8 9 TSYN_WR_L R0 R1 R2 TOE_L R0 R1 R2 ECAT A0_tag A1_tag A2_tag TDA...

Page 97: ...1 1 1 Mode overlap of tag and data accesses The data for three previous writes W0 W1 and W2 is written while three tag accesses reads are made for three younger stores R3 R4 and R5 Figure 7 8 Timing O...

Page 98: ...force an extra dead cycle while the E Cache data bus driver is switched from the SRAMs to the UltraSPARC UltraSPARC uses a one deep write buffer in the data SRAMs to reduce the read to write turn aro...

Page 99: ...poten tial drivers the same enable logic can and should be used for both Holding am plifiers in the System Controller must maintain the last state of Addr_Valid whenever UltraSPARC or the SC stop driv...

Page 100: ...e inside the SC or UltraSPARC All tristate output enables for the SYSADDR bus and Addr_Valid are registered This requires the protocol to be described as a pipeline where only the state of the request...

Page 101: ...lowing the de assertion of RESET_L 3 The UltraSPARC for which LAST PORT DRIVER port_ID 1 0 can take advantage of a rule that allows request then drive Otherwise the UltraSPARC will minimally see a req...

Page 102: ...DRIVER can drive without being dependent on possible simultaneously asserted requests Fairness is provided by the release request in presence of another request rule for example a request from anothe...

Page 103: ...DRIVER Addr_Valid tells the SC when the CURRENT DRIVER is driving a valid packet it is needed because the CURRENT DRIVER may keep its request asserted for longer than the minimum time required to deli...

Page 104: ...13 4 cycles if the CURRENT DRIVER must be forced off Figure 7 14 Figure 7 12 shows the timing in a uniprocessor system with the UltraSPARC driving back to back packets in the absence of a request from...

Page 105: ...ycle however and Port1 becomes CURRENT DRIVER as a result Figure 7 14 Arbitration CURRENT DRIVER Loses Ownership While Asserting Request Figure 7 15 on page 91 shows the timing when the SC takes owner...

Page 106: ...is al lowed to drive its packet s after one arbitration cycle 0 0 0 0 0 SC drives SYSADDR Addr_Valid 0 Undriven Addr_Valid 0 SYSADDR Port0 drives LAST PORT DRIVER Req 0 SC Request SYSADDR Addr_Valid 0...

Page 107: ...ed only to cacheable memory UltraSPARC splits P_REQ transactions into two independent classes Class 0 contains read transactions due to cacheable misses and block loads Class 1 contains Writeback requ...

Page 108: ...receives an S_REPLY for that line The SC must not issue an S_REPLY for a request with the same cache index that is for each coherent read or Writeback during the window between an S_REQ and P_REPLY f...

Page 109: ...cache coherence protocol operates on Physically Indexed Physically Tagged PIPT writeback caches The E Cache maintains inclusion for both the I Cache and the D Cache that is all lines in the internal c...

Page 110: ...g are invariants for the state transitions 1 Only one cache in the system can ever have the line in E or M state while a line is in E or M state no other cache can have a copy of that line 2 Only one...

Page 111: ...line that is already in its cache this includes P_RDD_REQ Figure 7 20 on page 95 shows that some transitions are caused by the PREFETCH A instructions which are not supported by all UltraSPARC models...

Page 112: ...KD S M Store hit atomic hit to Shared Clean line PREFETCH P_RDO_REQ S_OAK S I i A Shared Clean line is victimized by UltraSPARC I Cache miss Write hit on shared line P_RDS_REQ or P_RDSA_REQ or P_RDO_R...

Page 113: ...making the read with Writeback complete atomically this is described later Figure 7 21 illustrates a system that uses Dtags to maintain cache coherence the system contains multiple UltraSPARCs one Dt...

Page 114: ...mize block A for block B then block B will simply overwrite block A in the Etags and the Dtags for UltraSPARCk In this case the writeback buffer and DtagTB would not be used for this transaction since...

Page 115: ...h it sent an S_REQ before S_REPLYing to the original requesting UltraSPARC In general the SC does not complete the original transaction until all of the related S_REQs are P_REPLYed Implementations ma...

Page 116: ...hat UltraSPARC was initiating or had an outstanding P_WRB_REQ to the same address 40 6 Since some other writer has ownership this Writeback should not complete to memory because the other writer s mod...

Page 117: ...he has this datum that is if this is the first read of the datum then Etag transitions to E This gives exclusive access to the requesting UltraSPARC to later write this datum without generating anothe...

Page 118: ...a store hit or atomic hit on a shared line Etag transitions to M For a store miss or atomic miss SC gets data from memory or another processor and provides it to UltraSPARC with the S_RBU reply after...

Page 119: ...that each UltraSPARC model supports 7 4 1 Error Handling The system can reply with S_RTO time out typically if the address is for unim plemented memory or S_ERR bus error typically if the access is il...

Page 120: ...pts to report write failures 7 7 6 WriteInvalidate P_WRI_REQ Coherent Write and Invalidate request Generated by UltraSPARC for a block store to an S O or I state line or a block store commit to a line...

Page 121: ...P_SACKD if the block has been victimized from the E Cache but not yet written back P_SNACK if the block is not present in the E Cache or the writeback buffer UltraSPARC responds more quickly if NDP 0...

Page 122: ...undefined data in response to the S_CRAB UltraSPARC responds more quickly if NDP 0 SC should assert NDP only in sys tems that do not support Dtags Section 7 10 S_REQ on page 111 for more tim ing info...

Page 123: ...st SC can send its next coherent request on the cycle after the S_CRAB reply 7 10 CopybackToDiscard S_CPD_REQ Non destructive copyback request from SC to UltraSPARC Generated by SC to service a ReadTo...

Page 124: ...traSPARC does not cache data associated with these transactions 7 8 1 NonCachedRead P_NCRD_REQ Noncached Read Generated by an UltraSPARC by a load or instruction fetch from a noncached address space o...

Page 125: ...from SYSDATA Table 7 13 shows the number of outstanding NonCachedBlockRead transactions that each UltraSPARC model supports 8 3 NonCachedWrite P_NCWR_REQ Noncached Write Generated by UltraSPARC to wri...

Page 126: ...es UltraSPARC I also imposes the following restrictions on back to back S_REQs If the previous S_REQ requires a data transfer the earliest that SC can send the next S_REQ both S_INV_REQ and S_CP _REQ...

Page 127: ...with P_SNACK In systems with Dtags SC sets NDP 0 in all S_REQs This allows UltraSPARC to reply P_SACK D without searching its tag store which is a significant optimi zation All other effects are the s...

Page 128: ...ust receive its S_REPLY before UltraSPARC II issues a third read with DVP 1 UltraSPARC delays issue of a coherent read to any address that has an outstand ing Writeback UltraSPARC inhibits its own int...

Page 129: ...ust like for clean victims 2 UltraSPARC keeps the dirty victimized block in the coherence domain for copyback invalidate requests from SC until it receives the S_REPLYs for both the read and Writeback...

Page 130: ...h P_SACK if the requested line is in the E Cache P_SACKD if there is a pending Writeback for the line and P_SNACK if the line is not present Some special cases to this are described below The only dif...

Page 131: ...lock See the discussion accompanying Figure 7 19 on page 93 for more information 12 Interrupts P_INT_REQ UltraSPARC can both send and receive interrupt requests Interrupt requests are used to report i...

Page 132: ...try later after some backoff period 7 12 1 Extended Interrupt Target ID During an interrupt send UltraSPARC also passes PA 20 19 to create an extend ed MID 6 5 field See Chapter 9 Interrupt Handling T...

Page 133: ...s all P_REPLYs as an ac knowledgment to a previous SC request UltraSPARC can assert P_FERR at any time to indicate a fatal error requiring system reset upon seeing P_FERR from Table 7 17 P_REPLY Encod...

Page 134: ...be sent P_IAK Interrupt Acknowledge Reply to a P_INT_REQ from SC UltraSPARC acknowledges that the interrupt transaction has been serviced SC can send the next P_INT_REQ request and its data P_SACK Coh...

Page 135: ...andshake for delivering data to UltraSPARC 4 Figure 7 27 on page 124 shows the timing for back to back S_REQs for Copyback The earliest that SC can send another S_REQ to the same Table 7 19 S_REPLY En...

Page 136: ...SDATA bus can be kept continually busy without any dead cycles as long as the same source is driving the data If sources are switched one dead cycle is required on SYSDATA this allows the first source...

Page 137: ...SC commands the output data queue of the UltraSPARC that contains the block to drive 64 bytes of copyback data on SYSDATA Issued in response to a P_SACK or P_SACKD reply from UltraSPARC containing th...

Page 138: ...alls Figure 7 24 S_REPLY Timing UltraSPARC Sourcing Block Write No Data Stall Figure 7 25 S_REPLY Timing UltraSPARC Receiving Block Write No Data Stall S_REPLY Data on Bus S_WAB D 0 D 1 D 2 D 3 2 cloc...

Page 139: ...lifies the S_REPLY signal accompanying a data transfer The following rules govern the assertion of Data_Stall 1 When UltraSPARC is sourcing data the earliest that SC can assert Data_Stall is one syste...

Page 140: ...g of any quadword including the first quadword at the sink UltraSPARC can be delayed for an arbitrary number of clock cycles by keeping Data_Stall asserted for that many clock cycles Figure 7 30 shows...

Page 141: ...raSPARC II will not issue any other request Finally UltraSPARC II will not is sue a P_NCRD_REQ if any Class 0 transaction is outstanding UltraSPARC issues all other transactions in Class 1 and can hav...

Page 142: ...ictim read miss before its corresponding Writeback If the E Cache data bus is busy or if the assertion of an external re quest takes away SYSADDR the Writeback can be delayed A Writeback is not issued...

Page 143: ...k Typical systems will however since they complete all Class 1 transactions in or der Additionally UltraSPARC I restricts the issue of a read with Writeback until any prior read with Writeback has com...

Page 144: ...summarizes the requests and replies generated by the SC Table 7 21 Requests and Replies Generated by UltraSPARC Requests Replies P_RDS_REQ P_IDLE P_RDSA_REQ P_RERR P_RDO_REQ P_RAS P_RDD_REQ P_SACK P_W...

Page 145: ...ror and data is to be transferred to from UltraSPARC Table 7 23 Valid Request and Reply Types UltraSPARC to SC UltraSPARC Request Reply from SC P_RDS_REQ S_RBU or S_RBS or S_ERR2 or S_RTO2 P_RDSA_REQ...

Page 146: ...how the transfer of control between the processors and the SC Thus each table row may represent zero or more clock ticks 7 16 1 ReadToShare Block Condition Load miss on Processor 1 no other processor...

Page 147: ...Etag I P_RDS_REQ to System Initial state Etag E Initial state Etag I S_CPB_REQ to P2 P2 copies block to copyback buffer P2 updates Etag E S P_SACK reply to System S_CRAB reply to P2 S_RBS reply to P1...

Page 148: ...block When the miss victimizes a clean block instead of an invalid block the sequence is the same When Processor 2 s initial state is Etag M or O the sequence is the same 7 16 6 ReadToOwn Block Condi...

Page 149: ...ag S P_RDO_REQ to System Initial state Etag O Initial state Etag S S_INV_REQ to P2 S_INV_REQ to P3 P2 updates Etag O I P_SACK to System P3 updates Etag S I P_SACK to System S_OAK to P1 no data is tran...

Page 150: ...stem Processor 2 Processor 3 Initial victim state Etag1 M Initial missed state Etag2 I P1 copies the victim block into the Writeback buffer P_RDS_REQ to System DVP bit set Initial state Etag2 I Initia...

Page 151: ...rty Victimized Block Processor 1 System Processor 2 Processor 3 ial victim state g1 M ial missed state g2 I copies the victimized block into the teback buffer RDS_REQ to System VP bit set Initial stat...

Page 152: ...missed state Etag2 I P1 copies the victimized block into the writeback buffer P_RDS_REQ to System DVP bit set Initial state Etag1 I Initial state Etag2 I Initial state Etag2 I S_RBU reply to P1 P1 re...

Page 153: ...uest packets are carried over SYSADDR Table 7 36 Copyback Invalidate Dirty Victimized Block in Owned State Processor 1 System Processor 2 Processor 3 ial victim state g1 O ial missed state g2 I copies...

Page 154: ...7 31 Transaction Types Figures 7 32 7 33 and 7 34 show the transaction request packet formats Packet Type Initiated by UltraSPARC Cache Coherent P_RDS_REQ P_RDSA_REQ P_RDO_REQ Non Cached P_NCWR_REQ In...

Page 155: ...s 16 4 Reserved 22 13 NDP 33 Physical Address 8 6 Class 0 Reserved Second Cycle Master ID 35 29 Parity Physical Address 16 4 12 ByteMask 15 0 33 Class 0 34 28 13 Transaction Type Physical Address 38 1...

Page 156: ...ass bit identifies which of the two master Class queues the request has been issued from The system must maintain strong ordering between transac Table 7 37 Interconnect Transaction Type Encoding Tran...

Page 157: ...riteback bit This bit is set when a coherent read victim ized a dirty line The system uses this bit for victim handling 7 17 2 7 IV A Invalidate me Advisory bit in P_WRI_REQ transaction only UltraSPAR...

Page 158: ...without Dtags however SC must send the requesting UltraSPARC an S_INV_REQ if IVA 1 in a P_WRI_REQ 7 18 1 Using the IV A bit in a P_WRI_REQ UltraSPARC can issue a cache coherent block store that will g...

Page 159: ...a subsequent co herent miss to the same address might complete first Systems with Dtags ignore the IVA bit so this is not an issue Note This hazard occurs only in uniprocessor systems without Dtags In...

Page 160: ...32 bits of the 64 bit address to zero when the address mask AM bit in the PSTATE register is set Both big and little endian byte orderings are supported in UltraSPARC The de fault data access byte ord...

Page 161: ...through their virtual addresses as physical addresses Accesses made using these ASIs are always made in big endian mode regardless of the setting of the D MMU s IE bit Accesses to Internal ASIs with i...

Page 162: ...address space user privilege V9 1116 ASI_AS_IF_USER_SECONDARY ASI_AIUS RW2 Secondary address space user privilege V9 1816 ASI_AS_IF_USER_PRIMARY_LITTLE ASI_AIUPL RW2 Primary address space user privile...

Page 163: ...nostics access A 8 2 16 ASI_INTR_DISPATCH_STATUS ASI_INTR_DISPATCH_STATUS 016 R1 Interrupt vector dispatch status 9 3 3 16 ASI_INTR_RECEIVE ASI_INTR_RECEIVE 016 RW Interrupt vector receive status 9 3...

Page 164: ...U TSB 8K Pointer Register 6 9 8 5A16 ASI_DMMU_TSB_64KB_PTR_REG ASI_DMMU_TSB_64KB_PTR_REG 016 R1 D MMU TSB 64K Pointer Regis ter 6 9 8 5B16 ASI_DMMU_TSB_DIRECT_PTR_REG ASI_DMMU_TSB_DIRECT_PTR_REG 016 R...

Page 165: ...ASI_UDB_INTR_W 5016 W1 Outgoing interrupt vector data register 1 9 3 1 16 ASI_UDB_INTR_W ASI_UDB_INTR_W 6016 W1 Outgoing interrupt vector data register 2 9 3 1 16 ASI_BLOCK_AS_IF_USER_PRIMARY_LI TTLE...

Page 166: ...ry address space 4 16 bit partial store little endian 13 6 1 CB16 ASI_PST16_SECONDARY_LITTLE ASI_PST16_SL W1 4 Secondary address space 4 16 bit partial store little endian 13 6 1 CC16 ASI_PST32_PRIMAR...

Page 167: ...ster s ID field A16 ASI_FL16_PRIMARY_LITTLE ASI_FL16_PL RW 4 Primary address space one 16 bit floating point load store little endian 13 6 2 B16 ASI_FL16_SECONDARY_LITTLE ASI_FL16_SL RW 4 Secondary ad...

Page 168: ...DQ Set to zero since incoming slave data writes are not supported by UltraSPARC PREQ_RQ Set to one since one incoming P_REQ request may be outstanding at one time Two types of incoming requests are su...

Page 169: ...RC I Figure 8 3 UPA_CONFIG Register UltraSPARC II MCAP UltraSPARC II Implementation dependent module capability bits Software can use these bits to determine the processor module speed capability Thes...

Page 170: ...nsaction WB 10 UltraSPARC II Maximum number of outstanding Writebacks SCIQ0 9 8 UltraSPARC II Maximum number of outstanding Class 0 transactions BST 7 Maximum number of outstanding block stores NCST 6...

Page 171: ...for use by an implementation 4 2 SPARC V9 Defined ASRs Table 8 3 defines the SPARC V9 ASRs that must be supported by a conforming processor implementation 1 An attempt to read this register by non pri...

Page 172: ...m asi rd tick regrd rd pc regrd rd fprs regrd wr regrs1 reg_or_imm fprs Table 8 4 Non SPARC V9 ASRs ASR Value ASR Name Syntax Access Description Section 1016 PERF_CONTROL_REG RW3 Performance Control R...

Page 173: ...1 tick_cmpr rd dcr regrd wr regrs1 dcr Table 8 5 Other UltraSPARC Registers Register Name Access Description Section INTERRUPT_GLOBAL_REG RW 8 Interrupt handler globals 14 5 9 MMU_GLOBAL_REG RW 8 MMU...

Page 174: ...is generated instead of a data_access_protection trap 9 AG alternate globals MG MMU globals IG interrupt globals illegal_instruction AG 01016 710 privileged_opcode AG 01116 6 fp_disabled AG 02016 8 fp...

Page 175: ...these ASIs are used with incorrect opcodes they do not take mem_address_not_aligned or illegal_instruction traps for memory and register alignment required by the ASI For example block ASIs require 6...

Page 176: ...cross call a processor or I O device first writes to the Outgoing Interrupt Vector Data Registers according to an established software convention described below A subsequent write to the Interrupt Ve...

Page 177: ...ence PSTATE IE 1 DONE if NACK Retry after random delay if NACKED Until DONE Note In order to avoid deadlocks interrupts must be enabled for some period before retrying the atomic sequence Alternativel...

Page 178: ...n Section 9 1 2 Interrupt Vector Re ceive on page 162 the processor takes an implementation dependent interrupt_vector trap after receiving an interrupt packet Software uses a number of scratch regist...

Page 179: ...7016 A write to this ASI triggers an interrupt vector dispatch to the target CPU resid ing at slot MID Module ID along with the contents of the three Interrupt Vector Data Registers A read from this...

Page 180: ...rrupt Vector Data Registers Privileged ASI_UDB_INTR_R data 0 ASI 7F16 VA 63 0 4016 ASI_UDB_INTR_R data 1 ASI 7F16 VA 63 0 5016 ASI_UDB_INTR_R data 2 ASI 7F16 VA 63 0 6016 Data Interrupt data A read fr...

Page 181: ...ield matches the TICK Register s counter field the TICK_INT field is set and a software interrupt is generated See also Section 14 1 7 TICK Register on page 239 and Section 14 5 1 Per Processor TICK C...

Page 182: ...d for service at level n have been serviced the kernel will write to the CLEAR_SOFTINT register ASR 1516 with bit n set in order to clear that interrupt Note that the complement of the value written t...

Page 183: ...Sun Microelectronics 168 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 184: ...e bits in the LSU_Control_Register When PSTATE RED is explicitly set by a software write there are no side effects other than disabling the I MMU Software must create the appropriate state itself Trap...

Page 185: ...E or RETRY instruction in RED_state Note that the RAS is cleared after Power on Reset Section 16 2 10 Return Address Stack RAS on page 272 discusses the RAS in detail The following code fragment fills...

Page 186: ...s reset affects only one processor not the entire system 10 1 4 Watchdog Reset WDR and error_state A SPARC V9 processor enters error_state when a trap occurs and TL MAXTL The processor signals itself...

Page 187: ...Unchanged Y Unknown Unchanged PIL Unknown Unchanged CWP Unknown Unchanged except for register window traps TT TL 1 trap type 3 4 trap type CCR Unknown Unchanged ASI Unknown Unchanged TL MAXTL min TL...

Page 188: ...dep impl dep impl dep 0 0 0 0 0 0 slot ID 1 0 1 1B16 Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged slot ID 1 0 1 1B16 LSU_CONTROL all 0 off 0 off VA_WATCHPO...

Page 189: ...changed MID Unknown Unchanged ESTATE_ERR_EN ISAPEN sys addr err NCEEN non CE CEEN CE 0 off 0 off 0 off Unchanged Unchanged Unchanged AFAR PA Unknown Unchanged AFSR all Unchanged Unchanged Other UltraS...

Page 190: ...Fault Status Register and the UDB Error Register see Section 11 3 3 Asynchronous Fault Address Register on page 182 Section 11 3 2 Asynchro nous Fault Status Register on page 180 and Section 11 3 4 U...

Page 191: ...the oldest non executed instruction and its next PC As a result execution cannot normally be resumed from the point that the trap is taken Instruction access errors are reported before executing the...

Page 192: ...ter flushing to remove the corrupted data In case of an instruction error the instruction returned to the CPU is marked for ter mination to be aborted This means that a bad instruction will not create...

Page 193: ...aSPARC will take a disrupting data_access_error trap with priority 33 instead of a deferred trap This avoids panics when the system displaces corrupted user data from the cache Note To prevent multipl...

Page 194: ...ced before installing in the E Cache This prevents using the bad data or having the bad data written back to memory with good ECC bits Uncorrectable ECC errors on cache fills will be reported for any...

Page 195: ...olicy de scribed in Table 11 6 Error Detection and Reporting in AFAR and AFSR on page 183 The AFSR is logically divided into four fields Bit 32 the accumulating multiple error ME bit is set when multi...

Page 196: ...e clear will be performed before logging the new error status The syndrome field is read only and writes to this field are ig nored Refer to Table 10 1 Machine State After Reset and in RED_state on pa...

Page 197: ...ll corresponding error bits in AFSR If software attempts to write to these bits at the same time as an error that captures address occurs the error ad Table 11 3 E Cache Data Parity Syndrome Bit Order...

Page 198: ...ress Register Bits Field Use RW 63 41 Reserved R 40 4 PA 40 4 Physical address of faulting transaction RW 3 0 Reserved R Table 11 6 Error Detection and Reporting in AFAR and AFSR Error Type PA SYNDROM...

Page 199: ...for correctable errors from system In case of multiple outstanding errors only the first is recorded Bits 9 8 are sticky error bits that record the most recently detected errors These bits accumulate...

Page 200: ...ultiple errors conditions have occurred Errors are captured in the order that they are detected not necessarily in program order If an error occurs at the same time as error bits are cleared by softwa...

Page 201: ...e P_SYND field 11 5 3 AFSR E Cache Tag Parity ETS Overwrite Policy Parity information for the first occurrence of any error is captured in the ETS field of the AFSR register Error logging in this fiel...

Page 202: ...V9 12 Instruction Set Summary 189 13 UltraSPARC Extended Instructions 195 14 Implementation Dependencies 235 15 SPARC V9 Memory Models 255 Artisan Technology Group Quality Instrumentation Guaranteed 8...

Page 203: ...Sun Microelectronics 188 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 204: ...cates a SPARC V9 core instruction The Ref column lists the section number that contains the instruction documentation SPARC V9 core instructions are documented in The SPARC Architecture Manual Version...

Page 205: ...add A 12 LIGNDATA Perform data alignment for misaligned data 13 5 5 NDNOT1 s Negated src1 AND src2 single precision 13 5 6 NDNOT2 s src1 AND negated src2 single precision 13 5 6 ND s Logical AND sing...

Page 206: ...bit fixed pack 13 5 3 FPACK 16 32 Four 16 bit two 32 bit pixel pack 13 5 3 FPADD 16 32 s Four 16 bit two 32 bit partitioned add single precision 13 5 2 FPMERGE Two 32 bit pixel to 64 bit pixel merge 1...

Page 207: ...7 UWA Load unsigned word from alternate space A 28 X Load extended A 27 XA Load extended from alternate space A 28 XFSR Load extended floating point state register A 25 MBAR Memory barrier A 31 OVcc M...

Page 208: ...ht logical extended A 31 STB Store byte A 53 STBA Store byte into alternate space A 54 STBAR Store barrier A 50 STD Store doubleword A 53 STDA Store doubleword into alternate space A 54 STDF Store dou...

Page 209: ...teger divide and modify condition codes A 10 IVX 64 bit unsigned integer divide A 36 MUL UMULcc Unsigned integer multiply and modify condition codes A 37 RASI Write ASI register A 62 RASR Write ancill...

Page 210: ...emory accesses see Section 13 6 Memory Access Instructions 13 2 SHUTDOWN Format 3 Description The SHUTDOWN instruction waits for all outstanding transactions to be com pleted This leaves the system an...

Page 211: ...tain the data sheet This is a privileged instruction an attempt to execute it while in non privileged mode causes a privileged_opcode trap Traps privileged_opcode Note Privileged software should save...

Page 212: ...val ue Conversion from 32 bit fixed to 16 bit fixed is also supported with the FPACKFIX instruction Rounding can be performed by adding 1 to the round bit position Complex calculations needing more dy...

Page 213: ...S_LITTLE instruction See Section 13 5 5 Alignment Instructions on page 214 Traps fp_disabled 5 Graphics Instructions All instruction operands are in floating point registers unless otherwise speci fie...

Page 214: ...01 0001 Two 16 bit add FPADD32 0 0101 0010 Two 32 bit add FPADD32S 0 0101 0011 One 32 bit add FPSUB16 0 0101 0100 Four 16 bit subtract FPSUB16S 0 0101 0101 Two 16 bit subtract FPSUB32 0 0101 0110 Two...

Page 215: ...raphics instruction source operand in the next instruction group Similarly do not use the result of a standard FPADD as a 32 bit graphics instruction source operand in the next instruction group Traps...

Page 216: ...o not use the result of an FPACK as part of a 64 bit graphics instruction source operand in the next three instruction groups Do not use the result of FEXPAND or FPMERGE as a 32 bit graphics instructi...

Page 217: ...on is performed to convert the scaled value into a signed integer that is round toward negative infinity If the resulting value is negative that is the MSB is set zero is delivered as the clipped valu...

Page 218: ...ing clipping information 2 For each 32 bit value truncate and clip to an 8 bit unsigned integer starting at the bit immediately to the left of the implicit binary point i e between bits 23 and 22 of e...

Page 219: ...s the result in the 32 bit rd register This operation illustrated in Figure 13 5 is carried out as follows 1 Left shift each 32 bit value in rs2 by the number of bits in the 3 rs2 rd 7 2 0 5 implicit...

Page 220: ...igned integer i e rounds toward negative infinity If the resulting value is less than 32768 32768 is delivered as the clipped value If the value is greater than 32767 32767 is delivered Otherwise the...

Page 221: ...s to a 16 bit fixed value 2 Stores the results in the rd register Figure 13 6 FEXPAND Operation 5 3 5 FPMERGE FPMERGE interleaves four corresponding 8 bit unsigned values in rs1 and rs2 to produce a 6...

Page 222: ...ed when it is applied twice in suc cession for example R1R2R3R4 B1B2B3B4 R1B1R2B2R3B3R4B4 R1G1B1A1R2G2B2A2 Figure 13 7 FPMERGE Operation 6 3 rs1 rd 1 3 1 5 4 7 0 1 7 5 2 3 3 1 2 3 7 3 9 5 5 0 1 7 5 2...

Page 223: ...duct FMUL8x16AU 0 0011 0011 8 16 bit upper partitioned product FMUL8x16AL 0 0011 0101 8 16 bit lower partitioned product FMUL8SUx16 0 0011 0110 upper 8 16 bit partitioned product FMUL8ULx16 0 0011 011...

Page 224: ...e most significant bit Typically this operation is used with filter coefficients as the fixed point rs2 value and image data as the rs1 pixel value Appropriate scaling of the coefficient allows variou...

Page 225: ...L is the same as FMUL8x16AU except that the least significant 16 bits of the 32 bit rs2 register are used for the value Figure 13 10 FMUL8x16AL Operation 3 rd rs1 1 1 5 2 3 0 7 rs2 0 6 3 3 rd rs1 1 1...

Page 226: ...4 5 FMUL8ULx16 FMUL8ULx16 multiplies the unsigned lower 8 bits of each 16 bit value in rs1 by the corresponding fixed point signed integer in rs2 Each 24 bit product is sign extended to 32 bits The u...

Page 227: ...ed left by 8 bits to make up a 32 bit result The result is stored in the corresponding 32 bit of the destination rd register The operation is illustrated in Figure 13 13 3 rd rs1 1 1 5 2 3 0 7 rs2 5 5...

Page 228: ...duct is sign extended to 32 bits and stored in the rd register The operation is illustrated in Figure 13 14 Figure 13 14 FMULD8ULx16 Operation Code Example 13 2 16 bit x 16 bit 32 bit Multiply fmuld8s...

Page 229: ...lf of the concatenated value Bytes in this value are numbered from most significant to least significant with the most sig nificant byte being byte 0 Eight bytes are extracted from this value where th...

Page 230: ...c2 single precision FOR 0 0111 1100 Logical OR FORS 0 0111 1101 Logical OR single precision FNOR 0 0110 0010 Logical NOR FNORS 0 0110 0011 Logical NOR single precision FAND 0 0111 0000 Logical AND FAN...

Page 231: ...fregrs1 fregrs2 fregrd fands fregrs1 fregrs2 fregrd fnand fregrs1 fregrs2 fregrd fnands fregrs1 fregrs2 fregrd fxor fregrs1 fregrs2 fregrd fxors fregrs1 fregrs2 fregrd fxnor fregrs1 fregrs2 fregrd fx...

Page 232: ...Pixel Compare Instructions Format 3 opcode opf operation FCMPGT16 0 0010 1000 Four 16 bit compare set rd if src1 src2 FCMPGT32 0 0010 1100 Two 32 bit compare set rd if src1 src2 FCMPLE16 0 0010 0000...

Page 233: ...For FCMPLE each bit in the result is set if the corresponding value in rs1 is less than or equal to the value in rs2 Greater than or equal comparisons are made by swapping the operands For FCMPEQ each...

Page 234: ...mask is computed from left and right edge masks as fol lows 1 The left edge mask is computed from the 3 least significant bits LSBs of rs1 and the right edge mask is computed from the 3 LSBs of rs2 a...

Page 235: ...left edge mask The integer condition codes are set the same as a SUBCC instruction with the same operands End of scan line comparison tests may be performed using edge with an appropriate conditional...

Page 236: ...ce the result of a nonPDIST instruction in the previous two instruction groups Table 13 2 Edge Mask Specification Little Endian Edge Size A2 A0 Left Edge Right Edge 8 000 1111 1111 0000 0001 8 001 111...

Page 237: ...8 16 ARRAY16 or 32 bits ARRAY32 The rs2 operand speci fies the power of two size of the X and Y dimensions of a 3D image array The legal values for rs2 and their meanings are shown in the following ta...

Page 238: ...zero The number of zeros in the least signifi cant bits is determined by the element size An element size of eight bits has no zeros an element size of 16 bits has one zero and an element size of 32 b...

Page 239: ...block The following code fragment shows assembly of components along an interpolat ed line at the rate of one component per clock on UltraSPARC Code Example 13 4 Assembly of Components Along an Inter...

Page 240: ...8 bit conditional stores to secondary address space little endian STDFA ASI_PST16_P C216 Four 16 bit conditional stores to primary address space STDFA ASI_PST16_S C316 Four 16 bit conditional stores...

Page 241: ...t is big endian Note If the byte ordering is little endian the byte enables generated by this instruction are swapped with respect to big endian Traps fp_disabled mem_address_not_aligned data_access_e...

Page 242: ...condary address space little endian LDDFA STDFA ASI_FL16_P D216 16 bit load store from to primary address space LDDFA STDFA ASI_FL16_S D316 16 bit load store from to secondary address space LDDFA STDF...

Page 243: ...the low order 8 or 16 bits of the register Little endian ASIs transfer data in little endian format in memory otherwise memory is assumed to big endian Short loads and stores typically are used with...

Page 244: ...access_exception trap will be taken for a noncacheable access or use with any instruction other than LDDA A mem_address_not_aligned trap will be taken if the access is not aligned on a 128 bit boundar...

Page 245: ...STDFA ASI_BLK_S F116 64 byte block load store from to secondary address space LDDFA STDFA ASI_BLK_PL F816 64 byte block load store from to primary address space little endian LDDFA STDFA ASI_BLK_SL F9...

Page 246: ...from a 64 byte aligned memory area into eight double precision floating point registers specified by fregrd The lowest addressed eight bytes in memory are loaded into the lowest numbered double preci...

Page 247: ...rules data from before or after the load may be used UltraSPARC continues exe cution before all of the store data has been transferred If store data registers are overwritten before the next block sto...

Page 248: ...MEMBAR StoreLoad instruction the contents of the block are undefined If the BST overlaps a later store or flush and there is no intervening trap or MEM BAR StoreStore instruction the contents of the b...

Page 249: ...a f6 f8 f40 faligndata f8 f10 f42 faligndata f10 f12 f44 faligndata f12 f14 f46 addcc l0 1 l0 bg pt l1 fmovd f14 f48 end of loop handling l1 ldda regaddr ASI_BLK_P f0 stda f32 regaddr ASI_BLK_P falign...

Page 250: ...P opcodes and instructions with in valid values in reserved fields other than reserved FPops or fields in graphics in structions that reference floating point registers and the reserved field in the T...

Page 251: ...er roneous condition Upon completion of trap processing the state of the CPU is restored before returning to the offending code or terminating the process This time consuming operation is necessary be...

Page 252: ...4 1 5 SIGM Support Impdep 116 UltraSPARC initiates a Software Initiated Reset SIR by executing a SIGM in struction while in privileged mode When in non privileged mode SIGM behaves as a NOP See also S...

Page 253: ...struction_access_exception trap if PSTATE AM is not set If the target address of a JMPL or RETURN instruction is an out of range address and PSTATE AM is not set a trap is generated with the PC the ad...

Page 254: ...e the D MMU SFAR contains only 44 bits the trap handler must decode the load or store instruction if the full 64 bit virtual address is needed See also Section 6 9 4 I D MMU Synchronous Fault Status R...

Page 255: ...ight window 64 bit integer register file that is NWINDOWS 8 UltraSPARC truncates values stored in the CWP CANSAVE CANRESTORE CLEANWIN and OTHERWIN registers to three bits This in cludes implicit updat...

Page 256: ...manufacturer code 001716 TI JEDEC number that identifies the manufacturer of an UltraSPARC CPU impl 16 bit implementation code 001016 that uniquely identifies an UltraSPARC class CPU Table 14 3 shows...

Page 257: ...hed_FPop trap is signalled and these operations are handled in system software The unfinished trapping cases are listed in Table 14 4 and Table 14 5 Because trapping on subnormal operands and results...

Page 258: ...n handling Underflow is detected before rounding Prediction of overflow underflow and inexact traps for divide and square root is used to simplify the hardware For divide pessimistic prediction occurs...

Page 259: ...t register file modifying instructions include floating point Table 14 6 Unimplemented Quad Precision Floating Point Instructions Instruction Description F s d TOq Convert single double to quad precis...

Page 260: ...instructions The FBfcc FMOVcc and MOVcc instructions use one of these condition code sets to determine conditional control transfers and conditional register moves Note fcc0 is the same as the fcc in...

Page 261: ...mal Result Trapping Cases NS 0 on page 243 ver This field identifies a particular implementation of the UltraSPARC FPU architecture ftt The 3 bit floating point trap type field is set whenever an floa...

Page 262: ...54 exceptions 14 4 SPARC V9 Memory Related Operations 14 4 1 Load Store Alternate Address Space Impdep 5 29 30 Supported ASI accesses are listed in Section 8 3 Alternate Address Spaces on page 146 14...

Page 263: ...FLUSH sequence is performed UltraSPARC guarantees that earlier code modifications will be visible across the whole system 14 4 5 PREFETCH A Impdep 103 117 For UltraSPARC I PREFETCH A instructions with...

Page 264: ...mpdep 113 121 UltraSPARC supports all three memory models TSO PSO RMO See Section 15 2 Supported Memory Models on page 256 14 4 10 I O Operations Impdep 118 123 I O spaces and their accesses are speci...

Page 265: ...e is described in Chapter 3 Cache Organization 5 3 Memory Management Unit UltraSPARC implements a multi level memory management scheme The MMU architecture is described in Chapter 4 Overview of the MM...

Page 266: ...the trap globals Two 1 bit fields PSTATE IG and PSTATE MG have been added to the PSTATE register to select which set of global registers to use The PSTATE IG and PSTATE MG bits are also stored with th...

Page 267: ...ong with the rest of the PSTATE register When an interrupt_vector trap trap type 6016 is taken UltraSPARC selects the In terrupt Global registers by setting IG and clearing AG and MG When a fast_instr...

Page 268: ...power requirements during idle periods A privileged instruction SHUTDOWN has been added to facilitate a software controlled power down of the CPU and system Power down support is described in Appendix...

Page 269: ...g and Diagnostics Support UltraSPARC support for debug and diagnostics is described in Appendix A Debug and Diagnostics Support on page 303 Artisan Technology Group Quality Instrumentation Guaranteed...

Page 270: ...s to function correctly if data is shared MEMBAR is a SPARC V9 memo ry synchronization primitive that enables a programmer to explicitly control the ordering in a sequence of memory operations Process...

Page 271: ...ual Ver sion 9 The definitions in the following sections apply to system behavior as seen by the programmer A description of MEMBAR can be found in Section 5 3 2 Memory Synchronization MEMBAR and FLUS...

Page 272: ...must check snoop the store buffer for the most recent store to that address For SPARC V9 compatibility a MEMBAR Lookaside should be used between a store and a subsequent load to the same non cacheabl...

Page 273: ...accesses with the E bit set that is those having side effects are all strongly ordered with respect to each other A MEMBAR must be used between cacheable memory references if stronger order is desired...

Page 274: ...SectionIV ProducingOptimizedCode 16 Code Generation Guidelines 261 17 Grouping Rules and Stalls 281 Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 275: ...Sun Microelectronics 260 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 276: ...ntage of by using modern compiler technology This technology was not available previously mainly because the hardware support was not sufficient to justify its development 16 2 Instruction Stream Issu...

Page 277: ...cted for simplicity and timing con siderations hardware support for getting instructions from two adjacent lines was not included Consequently on average for random accesses 3 25 instruc tions are fet...

Page 278: ...he first three instruc tions in a group occupy slots that in most cases are interchangeable with respect to resources Only special cases of instructions that can only be executed in IEU1 followed by I...

Page 279: ...taken branch is among the four instructions the next field contains the index of the target of the branch The following cases represent situations when the prediction bits and or the next field do not...

Page 280: ...al branch is forced after address 31 and there is already a branch in the group Figure 16 5 Artificial Branch Inserted after a 32 byte Boundary 16 2 3 I Cache Timing If accesses to the I Cache hit the...

Page 281: ...s fit in the I Cache and avoid hot spots collisions UltraSPARC provides instru mentation to profile a program and detect if instruction accesses generate a cache miss or a cache hit For example one ca...

Page 282: ...ns from that line are simply forwarded to the next stage If the line is from a different virtual page the translation is obtained from the iTLB a cycle later The cost of crossing a page boundary is th...

Page 283: ...gorithm used for branch prediction is represented in Figure 16 6 Note This figure is identical to Figure A 15 Figure 16 6 Dynamic Branch Prediction State Diagram For loops in steady state the algorith...

Page 284: ...gain any performance The penalty for a mispredicted branch is always 4 cycles SETcc Bicc and the delay slot can be grouped together Figure 16 7 Figure 16 7 Handling of Conditional Branches Conditiona...

Page 285: ...the second CTI It processes the first CTI executes instructions until the second CTI reaches the N3 stage squashes all instructions executed after the first CTI and executes instruc tions starting wi...

Page 286: ...edictions for hard to predict branches For example in Figure 16 10 if the outcome of branch A which is executed before branch B has an impact on the direction on branch B then it is desirable to split...

Page 287: ...L or RETURN with rs1 equal to o7 normal subroutine or i7 leaf subroutine The RAS provides a guess for the target address so that prefetching can continue even though the address calculation has not ye...

Page 288: ...on between two loads returning data As soon as a cycle without a load appears in the pipeline the latency of loads is brought back to two cycles Note The SPARC V8 LD instruction is replaced with LDUW...

Page 289: ...s possible to organize data at compile time so that collisions are mini mized however For frequently executed loops the compiler should organize the data so that all accesses within the loop are mappe...

Page 290: ...Loads that miss the D Cache do not necessarily stall the pipeline non blocking loads Instead they are sent to the load buffer where they wait for the data to be returned from the E Cache The pipeline...

Page 291: ...In this case the younger load also must enter the load buffer it will access the D Cache array only after the older load D Cache miss does so If the load buffer is not empty the D Cache array access i...

Page 292: ...to the same 16 byte sub block the entering load is marked as a hit since by the time it accesses the D Cache array the sub block will be present Code Example 16 2 The detection of a hit eliminates a...

Page 293: ...this in more detail Code Example 16 3 Avoiding Bus Turnaround Penalties 1 1 1 mode only 3 6 5 Using LDDF to Load Two Single Precision Operands Cycle UltraSPARC supports single cycle 8 byte data trans...

Page 294: ...hat there could be a match between them In order to simplify the hardware the full 40 physical address bits are not used when comparing the address of the memory location requested by the load with th...

Page 295: ...s non faulting loads equivalent to silent loads used for Multiflow TRACE and Cydrome Cydra 5 so that loads can be moved ahead of conditional control structures that guard their use Non faulting loads...

Page 296: ...ten in Mixed Case BODY FONT Examples are FdMULq Floating point multiply double to quad SPARC V9 LDDF Load Double Floating Point Register SPARC V9 SHUTDOWN Power Down Support UltraSPARC Instruction Fam...

Page 297: ...uctions are shown with offsets between their stages to indicate the amount of latency that normally occurs between the instructions The following instruc tion pair has one cycle of latency This instru...

Page 298: ...are added for each I Cache miss The next fetch from the I Cache will not add instructions to the instruction buffer for one to two clocks after the E Cache instructions are added Back to back I Cache...

Page 299: ...UBcc TV ADDcc ANDcc ANDNcc ORcc ORNcc SUBcc XORcc XNORcc EDGE and ARRAY CALL JMPL BPr PST and FC MP LE NE GT EQ 16 32 also require the IEU1 data path besides counting as CTI store or floating point in...

Page 300: ...s to the TICK PSTATE and TL registers and FLUSH W instructions cause a pipeline flush when they reach the W Stage effectively inserting nine bubbles 17 5 2 IEU Dependencies Instructions that have the...

Page 301: ...To avoid this software should explicitly force the use instruction to be in the third group or later after the FCMP LE NE GT EQ 16 32 MULX U S MUL cc MULScc U S DIV X U S DIVcc and STD cannot be in t...

Page 302: ...erting nine bubbles into the pipe The pipeline is flushed even if the second DCTI is an nulled 17 6 1 Control Transfer Dependencies UltraSPARC can group instructions following a control transfer with...

Page 303: ...ing the delay slot For example When a control transfer is mispredicted the instruction buffer and instructions younger than the delay slot in the pipe are flushed effectively inserting four bub bles i...

Page 304: ...ruction is stalled in issue until the FDIV in struction completes A predicted annulled load does not affect dependency checking after it is dis patched For example 1 The W1 Stage is a virtual stage th...

Page 305: ...7 Load Store Instructions Load store instructions can be dispatched only if they are in the first three in struction slots One load store instruction can be dispatched per group Load store instruction...

Page 306: ...r example When an instruction referencing a load result enters the E Stage and the data is not yet returned all instructions in the E Stage and earlier will be stalled If there are multiple load uses...

Page 307: ...six clocks after the load reaches the C Stage otherwise Because load data is returned in order a D Cache load hit that reaches the C Stage one clock after a D Cache miss also returns data seven clocks...

Page 308: ...e previous store is dispatched 17 7 1 5 Other Timing Issues Additional clocks are added to the time a load returns data for E Cache misses and arbitration for the D and E Caches An E Cache miss adds a...

Page 309: ...after the data leaves the store buffer Data leaves the store buffer when the write is issued to the E Cache SRAM for cacheable accesses UDB for non cacheable accesses and internal register for intern...

Page 310: ...latency to the first word of read data is at least 18 processor clocks Noncacheable stores are removed from the store buffer with the same timing as if the store were an E Cache hit provided that the...

Page 311: ...ns that have the same destination register in the same register file can not be grouped together For example FBfcc cannot be grouped with an older FCMP E s d even if they reference differ ent floating...

Page 312: ...IV or FSQRT depen dent instruction would be released will be held for one clock regardless of data dependency FDIV and FSQRT use the floating point multiplier for final rounding so an M Class operatio...

Page 313: ...les Floating point loads and stores are independent of these mixed precision rules 1 A floating point or graphics instruction that follows an FMOV FABS FNEG of different precision break the group even...

Page 314: ...ixed precision floating point instruction To avoid the pipe flush overhead software should explicitly force the use instruction to be at least the latency number of groups after the source instruction...

Page 315: ...Vcc s d FMOV s d FABS s d FNEG s d FPADD 16 32 s FPSUB 16 32 s FALIGNDATA FPMERGE FEXPAND FPACK 16 32 FIX FMUL8x16 AL AU FMUL d 8ULx16 FMUL d 8SUx16 PDIST rs1 rs2 FCMPLE 16 32 FCMPNE 16 32 FCMPGT 16 3...

Page 316: ...ort 303 B Performance Instrumentation 319 C Power Management 327 D IEEE 1149 1 Scan Interface 329 E Pin and Signal Descriptions 337 F ASI Names 345 Artisan Technology Group Quality Instrumentation Gua...

Page 317: ...Sun Microelectronics 302 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 318: ...rough system calls to these facilities See Section 6 9 4 I D MMU Synchronous Fault Status Registers SFSR on page 58 for SFSR details Caution A STXA to any internal debug or diagnostic register require...

Page 319: ...tructions that read or update the Graphic Status Register GSR are treated as floating point instructions They cause an fp_disabled trap if either PSTATE PEF or FPRS FEF is cleared See Section 13 5 Gra...

Page 320: ...ority than the physical address watchpoint trap Separate 8 bit byte masks allow watchpoints to be set for a range of addresses Zero bits in the byte mask causes the comparison to ignore the correspond...

Page 321: ...ed 64 bit address into the watch point register 6 LSU_Control_Register ASI 4516 VA 0016 Name ASI_LSU_CONTROL_REGISTER The LSU_Control_Register contains fields that control several memory related hardw...

Page 322: ...le with side effects A 6 3 Parity Control FM 15 0 LSU parity_mask If set UltraSPARC writes will generate incorrect parity on the E Cache data bus for bytes corresponding to this mask The parity_mask c...

Page 323: ...in the watchpoint mask a virtual watchpoint trap is generated A 6 4 3 Physical Address Data Watchpoint Enable PR PW LSU physical_address_data_watchpoint_enable If PR PW is set a data read write that m...

Page 324: ...nto four fields per entry The instruction field contains eight 32 bit instructions The tag field contains a 28 bit physical tag and a valid bit The pre decode field contains eight 4 bit information pa...

Page 325: ...Instruction Access Address Format ASI 6616 IC_set This 1 bit field selects a set 2 way associative IC_addr This 10 bit index 12 3 selects an aligned pair of 32 bit instructions Figure A 7 I Cache Ins...

Page 326: ...This 8 bit index i e addr 12 5 selects an IC_Line IC_line For LDDA accesses this 2 bit field selects a pair of pre decode fields in a 64 bit aligned instruction pair For STXA accesses the least signi...

Page 327: ...ng 7 4 I Cache LRU BRPD SP NFA Fields ASI 6F16 VA 63 14 0 VA 13 IC_set VA 12 3 IC_addr VA 2 0 0 Name ASI_ICACHE_PRE_NEXT_FIELD Figure A 13 I Cache LRU BRPD SP NFA Field Access Address Format ASI 6F16...

Page 328: ...f either of the corresponding instructions is a branch with static prediction bit set other wise IC_brpd is set to likely not taken The prediction bits are subsequently up dated according to the dynam...

Page 329: ...Cache ASI accesses are supported data ASI 4616 and tag valid ASI 4716 A 8 1 D Cache Data Field ASI 4616 VA 63 14 0 VA 13 3 DC_addr VA 2 0 0 Name ASI_DCACHE_DATA Figure A 16 D Cache Data Access Address...

Page 330: ...o PA without page mapping To prevent interference from instruction prefetching modifying the E Cache state LDXA STXA instructions which use these ASIs should be on non physical cacheable pages A 9 1 E...

Page 331: ...READING VA 63 41 0 VA 40 39 2 VA 38 19 0 VA 18 6 EC_addr VA 5 0 0 0 5 Mb VA 38 20 0 VA 19 6 EC_addr VA 5 0 0 1 Mb VA 38 21 0 VA 20 6 EC_addr VA 5 0 0 2 Mb VA 38 22 0 VA 21 6 EC_addr VA 5 0 0 4 Mb VA...

Page 332: ...ASI_ECACHE TAG are ignored but the contents of the E Cache_tag_data_register are written to the selected E Cache line A 9 3 E Cache Tag State Parity Data Accesses ASI 4E16 VA 63 0 0 Name ASI_ECACHE_TA...

Page 333: ...Sun Microelectronics 318 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 334: ...vileged software to access the PICs causes a privileged_action trap Event measurements in non privileged and or privileged modes can be con trolled by setting the PCR UT and PCR ST fields Two 32 bit P...

Page 335: ...er_trace If set events in non privileged user mode are counted This may be set along with PCR ST to count all selected events ST System_trace If set events in privileged system mode are counted This m...

Page 336: ...cle counting is controlled by the PCR UT and PCR ST fields Instr_cnt PIC0 PIC1 The number of instructions completed Annulled mispredicted or trapped start set up PCR end sel PCR sel accumulate stat PI...

Page 337: ...t instruction in the group Dispatch0_FP_use PIC1 First instruction in the group depends on an earlier floating point result that is not yet available but only while the earlier instruction is not stal...

Page 338: ...regardless of whether the access will be used IC_ref PIC0 I Cache references I Cache references are fetches of up to four instructions from an aligned block of eight instructions I Cache references ar...

Page 339: ...wnership UPA transaction EC_wb PIC1 E Cache misses that do writebacks EC_snoop_inv PIC0 E Cache invalidates from the following UPA transactions S_INV_REQ S_CPI_REQS_INV_REQ S_CPI_REQS_INV_REQ S_CPI_RE...

Page 340: ...tch0_IC_miss 0011 Dispatch0_storeBuf 1000 IC_ref 1001 DC_rd 1010 DC_wr 1011 Load_use 1100 EC_ref 1101 EC_write_hit_RDO 1110 EC_snoop_inv 1111 EC_rd_hit Table B 2 PIC S1 Selection Bit Field Encoding S1...

Page 341: ...Sun Microelectronics 326 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 342: ...flushed to memory by software This flush should be done by displacement flush if other masters are doing coherent accesses while the flush is being performed Cache flushing is described in Section 5 2...

Page 343: ...a synchronous wake up signal eliminates the problems of warm switching the PLL loops and sampling the wake up signal without a clock When the reset pin is deasserted UltraSPARC begins RED_state reset...

Page 344: ...parts A test access port controller An instruction register Numerous public and private test data registers For information about how to obtain a copy of IEEE Std 1149 1 1990 see the Bib liography D 2...

Page 345: ...isabled the instruction register is initialized to select the Device ID register 3 2 RUN TEST IDLE An intermediate controller state between scan operations If no instruction is se lected all test data...

Page 346: ...IR SCAN SELECT DR SCAN RUN TEST IDLE TEST LOGIC RESET CAPTURE IR CAPTURE DR EXIT 2 IR EXIT 1 DR PAUSE DR EXIT 2 IR EXIT 2 DR UPDATE IR UPDATE DR 0 PAUSE IR SHIFT DR 0 SHIFT IR 1 0 1 1 1 1 0 0 0 1 1 0...

Page 347: ...orary controller state in which the IR DR retain their previous state D 3 8 PAUSE IR DR A temporary controller state in which the IR DR retain their previous state This state is provided so that the s...

Page 348: ...5 Instructions The UltraSPARC 8 bit instruction register IR implements numerous public and private instructions There are 75 valid instructions out of the 256 possible encod ings all invalid encoding...

Page 349: ...ter as the active test data register Used to perform board level interconnect testing When active the boundary scan chain drive the processor pins Therefore UltraSPARC cannot operate in its normal fun...

Page 350: ...vice inoperative D 6 Public Test Data Registers D 6 1 Device ID Register A 32 bit register that is loaded with the UltraSPARC ID upon entering the CAP TURE DR TAP state when the ID instruction is acti...

Page 351: ...e tween register bits and the pin signals is described in a Boundary Scan Descrip tion Language BSDL file available from your SPARC sales representative Note It is recommended that transitions from th...

Page 352: ...system clock UDB_UEL I Asserted when the Low UDB is driving EDATA 63 0 and it has detected an uncorrect able ECC error in that data Synchronous to system clock UDB_CEH I Asserted when the High UDB is...

Page 353: ...These pins are also used to transfer data to control status registers on the UDB chip EDPAR 7 0 I O Byte parity for EDATA Odd parity is driven for all EDATA transfers from the UDB and checked if UDB...

Page 354: ...s arbitration protocol Synchronous to system clock S_REPLY 3 0 I System Reply packet from the system to UltraSPARC Used by UltraSPARC for flow con trol and initiating data transfers between the system...

Page 355: ...ynchronous to processor clock TOE_L O Active low operation enable for all E Cache tag SRAM reads and writes Active low Synchronous to processor clock Table E 5 Clock Interface Pins Symbol Type Name an...

Page 356: ...nterface Pins Symbol Type Name and Function RESET_L I Asserted asynchronously for POR power on resets Deasserted synchronous to system clock Active low XIR_L I Asserted to signal XIR resets Acts like...

Page 357: ...ntial Clock Input A CLKA 1 I Differential Clock Input B CLKB 1 I PLL loop filter connection LOOP_CAP3 1 I Low Frequency D C signal DC_SPARE 1 I UDB Clock A copy SDBCLKA 1 I UDB Clock B copy SDBCLKB 1...

Page 358: ...m Reply S_REPLY 3 0 4 I System Identification SYSID 4 0 5 I System Clock Input A SYSCLKA 1 I System Clock Input B SYSCLKB 1 I External Event EXT_EVENT 1 I Phase Lock Loop Bypass PLL_BYPASSS 1 I Reset...

Page 359: ...Sun Microelectronics 344 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 360: ..._USER_SECONDARY Secondary address space user privilege 1116 ASI_AS_IF_USER_SECONDARY_LITTLE Secondary address space user privilege little endian 1916 ASI_BLK_AIUP Primary address space block load stor...

Page 361: ...ss 4716 ASI_DMMU D MMU PA Data Watchpoint Register 5816 ASI_DMMU D MMU Secondary Context Register 5816 ASI_DMMU D MMU Synch Fault Address Register 5816 ASI_DMMU D MMU Synch Fault Status Register 5816...

Page 362: ...imary address space one 8 bit floating point load store D016 ASI_FL8_PL Primary address space one 8 bit floating point load store little endian D816 ASI_FL8_PRIMARY Primary address space one 8 bit flo...

Page 363: ...e TL 0 little endian 0C16 ASI_NUCLEUS_QUAD_LDD Cacheable 128 bit atomic LDDA 2416 ASI_NUCLEUS_QUAD_LDD_L Cacheable 128 bit atomic LDDA little endian 2C16 ASI_NUCLEUS_QUAD_LDD_LITTLE Cacheable 128 bit...

Page 364: ...space 8 8 bit partial store little endian C816 ASI_PST8_PRIMARY Primary address space 8 8 bit partial store C016 ASI_PST8_PRIMARY_LITTLE Primary address space 8 8 bit partial store little endian C816...

Page 365: ...rnal UDB Control Register write low 7716 ASI_UDB_ERROR_W External UDB Error Register write high 7716 ASI_UDB_ERROR_W External UDB Error Register write low 7716 ASI_UDB_INTR_R Incoming interrupt vector...

Page 366: ...cements Reduced gate dimensions 0 35 and faster cycles times 4 ns 8 Mb and 16 Mb E Cache sizes Additional Processor System clock ratios Use of reduced cost increased density E Cache SRAMs Support for...

Page 367: ...transsition 1 1 1 Mode 82 Timing overlap for tag read data write for coherent write 1 1 1 Mode 83 Read to write bus turnaround penalty 1 1 1 Mode 96 Support for the PREFETCH A instructions 102 Number...

Page 368: ...SRAM mode 275 Load buffer depth optimized for 1 1 1 mode 277 E Cache accessed every other cycle in 2 2 mode 278 Read toWrite bus turnaround penalty in 1 1 1 mode only 284 CTI at end of cache line not...

Page 369: ...Sun Microelectronics 354 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 370: ...BackMatter Glossary 357 Bibliography 363 Index 367 Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 371: ...Sun Microelectronics 356 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 372: ...indow is one in which all of the registers contain either zero or a valid address from the current address space or valid data from the cur rent address space coherence A set of protocols guaranteeing...

Page 373: ..._register and IEEE_754_exception floating point IEEE 754 exception A floating point exception as specified by IEEE Std 754 1985 floating point trap type The specific type of a floating point exception...

Page 374: ...3 an instruction that can be executed when the processor is in either privileged mode or non privileged mode non privileged mode The mode in which processor is operating when PSTATE PRIV 0 See also pr...

Page 375: ...ions within an instruction field or a register field that is reserved for definition by future versions of the architecture A reserved field should only be written to zero by software A reserved regis...

Page 376: ...faulting load that is carried out before it is known whether the result of the operation is required These accesses typically are used to speed program execution An implementation through a combina t...

Page 377: ...mplementation unimplemented An architectural feature that is not directly executed in hardware because it is optional or is emulated in software unpredictable Synonymous with undefined unrestricted An...

Page 378: ...rs Boney Joel SPARC Version 9 Points the Way to the Next Generation RISC Sun World October 1992 pp 100 105 Greenley D et al UltraSPARC The Next Generation Superscalar 64 bit SPARC 40th Annual CompCon...

Page 379: ...t Interrupt Clock Controller Data Sheet STP2210QFP UltraSPARC I Uniprocessor System Controller Data Sheet STP2200BGA UltraSPARC I UPA Modules Data Sheet STP5110 UltraSPARC II Data Sheet STP1031 UltraS...

Page 380: ...standing Requests Whitepaper STB0117 How to Contact SME Sun Microelectronics SME is a division of Sun Microsystems Inc 2550 Garcia Avenue Mountain View CA U S A 94043 Phone 408 774 8545 FAX 408 774 85...

Page 381: ...Sun Microelectronics 366 UltraSPARC User s Manual Artisan Technology Group Quality Instrumentation Guaranteed 888 88 SOURCE www artisantg com...

Page 382: ...ter 48 to 49 51 145 167 220 238 to 239 Address Space Identifier ASI 145 to 146 255 address translation virtual to physical 21 to 22 ADR_VLD signal 342 alias 357 address 17 28 boundary 28 boundary mini...

Page 383: ..._SDB_INTR 164 to 165 _SDBH_CONTROL_RE 185 _SDBH_ERROR_REG 184 ASI_SDBL_ERROR_REG 184 ASI_SECONDARY 34 ASI_SECONDARY_LITTLE 34 ASI_SECONDARY_NO_FAULT 36 42 49 to 51 ASI_SECONDARY_NO_FAULT_LITTLE 36 42...

Page 384: ...305 byte granularity 279 Byte Mask 110 142 BYTE_WE_L signals 341 Bytemask field 142 C C Stage 276 290 292 C stage 269 cache direct mapped 274 external 18 flushing 28 inclusion 28 level 1 27 level 2 27...

Page 385: ...rency transactions in power down mode 327 coherent P_REQ 92 Coherent P_REQ transaction packet format illustrated 140 coherent read hit timing 79 coherent read hit timing illustrated 79 Coherent S_REQ...

Page 386: ...D0 see Data 0 D0 field of PIC register D1 see Data 1 D1 field of PIC register Data 0 D0 field of PIC register 320 data alignment 7 273 data byte addresses within quadword illustrated 76 Data Cache D...

Page 387: ...map Context operation 67 dependency load use 269 dependency checking 289 destination register 360 Diag see Diagnostics Diag field of TTE Diagnostic Diag field of TTE 43 diagnostic accesses I Cache 50...

Page 388: ...raction 76 E Cache client transactions relarive priorities 77 E Cache coherence states defined 94 E Cache coherency system responsibility 94 E Cache Data Access Address illustrated 315 E Cache Data Ac...

Page 389: ...nded Interrupt Target ID 117 external cache 4 18 External Cache E Cache 8 14 External Cache Unit ECU 8 illustrated 5 external power down EPD signal 196 328 External Reset pin 169 Externally Initiated...

Page 390: ...3 fcc3 field of FSR register 245 floating point condition codes 296 floating point deferred trap queue FQ 247 floating point exception 358 floating point IEEE 754 exception 358 floating point multipli...

Page 391: ...MERGE instruction 200 MERGE operation illustrated 207 RS Register 285 UB16 instruction 199 FPSUB32 instruction 199 FPSUB32S instruction 199 to 200 FPU Enabled FEF field of FPRS register 198 304 FQ see...

Page 392: ...Cache Predecode Field Access Address 311 illustrated 311 I Cache Predecode Field Access Data 311 I Cache Predecode Field LDDA Access Data illustrated 311 I Cache Predecode Field STXA Access Data illu...

Page 393: ...nation 15 ruction Translation Lookaside Buffer iTLB 5 8 170 instruction Translation Lookaside Buffer iTLB 17 Instruction Translation Lookaside Buffer iTLB misses 267 instruction_access_error exception...

Page 394: ...t Vector Receive Register 117 interrupt vector receive register 165 interrupt vector transmission 180 Interrupt Vector Uncorrectable Error IVUE field of AFSR 181 interrupt vectors in power down mode 3...

Page 395: ...instructions 296 machine state after reset 171 machine state in RED_state 171 mandatory SPARC V9 ASRs 156 manuf field of VER register 241 manuf see Manufacturer manuf field of VER register mask field...

Page 396: ...tate 54 MMU behavior during reset 54 MMU demap 66 MMU demap context operation 66 68 MMU demap operation format illustrated 66 MMU demap page operation 66 68 MMU dTLB Tag Access Register illustrated 63...

Page 397: ...n branches next program counter 359 NFO bit in MMU 36 NFO page attribute bit 280 NFO see No Fault Only NFO field of TTE No Dual Tag Present NDP option 93 no dual tag present NDP bit 106 to 108 NO_FAUL...

Page 398: ...Writes PREQ_ DQ field of UPA_PORT_ID register 153 Number of Noncacheable Stores NCST field of Number of Slave Reads ONEREAD field of UPA_PORT_ID register 153 Number of Writebacks WB field of UPA_ CON...

Page 399: ...to 109 115 118 to 120 122 137 to 138 NACK 101 106 to 109 111 to 112 115 118 to P_SNACK transaction 93 P_WRB_REQ 95 to 97 101 104 113 115 120 122 128 135 138 141 P_WRI_REQ 95 to 96 101 105 to 106 122 1...

Page 400: ...62 Physical Address PA field of TTE 43 physical address data watchpoint 306 Physical Address Data Watchpoint Read Enable Physical Address Data Watchpoint Write Enable PW field of LSU_Control_Register...

Page 401: ...ry Context Register 57 PRIV see Privileged PRIV field of PCR register Privilege PRIV field of AFSR 177 privilege PRIV field of PSTATE register 180 privilege violation 60 privileged 47 360 Privileged P...

Page 402: ...er RED_state 17 19 39 54 to 55 169 to 171 177 236 252 328 360 default memory model 255 exiting 39 170 252 MMU behavior 54 RED_state_exception trap 158 Reference MMU 24 Register R Stage 14 register fil...

Page 403: ...05 113 115 117 120 122 129 135 S_WAS 110 to 111 120 122 129 S_WBCAN 97 101 105 113 115 120 to 122 125 129 137 to 138 S0 see Select Code 0 S0 field of PCR register S1 see Select Code 1 S1 field of PCR...

Page 404: ...ed in ECU 9 snooping 33 361 store buffer 256 Soft see Software Defined Soft field of TTE Soft2 see Software Defined Soft2 field of TTE SOFTINT Register 161 166 SOFTINT register 250 SOFTINT_REG Ancilla...

Page 405: ...r SFAR 61 chronous Fault Status Register SFSR 58 illustrated 58 in E Cache 77 SYSADDR pins 339 SYSADDR bus 85 87 92 116 119 138 to 139 143 arbitration protocol 84 current driver 84 dead cycle when swi...

Page 406: ..._INT field of SOFTINT register 166 TICK Register 285 TICK_CMPR see Tick Compare TICK_CMPR field of TICK_compare register TICK_CMPR_REG register 157 TICK_INT 167 250 TICK_REG Ancillary State Register A...

Page 407: ...ddress TSB_Base field of TSB register TSB_Size see TSB Size TSB_Size field of TSB register TSO 295 mode 30 32 ordering 30 TSO memory model 249 TSTATE 253 TSYN_WR_L pin 340 TSYN_WR_L signal 341 turn ar...

Page 408: ...lustrated 154 UPA_PORT_ID Register 152 shadowed 156 UPA_Slave_Int_L signal unused in UltraSPARC I 153 UPACAP see UPA Capabilities UPACAP field of UPA_PORT_ID register UPACAP see UPA Capabilities UPACA...

Page 409: ...R register Stage virtual stage 289 chdog Reset WDR 169 171 236 chdog_reset trap 158 chpoint trap 49 304 see Number of Writebacks WB subfield of UPA_CONFIG register dow_fill trap 238 table W field of T...

Page 410: ...service in house repair center WE BUY USED EQUIPMENT Sell your excess underutilized and idle used equipment We also offer credit for buy backs and trade ins www artisantg com WeBuyEquipment REMOTE IN...

Reviews:

No comments