Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3 Manual Download | Manualshive

Page: 216 / 1898

background image

Volume 1, Part 2: Floating-point Applications

1:205

Floating-point Applications

6

6.1

Overview

The Itanium floating-point architecture is fully ANSI/IEEE-754 standard compliant and
provides performance enhancing features such as the fused multiply accumulate
instruction, the large floating-point register file (with static and rotating sections), the
extended range register file data representation, the multiple independent
floating-point status fields, and the high bandwidth memory access instructions that
enable the creation of compact, high performance, floating-point application code.

The beginning of this chapter reviews some specific performance limitations that are
common in floating-point intensive application codes. Later, architectural features that
address these limitations are presented with illustrative code examples. The remainder
of this chapter highlights the optimization of some commonly used kernels using these
features.

6.2

FP Application Performance Limiters

Floating-point applications are characterized by a predominance of loops. Some loops
compute complex calculations on regularly structured data, others simply copy data
from one place to another, while others perform gather/scatter-type operations that
simultaneously compute and rearrange data. The following sections describe code
characteristics that limit performance and how they affect these different kinds of
loops.

6.2.1

Execution Latency

Loops often contain recurrence relationships. Consider the tri-diagonal elimination
kernel from the Livermore Fortran Kernel suite.

DO 5 i = 2, N
5X[i] = Z[i] * (Y[i] - X[i-1])

The dependency between

X[i]

and

X[i-1]

limits the iteration time of the loop to be

the sum of the latency of the subtract and the multiply. The available parallelism can be
increased by unrolling the loop and can be exploited by replicating computation,
however the fundamental limitation of the data dependency remains.

Sometimes, even if the loop is vectorizable and can be software pipelined, the iteration
time of the loop is limited by the execution latency of the hardware that executes the
code. A simple vector divide (shown below) is a typical example:

DO 1 I = 1, N
1X[i] = Y[i] / Z[i]

Since typical modern microprocessors contain a non-pipelined floating-point unit, the
iteration time of the loop is the latency of the divide which can be tens of clocks.

«
...
214
215
216
217
218
...
»

Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3

Page 1: ......

Page 2: ...Intel Itanium Architecture Software Developer s Manual Volume 1 Application Architecture Revision 2 3 May 2010 Document Number 245317 ...

Page 3: ...ng applications Intel may make changes to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them Intel processo...

Page 4: ...ommunication 1 16 2 6 Speculation 1 16 2 6 1 Control Speculation 1 16 2 6 2 Data Speculation 1 17 2 6 3 Predication 1 17 2 7 Register Stack 1 18 2 8 Branching 1 19 2 9 Register Rotation 1 19 2 10 Floating point Architecture 1 19 2 11 Multimedia Support 1 20 2 12 Intel Itanium System Architecture Features 1 20 2 12 1 Support for Multiple Address Space Operating Systems 1 20 2 12 2 Support for Singl...

Page 5: ...4 4 Control Speculation 1 60 4 4 5 Data Speculation 1 63 4 4 6 Memory Hierarchy Control and Consistency 1 69 4 4 7 Memory Access Ordering 1 73 4 5 Branch Instructions 1 74 4 5 1 Modulo scheduled Loop Support 1 75 4 5 2 Branch Prediction Hints 1 78 4 5 3 Branch Predict Instructions 1 79 4 6 Multimedia Instructions 1 79 4 6 1 Parallel Arithmetic 1 79 4 6 2 Parallel Shifts 1 81 4 6 3 Data Arrangement...

Page 6: ... 140 2 3 1 Format 1 140 2 3 2 Expressing Parallelism 1 140 2 3 3 Bundles and Templates 1 141 2 4 Memory Access and Speculation 1 142 2 4 1 Functionality 1 142 2 4 2 Speculation 1 142 2 4 3 Control Speculation 1 142 2 4 4 Data Speculation 1 143 2 5 Predication 1 143 2 6 Architectural Support for Procedure Calls 1 144 2 6 1 Stacked Registers 1 144 2 6 2 Register Stack Engine 1 144 2 7 Branches and H...

Page 7: ... Software Pipelining and Loop Support 1 181 5 1 Overview 1 181 5 2 Loop Terminology and Basic Loop Support 1 181 5 3 Optimization of Loops 1 181 5 3 1 Loop Unrolling 1 182 5 3 2 Software Pipelining 1 183 5 4 Loop Support Features in the Intel Itanium Architecture 1 184 5 4 1 Register Rotation 1 185 5 4 2 Note on Initializing Rotating Predicates 1 186 5 4 3 Software pipelined Loop Branches 1 186 5 ...

Page 8: ...ation Recovery Using ld c 1 64 4 3 Data Speculation Recovery Using chk a 1 65 4 1 Memory Hierarchy 1 70 4 2 Allocation Paths Supported in the Memory Hierarchy 1 71 5 1 Floating point Register Format 1 86 5 2 Floating point Status Register Format 1 88 5 3 Floating point Status Field Format 1 89 5 4 Memory to Floating point Register Data Translation Single Precision 1 92 5 5 Memory to Floating point...

Page 9: ...187 5 2 wtop and wexit Execution Flow 1 189 Tables Part I Application Architecture Guide 2 1 Major Operating Environments 1 14 3 1 Reserved and Ignored Registers and Fields 1 24 3 2 Frame Marker Field Description 1 27 3 3 Application Registers 1 28 3 4 RSC Field Description 1 29 3 5 PFS Field Description 1 32 3 6 User Mask Field Descriptions 1 33 3 7 CPUID Register 3 Fields 1 35 3 8 CPUID Register...

Page 10: ... 4 Floating point Status Register s Status Field Description 1 89 5 5 Floating point Rounding Control Definitions 1 90 5 6 Floating point Computation Model Control Definitions 1 90 5 7 Floating point Memory Access Instructions 1 91 5 8 Floating point Register Transfer Instructions 1 97 5 9 General Register Integer to Floating point Register Data Translation setf 1 98 5 10 Floating point Register t...

Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 12: ...1 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I Application Architecture Guide ...

Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 14: ...rchitecture including application level resources programming environment and the IA 32 application interface This volume also describes optimization techniques used to generate high performance software 1 1 1 Part 1 Application Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Intel Itanium Architecture...

Page 15: ...em software 1 2 1 Part 1 System Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Chapter 2 Intel Itanium System Environment introduces the environment designed to support execution of Itanium architecture based operating systems running IA 32 or Itanium architecture based applications Chapter 3 System S...

Page 16: ...rating systems need to preserve Itanium register contents and state This chapter also describes system architecture mechanisms that allow an operating system to reduce the number of registers that need to be spilled filled on interruptions system calls and context switches Chapter 5 Memory Management introduces various memory management strategies Chapter 6 Runtime Support for Control and Data Spe...

Page 17: ...er 2 Instruction Reference provides a detailed description of all Itanium instructions organized in alphabetical order by assembly language mnemonic Chapter 3 Pseudo Code Functions provides a table of pseudo code functions which are used to define the behavior of the Itanium instructions Chapter 4 Instruction Formats describes the encoding and instruction format instructions Chapter 5 Resource and...

Page 18: ...em Environment The operating system environment that supports the execution of both IA 32 and Itanium architecture based code Itanium Architecture based Firmware The Processor Abstraction Layer PAL and System Abstraction Layer SAL Processor Abstraction Layer PAL The firmware layer which abstracts processor features that are implementation dependent System Abstraction Layer SAL The firmware layer w...

Page 19: ...m firmware 1 7 Revision History Date of Revision Revision Number Description March 2010 2 3 Added information about illegal virtualization optimization combinations and IIPA requirements Added Resource Utilization Counter and PAL_VP_INFO PAL_VP_INIT and VPD vpr changes New PAL_VPS_RESUME_HANDLER parameter to indicate RSE Current Frame Load Enable setting at the target instruction PAL_VP_INIT_ENV i...

Page 20: ...WN procedures Allows IPI redirection feature to be optional Undefined behavior for 1 byte accesses to the non architected regions in the IPI block Modified insertion behavior for TR overlaps See Vol 2 Part I Ch 4 for details Bus parking feature is now optional for PAL_BUS_GET_FEATURES Introduced low power synchronization primitive using hint instruction FR32 127 is now preserved in PAL calling con...

Page 21: ...ction 2 2 and 3 Part I Volume 3 Added Performance Counter Standardization Sections 7 2 3 and 11 6 Part I Volume 2 Added Freeze Bit Functionality in Context Switching and Interrupt Generation Clarification Sections 7 2 1 7 2 2 7 2 4 1 and 7 2 4 2 Part I Volume 2 Added IA_32_Exception Debug IIPA Description Change Section 9 2 Part I Volume 2 Added capability for Allowing Multiple PAL_A_SPEC and PAL_...

Page 22: ...TURES changes extend calls to allow implementation specific feature control Section 11 8 3 Split PAL_A architecture changes Section 11 1 6 Simple barrier synchronization clarification Section 13 4 2 Limited speculation clarification added hardware generated speculative references Section 4 4 6 PAL memory accesses and restrictions clarification Section 11 9 PSP validity on INITs from PAL_MC_ERROR_I...

Page 23: ...simplify the call to provide more information regarding machine check Chapter 11 PAL_ENTER_IA_32_Env call changes entry parameter represents the entry order SAL needs to initialize all the IA 32 registers properly before making this call Chapter 11 PAL_CACHE_FLUSH added a new cache_type argument Chapter 11 PAL_SHUTDOWN removed from list of PAL calls Chapter 11 Clarified memory ordering changes Cha...

Page 24: ...architecture and other enhancements that support the high performance requirements of workstation applications such as digital content creation design engineering and scientific analysis The Itanium architecture also provides binary compatibility with the IA 32 instruction set Processors based on the Itanium architecture can run IA 32 applications on an Itanium architecture based operating system ...

Page 25: ... to return to an IA 32 or Itanium instruction Interrupts transition the processor to the Itanium instruction set for all interrupt conditions Figure 2 1 System Environment Table 2 1 Major Operating Environments System Environment Application Environment Usage Itanium System Environment IA 32 Protected Mode IA 32 Protected Mode applications in the Intel Itanium System Environment IA 32 Real Mode IA...

Page 26: ... prediction to minimize the cost of branches Focused enhancements for improved software performance Special support for software modularity High performance floating point architecture Specific multimedia instructions The following sections highlight these important features of the Itanium architecture 2 4 Instruction Level Parallelism Instruction Level Parallelism ILP is the ability to execute mu...

Page 27: ...chy to improve utilization This is particularly important as the cost of cache misses is expected to increase 2 6 Speculation There are two types of speculation control and data In both control and data speculation the compiler exposes ILP by issuing an operation early and removing the latency of this operation from critical path The compiler will issue an operation speculatively if it is reasonab...

Page 28: ... be performed prior to the store then the load would be data speculative with respect to the store If memory addresses overlap during execution a data speculative load issued before the store might return a different value than a regular load issued after the store Therefore analogous to control speculation when the compiler data speculates a load it leaves a check instruction at the original loca...

Page 29: ... and parallel compares 2 7 Register Stack The Itanium architecture avoids the unnecessary spilling and filling of registers at procedure call and return interfaces through compiler controlled renaming At a call site a new frame of registers is available to the called procedure without the need for register spill and fill either by the caller or by the callee Register access occurs by renaming the ...

Page 30: ... predict instructions provide information that allows for perfect prediction of loop termination thereby eliminating costly mispredict penalties and a reduction of the loop overhead 2 9 Register Rotation Modulo scheduling of a loop is analogous to hardware pipelining of a functional unit since the next iteration of the loop starts before the previous iteration has finished The iteration is split i...

Page 31: ...unique address space Translation Lookaside Buffers TLBs which hold virtual to physical mappings often need to be flushed on a process context switch Some memory areas may be shared among processes e g kernel areas and shared libraries Most operating systems assume at least one local and one global space To promote sharing of data between processes MAS operating systems aggressively use virtual ali...

Page 32: ...rful runtime and debug environment The protection model includes four protection rings and enables increased system integrity by offering a more sophisticated protection scheme than has generally been available The machine check model allows detailed information to be provided describing the type of error involved and supports recovery for many types of errors Several mechanisms are provided for d...

Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...

Page 34: ...P Register which holds the bundle address of the currently executing instruction or byte address of the currently executing IA 32 instruction Current Frame Marker CFM State that describes the current general register stack frame and FR PR rotation Application Registers ARs A collection of special purpose registers Performance Monitor Data Registers PMD Data registers for performance monitor hardwa...

Page 35: ...e not defined are reserved Software must always write defined values to these fields Any attempt to write a reserved value will raise a Reserved Register Field fault Certain registers are read only registers A write to a read only register raises an Illegal Operation fault When fields are marked as reserved it is essential for compatibility with future processors that software treat these fields a...

Page 36: ...vailable to a program by allocating a register stack frame consisting of a programmable number of local and output registers See Register Stack on page 1 47 for a description A portion of the stacked registers can be programmatically renamed to accelerate loops See Modulo scheduled Loop Support on page 1 75 Figure 3 1 Application Register Model APPLICATION REGISTER SET pr0 IP Predicates Floating p...

Page 37: ...gh 31 contain the IA 32 floating point and multi media registers when executing IA 32 instructions For details see IA 32 Floating point Registers on page 1 124 3 1 4 Predicate Registers A set of 64 1 bit predicate registers are used to hold the results of compare instructions These registers are numbered PR0 through PR63 and are available to all programs at all privilege levels These registers are...

Page 38: ...rrent stack frame The CFM cannot be directly read or written see Register Stack on page 1 47 The frame markers contain the sizes of the various portions of the stack frame plus three Register Rename Base values used in register rotation The layout of the frame markers is shown in Figure 3 2 and the fields are described in Table 3 2 On a call the CFM is copied to the Previous Frame Marker field in ...

Page 39: ...AR 20 Reserved AR 21 FCR IA 32 Floating point Control Register AR 22 AR 23 Reserved AR 24 EFLAGb b Some IA 32 EFLAG field writes are silently ignored if the privilege level is not zero See Section 10 3 2 IA 32 System EFLAG Register on page 2 243 for details IA 32 EFLAG register AR 25 CSD IA 32 Code Segment Descriptor Compare and Store Data register AR 26 SSD IA 32 Stack Segment Descriptor AR 27 CF...

Page 40: ... 3 3 and the field description is contained in Table 3 4 Instructions that modify the RSC can never set the privilege level field to a more privileged level than the currently executing process 3 1 8 3 RSE Backing Store Pointer BSP AR 17 The RSE Backing Store Pointer is a 64 bit read only register Figure 3 4 It holds the address of the location in memory which is the save location for GR 32 in the...

Page 41: ...6 instructions and receives data loaded by the Itanium ld16 instruction For implementations that do not support the ld16 st16 and cmp8xchg16 instructions bits 61 60 may be optionally implemented This means that on move application register instructions the implementation can either ignore writes and return zero on reads or write the value and return the last value written on reads For implementati...

Page 42: ...are can secure the interval time counter from non privileged access When secured a read of the ITC at any privilege level other than the most privileged causes a Privileged Register fault The ITC can be written only at the most privileged level The IA 32 Time Stamp Counter TSC is similar to ITC counter ITC can directly be read by the IA 32 rdtsc read time stamp counter instruction System software ...

Page 43: ...tate PFS AR 64 The Previous Function State register PFS contains multiple fields Previous Frame Marker pfm Previous Epilog Count pec and Previous Privilege Level ppl Figure 3 7 diagrams the PFS format and Table 3 5 describes the PFS fields These values are copied automatically on a call from the CFM register Epilog Count Register EC and PSR cpl Current Privilege Level in the Processor Status Regis...

Page 44: ...evel Refer to Chapter 7 Debugging and Performance Monitoring in Volume 2 for details Performance monitors can be used to gather performance information for the execution of both IA 32 and Itanium instruction sets 3 1 10 User Mask UM The user mask is a subset of the Processor Status Register and is accessible to application programs The user mask controls memory access alignment byte ordering and u...

Page 45: ... and Table 3 7 specify the definitions of each field up 2 User performance monitor enable including IA 32 0 user performance monitors are disabled 1 user performance monitors are enabled ac 3 Alignment check for data memory references including IA 32 0 unaligned data memory references may cause an Unaligned Data Reference fault 1 all unaligned data memory references cause an Unaligned Data Referen...

Page 46: ...f selection This register does not contain IA 32 instruction set features IA 32 instruction set features can be acquired by the IA 32 cpuid instruction Table 3 7 CPUID Register 3 Fields Field Bits Description number 7 0 The index of the largest implemented CPUID register one less than the number of implemented CPUID registers This value will be at least 4 revision 15 8 Processor revision number An...

Page 47: ... Memory can be addressed in units of 1 2 4 8 10 and 16 bytes It is recommended that all addressable units be stored on their naturally aligned boundaries Hardware and or operating system software may have support for unaligned accesses possibly with some performance cost 10 byte floating point values should be stored on 16 byte aligned boundaries Bits within larger units are always numbered from 0...

Page 48: ...as big endian data the instruction will appear reversed in a register Figure 3 13 shows various loads in little endian format Figure 3 14 shows various loads in big endian format Stores are not shown but behave similarly Figure 3 13 Little endian Loads Figure 3 14 Big endian Loads a b c d e f g h d a c b h e g f 63 0 0 1 2 3 4 5 6 7 LD8 0 7 0 Memory Registers 0 b 0 0 0 0 0 0 63 0 LD1 1 0 c 0 d 0 0...

Page 49: ...after slot 2 In addition to the location of stops the template field specifies the mapping of instruction slots to execution unit types Not all possible mappings of instructions to units are available Table 3 10 indicates the defined combinations The three rightmost columns correspond to the three instruction slots in a bundle Listed within each column is the execution unit type controlled by that...

Page 50: ...ined in byte 0 of a bundle Within a bundle instructions are ordered from instruction slot 0 to instruction slot 2 as specified in Figure 3 15 on page 1 38 Instruction execution consists of four phases 1 Read the instruction from memory fetch 2 Read architectural state if necessary read 3 Perform the specified operation execute 0B M unit M unit I unit 0C M unit F unit I unit 0D M unit F unit I unit...

Page 51: ... the change in control flow even for the above cases If the instructions in instruction groups meet the resource dependency requirements then the behavior of a program will be as though each individual instruction is sequenced through these phases in the order listed above The order of a phase of a given instruction relative to any phase of a previous instruction is prescribed by the instruction s...

Page 52: ...tion of the instruction group A store following a load to the same address will not affect the data loaded by the load Advanced loads check loads advanced load checks stores and memory semaphore instructions implicitly access the ALAT RAW WAW and WAR ALAT dependencies are allowed within an instruction group and behave as described for memory dependencies The net effect of the dependency restrictio...

Page 53: ...gister file may appear in the same instruction group as alloc and will see the stack frame specified by the alloc Note Some instructions have RAW or WAW dependencies on resources other than CFM affected by alloc and are thus not allowed in the same instruction group after an alloc flushrs loadrs move from AR BSPSTORE move from AR RNAT br cexit br ctop br wexit br wtop br call brl call br ia br ret...

Page 54: ...ons in the same instruction group may target the same predicate register provided The compare type instructions are either all AND type compares or all OR type compares AND type compares correspond to and and andcm completers OR type compares correspond to or and orcm completers or The compare type instructions all target PR0 All WAW dependencies for PR0 are allowed the compares can be of any type...

Page 55: ...dependency violations must be reported as Illegal Dependency Faults defined in Chapter 5 Interruptions in Volume 2 When an Illegal Dependency fault is taken the value of the resource subject to the dependency violation is undefined Undetected dependency violations cause undefined program behavior as described in Undefined Behavior on page 1 44 All detected read after write and write after write de...

Page 56: ...ions refer to Volume 4 IA 32 Instruction Set Reference Therefore the result of an undefined scenario is strictly implementation dependent User should not rely on these undefined behaviors for correct program behavior and compatibility across future implementations An undefined response undefined behavior undefined result is subject to the following restrictions It must not impede forward progress ...

Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...

Page 58: ...procedures and consists of the 32 registers from GR 0 through GR 31 The stacked subset is local to each procedure and may vary in size from zero to 96 registers beginning at GR 32 The register stack mechanism is implemented by renaming register addresses as a side effect of procedure calls and returns The implementation of this rename mechanism is not otherwise visible to application programs The ...

Page 59: ...s output area becomes GR 32 for the callee The size of the local area is set to zero The size of the callee s frame sofb1 is set to the size of the caller s output area sofa sola Values in the output area of the caller s register stack frame are visible to the callee This overlap permits parameter and return value passing between procedures to take place entirely in registers Procedure frames may ...

Page 60: ... PR0 an Illegal Operation fault is raised An alloc does not affect the values or NaT bits of the allocated registers When a register stack frame is expanded newly allocated registers may have their NaT bit set In addition there are three instructions which provide explicit control over the state of the register stack These instructions are used in thread and context switching which necessitate a c...

Page 61: ... backing store before being used In addition stacked registers outside the current frame that have not been spilled by the RSE will not be stored to the backing store A loadrs instruction must be the first instruction in an instruction group otherwise the results are undefined A loadrs cannot be predicated Table 4 1 lists the architectural visible state relating to the register stack Table 4 2 sum...

Page 62: ...nstruction which takes two 32 bit unsigned register operands and produces a 64 bit result The unsigned integer shift left and multiply mpyshl4 instruction provides a building block for doing 64 bit multiplication It takes a 32 bit operand in the upper half of a first register a 32 bit operand in the lower half of a second register multiplies them and places the least significant 32 bits of the pro...

Page 63: ... on bit fields within a general register variable shifts fixed shift and mask instructions a 128 bit input funnel shift and special compare operations to test an individual bit within a general register The compare instructions for testing a single bit tbit or for testing the NaT bit tnat are described in Compare Instructions and Predication on page 1 54 The variable shift instructions shift the c...

Page 64: ...erforms a 128 bit input funnel shift It extracts an arbitrary 64 bit field from a 128 bit field formed by concatenating two source general registers The starting position is specified by an immediate This instruction can be used to accelerate the adjustment of unaligned data A bit rotate operation can be performed by using shrp and specifying the same register for both operands Table 4 6 summarize...

Page 65: ...s true predicate register PR0 is hardwired to one A few instructions cannot be predicated These instructions are allocate stack frame alloc branch predict brp bank switch bsw clear rrb clrrrb cover stack frame cover enter privileged code epc flush register stack flushrs load register stack loadrs counted branches br cloop br ctop br cexit and return from interruption rfi 4 3 2 Compare Instructions...

Page 66: ...eir floating point register sources are such that a valid approximation can be produced otherwise the predicate target is cleared 4 3 3 Compare Types Compare instructions can have as many as five compare types Normal Unconditional AND OR or DeMorgan The type defines how the instruction writes its target predicate registers based on the outcome of the comparison and on the qualifying predicate The ...

Page 67: ... the same predicate target in the same instruction group For all compare instructions except for tnat and fclass if one or both of the source registers contains a deferred exception token NaT or NaTVal see Control Speculation on page 1 60 the result of the compare is different Both predicate targets are treated the same and are either written to 0 or left unchanged In combination with speculation ...

Page 68: ...software should clear CFM rrb pr before initializing rotating predicates 4 4 Memory Access Instructions Memory is accessed by simple load store and semaphore instructions which transfer data to and from general registers or floating point registers The memory address is specified by the contents of a general register Most load and store instructions can also specify base address register update Ba...

Page 69: ...on 4 bytes double precision 8 bytes double extended precision 10 bytes and integer parallel FP 8 bytes The value s loaded from memory are converted into floating point register format see Memory Access Instructions on page 1 91 for details Table 4 12 Memory Access Instructions Mnemonic Operation General Floating point Normal Load Pair ld ldf ldfp Load ld s ldf s ldfp s Speculative load ld a ldf a ...

Page 70: ... transfer data from a general register a general register and the CSD register or floating point register to memory Store instructions are always non speculative Store instructions can specify base address register update but only by an immediate value A variant is also provided for controlling the memory cache subsystem An ordered store can be used to force ordering in memory accesses Both genera...

Page 71: ...eculation Concepts Control speculation describes the compiler optimization where an instruction or a sequence of instructions is executed before it is known that the dynamic control flow of the program will actually reach the point in the program where the sequence of instructions is needed This is done with instruction sequences that have long execution latencies Starting the execution early allo...

Page 72: ...instructions can be executed speculatively and only the result register need be checked for a deferred exception token to determine whether any exceptions occurred At the point in the program when it is known that the result of a speculative calculation is needed a speculation check chk s instruction is used This instruction tests for a deferred exception token If none is found then the speculativ...

Page 73: ...e results of speculative calculations can be checked simply by using non speculative instructions 4 4 4 5 Operating System Control over Exception Deferral An additional mechanism is defined that allows the operating system to control the exception behavior of speculative loads The operating system has the option to select which exceptions are deferred automatically in hardware and which exceptions...

Page 74: ...pill fill instructions allow spilling filling of registers that are targets of a speculative instruction and may therefore contain a deferred exception token Note also that transfers between the general and floating point register files cause a conversion between the two deferred exception token formats Table 4 14 lists the state relating to control speculation Table 4 15 summarizes the instructio...

Page 75: ...stores followed by a check The decision to perform such a transformation is highly dependent upon the likelihood and cost of recovering from an unsuccessful data speculation 4 4 5 2 Data Speculation and Instructions Advanced loads are available in integer ld a floating point ldf a and floating point pair ldfp a forms When an advanced load is executed it allocates an entry in a structure called the...

Page 76: ...ways by physical addresses and by ALAT register tags An ALAT register tag is a unique number derived from the physical target register number and type in conjunction with other implementation specific state Implementation specific state might include register stack wraparound information to distinguish one instance of a physical register that may have been spilled by the RSE from the current insta...

Page 77: ...ain the entry A check load checks for a matching entry in the ALAT If no matching entry is found it reloads the value from memory and any faults that occur during the memory reference are raised When a matching entry is found there is flexibility in the actions that a processor can perform 1 The implementation may choose to either leave the target register unchanged or to reload the value from mem...

Page 78: ...T to see if it overlaps with the locations affected by the invalidation event ALAT entries whose memory regions overlap with the invalidation event locations are removed The invalidation of ALAT entries due to the execution of stores semaphores or ptc ga instructions must occur no later in visibility order than the store of the data or the TLB purge Note that some invalidation events may require t...

Page 79: ...th for data speculation failures and to detect deferred exceptions 4 4 5 5 Instruction Completers for ALAT Management To help the compiler manage the allocation and deallocation of ALAT entries two variants of advanced load checks and check loads are provided variants with clear chk a clr ld c clr ld c clr acq ldf c clr ldfp c clr and variants with no clear chk a nc ld c nc ldf c nc ldfp c nc The ...

Page 80: ...In addition memory access instructions can specify which levels of the memory hierarchy are affected by the access This leads to an architectural view of the memory hierarchy depicted in Figure 4 1 composed of zero or more levels of cache between the register files and memory where each level may consist of two parallel structures a temporal structure and a non temporal structure Note that this vi...

Page 81: ... a cache line closer in the hierarchy than specified in the hint does not demote the line This enables the precise management of lines using lfetch and then subsequent uses by nta loads and stores to retain that level in the hierarchy For example specifying the nt2 hint by a prefetch indicates that the data should be cached at level 3 Subsequent loads and stores can specify nta and have the data r...

Page 82: ... occurs Both immediate and register post increment are defined for lfetch and lfetch fault The lfetch instruction does not cause any exceptions does not affect program behavior and may be ignored by the implementation The lfetch fault instruction affects the memory hierarchy in exactly the same way as lfetch but takes exceptions as if it were a 1 byte load instruction Implicit prefetch is based on...

Page 83: ...r processor s instruction caches Data accesses from different processors in the same coherence domain are coherent with respect to each other this consistency is provided by the hardware Data accesses from the same processor are subject to data dependency rules see Memory Access Ordering below The mechanism s by which coherence is maintained is implementation dependent Separate or unified structur...

Page 84: ...ng semantics unordered release acquire or fence Unordered data accesses may become visible in any order Release data accesses guarantee that all previous data accesses are made visible prior to being made visible themselves Acquire data accesses guarantee that they are made visible prior to all subsequent data accesses Fence operations combine the release and acquire semantics into a bi directiona...

Page 85: ...the 64 bit address space Because of the long immediate long branches occupy two instruction slots Indirect branches use the branch registers to specify the target address There are several branch types as shown in Table 4 22 The conditional branch br cond or br is a branch which is taken if the specified predicate is 1 and not taken otherwise The conditional call branch br call does the same thing...

Page 86: ...o or more instructions in each stage Modulo scheduled loops have three phases prolog kernel and epilog During the prolog phase new loop iterations are started each time around filling the software pipeline During the kernel phase the pipeline is full A new loop br ia Invoke the IA 32 instruction set Unconditional Indirect br cloop Counted loop branch Loop count IP rel br ctop br cexit Modulo sched...

Page 87: ...e unless all rotating register bases rrb s in the CFM are zero All rrb s can be set to zero with the clrrrb instruction The clrrrb pr form can be used to clear just the rrb for the predicate registers The clrrrb instruction must be the last instruction in an instruction group Rotation by one register position occurs when a software pipelined loop type branch is executed Registers are rotated towar...

Page 88: ...executed with both LC and EC equal to zero will have a branch direction to exit the loop LC EC and the rrb s will not be modified no rotation and PR 63 will be set to 0 LC and EC equal to zero can occur in some types of optimized unrolled software pipelined loops if the target of a cexit branch is set to the next sequential bundle and the loop trip count is not evenly divisible by the unroll amoun...

Page 89: ... see the processor specific documentation for further information Predictor deallocation This provides re use information to allow the hardware to better manage branch prediction resources Normally prediction resources keep track of the most recently executed branches However sometimes the most recently executed branch is not useful to remember either because it will not be re visited any time soo...

Page 90: ... are included to indicate that the branch will be a positive CLOOP CTOP WTOP or negative CEXIT WEXIT loop type The move to branch register instruction can also provide this same hint information simplifying the setup for a hinted indirect branch 4 6 Multimedia Instructions Multimedia instructions see Table 4 29 treat the general registers as concatenations of eight 8 bit four 16 bit or two 32 bit ...

Page 91: ...tions or The parallel multiply right instruction pmpy r multiplies the corresponding two even numbered signed 2 byte elements of both sources and writes the results into two 4 byte elements in the target The pmpy l instruction performs a similar operation on odd numbered 2 byte elements The parallel multiply and shift right instruction pmpyshr pmpyshr u multiplies the corresponding 2 byte elements...

Page 92: ...ller elements in the target register The pack sss instruction treats the extracted elements as signed values and performs signed saturation on them The pack uss instruction performs unsigned saturation The mux instruction mux copies individual 2 byte or 1 byte elements in the source to arbitrary positions in the target according to a specified function For 2 byte elements an 8 bit immediate allows...

Page 93: ...gement Instructions Mnemonic Operation 1 byte 2 byte 4 byte mix l Interleave odd elements from both sources x x x mix r Interleave even elements from both sources x x x mux Arbitrary copy of individual source elements x x pack sss Convert from larger to smaller elements with signed saturation x x pack uss Convert from larger to smaller elements with unsigned saturation x unpack l Interleave least ...

Page 94: ...ddition instructions are defined to move values between the general register file and the user mask mov psr um and mov psr um The sum and rum instructions set and reset the user mask The user mask is the non privileged subset of the Process Status Register PSR The mov pmd instruction is defined to move from a performance monitor data PMD register to a general register If the operating system has n...

Page 95: ...handler The epc instruction increases the privilege level without causing an interruption or a control flow transfer The new privilege level is specified by the TLB entry for the page containing the epc if virtual address translation for instruction fetches is enabled If the privilege level specified by PFS ppl in the Previous Function State application register is lower than the current privilege...

Page 96: ...nt register format A Parallel FP format where a pair of IEEE single precision values occupy a floating point register s significand is also supported A seventh data type IEEE style quad precision is supported by software routines A future architecture extension may include additional support for the quad precision real type 5 1 1 Real Types The parameters for the supported IEEE real types are summ...

Page 97: ...e of a finite floating point number encoded with zero exponent field can be calculated using the expression Integers 64 bit signed unsigned and Parallel FP numbers reside in the 64 bit significand field In their canonical form the exponent field is set to 0x1003E biased 63 and the sign field is set to 0 5 1 3 Representation of Values in Floating point Registers The floating point register encoding...

Page 98: ...e Real Normals produced when the computation model is IA 32 Stack Double 0 1 0x0C001 through 0x13FFE 1 000 00 11 0s through 1 111 11 11 0s Unnormalized Numbers Floating point Register Format unnormalized numbers 0 1 0x00000 0 000 01 through 1 111 11 0x00001 through 0x1FFFE 0 000 01 through 0 111 11 0x00001 through 0x1FFFD 0 000 00 1 0x1FFFE 0 000 00 Integers or Parallel FP positive signed unsigned...

Page 99: ...mbers not zeros 5 2 Floating point Status Register The Floating Point Status Register FPSR contains the dynamic control and status information for floating point operations There is one main set of control and status information FPSR sf0 and three alternate sets FPSR sf1 FPSR sf2 FPSR sf3 The FPSR layout is shown in Figure 5 2 and its fields are defined in Table 5 3 Table 5 4 gives the FPSR s stat...

Page 100: ...t Exception fault disabled when this bit is set traps zd 2 Zero Divide Floating Point Exception fault IEEE Trap disabled when this bit is set traps od 3 Overflow Floating Point Exception trap IEEE Trap disabled when this bit is set traps ud 4 Underflow Floating Point Exception trap IEEE Trap disabled when this bit is set traps id 5 Inexact Floating Point Exception trap IEEE Trap disabled when this...

Page 101: ...nding direction see Table 5 5 Table 5 5 Floating point Rounding Control Definitions Nearest or even Infinity down Infinity up Zero truncate chop FPSR sfx rc 00 01 10 11 Table 5 6 Floating point Computation Model Control Definitions Computation Model Control Fields Computation Model Selected Instruction s pc Completer FPSR sfx s Dynamic pc Field FPSR sfx s Dynamic wre Field Significand Precision Ex...

Page 102: ...nstructions require the two target registers to be odd even or even odd See ldfp Floating point Load Pair on page 3 161 The floating point store instructions stfs stfd stfe require the value in the floating point register to have the same type as the store for the format conversion to be correct Unsuccessful speculative loads write a NaTVal into the destination register or registers see Section 4 ...

Page 103: ...ecision Load setf s normal numbers bit 0 1 sign exponent integer significand FR Memory GR Single precision Load setf s infinities and NaNs bit 0 1 sign exponent integer significand FR Memory GR Single precision Load setf s zeros bit 0 0 0x1FFFF 1111111 1 0 0000000 0 0 sign exponent integer significand FR Memory GR Single precision Load setf s denormal numbers bit 0 0x0FF81 0000000 0 0 0 0 ...

Page 104: ...ad setf d normal numbers bit 0 1 sign exponent integer significand FR Memory GR Double precision Load setf d infinities and NaNs bit 0 1 sign exponent integer significand FR Memory GR Double precision Load setf d zeros bit 0 0 0x1FFFF 1111111 1 0 0000000 0 0 sign exponent integer significand FR Memory GR Double precision Load setf d denormal numbers bit 0 0x0FC01 0000000 0 0 111 000 000 0 0 0 0 0 ...

Page 105: ...rmal unnormal numbers bit sign exponent integer significand FR Memory Double extended precision Load infinities and NaNs bit sign exponent integer significand FR Memory Double extended precision Load denormal pseudo denormals and zeros bit 0x1FFFF 0 1111111 11111111 0000000 00000000 sign exponent integer significand FR Memory GR Integer Parallel FP Load setf sig bit 0 0x1003E sign exponent signifi...

Page 106: ...o Memory Data Translation Single Precision Figure 5 8 Floating point Register to Memory Data Translation Double Precision sign exponent integer significand FR Memory GR Single precision Store getf s bit AND sign exponent significand FR Memory GR Double precision Store getf d integer bit AND ...

Page 107: ...Section 3 2 3 Byte Ordering The byte ordering for the spill fill memory and double extended formats is shown in Figure 5 10 Figure 5 9 Floating point Register to Memory Data Translation Double Extended Integer Parallel FP and Spill sign exponent significand FR Memory integer bit sign exponent significand FR Memory Register Spill integer bit 0 0 0 0 0 0 sign exponent integer significand FR Memory G...

Page 108: ...nt register respectively and their translation formats are described in Table 5 9 and Table 5 10 Figure 5 10 Spill Fill and Double extended 80 bit Floating point Memory Formats Table 5 8 Floating point Register Transfer Instructions Operations GR to FR FR to GR Single setf s getf s Double setf d getf d Sign and Exponent setf exp getf exp Significand Integer setf sig getf sig s0 s1 s2 s3 s4 s5 s6 s...

Page 109: ...Register Integer to Floating point Register Data Translation setf General Register Floating Point Register sig Floating Point Register exp Class NaT Integer Sign Exponent Significand Sign Exponent Significand NaT 1 ignore NaTVal NaTVal integers 0 000 00 through 111 11 0 0x1003E integer integer 17 integer 16 0 0x8000000000000000 Table 5 10 Floating point Register to General Register Integer Data Tr...

Page 110: ...ctions are used to manipulate the Parallel FP data in the floating point significand The fand fandcm for and fxor instructions are used to perform logical operations on the floating point significand The fselect instruction is used for conditional selects Floating point minimum fmin sf fpmin sf Floating point maximum fmax sf fpmax sf Floating point absolute minimum famin sf fpamin sf Floating poin...

Page 111: ...onic s Floating point classify fclass fcrel fctype Floating point merge sign Parallel FP merge sign fmerge s fpmerge s Floating point merge negative sign Parallel FP merge negative sign fmerge ns fpmerge ns Floating point merge sign and exponent Parallel FP merge sign and exponent fmerge se fpmerge se Floating point mix left fmix l Floating point mix right fmix r Floating point mix left right fmix...

Page 112: ...cept for a NaTVal check The product of two 64 bit source significands is added to the third 64 bit significand zero extended to produce a 128 bit result The low and high versions of the instruction select the appropriate low high 64 bits of the 128 bit result respectively and write it into the destination register as a canonical integer The signed and unsigned versions of the instructions treat th...

Page 113: ...mainly integer task is able to use only FR 2 to FR 32 for executing integer multiply and divide operations then context switch time may be reduced by disabling access to the high floating point registers 5 4 1 2 Floating point Exception Fault A Floating Point Exception fault occurs if one of the following four circumstances arises 1 The processor requests system software assistance to complete the...

Page 114: ... NaTVal Operand Y N NaTVal Response Denormal Enabled FP Fault ISR d 1 FLAGS d 1 Limits Check 2 Terminal State Decision Point START SWA Fault ISR swa 1 N N Y Y Y N Y N Y N N Unsupported Operand UnNormal Operand Y COMPUTE OPERATION 1 For frcpa fprcpa 2 For frcpa frsqrta QNaN Operand N Invalid Enabled FP Fault ISR v 1 Reg prioritized NaN resp f4 f2 f3 Y N Y FLAGS v 1 Y Invalid Enabled FP Fault ISR v ...

Page 115: ...h ISR fpa is set to 1 when the magnitude of the delivered result is greater than the magnitude of the infinitely precise result It is set to 0 otherwise The magnitude of the delivered result may be greater if The significand is incremented during rounding or A larger pre determined value e g infinity is substituted for the computed result e g when overflow is disabled There is no requirement that ...

Page 116: ... MOD 28 The value s significand is rounded to the specified precision and written to the destination register If the rounded value is different from the infinitely precise value Figure 5 12 Floating point Exception Trap Prioritization Emax Emin Overflow Enabled Underflow Enabled FLAGS o 1 FLAGS i tmp_i Exp tmp_exp 217 Sig tmp_sig ISR o 1 ISR i tmp_i ISR fpa tmp_fpa FP TRAP Infinity Result Inexact ...

Page 117: ...Point Exception trap is disabled and tininess but not inexactness occurs then neither underflow nor inexactness is signaled and the result is a denormal The IEEE Underflow Floating Point Exception trap disabled response for all normal and Parallel FP arithmetic instructions is to denormalize the infinitely precise result and then round it to the destination precision The result may be a denormal z...

Page 118: ...it of the significand Quiet NaNs have a one in the most significant fractional bit of the significand This definition of signaling and quiet NaNs easily preserves NaNness when converting between different precisions When propagating NaNs in operations that have more than one NaN operand the result NaN is chosen from one of the operand NaNs in the following priority based on register encoding field...

Page 119: ...ndard suggests that implementations allow lower precision operands to produce higher precision results this is supported The IEEE standard also suggests that implementations not allow higher precision operands to produce lower precision results this suggestion is not followed When computations with higher precision operands produce values beyond the destination precision range the information prov...

Page 120: ...on Layer uses the native OS for acquiring system resources memory synchronization objects etc executing 32 bit system calls issued by the IA 32 application signal handling exceptions and other system notifications IA 32 Execution Layer supports user mode 32 bit flat protected applications Consistent with Itanium based operating systems that support legacy IA 32 applications 16 bit applications and...

Page 121: ...ion Branch to an IA 32 target instruction and change the instruction set to IA 32 rfi Itanium instruction Return from interruption is defined to return to either an IA 32 or Itanium instruction when resuming from an interruption Interruptions transition the processor to the Itanium instruction set for all interruption conditions The jmpe and br ia instructions provide a low overhead mechanism to t...

Page 122: ... and LDT segments are defined to support IA 32 segmented applications Segmented 16 and 32 bit code is fully supported Virtual addresses are confined to the lower 4G bytes of virtual region 0 Itanium architecture memory management is used to translate virtual to physical addresses for all IA 32 instruction set memory and I O Port references Instruction and Data memory references are forced to be li...

Page 123: ...d ensure the code segment descriptor and selector are properly loaded before issuing the branch If the target EIP value exceeds the code segment limit or has a code segment privilege violation an IA 32 GPFault 0 exception is reported on the target IA 32 instruction The processor does not ensure Itanium instruction set generated writes into the IA 32 instruction stream are observed by the processor...

Page 124: ...re operations are performed in a consistent manner 6 2 2 IA 32 Application Register State Model As shown in Figure 6 2 and Table 6 1 IA 32 general purpose registers segment selectors and segment descriptors are mapped into the lower 32 bits of Itanium general purpose registers GR8 to GR31 The floating point register stack MMX technology and SSE registers are mapped on Itanium floating point regist...

Page 125: ...may contain any value during Itanium instruction execution according to Itanium software conventions Software should follow IA 32 and Itanium calling conventions for these registers Undefined Registers marked as undefined may be used as scratch areas for execution of IA 32 instructions by the processor and are not ensured to be preserved across instruction set transitions Figure 6 2 IA 32 Applicat...

Page 126: ...t 0 GR1 3 undefinedf scratch for IA 32 execution GR4 7 unmodified Intel Itanium preserved registers GR8 EAX IA 32 state 32a IA 32 general purpose registers GR9 ECX GR10 EDX GR11 EBX GR12 ESP GR13 EBP GR14 ESI GR15 EDI GR16 15 0 DS 64 IA 32 selectors GR16 31 16 ES GR16 47 32 FS GR16 63 48 GS GR17 15 0 CS GR17 31 16 SS GR17 47 32 LDT GR17 63 48 TSS GR18 23 undefinedf scratch for IA 32 execution GR24...

Page 127: ...gisters BR6 7 undefined IA 32 code execution space Application Registers RSC unmodified not used for IA 32 execution Intel Itanium preserved registers BSP BSPSTORE RNAT CCV undefinedf 64 IA 32 code execution space UNAT unmodified not used for IA 32 execution Intel Itanium preserved register FPSR sf0 unmodified Intel Itanium numeric status and controls register FPSR sf1 2 3 undefinedf IA 32 code ex...

Page 128: ... 32 code execution Prior EC is preserved in PFM Intel Itanium preserved registers LC EC EFLAG EFLAG IA 32 state 32 IA 32 System Arithmetic flags writes of some bits condition by CPL and EFLAG iopl CSD CSD 64 IA 32 code segment register format b SSD SSD IA 32 stack segment register format b CFLG CR0 CR4 64 IA 32 control flags CR0 CFLG 31 0 CR4 CFLG 63 32 writable at CPL 0 only a On transitions into...

Page 129: ...oftware Developer s Manual Figure 6 4 IA 32 Segment Register Selector Format 63 48 47 32 31 16 15 0 GS FS ES DS GR16 TSS LDT SS CS GR17 Figure 6 5 IA 32 Code Data Segment Register Descriptor Format 63 62 61 60 59 58 57 56 55 52 51 32 31 0 g d b ig av p dpl s type lim 19 0 base 31 0 Table 6 2 IA 32 Segment Register Fields Field Bits Description selector 15 0 Segment Selector value see the Intel 64 ...

Page 130: ...of the IA 32 Protected Mode Real Mode or VM86 environments Table 6 3 defines software guidelines for establishing the initial IA 32 environment The processor checks the integrity of the IA 32 environment as defined in IA 32 Environment Runtime Integrity Checks on page 1 122 On the av 60 Ignored This field is ignored by the processor during IA 32 instruction set execution This field is available fo...

Page 131: ... RM 64KB operation 16 32 bit 16 bit type data rd wr expand up execute data rd wr expand up s bit 1 1 1 p bit 1 1 1 a bit 1 1 1 g bit limit 0xFFFFe e Segment limit should be set to 0xFFFF for normal RM 64KB operation limit 0xFFFF SS selector base 4a selector base 4 base selector 4b base selector 4 dpl PSR cpl 0 PSR cpl PSR cpl 3 d bit 16 bitd 16 32 bit size 16 bit type data rd wr expand up data typ...

Page 132: ...he stack segment descriptor register s DPL PSR cpl 3 and set to 16 bit data read write expand up For CSD DSD ESD FSD and GSD segment descriptor registers Itanium architecture based software should ensure DPL 3 the segment is set to 16 bit data read write expand up Software should ensure that all code stack and data segment descriptor registers do not contain encodings for any system segments Softw...

Page 133: ...ddition to the runtime checks defined on IA 32 processors and are high lighted in Table 6 4 Existing IA 32 runtime checks are listed but not highlighted Descriptor fields not listed in the table are not checked As defined in the table runtime checks are performed either on IA 32 instruction code fetches or on an IA 32 data memory reference to one of the specified segment registers These runtime ch...

Page 134: ...and GS dpl ignored GPFault 0 d bit ignored is not 16 bit type ignored data expand down read and not readable write and not writeable s p a bits are not 1 g bit limit segment limit violation data memory references to CS dpl ignored GPFault 0 d bit ignored is not 16 bit type ignored data expand down rd wr checks are ignored rd and not readable wr and not writeable rd wr checks are ignored s p a bits...

Page 135: ...mplementation can either ignore writes and return zero on reads or write the value and return the last value written on reads pf 2 IA 32 Parity Flag See the Intel 64 and IA 32 Architectures Software Developer s Manual for details af 4 IA 32 Aux Flag See the Intel 64 and IA 32 Architectures Software Developer s Manual for details zf 6 IA 32 Zero Flag See the Intel 64 and IA 32 Architectures Softwar...

Page 136: ...n un normalized denormal value placed in the target floating point register There are two canonical exponent values in the Itanium architecture which indicate single precision and double precision denormals When transferring floating point values from Itanium to IA 32 instructions it is highly recommended that typical IA 32 calling conventions be followed which pass floating point values through t...

Page 137: ...et or on a consuming IA 32 floating point instruction Dependent IA 32 floating point instructions that directly or indirectly consume a propagated NaTVal register will either propagate the NaTVal indication or generate an IA_32_Exception FPError Invalid Operand fault Whether a processor generates the fault or propagates the NaTVal is model specific In no case will the processor allow a NaTVal regi...

Page 138: ...reserved fields the implementation can either raise Reserved Register Field fault on non zero writes and return zero on reads or write the value no Reserved Register Field fault and return the last value written on reads Figure 6 1 IA 32 Floating point Control Register FCR IA 32 FCW 12 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 reserved set to 0 IC RC P...

Page 139: ...Fault FSW es FSR esa 7 Error Summary FSW c3 0 FSR c3 0 8 10 14 Numeric Condition codes FSW top FSR top 11 13 Top of IA 32 numeric stack FSW b FSR b 15 IA 32 FPU Busy always equals state of FSW ES FTW FSR tg 7 0 b 16 18 20 22 24 26 28 30 Numeric Tags 0 NotEmpty 1 Emptyc zeros 17 19 21 23 25 27 29 31 39 47 Ignored Writes are ignored reads return zero MXCSR ie FSR ie 32 SSE Invalid operation Exceptio...

Page 140: ...n IA 32 MMX technology instruction The exponent field of the corresponding floating point register bits 80 64 and the sign bit bit 81 are set to all ones The mantissa bits 63 0 is set to the MMX technology data value When a value is read from an MMX technology register by an IA 32 MMX technology instruction The exponent field of the corresponding floating point register bits 80 64 and its sign bit...

Page 141: ... odd registers When a SSE register is read using IA 32 SSE instructions The exponent field of the corresponding Itanium floating point register bits 80 64 and the sign bit bit 81 are ignored including any NaTVal encodings 6 2 3 Memory Model Overview Virtual addresses within either the Itanium or IA 32 instruction set are defined to address the same physical memory location Itanium instructions dir...

Page 142: ...te operands 16 bit effective addresses can extend above the 64K byte boundary however ending 32 bit effective addresses are truncated to 32 bits and do not extend above the 4G byte effective address boundary Refer to the Intel 64 and IA 32 Architectures Software Developer s Manual for complete details on wrap conditions IA 32 Code 16 32 bit Effective Addresses 16 or 32 bit EIP based on CSD d is us...

Page 143: ...l detect a IA 32 self modifying code event for the following conditions 1 PSR dt or PSR it is 0 or 2 there are virtual aliases to different physical addresses between the instruction and data TLBs To ensure self modifying code works correctly for IA 32 applications the operating system must ensure that there are no virtual aliases to different physical addresses between the instruction and data TL...

Page 144: ...tores from passing prior IA 32 stores by issuing a release operation or mf after the instruction set transition 6 2 4 IA 32 Usage of Intel Itanium Registers This section lists software considerations for the Itanium general and floating point registers and the ALAT when interacting with IA 32 code 6 2 4 1 Register Stack Engine Software must ensure that all dirty registers in the register stack hav...

Page 145: ...g entry to the IA 32 instruction set or on a consuming IA 32 floating point instruction Dependent IA 32 floating point instructions that directly or indirectly consume a propagated NaTVal register will either propagate the NaTVal indication or generate an IA_32_Exception FPError Invalid Operand fault Whether a processor generates the fault or propagates the NaTVal is model specific In no case will...

Page 146: ...1 135 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II Optimization Guide for the Intel Itanium Architecture ...

Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 148: ...ation For these code examples ALU operations are assumed to take one cycle and loads take two cycles to return from first level cache and that there are two load store execution units and four ALUs Other latencies and execution unit details are described as needed 1 1 Overview of the Optimization Guide Chapter 2 Introduction to Programming for the Intel Itanium Architecture provides an overview of...

Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...

Page 150: ...h this chapter provides a high level introduction to application level programming it assumes prior experience with assembly language programming as well as some familiarity with the Itanium application architecture Optimization is explored in other chapters of this guide 2 2 Registers The architecture defines 128 general purpose registers 128 floating point registers 64 predicate registers and up...

Page 151: ...p mnemonic comp dest srcs Where qp Specifies a qualifying predicate register The value of the qualifying predicate determines whether the results of the instruction are committed in hardware or discarded When the value of the predicate register is true 1 the instruction executes its results are committed and any exceptions that occur are handled as usual When the value is false 0 the results are n...

Page 152: ...e used by a template type specification Bundle templates enable processors based on the Itanium architecture to dispatch instructions with simple instruction decoding and stops enable explicit specification of parallelism There are five slot types M I F B and L six instruction types M I A F B L and 12 basic template types MII MI_I MLX MMI M_MI MFI MMF MIB MBB BBB MMB MFB Each basic template type h...

Page 153: ...on in the Itanium architecture See Chapter 3 Memory Reference for more detailed descriptions of speculative instruction behavior and application 2 4 3 Control Speculation Control speculation allows loads and their dependent uses to be safely moved above branches Support for this is enabled by special NaT bits that are attached to integer registers and by special NatVal values for floating point re...

Page 154: ... Other instructions st8 r55 r45 Cycle 0 ld8 c r3 r5 Cycle 0 check shr r7 r3 r87 Cycle 0 Note The shr instruction in this schedule could issue in cycle 0 if there were no con flicts between the advanced load and intervening stores If there were a con flict the check load instruction ld8 c would detect the conflict and reissue the load 2 5 Predication Predication is the conditional execution of an i...

Page 155: ...t of the stacked registers The stacked registers are numbered r32 up to a user configurable maximum of r127 A called procedure specifies the size of its new stack frame using the alloc instruction The procedure can use this instruction to allocate up to 96 registers per frame shared amongst input output and local values When a call is made the output registers of the calling procedure are overlapp...

Page 156: ...on all loops for the following reasons Unrolling may not fully exploit the parallelism available Unrolling is tailored for a statically defined number of loop iterations Unrolling can increase code size To maintain the advantages of loop unrolling while overcoming these limitations the Itanium architecture provides architectural support for software pipelining Software pipelining enables the compi...

Page 157: ... through p63 are rotated and floating point registers f32 through f127 are rotated 2 8 Summary The Itanium architecture provides features that reduce the effects of traditional microarchitectural performance barriers by enabling Improved ILP with a large number of registers and software scheduling of instruction groups and bundles Better branch handling through predication Reduced overhead for pro...

Page 158: ...struction scheduling 3 2 Non speculative Memory References The Itanium architecture supports non speculative loads and stores as well as explicit memory hint instructions 3 2 1 Stores to Memory Itanium integer store instructions can write either 1 2 4 or 8 bytes and 4 8 or 10 bytes for floating point stores For example a st4 instruction will write the first four bytes of a register to memory Altho...

Page 159: ...g answer if a data dependency is broken or raising a fault that should not be raised if a control dependency is broken This section describes Background material on memory reference dependencies Descriptions of how dependencies constrain code scheduling on traditional architectures Section 3 4 describes memory reference features defined in the Itanium architecture that increase the number of depen...

Page 160: ...rms describe data dependencies between instructions Write after write WAW A dependency between two instructions that write to the same register or memory location Write after read WAR A dependency between two instructions in which an instruction reads a register or memory location that a subsequent instruction writes Read after write RAW A dependency between two instructions in which an instructio...

Page 161: ...tructions were executed sequentially and in program order The pseudo code below demonstrates a memory dependency that will be observed by hardware mov r16 1 mov r17 2 st8 r15 r16 st8 r14 r17 If the address in r14 is equal to the address in r15 uni processor hardware guarantees that the memory location will contain the value in r17 2 The following RAW dependency is also legal in the same instructio...

Page 162: ...e 3 In this case the load cannot be moved past the store due to the memory dependency Stores will cause data dependencies if they cannot be disambiguated from loads or other stores In the absence of other architectural support stores can prevent moving loads and their dependent instructions The following C language statements could not be reordered unless ptr1 and ptr2 were statically known to poi...

Page 163: ...on speculative loads 3 4 2 Using Data Speculation in the Intel Itanium Architecture Data speculation in the Itanium architecture uses a special load instruction ld a called an advanced load instruction and an associated check instruction chk a or ld c to validate data speculated results When the ld a instruction is executed an entry is allocated in a hardware structure called the Advanced Load Add...

Page 164: ...uce the critical path of a sequence of instructions In the code below a load and store may access conflicting memory addresses st8 r4 r12 Cycle 0 ambiguous store ld8 r6 r8 Cycle 0 load to advance add r5 r6 r7 Cycle 2 st8 r18 r5 Cycle 3 On the generic machine model the code above would execute in four cycles but it can be rewritten using an advanced load and check ld8 a r6 r8 Cycle 2 or earlier Oth...

Page 165: ...d code is three cycles less than the original code 3 4 2 3 Terminology Review Terms related to speculation such as advanced loads and check loads have well defined meanings in the Itanium architecture The terms below were introduced in the preceding sections Data speculative load A speculative load that is statically scheduled prior to one or more stores upon which it may be dependent The data spe...

Page 166: ...tations 3 4 3 2 Control Speculation Example When a control speculative load is scheduled the compiler must insert a speculative check chk s along all paths on which results of the speculative load are consumed If a non speculative instruction other than a chk s reads a register with its NaT bit set a NaT consumption fault occurs and the operating system will terminate the program The code sequence...

Page 167: ...store floating point registers in floating point register format without surfacing exceptions due to NaTVals 3 4 3 4 Terminology Review The terms below are related to control speculation Control speculative load A speculative load that is scheduled prior to an earlier controlling branch References to speculative loads without qualifiers generally refer to control speculative loads and not data spe...

Page 168: ...ere are no unused ALAT entries the hardware may choose to invalidate an existing entry to make room for a new one Moreover exceptions associated with control speculative calculations are uncommon in correct code since they are related to events such as page faults and TLB misses However excessive control speculation can be expensive as associated instructions fill issue slots Although the static c...

Page 169: ...th ALAT entries allocated from calls in parent functions If it is unknown whether a large number of advanced loads will be executed by the called routines then the possibility that the capacity of that ALAT may be exceeded must be considered 3 5 3 Optimizing Code Size Part of the decision of when to speculate should involve consideration of any possible increases in code size Such consideration is...

Page 170: ... 3 5 4 Using Post increment Loads and Stores Post increment loads and stores can improve performance by combining two operations in a single instruction Although the text in this section mentions only post increment loads most of the information applies to stores as well Post increment loads are issued on M units and can increment their address register by either an immediate value or by the conte...

Page 171: ...o ptr will overwrite the values of a and b unless analysis can guarantee that this can never happen The use of advanced loads and checks allows code that is likely to be invariant to be removed from a loop even when a pointer cannot be disambiguated ld4 a r1 a ld4 a r2 b add r3 r1 r2 Move computation out of loop while cond chk a nc r1 recover1 L1 chk a nc r2 recover2 L2 p r3 At the end of the modu...

Page 172: ...trol speculative loads of r1 or r2 the NaT bit would have been propagated to r3 from r1 or r2 via the add instruction Another way to reduce the amount of check code is to use control flow analysis to avoid issuing extra ld c or ld a instructions For example the compiler can schedule a single check where it is known to be reached by all copies of the advanced load The portion of a flow graph shown ...

Page 173: ...chitectural support allows implementation of speculation in common scenarios in which it would normally not be allowed Speculation in turn increases ILP by making greater code motion possible thus enhancing traditional optimizations such as those involving loops Even though the speculation model can be applied in many different situations careful cost and benefit analysis is needed to insure best ...

Page 174: ... optimizations and techniques based on predication 4 2 1 Performance Costs of Branches Branches can decrease application performance by consuming hardware resources for prediction at execution time and by restricting instruction scheduling freedom during compilation 4 2 1 1 Prediction Resources Branch prediction resources include branch target buffers branch prediction tables and the logic used to...

Page 175: ...10 cycles 5 If the branch misprediction penalty could be eliminated either by reducing contention for resources or by removing the branch itself performance of the code sequence would improve by a factor of two 4 2 1 2 Instruction Scheduling Branches limit the ability of the compiler to move instructions that alter memory state or that can raise exceptions because instructions in a program are con...

Page 176: ...p4 r6 5 Additionally a predicate almost always requires a stop to separate its producing instruction and its use cmp eq p1 p2 r1 r2 p1 add r1 r2 r3 The only exception to this rule involves an integer compare or test instruction that sets a predicate that is used as the condition for a subsequent branch instruction cmp eq p1 p2 r1 r2 No stop required p1 br cond some_target 4 2 3 Optimizing Program ...

Page 177: ...ompiler could also try to schedule these statements with earlier or later code since several branches and labels have been removed as part of if conversion Since the branches have been removed no branch misprediction is possible and there will be no pipeline bubbles due to taken branches Such effects are significant in many large applications and these transformations can greatly reduce branch ind...

Page 178: ... moved above the enclosing conditional instruction because it could cause an address fault or other exception depending upon the branch direction p1 br cond some_label Cycle 0 st4 r34 r23 Cycle 1 ld4 r5 r56 Cycle 1 ld4 r6 r57 Cycle 2 no cycle 1 M s One reason why it might be desirable to move the store instruction up is to allow loads below it to move up Note Ambiguous stores are barriers beyond w...

Page 179: ...p0 r0 r0 Initialize p1 to false Other instructions cmp eq p1 p0 r0 r0 Initialize p1 to true ld8 r56 r45 Cycle 0 label_A add Cycle 1 add add add p1 st4 r23 r56 Cycle 2 Here downward code motion saves one cycle There are examples of more sophisticated situations involving cyclic scheduling other store constrained code motion or pulling code from outside loops into them but they are not described her...

Page 180: ...f p2 is true and five cycles if p2 is false When analyzing such cases consider execution weights branch misprediction probabilities and prediction costs along each path In the three scenarios presented below assume a branch misprediction costs ten cycles No instruction cache or taken branch penalties are considered 4 2 4 2 Case 1 Suppose the if clause is executed 50 of the time and the branch is n...

Page 181: ...ment Given the generic machine model that has only two load store M units If a compiler predicates and combines these two blocks then the resource availability height through the block will be four clocks since that is the minimum amount of time necessary to issue eight memory operations then_clause ld r1 r21 Cycle 0 ld r2 r22 Cycle 0 st r32 r3 Cycle 1 st r33 r4 Cycle 1 br end_if else_clause ld r3...

Page 182: ...rams is limited by non local effects such as overall branch behavior sensitivity to code size percentage of time spent servicing branch mispredictions etc In these situations the decision to use if convert or perform other speculative transformation becomes more involved 4 3 Control Flow Optimizations A common occurrence in programs is for several control flows to converge at one point or for mult...

Page 183: ...which can then be used to branch to the if block Note It is also possible to predicate the if block using p1 to avoid branch mispredic tions To reduce the cost of compound conditionals the Itanium architecture has special parallel compare instructions to optimize expressions that have and and or operations These compare instructions are special in that multiple and or compare instructions are allo...

Page 184: ...e sequence for the C code above cmp eq p1 p0 r0 r0 initialize p1 to 1 cmp ne and p1 p0 rB 15 cmp ge and p1 p0 rA r0 cmp le and p1 p0 rC r0 When used correctly and or compares write both target predicates with the same value or do not write the target predicate at all Another variation on parallel compare usage is where both the if and else part of a complex conditional are needed if rA 0 rB 10 r1 ...

Page 185: ...and then continue A variant of this is when separate paths need to compute separate results but could otherwise use the same registers since the paths are known to be complementary The use of predication can optimize these cases 4 3 3 1 Selecting One of Several Values When several control paths that each compute a different value of a single variable meet a sequence of conditionals is usually requ...

Page 186: ...e compiler to be complementary p1 add r1 r2 r3 p2 sub r5 r4 r56 p1 ld8 r7 r2 p2 ld8 r9 r6 p1 a use of r1 p2 a use of r5 p1 a use of r7 p2 a use of r9 Assuming registers r1 r5 r7 and r9 are used for compiler temporaries each of which is live only until its next use the preceding code segment can be rewritten as p1 add r1 r2 r3 p2 sub r1 r4 r56 Reuse r1 p1 ld8 r7 r2 p2 ld8 r7 r6 Reuse r7 p1 a use of...

Page 187: ...uction group extends across more than one bundle That is if both of the following conditions are true at some label L then padding previous instruction groups so that L is aligned on a cache line boundary is recommended The label is commonly branched to from out of line Examples include tops of loops and commonly executed else clauses The instruction group starting at label L extends across more t...

Page 188: ...e threads multiple logical processors through a common set of execution resources data paths functional units TLBs etc Functionally each of these hardware threads fully implements the Itanium architecture therefore software need not be aware of multi threading nor do anything special to support it From performance standpoint there are a few circumstances where it may be beneficial for software to ...

Page 189: ...ting an OS kernel idle loop It can provide this information to the processor also by executing a hint pause instruction This encourages the processor to allocate more processor resources to other threads of execution for the next while Resource allocation within the processor eventually reverts to a fair allocation so there s no need for software to hint that it is no longer in an idle loop Conver...

Page 190: ...y This chapter has presented a wide variety of topics related to optimizing control flow including predication branch architecture multiway branches parallel compares instruction stream alignment and branch hints Although such topics could have been presented in separate chapters the interplay between the features is best understood by their effects on each other Predication and its interplay on s...

Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...

Page 192: ... is a more general calculation not a simple count and the trip count is unknown Both types are directly supported in the architecture The Itanium architecture improves the performance of conventional counted loops by providing a special counted loop branch the br cloop instruction and the Loop Count application register LC The br cloop instruction does not have a branch predicate Instead the branc...

Page 193: ... For simplicity assume that the loop trip count is a constant N that is a multiple of two so that no exit branch is required after the first copy of the loop body L1 ld4 r4 r5 4 Cycle 0 ld4 r14 r5 4 Cycle 1 add r7 r4 r9 Cycle 2 add r17 r14 r9 Cycle 3 st4 r6 r7 4 Cycle 3 st4 r6 r17 4 Cycle 4 br cloopL1 Cycle 4 The above code does not expose as much ILP as possible The two loads are serialized becau...

Page 194: ...he loop 5 3 2 Software Pipelining Software pipelining is a technique that seeks to overlap loop iterations in a manner that is analogous to hardware pipelining of a functional unit Each iteration is partitioned into stages with zero or more instructions in each stage A conceptual view of a single pipelined iteration of the loop from page 1 181 in which each stage is one cycle long is shown below s...

Page 195: ...e iterations already in progress are completed draining the pipeline In the above example iterations 3 5 are completed during the epilog phase The software pipeline is coded as a loop that is very different from the original source code loop To avoid confusion when discussing loops and loop iterations we use the term source loop and source iteration to refer back to the original source code loop a...

Page 196: ...ng point registers CFM rrb pr for the predicate registers The software pipelined loop branches decrement all the rrb registers simultaneously Below is an example of register rotation The swp_branch pseudo instruction represents a software pipelined loop branch L1 ld4 r35 r4 4 post increment by 4 st4 r5 r37 4 post increment by 4 swp_branchL1 The value that the load writes to r35 is read by the stor...

Page 197: ... The registers are rotated rrb registers are decremented 4 The Loop Count LC and or the Epilog Count EC application registers are selectively decremented There are two types of software pipelined loop branches counted and while 5 4 3 1 Counted Loop Branches Figure 5 1 shows a flowchart for modulo scheduled counted loop branches During the prolog and kernel phase a decision to continue kernel loop ...

Page 198: ...tion decision is located somewhere other than the bottom of the loop 5 4 3 2 Counted Loop Example A conceptual view of a pipelined iteration of the example counted loop on page 1 181 with II equal to one is shown below stage 1 p16 ld4 r4 r5 4 stage 2 p17 empty stage stage 3 p18 add r7 r4 r9 stage 4 p19 st4 r6 r7 4 To generate an efficient pipeline the compiler must take into account the latencies ...

Page 199: ...top is executed in cycle 202 EC is equal to 1 EC is decremented the registers are rotated one last time and execution falls out of the kernel loop Note After this final rotation EC and the stage predicates p16 p19 are 0 It is desirable to allocate variables that are loop variant to the rotating portion of the register file whenever possible to preserve space in the static portion for loop invarian...

Page 200: ...val II The number of cycles between the start of successive source iterations in a software pipelined loop Each stage of the pipeline is II cycles long Prolog The first phase of a software pipelined loop in which the pipeline is filled Kernel The second phase of a software pipelined loop in which the pipeline is full Epilog The third phase of a software pipelined loop in which the pipeline is drai...

Page 201: ...tructure of the loop This section discusses do while loops in which the loop condition is computed at the bottom of the loop Optimizing compilers often transform while loops where the condition is computed at the top of the loop into do while loops by moving the condition computation to the bottom of the loop and placing a copy of the condition computation prior to the loop to reduce the number of...

Page 202: ...pipelined version of the while loop on page 1 190 is shown below A check for the speculative load is included mov ec 2 mov pr rot 1 16 PR16 1 rest 0 L1 ld4 s r32 r5 4 Cycle 0 p18 chk s r34 recovery Cycle 0 p18 cmp ne p17 p0 r34 r0 Cycle 0 p18 st4 r6 r34 4 Cycle 0 p17 br wtop sptkL1 Cycle 0 L2 To explain why the kernel loop is programmed the way it is it is helpful to examine a trace of the executi...

Page 203: ... this is the final source iteration the result of the compare is a zero and p17 is unmodified The zero that was rotated into p17 from p16 causes the br wtop to fall through to the loop exit EC is decremented and the registers are rotated one last time In the above example there are no epilog stages As soon as the branch predicate becomes zero the kernel loop is exited 5 5 2 Loops with Predicated I...

Page 204: ... p20 fcmp ge uncp31 p32 f36 f42 p33 stfs r9 f43 4 L2 br ctop sptkL1 5 5 3 Multiple exit Loops All of the example loops discussed so far have a single exit at the bottom of the loop The loop below contains multiple exits an exit at the bottom associated with the loop closing branch and an early exit in the middle L1 ld4 r4 r5 4 ld4 r9 r4 cmp eq unc p1 p0 r9 r7 p1 br cond exit early exit add r8 1 r8...

Page 205: ...imiting the II It is assumed that either r8 is not live out at the early exit or that compensation code is added at the target of the early exit The pipeline for this loop is shown below with the stage predicate assignments but no other rotating register allocation The compare and the branch at the end of stage 4 are not assigned stage predicates because they already have qualifying predicates in ...

Page 206: ...generated for the early exit The LC register is initialized to five more than 199 because there are five speculative stages The purpose of the first five executions of br ctop is simply to keep the loop going until the first valid branch predicate is generated for the br cond During each of these executions LC is decremented so five must be added to the LC initialization amount to compensate A sma...

Page 207: ...ed loads allow some code that is likely to be invariant to be removed from loops thus reducing the resource requirements of the loop Use of advanced loads also can reduce the critical path through the iterations allowing a smaller II to be achieved See Chapter 3 Memory Reference for more information on advanced loads However caution must be exercised when using advanced loads with register rotatio...

Page 208: ...t any ALAT misses because the two advanced load check load pairs never create more than 32 simultaneously live ALAT entries L1 p16 ld4 a r32 r8 p31 ld4 c r47 r8 p16 ld4 a r48 r9 p31 ld4 c r63 r9 br ctop L1 When the code cannot be arranged to avoid ALAT misses it may be best to assign static registers to the destinations of the advanced loads and unroll the loop to explicitly rename the destination...

Page 209: ...ve II of 1 5 for the original source loop Below is a possible pipeline for the unrolled loop stage 1 p16 ld4 r4 r5 8 odd iteration p16 ld4 r9 r8 8 odd iteration stage 2 p16 ld4 r14 r15 8 even iteration p16 ld4 r19 r18 8 even iteration empty cycle stage 3 p18 add r7 r4 r9 odd iteration p17 add r17 r14 r19 even iteration stage 4 empty cycle p19 st4 r6 r7 8 odd iteration p18 st4 r16 r17 8 even iterat...

Page 210: ... number of epilog stages is three starting after the br cexit and ending at the br ctop If the trip count is even the number of epilog stages is two starting after the br ctop and ending at the br ctop The EC must be set to account for the maximum number of epilog stages Thus for this example EC is initialized to four When the trip count is even one extra epilog stage is executed and br exit L3 is...

Page 211: ...nrolling is most beneficial for small loops because the potential performance degradation due to under utilized resources is greater and the effect of unrolling on the instruction cache performance is smaller compared to large loops 5 5 7 Implementing Reductions In the following example a sum of products is accumulated in register f7 mov f7 0 initialize sum L1 ldfs f4 r5 4 ldfs f9 r8 4 fma f7 f4 f...

Page 212: ...16 ldfs f50 r5 4 Cycle 0 p16 ldfs f60 r8 4 Cycle 0 p25 fma f41 f59 f69 f46 Cycle 0 br ctop sptk L1 Cycle 0 fadd f10 f42 f43 add sums fadd f11 f44 f45 fadd f12 f10 f11 fadd f7 f12 f46 5 5 8 Explicit Prolog and Epilog In some cases an explicit prolog is necessary for code correctness This can occur in cases where a speculative instruction generates a value that is live across source iterations Consi...

Page 213: ...e but cannot be controlled during the first kernel iteration because it is speculative and does not have a stage predicate This problem can be solved by peeling off one iteration of the kernel and excluding from that copy any instructions that are not in the first stage of the pipeline as shown below Note that the destination register numbers for the instructions in the explicit prolog have been i...

Page 214: ...dicates and eliminates the need for software pipelined loop branches between prolog stages Thus the entire prolog is independent of the initialization of LC and EC that precede it The register numbers in the prolog and epilog have been adjusted to account for the lack of rotation between stages during those phases Note This code assumes that the trip count of the source loop is at least four If th...

Page 215: ...4 br cloop L1 In traditional architectures the mov instruction can only be removed by unrolling the loop twice One instruction is removed from the loop at the cost of two times code expansion The register rotation feature in the Itanium architecture can be used to eliminate the mov instruction without unrolling the loop add r8 r5 4 ld4 r33 r5 4 a i L1 ld4 r32 r8 4 a i 1 add r7 r33 r32 st4 r6 r7 4 ...

Page 216: ... compute complex calculations on regularly structured data others simply copy data from one place to another while others perform gather scatter type operations that simultaneously compute and rearrange data The following sections describe code characteristics that limit performance and how they affect these different kinds of loops 6 2 1 Execution Latency Loops often contain recurrence relationsh...

Page 217: ...rity between the processor and memory creates a general memory latency problem for most codes there are a few special conditions in floating point codes that exacerbate its impact One such condition is the use of indirect addressing Gather scatter codes in general and sparse matrix vector multiply code below in particular are good examples DO 1 ROW 1 N R ROW 0 0d0 DO 1 I ROWEND ROW 1 1 ROWEND ROW ...

Page 218: ...chitectural features that reduce the impact of the performance limiters described in Section 6 2 using illustrative examples 6 3 1 Large and Wide Floating point Register Set As machine cycle times are reduced the latency in cycles of the execution units generally increases As latency increases register pressure due to multiple operations in flight also increases Furthermore as multiple execution u...

Page 219: ... 16 Load B k j ldfd f8 r8 16 Load B k j 1 fma s0 f9 f5 f7 f9 on C i j fma s0 f10 f6 f7 f10 on C i 1 j fma s0 f11 f5 f8 f11 on C i j 1 fma s0 f12 f6 f8 f12 on C i 1 j 1 br cloop L1 With 128 available registers the outer loops of i and j could be unrolled by 8 each so that 64 multiplies and adds can be performed by loading just 16 operands The floating point register file is divided into two regions...

Page 220: ... same loop can be scheduled with an initiation interval of just 2 clocks without unrolling and 1 5 clocks if unrolled by 2 L1 p24 stf r7 f57 8 Cycle 15 17 p21 fadd f57 f37 f47 Cycle 9 11 13 p16 ldf f32 r5 8 Cycle 0 2 4 6 p16 ldf f42 r6 8 Cycle 0 2 4 6 br ctop L1 It is thus often advantageous to modulo schedule and then unroll if required Please see Chapter 5 Software Pipelining and Loop Support fo...

Page 221: ...loating point computation rather than the peak computational rate the multiply add operation can often be used advantageously Consider the Livermore FORTRAN Kernel 9 General Linear Recurrence Equations DO 191 k 1 n B5 k KB5I SA k STB5 SB k STB5 B5 k KB5I STB5 191CONTINUE Since there is a true data dependency between the two statements on variable B5 k KB5I and a loop carried dependency on variable...

Page 222: ...pa s0 f8 p6 f6 f7 p6 fma s1 f9 f6 f8 f0 p6 fnma s1 f10 f7 f8 f1 p6 fma s1 f9 f10 f9 f9 p6 fma s1 f11 f10 f10 f0 p6 fma s1 f8 f10 f8 f8 p6 fma s1 f9 f11 f9 f9 p6 fma s1 f10 f11 f11 f0 p6 fma s1 f8 f11 f8 f8 p6 fma d s1 f9 f10 f9 f9 p6 fma s1 f8 f10 f8 f8 p6 fnma d s1 f6 f7 f9 f6 p6 fma d s0 f8 f6 f8 f9 Square Root Max Throughput a 14 Instructions 10 Groups a The following value is assumed preset f1...

Page 223: ...l throughput be improved whereas in hardware these operations are typically not pipelineable Another significant advantage of the software based divide square root computations is that the accuracy of the result can be controlled by the user and can be traded off for speed This trade off is often used in graphics codes where the divide accuracy of about 14 bits suffices and the sequence can be sho...

Page 224: ...te if they ever are One of these additional status fields typically status fields 2 or 3 can be used for this purpose Consider the Livermore FORTRAN kernel 16 Monte Carlo Search DO 470 k 1 n k2 k2 1 j4 j2 k k j5 ZONE j4 IF j5 n 420 475 450 415 IF j5 n II 430 425 425 420 IF j5 n LB 435 415 415 425 IF PLAN j5 R 445 480 440 430 IF PLAN j5 S 445 480 440 435 IF PLAN j5 T 445 480 440 440 IF ZONE j4 1 45...

Page 225: ...classes Consider the following code used for screening invalid operands for square root computation IF A EQ NATVAL OR A EQ SNAN OR A EQ QNAN OR A EQ NEG_INF OR A EQ POS_INF OR A LT 0 0D0 THEN WRITE INVALID INPUT OPERAND ELSE WRITE SQUARE ROOT SQRT A ENDIF The above conditional can be determined by two fclass instructions as indicated below fclass m p1 p2 f2 0x1E3 Detect NaTVal NaN Inf or Inf p2 fc...

Page 226: ...parison on the absolute value of the input operands i e they ignore the sign bit but otherwise operate in the same non commutative way as the fmin fmax instructions 6 3 6 3 Integer Floating point Conversion Unsigned integers are converted to their equivalently valued floating point representations by simply moving the integer to the significand field of the floating point register using the setf s...

Page 227: ...a prefetch instructions are defined In order to maximize the utilization of caches the architecture defines locality attributes as part of memory access instructions to help control the allocation and de allocation of data in the caches For instances where the instruction bandwidth may become a performance limiter the architecture defines machine hints to trigger relevant instruction prefetches 6 ...

Page 228: ...cate the nature of the locality of the subsequent accesses on that data and to indicate which level of cache that data needs to be promoted to While regular loads can also be used to achieve the effect of data prefetching if the load target is never used lfetches can more effectively reduce the memory latency without using floating point registers as targets of the data being prefetched Furthermor...

Page 229: ...d Loop Support that help to overcome some of these performance limiters Architectural support for speculation rounding and precision control are also described Examples in the chapter include how to implement floating point division and square root common scientific computations such as reductions use of features such as the fma instruction and various Livermore kernels ...

Page 230: ......

Page 231: ...Intel Itanium Architecture Software Developer s Manual Volume 2 System Architecture Revision 2 3 May 2010 Document Number 245318 ...

Page 232: ...hanges to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them Intel processors based on the Itanium architec...

Page 233: ...Serialization 2 17 3 2 1 Instruction Serialization 2 18 3 2 2 Data Serialization 2 18 3 2 3 Definition of In flight Resources 2 19 3 3 System State 2 20 3 3 1 System State Overview 2 20 3 3 2 Processor Status Register PSR 2 23 3 3 3 Control Registers 2 29 3 3 4 Global Control Registers 2 31 3 3 5 Interruption Control Registers 2 36 3 3 6 External Interrupt Control Registers 2 42 3 3 7 Banked Gener...

Page 234: ...6 Interruption Priorities 2 108 5 6 1 IA 32 Interruption Priorities and Classes 2 111 5 7 IVA based Interruption Vectors 2 113 5 8 Interrupts 2 114 5 8 1 Interrupt Vectors and Priorities 2 118 5 8 2 Interrupt Enabling and Masking 2 119 5 8 3 External Interrupt Control Registers 2 121 5 8 4 Processor Interrupt Block 2 127 5 8 5 Edge and Level sensitive Interrupts 2 131 6 Register Stack Engine 2 133...

Page 235: ...4 2 Exiting IA 32 Processes 2 253 10 5 IA 32 Instruction Set Behavior Summary 2 253 10 6 System Memory Model 2 259 10 6 1 Virtual Memory References 2 260 10 6 2 IA 32 Virtual Memory References 2 261 10 6 3 IA 32 TLB Forward Progress Requirements 2 261 10 6 4 Multiprocessor TLB Coherency 2 262 10 6 5 IA 32 Physical Memory References 2 262 10 6 6 Supervisor Accesses 2 263 10 6 7 Memory Alignment 2 2...

Page 236: ...res 2 353 11 10 1 PAL Procedure Summary 2 354 11 10 2 PAL Calling Conventions 2 358 11 10 3 PAL Procedure Specifications 2 365 11 11 PAL Virtualization Services 2 486 11 11 1 PAL Virtualization Service Invocation Convention 2 486 11 11 2 PAL Virtualization Service Specifications 2 488 Part II System Programmer s Guide 1 About the System Programmer s Guide 2 503 1 1 Overview of the System Programme...

Page 237: ...epc Demoting Branch Return 2 555 4 4 2 break rfi 2 556 4 4 3 NaT Checking for NaTs in System Calls 2 556 4 5 Context Switching 2 557 4 5 1 User level Context Switching 2 557 4 5 2 Context Switching in an Operating System Kernel 2 558 5 Memory Management 2 561 5 1 Address Space Model 2 561 5 1 1 Regions 2 561 5 1 2 Protection Keys 2 564 5 2 Translation Lookaside Buffers TLBs 2 565 5 2 1 Translation...

Page 238: ...2 and Itanium Architecture based Code 2 600 9 3 1 Instruction Breakpoints 2 600 9 3 2 Data Breakpoints 2 600 9 3 3 Single Step Traps 2 601 9 3 4 Taken Branch Traps 2 601 10 External Interrupt Architecture 2 603 10 1 External Interrupt Basics 2 603 10 2 Configuration of External Interrupt Vectors 2 604 10 3 External Interrupt Masking 2 604 10 3 1 PSR i 2 604 10 3 2 IVR Reads and EOI Writes 2 605 10...

Page 239: ...Counter ITC AR44 2 32 3 5 Interval Timer Match Register ITM CR1 2 32 3 6 Interval Timer Offset Register ITO CR4 2 34 3 7 Interruption Vector Address IVA CR2 2 35 3 8 Page Table Address PTA CR8 2 35 3 9 Interruption Status Register ISR CR17 2 36 3 10 Interruption Instruction Bundle Pointer IIP CR19 2 38 3 11 Interruption Faulting Address IFA CR20 2 39 3 12 Interruption TLB Insertion Register ITIR 2...

Page 240: ...ion Register LRR CR80 81 2 127 5 15 Processor Interrupt Block Memory Layout 2 128 5 16 Address Format for Inter processor Interrupt Messages 2 129 5 17 Data Format for Inter processor Interrupt Messages 2 129 6 1 Relationship Between Physical Registers and Backing Store 2 134 6 2 Backing Store Memory Format 2 134 6 3 Four Partitions of the Register Stack 2 137 7 1 Data Breakpoint Registers DBR 2 1...

Page 241: ...tion Acceleration Control vac 2 329 11 14 Virtualization Disable Control vdc 2 330 11 15 PAL Virtualization Intercept Handoff Opcode GR25 2 335 11 1 operation Parameter Layout 2 371 11 2 config_info_1 Return Value 2 374 11 3 config_info_2 Return Value 2 375 11 4 config_info_1 Return Value 2 378 11 5 config_info_2 Return Value 2 378 11 6 config_info_3 Return Value 2 379 11 7 cache_protection Fields...

Page 242: ... 48 Layout of vm_info_1 Return Value 2 468 11 49 Layout of vm_info_2 Return Value 2 469 11 50 Layout of TR_valid Return Value 2 470 Part II System Programmer s Guide 2 1 Intel Itanium Ordering Semantics 2 512 2 2 Interaction of Ordering and Accesses to Sequential Locations 2 524 2 3 Why a Fence During Context Switches is Required in the Intel Itanium Architecture 2 526 2 4 Spin Lock Code 2 527 2 5...

Page 243: ...Ordering Semantics and Instructions 2 83 4 16 Ordering Semantics 2 84 4 17 ALAT Behavior on Non faulting Advanced Check Loads 2 88 5 1 ISR Settings for Non access Instructions 2 104 5 2 Programming Models 2 105 5 3 Exception Qualification 2 106 5 4 Qualified Exception Deferral 2 107 5 5 Spontaneous Deferral 2 107 5 6 Interruption Priorities 2 109 5 7 Interruption Vector Table IVT 2 113 5 8 Interru...

Page 244: ...n Field Values 2 291 11 4 status Field Values 2 292 11 5 Geographically Significant Processor Identifier Fields 2 293 11 6 state Field Values 2 294 11 7 Processor State Parameter Fields 2 299 11 8 Software Recovery Bits in Processor State Parameter 2 301 11 9 PSP Bit Settings for Unconsumed Data poisoning Events on MCA 2 302 11 10 NaT Bits for Saved GRs 2 305 11 11 function Field Values 2 305 11 1...

Page 245: ... Virtualization Disables Summary 2 346 11 47 Supported Virtualization Optimization Combinations 2 349 11 48 PAL Procedure Index Assignment 2 354 11 49 PAL Cache and Memory Procedures 2 354 11 50 PAL Processor Identification Features and Configuration Procedures 2 355 11 51 PAL Machine Check Handling Procedures 2 356 11 52 PAL Power Information and Management Procedures 2 356 11 53 PAL Processor Se...

Page 246: ...106 err_struct_info Bus Processor Interconnect 2 431 11 107 capabilities vector for Bus Processor Interconnect 2 431 11 108 hw_check Fields 2 432 11 109 control_word Layout 2 438 11 110 pm_info Fields 2 440 11 111 pm_buffer Layout 2 440 11 112 Processor Features 2 447 11 113 Values for ddt Field 2 452 11 114 info_request Return Value 2 454 11 115 RSE Hints Implemented 2 455 11 116 Processor Hardwa...

Page 247: ...oads 2 519 2 12 Bypassing to a Semaphore Operation 2 521 2 13 Bypassing from a Semaphore Operation 2 521 2 14 Enforcing the Same Visibility Order to All Observers in a Coherence Domain 2 522 2 15 Intel Itanium Architecture Obeys Causality 2 523 2 16 Potential Pipeline Behaviors of the Branch at x from Figure 2 9 2 534 3 1 Interruption Handler Execution Environment PSR and RSE CFLE Settings 2 540 4...

Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 249: ...2 1 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part I System Architecture Guide ...

Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 251: ...es the Itanium application architecture including application level resources programming environment and the IA 32 application interface This volume also describes optimization techniques used to generate high performance software 1 1 1 Part 1 Application Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manua...

Page 252: ...em software 1 2 1 Part 1 System Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Chapter 2 Intel Itanium System Environment introduces the environment designed to support execution of Itanium architecture based operating systems running IA 32 or Itanium architecture based applications Chapter 3 System S...

Page 253: ...rating systems need to preserve Itanium register contents and state This chapter also describes system architecture mechanisms that allow an operating system to reduce the number of registers that need to be spilled filled on interruptions system calls and context switches Chapter 5 Memory Management introduces various memory management strategies Chapter 6 Runtime Support for Control and Data Spe...

Page 254: ...er 2 Instruction Reference provides a detailed description of all Itanium instructions organized in alphabetical order by assembly language mnemonic Chapter 3 Pseudo Code Functions provides a table of pseudo code functions which are used to define the behavior of the Itanium instructions Chapter 4 Instruction Formats describes the encoding and instruction format instructions Chapter 5 Resource and...

Page 255: ...em Environment The operating system environment that supports the execution of both IA 32 and Itanium architecture based code Itanium Architecture based Firmware The Processor Abstraction Layer PAL and System Abstraction Layer SAL Processor Abstraction Layer PAL The firmware layer which abstracts processor features that are implementation dependent System Abstraction Layer SAL The firmware layer w...

Page 256: ...m firmware 1 7 Revision History Date of Revision Revision Number Description March 2010 2 3 Added information about illegal virtualization optimization combinations and IIPA requirements Added Resource Utilization Counter and PAL_VP_INFO PAL_VP_INIT and VPD vpr changes New PAL_VPS_RESUME_HANDLER parameter to indicate RSE Current Frame Load Enable setting at the target instruction PAL_VP_INIT_ENV i...

Page 257: ...WN procedures Allows IPI redirection feature to be optional Undefined behavior for 1 byte accesses to the non architected regions in the IPI block Modified insertion behavior for TR overlaps See Vol 2 Part I Ch 4 for details Bus parking feature is now optional for PAL_BUS_GET_FEATURES Introduced low power synchronization primitive using hint instruction FR32 127 is now preserved in PAL calling con...

Page 258: ...ction 2 2 and 3 Part I Volume 3 Added Performance Counter Standardization Sections 7 2 3 and 11 6 Part I Volume 2 Added Freeze Bit Functionality in Context Switching and Interrupt Generation Clarification Sections 7 2 1 7 2 2 7 2 4 1 and 7 2 4 2 Part I Volume 2 Added IA_32_Exception Debug IIPA Description Change Section 9 2 Part I Volume 2 Added capability for Allowing Multiple PAL_A_SPEC and PAL_...

Page 259: ...TURES changes extend calls to allow implementation specific feature control Section 11 8 3 Split PAL_A architecture changes Section 11 1 6 Simple barrier synchronization clarification Section 13 4 2 Limited speculation clarification added hardware generated speculative references Section 4 4 6 PAL memory accesses and restrictions clarification Section 11 9 PSP validity on INITs from PAL_MC_ERROR_I...

Page 260: ...simplify the call to provide more information regarding machine check Chapter 11 PAL_ENTER_IA_32_Env call changes entry parameter represents the entry order SAL needs to initialize all the IA 32 registers properly before making this call Chapter 11 PAL_CACHE_FLUSH added a new cache_type argument Chapter 11 PAL_SHUTDOWN removed from list of PAL calls Chapter 11 Clarified memory ordering changes Cha...

Page 261: ...rchitectures debug performance monitoring control registers and the set of privileged instructions 2 1 Processor Boot Sequence Figure 2 1 shows the defined boot sequence Unlike IA 32 processors which power up in 32 bit Real Mode processors in the Itanium processor family power up in the Itanium System Environment running Itanium architecture based code Processor initialization testing memory and p...

Page 262: ...ating system the processor directly executes all performance critical but non sensitive IA 32 application level instructions Accesses to sensitive system resources interrupt flags control registers TLBs etc are intercepted into the Itanium architecture based operating system Using this set of intervention hooks an Itanium architecture based operating system can emulate or virtualize an IA 32 syste...

Page 263: ...he Itanium system environment is defined by chapters Chapter 9 describes IA 32 interruption handler entry points Chapter 10 Itanium Architecture based Operating System Interaction Model with IA 32 Applications describes how IA 32 applications interact with Itanium architecture based operating systems ...

Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...

Page 265: ...el PL of the virtual page and the CPL 3 2 Serialization For all application and system level resources apart from the control register file the processor ensures values written to a register are observed by instructions in subsequent instruction groups This is termed data dependency For example writes to general registers floating point and application registers are observed by subsequent reads of...

Page 266: ...rol Registers on page 2 29 The instructions Return from Interruption rfi and Instruction Serialize srlz i perform explicit instruction serialization An interruption performs an implicit instruction serialization operation so the first instruction group in the interruption handler will observe the serialized state Instruction Serialization Example mov ibr reg reg move to instruction debug register ...

Page 267: ...licit instruction or data serialization is changed by one or more writers that resource is said to be in flight until the required serialization is performed There can be multiple in flight values if multiple writers have occurred since the last serialization An instruction that reads an in flight resource will see one of the in flight values or the state prior to any of the unserialized writers H...

Page 268: ...re defined in detail in Interval Time Counter and Match Register ITC AR44 and ITM CR1 on page 2 32 Resource Utilization Facility A 64 bit resource utilization counter is provided for privileged and non privileged use This counts the number of Interval Timer cycles consumed by this logical processor See Section 3 1 8 11 Resource Utilization Counter RUC AR 45 on page 1 31 Debug Breakpoint Registers ...

Page 269: ... domains Please see the processor specific documentation for further information on the number of Protection Key Registers implemented on the Itanium processor Refer to Protection Keys on page 2 59 for details Translation Lookaside Buffer TLB Holds recently used virtual to physical address mappings The TLB is divided into Instruction ITLB Data DTLB Translation Registers TR and Translation Cache TC...

Page 270: ...rmance Monitor 63 0 Banked Reg 0 0 0 1 0 General Registers 0 NaTs CFM Current Frame Marker Performance Monitor 63 0 pr63 pr15 pr16 cr17 cr16 IPSR cr20 IFA cr19 cr24 IIM cr25 IHA cr64 External 37 0 pmd0 pmd1 pmdn pmc0 pmc1 pmcn ibr0 ibr1 ibrn dbr0 dbr1 dbrn pkrn itr1 itrn itc dtr0 dtr1 dtrn dtc cr8 Interrupt Registers cr81 Processor Identifiers 63 0 cpuid0 cpuid1 cpuidn Configuration Registers Data...

Page 271: ...er PSR 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv rt tb lp db si di pp sp dfh dfl dt rv pk i ic rv mfh mfl ac up be rv 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 rv vm ia bn ed ri ss dd da id it mc is cpl Table 3 1 Processor Status Register Instructions Mnemonic Description Operation Instr Type Seria...

Page 272: ...ian This bit is ignored for IA 32 data references which are always performed little endian Instruction fetches are always performed little endian DCR be dataa up 2 User Performance monitor enable When 1 performance monitors configured as user monitors are enabled to count events including IA 32 When 0 user configured monitors are disabled See Performance Monitoring on page 2 155 for details unchan...

Page 273: ...ed by the next instruction Toggling PSR i from one to zero via Move to PSR l requires data serialization When executing IA 32 instructions external interrupts are enabled if PSR i and CFLG if is 0 or EFLAG if is 1 NMI interrupts are enabled if PSR i is 1 regardless of EFLAG if 0 clear implicit serialization set datad pk 15 Protection Key enable When 1 and PSR it is 1 instruction references includi...

Page 274: ...ed Instruction Set Transition fault This bit doesn t restrict instruction set transitions due to interruptions or rfi 0 data si 23 Secure Interval timer When 1 the Interval Time Counter ITC register and the Resource Utilization Counter RUC are readable only by privileged code non privileged reads result in a Privileged Register fault When 0 ITC and RUC are readable at any privilege level System so...

Page 275: ...and the IA 32 jmpe instruction 0 rfig br iah mc 35 Machine Check abort mask When 1 machine check aborts are masked When 0 machine check aborts can be delivered including IA 32 instruction set execution Processor operation is undefined if PSR mc is 1 and a transition is made to execute IA 32 code unchanged 1i rfig it 36 Instruction address Translation When 1 virtual instruction addresses are transl...

Page 276: ...he current bundle is a speculative load the operation is forced to indicate a deferred exception by setting the load target register to NaT or NaTVal No memory references are performed however any address post increments are performed If the operation is a speculative advanced load the ALAT entry corresponding to the load address and target register is purged If the operation is an lfetch instruct...

Page 277: ...SR i are observed before a given point in program execution e Requires instruction or data serialization based on whether the dependent use is an instruction fetch access or data access f CPL can be modified due to interruptions Return From Interruption rfi Enter Privilege Code epc and Branch Return br ret instructions g Can only be modified by the Return From Interruption rfi instruction rfi perf...

Page 278: ...4 CMCV Corrected Machine Check Vector dataa CR75 79 reserved reserved CR80 LRR0 Local Redirection Register 0 dataa CR81 LRR1 Local Redirection Register 1 dataa Reserved CR82 127 reserved reserved a Serialization is needed to ensure external interrupt masking new interval timer match values or new interruption table addresses are observed before a given point in program execution b Serialization is...

Page 279: ...ck for atomicity when the memory transaction is made to non write back memory or are unaligned across an implementation specific non supported alignment boundary When 0 and an IA 32 atomic memory reference is defined as requiring a read modify write operation external to the processor under external bus lock the processor may either execute the transaction as a series of non atomic transactions or...

Page 280: ...ITC is guaranteed to be clocked at a constant rate even if the instruction execution frequency may vary The ITC counting rate is not affected by power management mechanisms dk 10 Defer Key Miss faults only When 1 and a Key Miss fault is deferred lower priority Access Bit Access Rights or Debug faults may still be delivered A Key Miss fault deferred or not precludes concurrent Key Permission faults...

Page 281: ...erval time counter in an multiprocessor system nor is it synchronized with the wall clock Software must calibrate interval timer ticks to wall clock time and periodically adjust for drift In a multiprocessor system a processor s ITC is not architecturally guaranteed to be clocked synchronously with the ITC s on other processors and may not be clocked at the same nominal clock rate as ITC s on othe...

Page 282: ...for further information on the level of sampling error of the Itanium processor RUC should only be written by Virtual Machine Monitors other Operating Systems should not write to RUC but should only read it The RUC register is not supported on all processor implementations Software can check CPUID register 4 to determine the availability of this feature The RUC register is reserved when this featu...

Page 283: ...tor Address IVA CR2 63 15 14 0 IVA ig 49 15 Figure 3 8 Page Table Address PTA CR8 63 15 14 9 8 7 2 1 0 base rv vf size rv ve 49 6 1 6 1 1 Table 3 6 Page Table Address Fields Field Bits Description ve 0 VHPT Enable When 1 the processor is enabled to walk the VHPT size 7 2 VHPT Size VHPT table size in power of 2 increments table size is 2size bytes Size generates a mask that is logically AND ed with...

Page 284: ...uption and if PSR ic is 1 the IPSR receives the value of the PSR The IPSR IIP and IFS are used to restore processor state on a Return From Interruption rfi The IPSR has the same format as PSR see Processor Status Register PSR on page 2 23 for details 3 3 5 2 Interruption Status Register ISR CR17 The ISR receives information related to the nature of the interruption and is written by the processor ...

Page 285: ...truction set sp 36 Speculative load exception Interruption is associated with a speculative load instruction This bit is always 0 for interruptions taken in the IA 32 instruction set rs 37 Register Stack Interruption is associated with a mandatory RSE fill or spill This bit is always 0 for interruptions taken in the IA 32 instruction set ir 38 Incomplete Register frame The current register frame i...

Page 286: ...PA_MSB 1 are set to 0 2 IIP may be written with the full unimplemented physical address from IP When an rfi is executed with an unimplemented address in IIP an unimplemented virtual address if IPSR it is 1 or an unimplemented physical address if IPSR it is 0 and an Unimplemented Instruction Address trap is taken an implementation may optionally leave IIP unchanged preserving the unimplemented addr...

Page 287: ...uption Faulting Address IFA CR20 63 0 IFA 64 Figure 3 12 Interruption TLB Insertion Register ITIR 63 32 31 8 7 2 1 0 rv ci key ps rv ci 32 24 6 2 Table 3 8 ITIR Fields Field Bits Description rv ci 63 32 1 0 Reserved Check on Insert On a read these fields may return zeros or the value last written to them If a non zero value is written a Reserved Register Field fault may be raised on the mov to ITI...

Page 288: ...SR ic is one accesses to IIPA cause an Illegal Operation fault When PSR ic is zero IIPA is not updated by hardware and can be read and written by software This permits low level code to preserve IIPA across interruptions If the PSR ic bit is explicitly cleared e g by using rsm then the contents of IIPA are undefined Only when the PSR ic bit is cleared by an interruption is the value of IIPA define...

Page 289: ... a result of the fault not by the instruction itself 3 3 5 9 Interruption Hash Address IHA CR25 The IHA Figure 3 16 is loaded with the address of the Virtual Hash Page Table VHPT entry the processor referenced or would have referenced to resolve a translation fault The IHA is written on interruptions by the processor when PSR ic is 1 Refer to VHPT Hashing on page 2 65 for complete details See Tabl...

Page 290: ...ruptions instruction bundle information is not provided and the values in IIB registers are undefined The IIB registers are not supported on all processor implementations Software can call PAL_PROC_GET_FEATURES to determine the availability of this feature see PAL_PROC_GET_FEATURES Get Processor Dependent Features 17 on page 2 446 for details The IIB registers are reserved when this feature is not...

Page 291: ...nium architecture based application code is executed within register bank 1 If IA 32 or Itanium architecture based application code executes out of register bank 0 the application register state including IA 32 will be lost on any interruption During interruption processing the operating system uses register bank 0 as the initial working register context Usage of these additional registers is dete...

Page 292: ... may provide an implementation dependent mechanism to disable virtual machine features see PAL_PROC_GET_FEATURES Get Processor Dependent Features 17 on page 2 446 for details Processor virtualization is largely invisible to system software and therefore its effects on virtualized instructions are not discussed in this document except on the instruction description pages themselves Table 3 10 Virtu...

Page 293: ...s TLB that resides in memory and can be automatically searched by the processor A particular operating system page table format is not dictated However the VHPT is designed to mesh with two common translation structures the virtual linear page table and hashed page table Enabling of the VHPT and the size of the VHPT are completely under software control Sparse 64 bit virtual addressing is supporte...

Page 294: ...isters the TLB is then searched for a translation entry with a matching VPN and RID value The VRN may optionally be used when searching for a matching translation on memory references references other than inserts and purges see Section 4 1 1 4 Purge Behavior of TLB Inserts and Purges If a matching translation entry is found the entry s physical page number PPN is concatenated with the page offset...

Page 295: ... further divided into two sub sections Translation Registers TR and Translation Cache TC In the remainder of this document the term TLB refers to the combined instruction data translation register and translation cache structures Figure 4 2 Conceptual Virtual Address Translation for References Figure 4 3 TLB Organization Virtual Region Number VRN Virtual Address rr0 rr1 rr2 rr7 Region Search Prote...

Page 296: ...her TR and or TC entries to be removed refer to Section 4 1 1 4 Purge Behavior of TLB Inserts and Purges for details Prior to inserting a TR entry software must ensure that no overlapping translation exists in any TR including the one being written otherwise a Machine Check abort may be raised or the processor may exhibit other undefined behavior Translation register entries may be removed by the ...

Page 297: ...ation registers Implementations are free to implement translation cache arrays of larger sizes Implementations may also choose to implement additional hierarchies for increased performance At least one translation cache level is required to support all implemented page sizes Additional hierarchy levels may or may not be performance optimized for the preferred page size specified by the virtual reg...

Page 298: ... by consistently displacing a required TC entry through a global or local translation cache purge IA 32 code has more stringent forward progress rules that must be observed by the processor and software IA 32 forward progress rules are defined in Section 10 6 3 IA 32 TLB Forward Progress Requirements on page 2 261 The translation cache can be used to cache TR entries if the TC maintains the instru...

Page 299: ...fier VPN pair in either the translation cache or translation registers may result in a TLB miss if a memory reference is made through a different VRN even if the region identifiers in the two region registers are identical Some processor models may also omit the VRN field of the TLB causing the TLB search on memory references to find an entry independent of VRN bits However all processor models ar...

Page 300: ... g May Insert indicates that the translation specified by the operation may be inserted into a TC However software must not rely on the insert May Musth h Must Machine Check indicates that a processor will cause a Machine Check abort if an attempt is made to insert or purge a partially or fully overlapped translation The Machine Check abort may not be delivered synchronously with the TLB insert or...

Page 301: ... which overlaps with a small page translation in the TR the VHPT insertion can be done but a machine check must be raised Software must not create overlapping translations in the VHPT that are larger than a currently existing TR translation The behavior of VHPT inserts is summarized in Table 4 2 4 1 1 5 Translation Insertion Format Figure 4 5 shows the register interface to insert entries into the...

Page 302: ...t is raised on the mov to IFA or TLB insert instruction Software must issue an instruction serialization operation to ensure installs into the ITLB are observed by dependent instruction fetches and a data serialization operation to ensure installs into the DTLB are observed by dependent memory data references Table 4 3 describes all the translation interface fields Figure 4 5 Translation Insertion...

Page 303: ...ss Rights on page 2 56 for details ppn GR r 49 12 Physical Page Number Most significant bits of the mapped physical address Depending on the page size used in the mapping some of the least significant PPN bits are ignored ig GR r 63 53 IFA 11 0 RR vrn 0 7 2 available Software can use these fields for operating system defined parameters These bits are ignored when inserted into the TLB by the proce...

Page 304: ...means no access R means read access W means write access X means execute access and Pn means promote PSR cpl to privilege level n when an Enter Privileged Code epc instruction is executed Figure 4 6 Translation Insertion Format Not Present 63 32 31 12 11 8 7 2 1 0 GR r ig 0 ITIR rv ci key ps rv ci IFA vpn ig RR vrn rv rid ig rv ig Table 4 4 Page Access Rights TLB ar TLB pl Privilege Levela Descrip...

Page 305: ...statically allocated For example large areas of the virtual address space may be reserved for operating system kernels frame buffers or memory mapped I O regions Software may also elect to pin these translations by placing them in the translation registers Table 4 5 lists insertable and purgeable page sizes that are supported by all processor models Insertable page sizes can be specified in the tr...

Page 306: ... RR Each register contains a Region Identifier RID along with several other region attributes see Figure 4 7 The values placed in the region register by the operating system can be viewed as a collection of process address space identifiers Regions support multiple address space operating systems by avoiding the need to flush the TLB on a context switch Sharing between processes is promoted by map...

Page 307: ...key register format and protection key register fields ps 7 2 Preferred page Size Selects the virtual address bits used in hash functions for set associative TLBs or the VHPT Encoded as 2ps bytes The processor may make significant performance optimizations for the specified preferred page size for the region a rid 31 8 Region Identifier During TLB inserts the region identifier from the select regi...

Page 308: ...ission faults are only raised when memory translations are enabled PSR dt is 1 for data references PSR it is 1 for instruction references PSR rt is 1 for register stack references and protection key checking is enabled PSR pk is one Data TLB protection keys can be acquired with the Translation Access Key tak instruction Instruction TLB key values are not directly readable To acquire instruction ke...

Page 309: ...r3 Insert instruction translation register ITR GR r2 GR r3 IFA ITIR M inst itr d dtr r2 r3 Insert data translation register DTR GR r2 GR r3 IFA ITIR M data probe r1 r3 r2 Probe data TLB for translation M none probe fault r3 imm2 Probe data TLB for translation M none ptc l r3 r2 Purge a translation from local processor instruction and data translation cache M data inst ptc g r3 r2 Globally purge a ...

Page 310: ... region linear page table structure an operating system would typically map the leaf page table nodes with small backing virtual translations The size of the table is expanded to include all possible virtual mappings effectively creating a large per region flat page table within the virtual address space To implement a single large hash page table the entire VHPT is typically mapped with a single ...

Page 311: ...latively executed instructions are not required to be done in program order Therefore if the walker is enabled and if the VHPT contains multiple entries that map the same virtual address range software must set up these entries such that any of them can be used in the translation of any part of this virtual address range Additionally if software inserts a translation into the TLB which is needed f...

Page 312: ... Figure 4 13 Also in some implementations 8 63 32 and 8 31 8 may be ignored as well Figure 4 11 VHPT Not present Short Format 63 1 0 ig 0 64 Figure 4 12 VHPT Long Format offset 63 52 51 50 49 32 31 12 11 9 8 7 6 5 4 2 1 0 0 ig ed rv ppn ar pl d a ma rv p 8 rv key ps rv 16 ti tag 24 ig 64 Table 4 9 VHPT Long format Fields Field Offset Description tag 16 Translation Tag The tag in conjunction with t...

Page 313: ...llision chains associativities defined by the operating system or to perform a search in software The thash instruction is used to generate a VHPT entry s address outside of interruption handlers and provides the same hash function that is used to calculate IHA thash produces a VHPT entry s address for a given virtual address and region identifier depending on the setting of the PTA vf bit When PT...

Page 314: ...hat resides in the region defined by PTA base As defined in Figure 4 15 the VHPT walker uses the virtual address the region identifier the region s preferred page size and the PTA size field to compute a hash index into the long format VHPT PTA 63 15 defines the base address and the region of the long format VHPT PTA size reflects the size of the hash table and is typically set to a number signifi...

Page 315: ...t be generated using the preferred page size assigned to that region 4 To reuse a region identifier with a different preferred page size software must first ensure that the VHPT contains no insertable translations for that rid purge all translations for that rid from all processors that may have used it and then update the region register with the new preferred page size 4 1 7 VHPT Environment The...

Page 316: ...and deliver an Instruction Data TLB Miss fault at any time for implementation specific reasons The processor s VHPT walker is required to read and insert VHPT entries from memory atomically an 8 byte atomic read and insert for short format and a 32 byte atomic read and insert for long format Some implementation strategies for achieving this atomicity are as follows If the walker performs its VHPT ...

Page 317: ...C or DTC Provided the above fault conditions are not detected the processor may load the entry into the ITC or DTC even if an in order execution of the program did not require the translation See Table 4 1 Purge Behavior of TLB Inserts and Purges on page 2 52 for the purge behavior of VHPT walker inserts After the translation entry is loaded additional TLB faults are checked these include in prior...

Page 318: ...Fault Checks Page Not Present NaT Page Consumption Key Miss Key Permission Access Rights Access Memory No Fault Failed Search Alternate Instruction Instruction TLB Miss fault Tag Mismatch or Walker Abort TLB Miss TLB Miss fault Faults TC Insert Instruction TLB VHPT Search Access Bit Debug Virtual Address Search TLB Not Found Data Yes No Search VHPT VHPT Walker VHPT Data fault Found Found Fault Che...

Page 319: ... or The hash address matches a data debug register Instruction Data TLB Miss handlers are essentially software walkers of the VHPT Data Nested TLB Raised when a Data TLB Miss Alternate Data TLB Miss or VHPT Data Translation fault occurs and PSR ic is 0 and not in flight e g fault within a TLB miss handler Data Nested TLB faults enable software to avoid overheads for potential data TLB Miss faults ...

Page 320: ...ive a converted 64 bit pointer can be used as a 32 bit pointer Flat 31 or 32 bit address spaces can be constructed by assigning the same region identifier to contiguous region registers Branches into another 230 byte region are performed by first calculating the target address in the 32 bit virtual space and then converting to a 64 bit pointer by addp4 Otherwise branch targets will extend above th...

Page 321: ...s to unimplemented physical addresses result either in an Unimplemented Instruction Address trap on the last valid instruction or in an Unimplemented Instruction Address fault on the instruction fetch of the unimplemented address Data references to unimplemented physical addresses result in an Unimplemented Data Address fault Memory references to unpopulated address ranges result in an asynchronou...

Page 322: ...ess consists of three fields virtual region number VRN unimplemented and implemented bits All processor models provide three VRN bits in VA 63 61 IMPL_VA_MSB is the implementation specific bit position of the most significant implemented virtual address bit In addition to the three VRN bits all processor models implement at least 54 virtual address bits i e the smallest IMPL_VA_MSB is 53 In a proc...

Page 323: ...d Eager RSE operations to unimplemented addresses do not fault Execution of a taken branch taken chk or an rfi to an unimplemented address or execution of a non branching slot 2 instruction in a bundle at the upper edge of the implemented address space where the next sequential bundle address would be an unimplemented address results either in an Unimplemented Instruction Address trap on the branc...

Page 324: ...f Memory Attributes on Memory Reference Instructions on page 2 86 4 4 2 Physical Addressing Memory Attributes The selection of memory attributes for physical addressing is selected by bit 63 of the address contained in the address base register as shown in Figure 4 20 and Table 4 12 Table 4 11 Virtual Addressing Memory Attribute Encodings Attribute Mnemonic ma Cacheability Write Policy Speculation...

Page 325: ...m ensure that there is a consistent view of memory from each processor Processors support multiprocessor cache coherence based on physical addresses between all processors in the coherence domain tightly coupled multiprocessors Coherency is supported in the presence of virtual aliases although software is recommended to use aliases which are an integer multiple of 1 MB apart to avoid any possible ...

Page 326: ...ache line write backs or platform visible replacements cause the corresponding ALAT entries to be invalidated 4 4 5 Coalescing Attribute For uncacheable pages the coalescing attribute informs the processor that multiple stores to this page may be collected in a coalescing buffer and issued later as a single larger merged transaction The processor may accumulate stores for an indefinite period of t...

Page 327: ...fer flushes have an unordered non sequential memory ordering attribute See Sequentiality Attribute and Ordering on page 2 82 Data that has been read or prefetched into a coalescing buffer prior to execution of an Itanium acquire or fence type instruction is invalidated by the acquire or fence instruction See Table 4 15 for a list of acquire and fence instructions 4 4 6 Speculation Attributes For p...

Page 328: ...eculative pages are performed in program order and only once per program order occurrence the rules in Table 4 13 and Table 4 14 are defined Software should also ensure that RSE spill fill transactions are not performed to non speculative memory that may contain I O devices otherwise system behavior is undefined Table 4 13 Permitted Speculation Memory Attribute Load ld a a Includes the faulting fo...

Page 329: ...ed correctly Data references by instructions when the processor has not yet determined whether prior instructions will raise faults or traps Hardware generated instruction prefetch references Hardware generated data prefetch references Eager RSE data references For an instruction fetch to constitute a verified reference it must only be determined that an in order execution of the program requires ...

Page 330: ...cacheable memory cache synchronization sync i and global TLB purge operations ptc g ptc ga As described in Section 4 4 7 Memory Access Ordering on page 1 73 read after write write after write and write after read dependencies to the same memory location memory dependency are performed in program order by the processor Otherwise all other memory references may be performed in any order unless the r...

Page 331: ...al of an address translation on the local processor Local TLB purge instructions ptc l ptc e ensure that all prior stores are made locally visible before the actual purge operation is performed Itanium memory accesses to sequential pages occur in program order with respect to all other sequential pages in the same peripheral domain but are not necessarily ordered with respect to non sequential pag...

Page 332: ...and a second processor executes this sequence ld acq b ld a if the second processor observes the store to b it will also observe the store to a Unless an ordering constraint from Table 4 16 prevents a memory read1 from becoming visible the read may be satisfied with values found in a store buffer or any logically equivalent structure These values need not be globally visible even when the operatio...

Page 333: ...cular order unless they are constrained w r t each other by the ordering rules defined in Table 4 16 Ordering of loads is further constrained by data dependency That is if one load reads a value written by an earlier load by the same processor either directly or transitively through either registers or memory then the two loads become visible in program order For example when this sequence is exec...

Page 334: ...tion indicator NaT or NaTVal to be written to the load target register and the memory reference is aborted However all other effects of the load instruction such as post increment are performed Instruction fetches loads stores and semaphores including IA 32 but except for Itanium speculative loads pages marked as NaTPage raise a NaT Page Consumption fault A speculative reference to a page marked a...

Page 335: ...tch instructions lfetch and any implicit prefetches to pages that are not cacheable are suppressed No transaction is initiated This allows programs to issue prefetch instructions even if the program is not sure the memory is cacheable 4 4 10 Effects of Memory Attributes on Advanced Check Loads The ALAT behavior of advanced and check loads is dependent on the memory attribute of the page referenced...

Page 336: ...e attribute to another software must perform the following sequence The address of the page whose attribute is being modified is referred to as X Note This sequence is ONLY required if the new mapping and the old mapping do not have the same memory attribute On the processor initiating the transition perform the following steps 1 3 1 PTE X p 0 Mark page as not present This prevents any processors ...

Page 337: ...irtual memory attributes The return argument from this procedure informs the caller if this procedure call is needed on remote processors or not If this procedure call is not needed on remote processors then software may skip the IPI in step 5 and go straight to step 6 below 5 Using the IPI mechanism defined in Inter processor Interrupt Messages on page 2 128 to reach all processors in the coheren...

Page 338: ...with the UC attribute software must first cause all such 4K pages to no longer be verified pages and flush any cached copies from the cache Otherwise an uncacheable reference may hit in cache causing a Machine Check abort On the processor initiating the transition perform the following steps 1 Call PAL_PREFETCH_VISIBILITY Call PAL_PREFETCH_VISIBILITY with the input argument trans_type equal to one...

Page 339: ...ack transactions have completed to memory before re configuring physical memory 4 4 11 3 Memory OLD Attribute Transition Sequence In order to safely delete a memory range online memory OLD all speculative reference and prefetches to that range must be halted and all cache lines returned to the memory being deleted If this is not done an MCA could occur if data were to be delivered back to the memo...

Page 340: ...ssor Interrupt Messages on page 2 128 to reach all processors in the coherency domain If step 2a was performed then steps 2 through 4 must be performed on all processors in the coherency domain Otherwise only step 4 must be performed Wait for all PAL_PREFETCH_VISIBILITY calls to complete on all processors in the coherency domain before continuing After step 5 no more new instruction or data prefet...

Page 341: ...ences ldfe stfe and IA 32 10 byte memory references to non write back cacheable memory In some processor models aligned 10 byte Itanium floating point extended memory references to non write back cacheable memory may raise an Unsupported Data Reference fault See Effects of Memory Attributes on Memory Reference Instructions on page 2 86 for details Loads are allowed to be satisfied with values obta...

Page 342: ...nce fault if the datum referenced by an Itanium instruction spans a 4K aligned boundary and many implementations will raise an Unaligned Data Reference fault if the datum spans a cache line Implementations may also raise an Unaligned Data Reference fault for any other unaligned Itanium memory reference Software is strongly encouraged to align data values to avoid possible performance degradation f...

Page 343: ...efinitions Depending on how an interruption is serviced interruptions are divided into IVA based interruptions and PAL based interruptions IVA based interruptions are serviced by the operating system IVA based interruptions are vectored to the Interruption Vector Table IVT pointed to by CR2 the IVA control register see IVA based Interruption Vectors on page 2 113 PAL based interruptions are servic...

Page 344: ...I on page 2 310 for details External Interrupts INT A processor has received a request to perform a service on behalf of the operating system Typically these requests come from I O devices although the requests could come from any processor in the system including itself The External Interrupt vector is entered to handle the request External Interrupts are distinguished by unique vector numbers in...

Page 345: ...s complete control over the structure of the information communicated and the conventions between the low level handlers and the high level code Such a scheme allows software rather than hardware to dictate how to best optimize performance for each of the interruptions in its environment The same basic mechanisms are used in all interruptions to support efficient low level fault handlers for event...

Page 346: ...rrent set of resources from being overwritten on a nested fault the PSR ic bit is cleared on any interruption This will suppress the writing of critical interruption resources if another interruption occurs while the PSR ic bit is cleared If a data TLB miss occurs while the PSR ic bit is zero then hardware will vector to the Data Nested TLB fault handler For a complete description of interruption ...

Page 347: ...ake the UIA fault if the instruction pointer IP falls outside of the implemented range 6 For IA 32 code IA 32 instruction addresses are checked for possible instruction Figure 5 2 Interruption Processing Note The solid line represents the normal execution path Dashed line boxes describe software activities Fetch current instruction execute current instruction Vector to highest priority trap Vector...

Page 348: ...0 after the execution of each mandatory RSE memory reference that does not raise a fault PSR da PSR ia PSR dd and PSR ed bits are cleared before the first IA 32 instruction starts execution after a br ia or rfi instruction EFLAG rf and PSR id bits are set to 0 after an IA 32 instruction is successfully executed If an rfi instruction is in the current bundle then on the execution of rfi the value f...

Page 349: ...ine check aborts are unmasked 5 5 IVA based Interruption Handling IVA based interruption handling is implemented as a fast context switch On IVA based interruptions instruction and data translation is left unchanged the endian mode is set to the system default and delivery of most PSR controlled interruptions is disabled including delivery of asynchronous events such as external interrupts The pro...

Page 350: ...t PSR rt PSR mc and PSR it are unchanged for all interruptions PSR be is set to the value of the default endian bit DCR be If DCR be is in flight at the time of interruption PSR be may receive either the old value of DCR be or the in flight value PSR pp is set to the value of the default privileged performance monitor bit DCR pp If DCR pp is in flight at the time of interruption PSR pp may receive...

Page 351: ...issue a cover instruction so that the interrupted frame can become part of backing store See Switch from Interrupted Context on page 2 148 It may be desirable to emulate a faulting instruction in the interruption handler and rfi back to the next sequential instruction rather than resuming at the faulting instruction Some Itanium instructions can be emulated without having to read the bundle from m...

Page 352: ...o suppress the instruction debug fault for one IA 32 or Itanium instruction This bit will be cleared in the PSR after the first successfully executed instruction The PSR ia bit is used to suppress the Instruction Access Bit fault for one Itanium instruction This bit will be cleared in the PSR after the first successfully executed instruction The PSR da and PSR dd bits are used to suppress Dirty Bi...

Page 353: ...covery model performance may be increased by deferring additional exceptional conditions The recovery model is used only if the program provides additional recovery code to re execute failed speculative computations When a speculative load is executed with PSR ic equal to 1 and ITLB ed equal to 0 the no recovery model is in effect When PSR ic is 1 and ITLB ed is 1 the recovery model is in effect T...

Page 354: ...loads are automatically deferred by hardware Table 5 3 Exception Qualification Exception Condition Precluded by Concurrent Exception Condition Register NaT Consumption NaT ed address none Unimplemented Data Address Register NaT Consumption Alternate Data TLB Register NaT Consumption Unimplemented Data Address VHPT data Register NaT Consumption Unimplemented Data Address Data TLB Register NaT Consu...

Page 355: ...eatures 17 procedure for details on enabling disabling spontaneous deferral After checking for deferral execution of a speculative load instruction proceeds as follows When PSR ed is 1 then a deferred exception indicator NaT bit or NaTVal is written to the load target register regardless of whether it has an exception or not and regardless of the state of DCR PSR it PSR ic and the ITLB ed bits If ...

Page 356: ...he deferral of non fatal or fatal exceptions the system architecture provides three additional resources ISR sp ISR ed and PSR ed ISR sp indicates whether the exception was the result of a speculative or speculative advanced load The ISR ed bit captures the code page ITLB ed bit and allows deferral of a non fatal exception due to a speculative load If both the ISR sp and ISR ed bit are 1 on an int...

Page 357: ...bug fault Debug vector 19 Unimplemented Instruction Address faultb Lower Privilege Transfer Trap vector Faults IA 32 20 IA 32 Instruction Breakpoint fault IA 32 Exception vector Debug A 21 IA 32 Code Fetch faultc IA 32 Exception vector GPFault IA 32 Intel Itanium 22 Alternate Instruction TLB fault Alternate Instruction TLB vector 23 VHPT Instruction fault VHPT Translation vector 24 Instruction TLB...

Page 358: ...ltf NaT Consumption vector 57 Data Key Miss faultf Data Key Miss vector 58 Data Key Permission faultf Key Permission vector 59 Data Access Rights faultf Data Access Rights vector 60 Data Dirty Bit fault Dirty Bit vector 61 Data Access Bit faultf Data Access Bit vector Intel Itanium 62 Data Debug faultf Debug vector 63 Unaligned Data Reference faultf Unaligned Reference vector IA 32 64 IA 32 Alignm...

Page 359: ...he code segment limit is IA 32 77 IA 32 System Flag Intercept trap IA 32 Intercept vector SystemFlag D 78 IA 32 Gate Intercept trap IA 32 Intercept vector Gate 79 IA 32 INTO trap IA 32 Exception vector Overflow 80 IA 32 Breakpoint INT 3 trap IA 32 Exception vector Break 81 IA 32 Software Interrupt INT trap IA 32 Interrupt vector Vector 82 IA 32 Data Breakpoint trap IA 32 Exception vector Debug 83 ...

Page 360: ...tiple data memory references or a single data memory reference crosses a virtual page If any given IA 32 instruction requires multiple data memory references all possible faults are raised on the first data memory reference before any faults are checked on subsequent data memory references This implies lower priority faults on an earlier memory reference will be raised before higher priority fault...

Page 361: ...ce the IVT Table 5 7 Interruption Vector Table IVT Offset Vector Name Interruption s Page 0x0000 VHPT Translation vector 10 23 53 2 173 0x0400 Instruction TLB vector 24 2 175 0x0800 Data TLB vector 11 54 2 176 0x0c00 Alternate Instruction TLB vector 22 2 177 0x1000 Alternate Data TLB vector 9 52 2 178 0x1400 Data Nested TLB vector 8 51 2 179 0x1800 Instruction Key Miss vector 27 2 180 0x1c00 Data ...

Page 362: ...igned Reference vector 63 2 201 0x5b00 Unsupported Data Reference vector 70 2 202 0x5c00 Floating point Fault vector 71 2 203 0x5d00 Floating point Trap vector 73 2 204 0x5e00 Lower Privilege Transfer Trap vector 72 74 2 205 0x5f00 Taken Branch Trap vector 75 2 207 0x6000 Single Step Trap vector 76 2 208 0x6100 Virtualization vector 48 2 209 0x6200 Reserved 0x6300 Reserved 0x6400 Reserved 0x6500 R...

Page 363: ...ly connected devices These interrupts originate on the processor s interrupt pins LINT INIT PMI 1 and are always directed to the local processor The LINT pins can be connected directly to an Intel 8259A compatible external interrupt controller The LINT pins are programmable to be either edge sensitive or level sensitive and for the kind of interrupt that gets generated If programmed to generate ex...

Page 364: ...ong as the pin is asserted Deassertion of a level sensitive interrupt removes the pending indication see Edge and Level sensitive Interrupts on page 2 131 The processor maintains an individual interrupt pending indication for INITs Since external interrupts and PMIs are also signified by a unique interrupt vector number the processor maintains individual pending indications per vector An occurrenc...

Page 365: ...pplies only to external interrupts Unmasked interrupts are those external interrupts of higher priority than the highest priority external interrupt vector currently in service if any and whose priority level is higher than the current priority masking level specified by the TPR register see Task Priority Register TPR CR66 on page 2 123 Masking conditions are defined in Table 5 8 PSR i does not af...

Page 366: ...y a unique vector number There are 256 distinct vector numbers in the range 0 255 Vector numbers 1 and 3 through 14 are reserved for future use Vector number 0 ExtINT is used to service Intel 8259A compatible external interrupt controllers Vector number 2 is used for the Non Maskable Interrupt NMI The remaining 240 external interrupt vector numbers 16 through 255 are available for general operatin...

Page 367: ...d unmasking conditions are satisfied see Table 5 8 the processor accepts the pending interrupt interrupts the control flow of the processor and transfers control to the External Interrupt handler for external interrupts or to PAL firmware for INITs and PMIs Note The TPR controls the masking of external interrupts TPR is described in Task Priority Register TPR CR66 on page 2 123 Table 5 8 Interrupt...

Page 368: ...of this interrupt external interrupts of equal or lower priority than vector 80 are masked When EOI is issued by software the processor will remove the in service indication for external interrupt vector 80 External interrupt masking will then revert back to the next highest priority in service external interrupt vector 45 External interrupt vectors of equal or lower priority than vector 45 would ...

Page 369: ...ualifying predicate of the rsm is true then external interrupt delivery is disabled immediately following the rsm instruction 3 If the qualifying predicate of the rsm is false then external interrupt delivery may be disabled until the next data serialization operation that follows the rsm instruction The delivery disable window is guaranteed to be no larger than defined by the above criteria but i...

Page 370: ...ecution software must perform a data serialization operation after a LID write and prior to that point The Local ID fields are defined in Figure 5 6 and Table 5 10 Table 5 9 External Interrupt Control Registers Register Name Description CR64 LID Local ID CR65 IVR External Interrupt Vector Register read only CR66 TPR Task Priority Register CR67 EOI End Of External Interrupt CR68 IRR0 External Inter...

Page 371: ...efore the next IVR read software must perform a data serialization operation after a TPR or EOI write and prior to that IVR read Software must be prepared to service any possible external interrupt if it reads IVR since IVR reads are destructive and removes the highest priority pending external interrupt if any IVR is a read only register writes to IVR result in a Illegal Operation fault IVR reads...

Page 372: ...s a read write register Reads return 0 Data associated with the EOI writes is ignored To ensure that the previous in service interrupt indication has been cleared by a given point in program execution software must perform a data serialization operation after an EOI write and prior to that point To ensure that the reported IVR vector is correctly masked before the next IVR read software must perfo...

Page 373: ...l Operation faults 5 8 3 6 Interval Timer Vector ITV CR72 ITV specifies the external interrupt vector number for Interval Timer Interrupts To ensure that subsequent interval timer interrupts reflect the new state of the ITV by a given point in program execution software must perform a data serialization operation after an ITV write and prior to that point See Figure 5 11 and Table 5 12 for the def...

Page 374: ...ferred to as Local Interrupt 0 LINT0 and Local Interrupt 1 LINT1 Software can query the presence of these pins via the PAL_PROC_GET_FEATURES procedure call To ensure that subsequent interrupts from LINT0 and LINT1 reflect the new state of LRR prior to a given point in program execution software must perform a data serialization operation after an LRR write and prior to that point In the case when ...

Page 375: ...r 16 255 All other vector numbers are ignored and reserved for future use 001 reserved 010 PMI pend a Platform Management Interrupt Vector number 0 for system firmware The vector field is ignored 011 reserved 100 NMI pend a Non Maskable Interrupt This interrupt is pended at external interrupt vector number 2 The vector field is ignored 101 INIT pend an Initialization Interrupt for system firmware ...

Page 376: ...ccesses to the XTP byte and 1 byte read access to the INTA byte and references through any memory attribute other than UC Any memory operation targeted at the lower half of the Processor Interrupt Block which does not correspond to any actual processor is undefined 5 8 4 1 Inter processor Interrupt Messages A processor can interrupt any individual processor including itself by issuing an inter pro...

Page 377: ...on vector 7 0 Vector number for the interrupt For INT delivery allowed vector values are 0 2 or 16 255 All other vectors are ignored and reserved for future use For PMI delivery allowed PMI vector values are 0 3 All other PMI vector values are reserved for use by processor firmware dm 10 8 000 INT pend an external interrupt for the specified vector to the processor listed in the destination Allowe...

Page 378: ...nterrupt messages and therefore do not specify an external interrupt vector number when the interrupt request is generated When accepting an external interrupt software must inspect the vector number supplied by the IVR register If the vector matches the vector number assigned to the external controller can be ExtINT or any other vector number based on software convention software must acquire the...

Page 379: ... by external interrupt controllers or devices deassertion of an external interrupt sends no interrupt message to the processor Since the processor removes the pending interrupt when the interrupt is serviced the processor guarantees exactly one interrupt acceptance for each external interrupt message By definition external interrupt messages are edge sensitive Level sensitive external interrupts c...

Page 380: ...2 132 Volume 2 Part 1 Interruptions ...

Page 381: ... fill activity occurs at addresses below what is contained in the BSP since the RSE spills fills the frames of the current procedure s parents The BSPSTORE application register contains the address at which the next RSE spill will occur The address register which corresponds to the next RSE fill operation the BSP load pointer is not architecturally visible The addresses contained in BSP and BSPSTO...

Page 382: ...ween Physical Registers and Backing Store Figure 6 2 Backing Store Memory Format procA procB procC sola sofc solb Unallocated Unallocated Higher Memory Addresses Higher Register Addresses call return AR BSP RSE Loads Stores procA procB Currently Active Frame procA s Ancestors Backing Store Physical Stacked Registers procA calls procB calls procC AR BSPSTORE 00 111111 01 000000 01 111110 01 111111 ...

Page 383: ... directly exposed to the programmer as architecturally visible registers As a consequence RSE internal state does not need to be preserved across context switches or interruptions Instead it is modified as the side effect of register stack related instructions To describe the effects of these instructions a complete definition of the RSE internal state is essential To distinguish them from archite...

Page 384: ...cedure frames The registers in this partition have not yet been spilled to the backing store by the RSE The number of registers contained in the dirty partition distance between RSE StoreReg and RSE BOF is referred to as RSE ndirty Current frame shaded dark stacked registers allocated for computation The position of the current frame in the physical stacked register file is defined by the Bottom o...

Page 385: ... bottom of frame pointer RSE BOF which moves registers from the current frame to the dirty partition An alloc may shrink or grow the current frame by updating CFM sof A br ret or rfi instruction may shrink or grow the current frame by updating both the bottom of frame pointer RSE BOF and CFM sof 6 4 RSE Operation The register stack backing store is organized as a stack in memory that grows from lo...

Page 386: ...ntil the required number of registers have been spilled to the backing store Similarly a br ret or rfi back to a sufficiently large frame or execution of a loadrs instruction may cause the RSE to suspend program execution and issue mandatory RSE loads until the required number of registers have been restored from the backing store The RSE only operates in the foreground and suspends program execut...

Page 387: ...m that decides whether and when to speculatively perform eager register spill or fill operations is implementation dependent Software may not make any assumptions about the RSE load store behavior when the RSC mode is non zero Furthermore access to the BSPSTORE and RNAT application registers and the execution of the loadrs instructions require RSC mode to be zero enforced lazy mode If loadrs move ...

Page 388: ...be made by software when RSC mode is zero Failure to do so results in undefined backing store contents 6 5 2 Register Stack NaT Collection Register As described in Section 6 1 RSE and Backing Store Overview on page 2 133 the RSE is responsible for saving and restoring NaT bits associated with the stacked registers to and from the backing store The RSE writes its NaT collection register the RNAT ap...

Page 389: ... dirty partition is preserved to allow software to change the backing store pointer without having to flush the register stack Writing BSPSTORE causes the contents of the RNAT register to become undefined Therefore software must preserve the contents of RNAT prior to writing BSPSTORE After writing to BSPSTORE the NaT collection bit index RSE RNATBitIndex is set to bits 8 3 of the presented address...

Page 390: ...ions between the current BSP and the tear point BSP RSC loadrs 13 3 3 and no more than that are guaranteed to be present and marked as dirty in the stacked physical registers When loadrs completes BSPSTORE and RSE BspLoad are defined to be equal to the backing store tear point address All other physical stacked registers are marked invalid If the tear point specifies an address below RSE BspLoad t...

Page 391: ...the procedure being returned to to zero Typical procedure call and return sequences that preserve PFS values and that do not use cover or loadrs instructions will not encounter this situation The RSE will detect the above condition on a br ret and update its state as follows The register rename base RSE BOF AR BSP and AR BSPSTORE are updated as required by the return Table 6 5 RSE Control Instruct...

Page 392: ...nted stacked physical registers Note that loadrs may cause registers in the dirty partition to be lost 6 6 RSE Interruptions Although the RSE runs asynchronously to processor execution RSE related interruptions are delivered synchronously with the instruction stream These RSE interruptions are a direct consequence of register stack related instructions such as alloc br ret rfi flushrs loadrs or mo...

Page 393: ...adrs Unimplemented Data Address fault AR BSPSTORE contains an unimplemented address Data Nested TLB fault Alternate Data TLB fault VHPT Data fault Data TLB fault Data Page Not Present fault Data NaT Page Consumption fault AR BSPSTORE pointed to a NaTVal data page Data Key Miss fault Data Key Permission fault Data Access Rights fault Data Dirty Bit fault Data Access Bit fault Data Debug fault br ca...

Page 394: ...ng the original sequence of mandatory RSE loads When the current frame is incomplete the following instructions have undefined behavior alloc br call brl call br ret flushrs loadrs and move to BSPSTORE Software must guarantee that the current frame is complete before executing these instructions 6 9 RSE and ALAT Interaction The ALAT see Data Speculation on page 1 63 uses physical register addresse...

Page 395: ...the location to be modified lies between BSPSTORE and BSP software must issue a flushrs update the backing store location in memory and issue a loadrs instruction with the RSC loadrs set to zero this invalidates the current contents of the physical stacked registers except the current frame which forces the RSE to reload registers from the backing store If the location to be modified lies below BS...

Page 396: ...alue written in step 6 of Section 6 11 1 Switch from Interrupted Context from the BSP value read in step 7 of Section 6 11 1 Switch from Interrupted Context on page 2 148 and deposit the difference into RSC loadrs along with a zero into RSC mode to place the RSE into enforced lazy mode 3 Issue a loadrs instruction to insure that any registers from the interrupted context which were saved on the ne...

Page 397: ... 8 Write the new context s RSC which will set the RSE mode privilege level and byte order 6 12 RSE Initialization At processor reset the RSE is defined to be in enforced lazy mode i e the RSC mode bits are both zero The RSE privilege level RSC pl is defined to be zero RSE BOF points to physical register 32 The values of AR PFS pfm and CR IFS ifm are undefined The current frame marker CFM is set as...

Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...

Page 399: ...with the immediate operand from the instruction IIM values are defined by software convention break can be used for profiling debugging and entry into the operating system although Enter Privileged Code epc is recommended since it has lower overhead Execution of the IA 32 INT 3 break instruction results in a IA_32_Exception Break trap Taken Branch trap When PSR tb is 1 a Taken Branch trap occurs o...

Page 400: ...ode details Zero one or more DBR registers may be reported as matching 7 1 1 Data and Instruction Breakpoint Registers Instruction or data memory addresses that match the Instruction or Data Breakpoint Registers IBR DBR shown in Figure 7 1 and Figure 7 2 and Table 7 1 result in an Instruction or Data Debug fault IA 32 Instruction or data memory addresses that match the Instruction or Data Breakpoi...

Page 401: ...ing at the specified privilege level Each bit corresponds to one of the four privilege levels with bit 56 corresponding to privilege level 0 bit 57 with privilege level 1 etc A value of 1 indicates that the debug match is enabled at that privilege level w 62 Write match enable When DBR w is 1 any non nullified mandatory RSE store IA 32 or Intel Itanium store semaphore probe w fault or probe rw fau...

Page 402: ...ndefined or the processor and or hardware debugger may crash 7 1 2 Debug Address Breakpoint Match Conditions For virtual memory accesses breakpoint address registers contain the virtual addresses of the debug breakpoint For physical accesses the addresses in these registers are treated as a physical address Software should be aware that debug registers configured to fault on virtual references may...

Page 403: ...ask fields No spurious data breakpoint events are generated for IA 32 data memory operands that are unaligned nor are breakpoints reported if no bytes of the operand lie within the address range specified by the DBR address and mask fields 7 2 Performance Monitoring Performance monitors allow processor events to be monitored by programmable counters or give an external notification such as a pin o...

Page 404: ...dex 4 At least 4 performance counter register pairs PMC PMD 4 PMC PMD 7 are implemented in all processor models Each counter can be configured to monitor events for any combination of privilege levels and one of several event metrics The number of performance counters is implementation specific The figures and tables use the symbol p to represent the index of the last implemented generic PMC PMD p...

Page 405: ...ount W 1 0 Event Count The counter is defined to overflow when the count field wraps carry out from bit W 1 Figure 7 5 Generic Performance Counter Configuration Register PMC 4 PMC p 63 16 15 8 7 6 5 4 3 0 PMC 4 PMC p implementation specific es ig pm oi ev plm 48 8 1 1 1 1 4 Table 7 4 Generic Performance Counter Configuration Register Fields PMC 4 PMC p Field Bits Description plm 3 0 Privilege Leve...

Page 406: ...topped synchronously by the system mask instructions rsm and ssm by altering PSR pp Table 7 5 summarizes the effects of PSR sp PMC i pm and PSR cpl on reading PMD registers Updates to generic PMC registers and PSR bits up pp is sp cpl require implicit or explicit data serialization prior to accessing an affected PMD register The data serialization ensures that all prior PMD reads and writes as wel...

Page 407: ...plementation dependent PMD registers can only be read reliably when event monitoring is frozen PMC 0 fr is one For accurate PMD reads of disabled counters data serialization implicit or explicit is required between any PMD read and a subsequent ssm or sum that could toggle PSR up or PSR pp from 0 to 1 or a subsequent epc demoting br ret or branch to IA 32 br ia that could affect PSR cpl or PSR is ...

Page 408: ...ent monitoring PMU interrupts are generated by events such as the overflowing of a generic counter pair which is configured to interrupt on overflow Each such event generates one interrupt Provided that software does not clear the freeze bit while either or both of PSR up and pp are 1 before clearing the overflow bits writes to PMCs and PMDs by software do not generate interrupts nor cause a monit...

Page 409: ...nstruction serialization is required to ensure that the behavior specified by PMC 0 fr is observed Figure 7 6 Performance Monitor Overflow Status Registers PMC 0 PMC 3 63 4 3 2 1 0 overflow ig fr 60 3 1 overflow overflow overflow Table 7 7 Performance Monitor Overflow Register Fields PMC 0 PMC 3 Register Field Bits Description PMC 0 fr 0 Performance Monitor freeze bit This bit is volatile state i ...

Page 410: ...er monitors etc 7 2 4 Implementation independent Performance Monitor Code Sequences This section describes implementation independent code sequences for servicing overflow interrupts and context switches of the performance monitors For forward compatibility the code sequences outlined in Section 7 2 4 1 and Section 7 2 4 2 use PAL provided implementation specific information to collect preserve da...

Page 411: ... outgoing context has a pending performance monitor interrupt by reading the freeze bit with the knowledge that it was not generated by software then software also preserves the outgoing context s overflow status registers PMC 0 PMC 3 before all PMC and PMD registers whose mask bit is set Here it is explicitly assumed that software tracks monitored processes and can determine whether a process is ...

Page 412: ...thread switch if outgoing process is monitored 1 Turn off counting and ignore interrupts for context switch of counters 1a if not already done raise interrupt priority above perf mon overflow vector 1b read and preserve PSR up PSR pp PSR sp 1c clear PSR up clear PSR pp 1d srlz d 2 Preserve PMC PMD contents 2a For each PMC whose PALPMCmask bit is set preserve PMC 2b For each PMD whose PALPMDmask bi...

Page 413: ...cept Table 8 1 defines which interruption resources are written are left unmodified or are undefined for each interruption vector The individual vector descriptions below list interruption specific resources for each vector See IVA based Interruption Handling on page 2 101 for details on how the processor handles an interruption See Interruption Control Registers on page 2 36 for the definition of...

Page 414: ...W W x Data Nested TLB vector Data Nested TLB fault N A N A N A N A x N A N A N A IR Data Nested TLB fault N A N A N A N A x N A N A N A Data TLB vector Data TLB fault N A W N A W N A W N A W N A x N A W N A W IR Data TLB fault N A W N A W N A W N A W N A x N A W N A x Debug vector Data Debug fault W W x x x x x x W W W Instruction Debug fault W W x x x x x x W W x IR Data Debug fault W W x x x x x...

Page 415: ...Transfer Trap vector Unimplemented Instruction Address fault W x W x x x x x x W W x Lower Privilege Transfer trap W x x x x x x x x W W W Unimplemented Instruction Address trap W x x x x x x x x W W W NaT Consumption vector Data NaT Page Consumption fault W W x x x x x x W W W Instruction NaT Page Consumption fault W W x x x x x x W W x IR Data NaT Page Consumption fault W W x x x x x x W W x Reg...

Page 416: ... naf r w x Alternate Data TLB vector Alternate Data TLB fault edk ri so nil 0 rs sp na r w 0 IR Alternate Data TLB fault 0 ri 0 nil 1 1 0 0 1 0 0 Alternate Instruction TLB vector Alternate Instruction TLB fault 0 ri 0 ni 0 0 0 0 0 0 1 Break Instruction vector Break Instruction fault 0 ri 0 ni 0 0 0 0 0 0 0 Data Access Rights vector Data Access Rights fault edk ri so ni 0 rs sp na r w 0 IR Data Acc...

Page 417: ... vector 0 0 0 0 0 0 0 0 0 0 x IA 32 Intercept vector 0 0 0 0 0 0 0 0 r w 0 IA 32 Interrupt vector 0 0 0 0 0 0 0 0 0 0 0 Instruction Access Rights vector Instruction Access Rights fault 0 ri 0 ni 0 0 0 0 0 0 1 Instruction Access Bit vector Instruction Access Bit fault 0 ri 0 ni 0 0 0 0 0 0 1 Instruction Key Miss vector Instruction Key Miss fault 0 ri 0 ni 0 0 0 0 0 0 1 Instruction TLB vector Instru...

Page 418: ... and interrupts on the L X instruction of an MLX For traps ISR ei points at the excepting instruction 2 for traps on the L X instruction of an MLX b If ISR ni is 1 the interruption occurred either when PSR ic was 0 or was in flight c ISR ir captures the value of RSE CFLE at the time of an interruption d ISR rs is 1 for interruptions caused by mandatory RSE fills spills and 0 for all others e ISR s...

Page 419: ...ess Rights vector 0x5300 2 191 Data Access Bit vector 0x2800 2 184 Data Key Miss vector 0x1c00 2 181 Data Nested TLB vector 0x1400 2 179 Data TLB vector 0x0800 2 176 Debug vector 0x5900 2 200 Dirty Bit vector 0x2000 2 182 Disabled FP Register vector 0x5500 2 195 External Interrupt vector 0x3000 2 186 Floating Point Fault vector 0x5c00 2 203 Floating Point Trap vector 0x5d00 2 204 General Exception...

Page 420: ...s Unsupported Data Reference vector 0x5b00 2 202 VHPT Translation vector 0x0000 2 173 Virtual External Interrupt vector 0x3400 2 187 Virtualization vector 0x6100 2 209 Table 8 4 Interruption Vectors Sorted Alphabetically Continued Vector Name Offset Page ...

Page 421: ...ruction and data references IFA The faulting address that the hardware VHPT walker was attempting to resolve ISR The ISR bits are set to reflect the original access on whose behalf the VHPT walker was operating If the original operation was a non access instruction then the ISR code bits 3 0 are set to indicate the type of the non access instruction otherwise they are set to 0 For mandatory RSE fi...

Page 422: ...d by the operating system page fault handler in the case where the page containing the VHPT entry has not yet been allocated When the translation for the VHPT is available the handler must first move the address contained in the IHA to the IFA prior to the TLB insert 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 63 62 61 60 59 58 57 56 55 54 53 52 51 5...

Page 423: ...d to 64 bits IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception The defined ISR bits are specified below The ISR ni bit is 0 if PSR ic was 1 when the interruption was taken and is...

Page 424: ...he IIB registers ISR If the interruption was due to a non access operation then the ISR code bits 3 0 are set to indicate the type of the non access instruction otherwise they are set to 0 For mandatory RSE fill or spill references ISR ed is always 0 The ISR ni bit is 0 if PSR ic was 1 when the interruption was taken and is 1 if PSR ic was in flight The ISR code ed ei ir rs sp and na bits are alwa...

Page 425: ...plemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR For Itanium memory references the ISR ei bits are set to indicate which instruction caused the exception and ISR ni is set to 0 if PSR ic was 1 when the interruption was taken and set to 1 if PSR ic was 0 or in flight ...

Page 426: ...CR26 27 on page 2 42 for details on the IIB registers ISR If the interruption was due to a non access operation then the ISR code bits 3 0 are set to indicate the type of the non access instruction otherwise they are set to 0 For mandatory RSE fill or spill references ISR ed is always 0 The ISR ni bit is 0 if PSR ic was 1 when the interruption was taken and is 1 if PSR ic was in flight For IA 32 m...

Page 427: ...This fault occurs when PSR dt 1 and PSR ic is 0 on a load store semaphore and faulting non access instructions It also occurs when PSR dt is 0 and PSR ic is 0 for a regular_form probe instruction Finally it can occur when PSR rt is 1 and PSR ic is 0 on a RSE mandatory load store operation Since the operating system is in control of the code executing at the time of the nested fault it can by conve...

Page 428: ...he referenced region register The ITIR ps field is set to the RR ps field from the referenced region register All other fields are set to 0 IFA The virtual address of the bundle or the 16 byte aligned IA 32 instruction address zero extended to 64 bits IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 o...

Page 429: ...from the referenced region register All other fields are set to 0 IFA Faulting data address IIB0 IIB1 If implemented for Data Key Miss faults the IIB registers contain the instruction bundle pointed to by IIP The IIB registers are undefined for IR Data Key Miss faults Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB regis...

Page 430: ... and are specified below For mandatory RSE spill references ISR ed is always 0 For IA 32 memory references ISR ed ei ni and rs are 0 If the interruption was due to a non access operation then the ISR code bits 3 0 are set to indicate the type of the non access instruction otherwise they are set to 0 Notes Dirty Bit fault can only occur in these situations When PSR dt is 1 on an IA 32 or Itanium st...

Page 431: ...ress of the bundle or the 16 byte aligned IA 32 instruction address zero extended to 64 bits IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception For IA 32 memory references the ISR...

Page 432: ... faults Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The value for the ISR bits depend on the type of access performed and are specified below For mandatory RSE fill or spill references ISR ed is always 0 For IA 32 memory references ISR code ed ei ni ir rs na and sp are 0 Notes These faults can only occu...

Page 433: ...IB registers contain the instruction bundle pointed to by IIP Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception The defined ISR bits are specified below Notes This fault cannot be raised by IA 32 instructions 31 30 29 28 27 26 25 24 23...

Page 434: ... number If there are no unmasked pending interrupts the spurious interrupt vector 15 is reported IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction was to be executed when the external interrupt event...

Page 435: ...defined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction was to be executed when the external interrupt event was taken The defined ISR bits are specified below For external interrupts taken in the IA 32 instruction set ISR ei ni and ir bits are 0 Notes S...

Page 436: ...and data original references IFA The virtual address of the data being referenced ISR If the interruption was due to a non access operation then the ISR code bits 3 0 are set to indicate the type of the non access instruction otherwise they are set to 0 The value for the ISR bits depend on the type of access performed and are specified below For mandatory RSE fill or spill references ISR ed is alw...

Page 437: ...ction bundle pointed to by IIP The IIB registers are undefined for IR Data Key Permission and Instruction Key Permission faults Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers If the fault is due to a data key permission fault IFA Faulting data address ISR The value for the ISR bits depend on the type of access...

Page 438: ... ps field is set to the RR ps field from the referenced region register All other fields are set to 0 IFA The virtual address of the bundle or the 16 byte aligned IA 32 instruction address zero extended to 64 bits IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB reg...

Page 439: ... detailed description ITIR The ITIR contains default translation information for the address contained in the IFA The access key field within this register is set to the region id value from the referenced region register The ITIR ps field is set to the RR ps field from the referenced region register All other fields are set to 0 IFA Faulting data address IIB0 IIB1 If implemented for Data Access R...

Page 440: ...se refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception For IA 32 instruction set faults ISR ei ni na sp rs ir ed bits are always 0 If the fault was caused by a non access instruction ISR code 3 0 specifies which non access instruction See Non a...

Page 441: ...l FR targets or two even numbered physical FR targets Attempts to access an application register from the wrong unit type Attempts to execute a br cloop br ctop br cexit br wtop or br wexit other than in slot 2 of a bundle Attempts to execute an alloc flushrs or loadrs as other than the first instruction in an instruction group The result of such an attempt is undefined and could result in an Ille...

Page 442: ...y violation If the fault is due to a Disabled ISA Transition fault Illegal Dependency fault Illegal Operation fault Privileged Register fault or Reserved Register Field fault Otherwise 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 code 7 4 0 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 0 0 ei 0 ni 0 0 ...

Page 443: ... Low fault is discarded and not reported in the ISR code Interruptions on this vector Disabled Floating Point Register fault Parameters IIP IPSR IIPA IFS are defined refer to page 2 165 for a detailed description IIB0 IIB1 If implemented the IIB registers contain the instruction bundle pointed to by IIP Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on pa...

Page 444: ...g data address ISR The value for the ISR bits depend on the type of access performed and are specified below For mandatory RSE fill or spill references ISR ed is always 0 For the IA 32 instruction set ISR ed ei ni ir rs and na bits are 0 For probe fault or lfetch fault the ISR na bit is set If the fault is due to an Instruction NaT Page Consumption fault A non speculative Itanium integer FP instru...

Page 445: ...ils ISR The value for the ISR bits depend on the type of access performed and are specified below For the IA 32 instruction set ISR ed ei ni ir rs r w and na bits are 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 1 code 3 0 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 0 0 ei 0 ni 0 0 0 na r w 0 ...

Page 446: ... Notes The Speculative Operation fault handler is required to perform the following steps 1 Read the predicates and the IIM IIP IPSR and ISR control registers into scratch bank 0 general registers 2 Copy the IIP value to IIPA 3 Sign extend the IIM value from 21 bits to 64 shift it left by 4 bits add it to the IIP value and write this value back into IIP 4 Set the IPSR ri field to 0 5 Check whether...

Page 447: ...ds to take a Single Step trap or Taken Branch trap or both the UIA trap will not be raised until after the Single Step and or Taken Branch trap has been handled making it appear that the Unimplemented Instruction Address trap has the wrong priority A Speculative Operation fault handler with this behavior is architecturally compliant On processors which report unimplemented instruction addresses wi...

Page 448: ...data being referenced ISR The value for the ISR bits depend on the type of access performed and are specified below For mandatory RSE fill or spill references ISR ed is always 0 If the fault is due to an instruction debug fault IFA Faulting instruction fetch address ISR The ISR ei bits are set to indicate which instruction caused the exception The defined ISR bits are specified below Notes On an i...

Page 449: ...a reference specified is both unaligned to the natural datum size and unsupported then an Unaligned Data Reference fault is taken Interruptions on this vector Unaligned Data Reference fault Parameters IIP IPSR IIPA IFS are defined refer to page 2 165 for a detailed description IFA The address of the data being referenced IIB0 IIB1 If implemented the IIB registers contain the instruction bundle poi...

Page 450: ... 32 data memory references that require an external atomic lock when DCR lc is 1 raise an IA_32_Intercept Lock fault see Chapter 9 IA 32 Interruption Vector Descriptions Interruptions on this vector Unsupported Data Reference fault Parameters IIP IPSR IIPA IFS are defined refer to page 2 165 for a detailed description IFA The address of the data being referenced IIB0 IIB1 If implemented the IIB re...

Page 451: ...h instruction caused the exception ISR code contains information about the FP exception fault The ISR code field has eight bits defined See Chapter 5 for details ISR code 0 1 IEEE V invalid exception Normal or Parallel FP HI ISR code 1 1 Denormal Unnormal operand exception Normal or Parallel FP HI ISR code 2 1 IEEE Z divide by zero exception Normal or Parallel FP HI ISR code 3 1 Software assist No...

Page 452: ...erruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception ISR code contains information about the type of FP exception and IEEE information The ISR code field contains a bit vector see Table 8 3 on page 2 170 for all traps which occurred in the just executed instruction The defi...

Page 453: ...aults and traps Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR For Unimplemented Instruction Address trap and Lower Privilege Transfer trap the ISR ei bits are set to indicate which instruction caused the exception and the ISR code contains a bit vector see Table 8 3 on page 2 170 for all traps which occur...

Page 454: ...p Taken Branch Lower Privilege Transfer FP will be taken first asynchronous interrupts such as External interrupt may be taken with IIP pointing to the unimplemented address before the Unimplemented Instruction Address fault is taken incomplete register stack frame interrupts may be taken with IIP pointing to the unimplemented address before the Unimplemented Instruction Address fault is taken ISR...

Page 455: ...he IIP value for an unimplemented instruction address trap or fault IIB0 IIB1 If implemented the IIB registers contain the instruction bundle pointed to by IIPA Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception The ISR code contains a ...

Page 456: ... defined refer to page 2 165 for a detailed description IIB0 IIB1 If implemented the IIB registers contain the instruction bundle pointed to by IIPA Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception The ISR code contains a bit vector s...

Page 457: ...B0 IIB1 If implemented the IIB registers contain the instruction bundle pointed to by IIP Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR The ISR ei bits are set to indicate which instruction caused the exception The defined ISR bits are specified below 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 ...

Page 458: ... defined refer to page 2 165 for a detailed description IFA is undefined The faulting IA 32 address is contained in IIPA IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR ISR vector contains the IA 32 exception vector number ISR code contains the IA 32 ...

Page 459: ...use of the intercept IIB0 IIB1 If implemented the IIB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR ISR vector contains a number specifying the type of intercept ISR code contains the IA 32 specific intercept information or a trap code listing concurrent trap events for traps Notes...

Page 460: ...IB registers are undefined Please refer to Section 3 3 5 10 Interruption Instruction Bundle Registers IIB0 1 CR26 27 on page 2 42 for details on the IIB registers ISR ISR vector contains the IA 32 defined interruption vector number ISR code contains a trap code listing concurrent trap events Notes See Chapter 9 IA 32 Interruption Vector Descriptions for complete details on this vector and the trap...

Page 461: ...xecution in the Itanium system environment Figure 9 1 IA 32 Trap Code 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 b3 b2 b1 b0 ss tb 0 Figure 9 2 IA 32 Trap Code Bit Name Description 2 tb taken branch trap set if an IA 32 branch is taken and branch traps are enabled PSR tb is 1 3 ss single step trap set after the successful execution of every IA 32 instruction if PSR ss or EFLAG tf is 1 4 7 b0 to b3 Da...

Page 462: ...loper s Manual for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv 0 0 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 3...

Page 463: ... and IA 32 Architectures Software Developer s Manual for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 1 ISR x 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv 1 0 63 62 61 60 59 58 57 56 55 54 53 52 51 ...

Page 464: ...this trap Parameters IIPA virtual address of the trapping IA 32 instruction zero extended to 64 bits if there was a taken branch trap On jmpe taken branch traps IIPA contains the address of the jmpe instruction For all other trap events IIPA is undefined IIP next Itanium instruction address or the virtual IA 32 instruction address zero extended to 64 bits ISR vector 1 ISR code Trap Code indicates ...

Page 465: ...f this trap Parameters IIPA trapping virtual IA 32 instruction address zero extended to 64 bits IIP next virtual IA 32 instruction address zero extended to 64 bits ISR vector 3 ISR code Trap Code indicates Concurrent Single Step condition 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv 3 trap_code 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 ...

Page 466: ...definition of this trap Parameters IIPA trapping virtual IA 32 instruction address zero extended to 64 bits IIP next virtual IA 32 instruction address zero extended to 64 bits ISR vector 4 ISR code Trap Code indicates Concurrent Single Step 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv 4 trap_code 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 4...

Page 467: ...for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 5 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv 5 0 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 rv 0...

Page 468: ...32 invalid opcode faults are delivered to the IA_32_Intercept Instruction handler including IA 32 illegal unimplemented opcodes MMX technology and SSE instructions if CR0 EM is 1 and SSE instructions if CR4 fxsr is 0 All illegal IA 32 floating point opcodes result in an IA_32_Intercept Instruction regardless of the state of CR0 em ...

Page 469: ...ts bit is 1 Refer to the Intel 64 and IA 32 Architectures Software Developer s Manual for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 7 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv 7 0 63 62 61 60 59...

Page 470: ...2 222 Volume 2 Part 1 IA 32 Interruption Vector Descriptions Name Double Fault Cause IA 32 Double Faults IA 32 vector 8 are not generated by the processor in the Itanium System Environment ...

Page 471: ...Volume 2 Part 1 IA 32 Interruption Vector Descriptions 2 223 Name Invalid TSS Fault Cause IA 32 Invalid TSS Faults IA 32 vector 10 are not generated in the Itanium System Environment ...

Page 472: ...Manual for a complete definition of this fault and error codes Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 11 ISR code IA 32 defined error code See Intel 64 and IA 32 Architectures Software Developer s Manual 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 ...

Page 473: ...2 stack reference instruction e g PUSH POP CALL RET See section Segment Descriptor and Environment Integrity for a list of possible inconsistent register descriptor conditions Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 12 ISR code IA 32 defined ErrorCode Zero if an inconsistent...

Page 474: ...ister descriptor value during an IA 32 code fetch or data memory reference See section Segment Descriptor and Environment Integrity for a list of possible inconsistent register descriptor conditions Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 13 ISR code IA 32 defined ErrorCode ...

Page 475: ...Volume 2 Part 1 IA 32 Interruption Vector Descriptions 2 227 Name Page Fault Cause IA 32 defined page faults IA 32 vector 14 can not be generated in the Itanium System Environment ...

Page 476: ...anium numeric exceptions or the execution of Itanium numeric instructions Refer to the Intel 64 and IA 32 Architectures Software Developer s Manual for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits FSR FIR FDR and FCR contain the IA 32 floating point envir...

Page 477: ...s Software Developer s Manual for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits IFA referenced virtual data address byte granular zero extended to 64 bits ISR vector 17 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 r...

Page 478: ...2 230 Volume 2 Part 1 IA 32 Interruption Vector Descriptions Name Machine Check Cause IA 32 Machine Check IA 32 vector 18 is not generated in the Itanium System Environment ...

Page 479: ...ions SSE instructions always ignore CR0 ne and the IGNNE pin Refer to the Intel 64 and IA 32 Architectures Software Developer s Manual for a complete definition of this fault Parameters IIP virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits ISR vector 19 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 ...

Page 480: ...virtual IA 32 instruction address points to the INT instruction zero extended to 64 bits IIP next virtual IA 32 instruction address zero extended to 64 bits ISR vector vector number ISR code TrapCode Indicates Concurrent Single Step Trap condition 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv vector trap_code 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 4...

Page 481: ...Software Developer s Manual The lowest memory address byte is placed in byte 0 of IIM higher memory address bytes are placed in increasingly higher numbered bytes within IIM The 8 byte opcode loaded into IIM is stripped of the following prefixes lock repeat address size operand size and segment override prefixes opcode bytes 0xF3 0xF2 0xF0 0x2E 0x36 0x3E 0x26 0x64 0x65 0x66 and 0x67 The 0x0F opcod...

Page 482: ...t rp 4 REP or REPE REPZ Prefix If 1 indicates a REP REPE REPZ prefix is in effect np 5 REPNE REPNZ Prefix If 1 indicates a REPNE REPNZ prefix is in effect sp 6 Segment Prefix If 1 indicates a Segment Override prefix is present seg 7 9 Segment Value Segment Prefix Override value see Figure 9 2 for encodings If there is no segment prefixes this field is undefined len 12 15 Length of Prefixes Length ...

Page 483: ...e TrapCode Indicates Concurrent Data Debug taken Branch and Single Step Events ISR code 15 14 indicates whether CALL or JMP generated the trap See Table 9 3 for details 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 reserved gate selector 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 reserved 31 30 29 28 27 26...

Page 484: ... interrupts and code breakpoints between Mov Pop SS and the next instruction and to inhibit Single Step and Data Breakpoint traps on the Mov or Pop SS instruction IIP next virtual IA 32 instruction address zero extended to 64 bits IIPA trapping virtual IA 32 instruction address zero extended to 64 bits IIM contains the previous EFLAG value before the trapping instruction ISR vector 2 ISR code Trap...

Page 485: ... an external bus lock the processor raises a Locked Data Reference fault Parameters IIP faulting virtual IA 32 instruction address zero extended to 64 bits IIPA virtual address of the faulting IA 32 instruction zero extended to 64 bits IFA faulting virtual data address byte granular zero extended to 64 bits ISR vector 4 ISR code 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9...

Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...

Page 487: ...n IA 32 privileged system resources are accessed on an interruption or when the following conditions are detected Instruction Interception IA 32 system level privileged instructions are executed System Flag Interception Various EFLAG system flags are modified e g AC TF and IF bits Gate Interception Control transfers are made through call gate or transfers through a task switch TSS segment or Task ...

Page 488: ...on Table 10 1 IA 32 System Register Mapping Intel Itanium Reg IA 32 Reg Convention Size Description Application Registers EFLAG EFLAG IA 32 state 32 IA 32 System Arithmetic flags writes of some bits are conditioned by PSR cpl and EFLAG iopl CSD CSD 64 IA 32 code segment register format SSD SSD IA 32 stack segment register format CFLG CR0 CR4 64 IA 32 control flags CR0 CFLG 31 0 CR4 CFLG 63 32 a wr...

Page 489: ...rnal interrupt control registers are used to generate prioritize and delivery external interrupts during IA 32 or Intel Itanium instruction set execution Translation Resources TRs shared All Intel Itanium virtual memory registers can be used for memory references including IA 32 TCs RRs PKRs Debug Registers IBRs dr0 3 dr7 shared 64 Intel Itanium debug registers are used memory references including...

Page 490: ...ches or interlevel control transfers If the TSSD is used for I O Permissions Itanium architecture based operating system software must ensure that a valid 286 or 386 Task State Descriptor is loaded otherwise IN OUT operations to the TSSD I O permission bitmap will result in undefined behavior The IDT descriptor is not supported or defined within the Itanium System Environment Table 10 2 IA 32 Syst...

Page 491: ... control the execution of IA 32 instructions These bits do not control any Itanium instructions See Table 10 3 for a complete definition these bits The NT bit does not directly control the execution of any IA 32 or Itanium instructions All IA 32 instructions that modify this bit is intercepted e g IRET Task Switches See Table 10 3 IA 32 EFLAG Field Definition for the behavior on IA 32 and Itanium ...

Page 492: ...f 8 IA 32 Trap Flag In the Intel Itanium System Environment IA 32 instruction single stepping is enabled when EFLAG tf is 1 or PSR ss is 1 EFLAG tf does not control single stepping for Intel Itanium instruction set execution When single stepping is enabled the processor generates a IA_32_Exception Debug trap event after the successful execution of each IA 32 instruction If EFLAG tf is modified by ...

Page 493: ...is set to 1 if a repeat string sequence REP MOVS SCANS CMPS LODS STOS INS OUTS takes an external interrupt trap or fault before the final iteration EFLAG rf and PSR id are set to 0 after the last iteration For all other cases external interrupts faults traps and intercept conditions EFLAG rf is unmodified The RF bit can be modified by Intel Itanium instructions running at any privilege level IA 32...

Page 494: ...nts A IA 32 Code Fetch fault GPFault 0 is generated on every IA 32 instruction including the target of rfi and br ia if the following condition is true EFLAG vip EFLAG vif CFLG pe PSR cpl 3 CFLG pvi EFLAG vm CFLG vme EFLAG vip 20 IA 32 Virtual Interrupt Pending See VME extensions in the Intel 64 and IA 32 Architectures Software Developer s Manual for programming details Affects execution of POPF P...

Page 495: ...lt This bit is ignored by Intel Itanium floating point instructions This bit is supported in both IA 32 and Intel Itanium System Environments See the Intel 64 and IA 32 Architectures Software Developer s Manual for details on this bit CR0 EM CFLG em 2 Emulation When CFLG em is set execution of IA 32 ESC and floating point instructions generates an IA_32_Exception DNA fault When CFLG em is 1 execut...

Page 496: ... external interrupt enabling External interrupts are enabled for the IA 32 instruction set by if PSR i and CLFG if or EFLAG if This bit always returns zero when read by the IA 32 Mov from CR0 instruction This bit is not defined in the IA 32 System Environment CFLG ii 8 IF Intercept When CFLG ii is 1 successful modification of the EFLAG if bit by IA 32 CLI STI or POPF instructions result in a IA_32...

Page 497: ... have any side effects such as flushing the TLBs This bit is supported as defined in the Intel 64 and IA 32 Architectures Software Developer s Manual for the IA 32 System Environment CR2 KR2 63 32 IA 32 Page Fault Virtual Address IA 32 Mov to CR2 result in an interception fault Mov from CR2 returns the value contained in KR2 63 32 CR2 is replaced by IFA in the Intel Itanium System Environment CR3 ...

Page 498: ...SE CFLG pse 36 Page Size Extensions In the Intel Itanium System Environment this bit is ignored by IA 32 or Intel Itanium references In the IA 32 System Environment this bit enables 4M byte page extensions for IA 32 paging Modification of this bit by Itanium architecture based code does have any side effects such as flushing the TLBs CR4 PAE CFLG pae 37 Physical Address Extensions In the IA 32 Sys...

Page 499: ...ns This bit is supported in both the IA 32 and Intel Itanium System Environments See the Intel 64 and IA 32 Architectures Software Developer s Manual for details on these bits CR4 FXSR CFLG FXSR 41 SSE FXSR Enable When 1 enables the SSE register context When 0 execution of all SSE instructions results in an IA_32_Intercept Instruction fault This bit does not control the behavior of Intel Itanium i...

Page 500: ...facility Itanium architecture based code can program the performance monitors for IA 32 and or Itanium events by configuring the PMC registers Count values are accumulated in the PMD registers for both IA 32 and Itanium events See implementation specific documentation for the list of supported events and encodings IA 32 code can sample the performance counters by issuing the RDPMC instruction RDPM...

Page 501: ...e registers must be saved by the RSE before entering an IA 32 process Low Integer registers GR1 31 These registers must be explicitly saved before entering an IA 32 process 10 4 2 Exiting IA 32 Processes High FP registers FR32 127 PSR mfh is unmodified when leaving the IA 32 instruction set IA 32 instruction set execution leaves FR32 127 in an undefined state Software can not rely on register valu...

Page 502: ... FIP FOP FDS and FEA On exit from the IA 32 instruction set Itanium registers defined to hold IA 32 state reflect the results of all IA 32 prior numeric instructions FSR FCR FIR FDR Itanium numeric status and control resources defined to hold IA 32 state are honored by IA 32 numeric instructions when entering the IA 32 instruction set In Table 10 5 unchanged indicates there is no change in behavio...

Page 503: ... Lock Intercept If Locks are disabled DCR lc is 1 and a processor external lock transaction is required CPUID unchanged CWD CDQ CVTPI2PS CVTPS2PI CVTSI2SS CVTSS2SI CVTTPS2PI CVTTSS2SI DAA DAS DEC DIV DIVPS DIVSS ENTER EMMS Table 10 5 IA 32 Instruction Summary Continued IA 32 Instruction Intel Itanium System Environment Comments ...

Page 504: ...LEX FNCLEX FCMOV FCOM FCOMPP FCOMI FCOMIP FUCOMI FUCOMIP FCOS FDECSTP FDIV FDIVP FIDIV FDIVR FDIVRP FDIVR FFREE FICOM FICOMP FILD FINCSTP FINIT FNINIT FIST FISTP FLD FLD constant FLDCW FLDENV FMUL FMULP FIMUL FNOP FPATAN FPTAN FPREM FPREM1 FRNDINT FRSTOR FSAVE FNSAVE FSCALE FSIN FSINCOS FSQRT FST FSTP FSTCW FNSTCW FSTENV FNSTENV FSTSW FNSTSW FSUB FSUBP FISUB FSUBR FSUBRP FISUBR FTST FUCOM FUCOMP F...

Page 505: ...ll forms of IRET result in an instruction intercept Jcc additional taken branch trap If PSR tb is 1 raise a taken branch trap JMP near no change far no change gate task Gate Intercept call gate Gate Intercept additional taken branch trap Intercept fault if through a call or task gate If PSR tb is 1 raise a taken branch trap JMPE Jumps to the Intel Itanium instruction set LAHF unchanged LAR LDMXCSR...

Page 506: ...hanged PADD PADDS PADDUS PAND PANDN PCMPEQ PCMPGT PEXTRW PINSRW PMADD PMULHW PMULLW PMULHUW PMOVMSKB POP POPA POP SS System Flag Intercept System Flag Intercept Trap after instruction completes POPF POPFD Optional System Flag Intercept Intercept if EFLAG if changes state and CFLG ii is 1 Intercept if EFLAG ac or tf change state POR unchanged PREFETCH PSHUFW PSLL PSRA PSRL PSUB PSUBS PSUBUS PUNPCKH...

Page 507: ... SHR SBB SCAS SFENCE SETcc SGDT SLDT Instruction Intercept IA 32 privileged instruction SHLD SHRD unchanged SHUFPS SQRTPS SQRTSS SIDT Instruction Intercept IA 32 privileged instructions SMSW STC STD unchanged STI Optional System Flag Intercept Intercept if EFLAG if changes state and CFLG ii is 1 STMXCSR unchanged STOS STR Instruction Intercept IA 32 privileged instruction SUB unchanged SUBPS SUBSS...

Page 508: ...ructions bypass all of these steps and directly generate addresses within the 64 bit virtual address space For both IA 32 and Itanium instruction set memory references virtual memory management defined by the Itanium architecture is used to map a given virtual address into a physical address Itanium architecture based virtual memory management hardware does not distinguish between Itanium and IA 3...

Page 509: ...ing system TLB Dirty Bit If this bit is zero a Dirty bit fault is generated during any Itanium or IA 32 instruction that stores to a dirty page Note the processor does not automatically set the Dirty bit in the VHPT on every write Dirty bit updates are managed by the operating system 10 6 3 IA 32 TLB Forward Progress Requirements To ensure forward progress while executing IA 32 instructions additi...

Page 510: ...each iteration of an IA 32 string instruction regardless of the state of PSR i and EFLAG if The processor may delay in its response and acknowledgment to a broadcast purge TC transaction until the processor executing an IA 32 instruction has reached a point e g an IA 32 instruction boundary where it is safe to process the purge TC request The amount of the delay is implementation specific and can ...

Page 511: ...e degradation associated with un aligned values and extra over head for unaligned data memory fault handlers The processor provides full functional support for all cases of un aligned IA 32 data memory references If PSR ac is 1 or EFLAG ac is 1 and CR0 am is 1and the effective privilege level is 3 unaligned IA 32 memory references result in an IA 32 Exception AlignmentCheck fault Unaligned process...

Page 512: ...fined as requiring a read modify write operation external to the processor under external bus lock and if DCR lc is set to 1 an IA_32_Intercept Lock fault is generated IA 32 atomic memory references are defined to require an external bus lock for atomicity when the memory transaction is made to non write back memory or are unaligned across an implementation specific non supported alignment boundar...

Page 513: ...specific and can vary across processor generations All IA 32 loads have acquire semantics Some high performance processor implementations may speculatively issue acquire loads into the memory system for speculative memory types if and only if the loads do not appear to pass other loads as observed by the program If there is a coherency action that would result in the appearance to the program of a...

Page 514: ...e is seen to the instruction stream and then discarded Self modifying code due to Itanium stores is not detected by the processor IA 32 instruction fetches can pass prior loads or memory fence operations from the same processor Data memory accesses and memory fences are not ordered with respect to IA 32 instruction fetches IA 32 instruction fetches can not pass any serializing instructions includi...

Page 515: ...tion Set All data dependencies are honored Itanium loads see the results of all prior Itanium and IA 32 stores Itanium stores or loads can not pass prior IA 32 loads acquire Itanium unordered stores or any Itanium load can pass prior IA 32 stores release to a different address Itanium architecture based software can prevent Itanium loads and stores from passing prior IA 32 stores by issuing a rele...

Page 516: ... MB I O port block within the 263 byte physical address space is determined by platform conventions see Section 10 7 2 Physical I O Port Addressing on page 2 270 for details 10 7 1 Virtual I O Port Addressing The IA 32 defined 64 KB I O port space is expanded into 64 MB This effectively places 4 I O ports per each 4KB virtual and physical page Since there are 4 ports per virtual page the TLBs can ...

Page 517: ...essed IA 32 IN and OUT instructions and Itanium or IA 32 load store instructions can reference I O ports in 1 2 or 4 byte transactions References to the legacy I O port space cannot be performed with greater than 4 byte transactions due to bus limitations in most systems Since an IA 32 IN OUT instruction can access up to 4 bytes at port address 0xFFFF the I O port space effectively extends 3 bytes...

Page 518: ...instructions 10 7 2 Physical I O Port Addressing Some processors implementations will provide an M IO pin or bus indication by decoding physical addresses if references are within the 64MB physical I O block If so the 64MB I O port space is compressed back to 64KB Subsequent processor implementations may drop the M IO pin or bus indication and rely on platform or chip set decoding of a range of th...

Page 519: ... bitmap may be consulted as defined below If the Bitmap denies permission or is not consulted an IA_32_Exception GPFault is generated If IOPL permission is denied and CFLG io is 1 the TSS I O permission bitmap is consulted for access permission If the corresponding bit s for the I O port s is 1 indicating permission is denied a GPFault is generated Otherwise access permission is granted The TSS I ...

Page 520: ...p is not checked and stores and loads do not honor IN and OUT memory ordering and acceptance semantics the processor will not automatically wait for a store to be accepted by the platform Virtual addresses for the I O port space should be computed as defined in Section 10 7 1 Virtual I O Port Addressing on page 2 268 If data translations are enabled the TLB is consulted for mappings and permission...

Page 521: ...rf is 1 Instruction Debug faults are temporarily disabled for one IA 32 instruction The successful execution of an IA 32 instruction clears both PSR id and EFLAG rf bits The successful execution of an Itanium instruction only clears PSR id Data Debug traps When PSR db is 1 any IA 32 data memory reference that matches the parameters specified by the DBR registers results in a IA_32_Exception Debug ...

Page 522: ... match with the corresponding data breakpoint register DBR0 3 For IA 32 data debug traps any number of B bits can be set indicating a match The B bits are only set and a data breakpoint trap generated if 1 the breakpoint register precisely matches the specified DBR address and mask 2 it is enabled by the DBR read or write bits for the type of the memory transaction 3 the DBR privilege field matche...

Page 523: ...s the three interruption handlers defined to support IA 32 events See Section 9 2 IA 32 Interruption Vector Definitions on page 2 213 for details on these interruption handlers This grouping of interruption handlers simplifies software handlers such that they do not need to be concerned with behavior of both IA 32 and Itanium instruction sets Interruption registers defined in Chapter 3 record the ...

Page 524: ...rCodea IA 32 Segment Not present fault 12 IA_32_Exception Stack 12 ErrorCode IA 32 Stack Exception fault 13 IA_32_Exception GPFault 13 ErrorCode IA 32 General Protection fault 14 Intel Itanium TLB faults see Data TLB faults below IA 32 Page fault can not be generated in the Intel Itanium System Environment replaced by Intel Itanium TLB faults Intel reserved 15 N A Intel reserved 16 IA_32_Exception...

Page 525: ...e operating system must save and restore FSR FIR and FDR effectively performing an FSTENV and FLDENV to ensure numeric exceptions are correctly reported across a process switch 10 10 Processor Bus Considerations for IA 32 Application Support The section briefly discusses bus and platform considerations when supporting IA 32 applications in the Itanium System Environment Itanium architecture based ...

Page 526: ...1 5 Platform Management Interrupt PMI on page 2 310 for details 10 10 1 IA 32 Compatible Bus Transactions Within the Itanium System Environment the following bus transactions are initiated INTA Interrupt Acknowledge emitted by the operating system via a read to the INTA byte in the processor s Interrupt Block to acquire the interrupt vector number from an external interrupt controller HALT Emitted...

Page 527: ... as possible PAL is Itanium architecture based code PAL code runs at privilege level 0 PAL procedures can be called without backing store except where memory based parameters are returned The processor and platform hardware requirements for PAL This includes minimizing PAL dependencies on platform hardware and clearly stating where those dependencies exist A PAL interface and requirements to suppo...

Page 528: ...sors on a platform Figure 11 1 Firmware Model Non performance criti cal hardware events e g reset machine checks Operating System Software System Abstraction Layer SAL Processor hardware Performance critical hard ware events e g inter rupts Instruction Execution Platform Processor Abstraction Layer PAL Interrupts traps and faults Transfers to SAL entrypoints PAL procedure calls Access to platform ...

Page 529: ...by higher level firmware and software to obtain information about the identification configuration and capabilities of the processor implementation to perform implementation dependent functions such as cache initialization or to allow software to interact with the hardware through such functions as power management or enabling disabling processor features 11 1 2 Firmware Entrypoints Figure 11 2 Fi...

Page 530: ...and reporting PALE_INIT Saves the processor state places the processor in a known state and branches to SALE_ENTRY PALE_INIT is entered as a response to an initialization event PALE_PMI Saves the processor state and branches to SALE_PMI PALE_PMI is entered as a response to a platform management interrupt 11 1 4 SAL Entrypoints There are two entrypoints from PAL into SAL SALE_ENTRY PAL branches to ...

Page 531: ... in Figure 11 4 The first version has one PAL_A component This layout allows for robust recovery of PAL_B and SAL_B components This layout is useful for cases where PAL_A will not need to be upgraded The second version splits the PAL_A block into two components The first component is referred to as the generic PAL_A and the second component is the processor specific PAL_A Splitting the PAL_A up in...

Page 532: ...e Optional PAL_B Block Reserved SAL space Optional SAL_B Block Available Space 16 Bytes 8 Bytes Multiple of 16 Bytes 8 Bytes Multiple of 16 Bytes Multiple of 16 Bytes Multiple of 16 bytes Multiple of 16 Bytes Multiple of 16 Bytes Multiple of 16 bytes SALE_ENTRY CPU Reset Init H W Error PALE_RESET PALE_INIT PALE_CHECK C X 16MB Maximum Protected Bootblock 4GB 48 4GB 64 Reserved 16 Bytes PAL_A FIT En...

Page 533: ...nterface Table FIT Reserved PAL Space Optional PAL_B Block Reserved SAL Space Optional SAL_B Block Available Space 16 Bytes 8 Bytes Multiple of 16 Bytes 8 Bytes Multiple of 16 Bytes Multiple of 16 Bytes Multiple of 16 Bytes Multiple of 16 Bytes Multiple of 16 Bytes Multiple of 16 Bytes CPU Reset Init H W Error PALE_RESET PALE_INIT PALE_CHECK C X 16MB Maximum Protected 4GB 48 4GB 64 Reserved 8 Byte...

Page 534: ...in length The collection of regions above from the beginning of the SAL_A code to 4GB is called the Protected Bootblock The size of the Protected Bootblock is SAL_A size PAL_A size 64 The Firmware Interface Table FIT comprises of 16 byte entries containing starting address and size information for the firmware components The FIT is generated at build time based on the size and location of the firm...

Page 535: ...and places code in one block such as PAL_A cannot branch to code in another block such as PAL_B directly The FIT allows code in one block to find entrypoints in another Figure 11 5 below shows the FIT layout Each FIT entry contains information for the corresponding firmware component The first entry contains size and checksum information for the FIT itself The order of the following FIT entries mu...

Page 536: ...ry FIT type 0 the modulo sum of all the bytes in the FIT table must add up to zero Note The PAL_A FIT entry is not part of the FIT table checksum Address An 8 byte field containing the base address of the component For the FIT header this field contains the ASCII value of _FIT_ sp sp sp sp represents the space character The FIT allows simpler firmware updates Different components may be updated in...

Page 537: ...rmware recovery check This section is referred to as phase one of processor self test and they are generally run early during the processor boot process The second phase is written requiring that external memory is available to execute correctly These tests are run when a call to the PAL procedure PAL_TEST_PROC is made with the correct parameters set up These tests are referred to as phase two of ...

Page 538: ...L after firmware recovery check GR38 Indicates if the PAL_MEMORY_BUFFER procedure is required to be called on this processor implementation for correct behavior Also indicates the minimum buffer size required for the PAL_MEMORY_BUFFER procedure Table 11 2 defines the layout of this register Banked GRs All bank 0 general registers are undefined FRs The contents of all floating point registers are u...

Page 539: ...caches to the processor core The path from external memory to the caches cannot be tested until phase two of the processor self test Note All cache contents will be invalidated when SAL returns to PAL after the RECOVERY_CHECK hand off If the SAL uses the caches in their RECOVERY_CHECK code it is SAL s responsibility to write back any modified data in the caches before returning to PAL TLB The TRs ...

Page 540: ...IT because of a FIT checksum failure and no compatible processor specific PAL_A was found in the alternate FIT if supported PAL_A_Spec Found FIT Checksum Failure 10 A compatible processor specific PAL_A was found in the alternate FIT No compatible processor specific PAL_A was found in the FIT due to a FIT checksum failure PAL_A_Spec Failure Good PAL_A_Spec found in FIT 11 One or more compatible pr...

Page 541: ...assed authentication and checksum 64K Unaligned 17 No PAL_B was found in the FIT and alternate FIT if supported that was correctly aligned to a 64KB boundary Figure 11 8 Geographically Significant Processor Identifier 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 reserved proc_id 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 3...

Page 542: ...cating whether testing has occurred If this field is zero the processor has not been tested and no other fields in the Self Test State Parameter are valid The processor can be tested prior to entering SALE_ENTRY for both RECOVERY CHECK and RESET functions If the state field indicates that the processor is functionally restricted then the fields vm ia fp specify additional information about the fun...

Page 543: ...ets these bits as input parameters on two occasions The first time is when SAL passes control back to PAL after the firmware recovery check The second time is when a call to PAL_TEST_PROC is made When PAL interprets these bits it will only interpret implemented test_control bits and will ignore the values located in the unimplemented test_control bits PAL interprets the implemented bits such that ...

Page 544: ...riptions of all these procedure calls Code for handling machine checks must take into consideration the possibility that nested machine checks may occur A nested machine check is a machine check that occurs while a previous machine check is being handled PALE_CHECK is entered in the following conditions When PSR mc 0 and an error occurs which results in a machine check or When PSR mc changes from ...

Page 545: ... 2 42 for the definition of the bank 0 registers Additional resources required to allow software recovery of machine checks when PSR ic 0 The presence of these resources is processor implementation specific The PAL_PROC_GET_FEATURES procedure described on page 2 440 returns information on the existence of these optional resources XIP XPSR XFS interruption resources implemented to store information...

Page 546: ...ate save area A cover instruction is executed in the PALE_CHECK handler which allocates a new stack frame of zero size BSP will be modified to point to a new location since all the registers from the current frame at the time of interruption were added to the RSE dirty partition by the allocation of a new stack frame The ITC register will not be directly modified by PAL but will continue to count ...

Page 547: ...s not been lost mn 5 Min state save area has been registered with PAL if set to 1 sy 6 Storage integrity synchronized A value of 1 indicates that all loads and stores prior to the instruction on which the machine check occurred completed successfully and that no loads or stores beyond that point occurred See Table 11 8 co 7 Continuable A value of 1 indicates that all in flight operations from the ...

Page 548: ...gisters are valid 1 valid 0 not valid ar 25 Application registers are valid 1 valid 0 not valid br 26 Branch registers are valid 1 valid 0 not valid pr 27 Predicate registers are valid 1 valid 0 not valid fp 28 Floating point registers are valid 1 valid 0 not valid b1 29 Preserved bank one general registers are valid 1 valid 0 not valid b0 30 Preserved bank zero general registers are valid 1 valid...

Page 549: ...y SAL Any processor state not listed below must be either unchanged or restored by SAL before returning to PALE_CHECK SAL will preserve the values in GR4 GR7 and GR17 GR18 SAL will return to PALE_CHECK via the address in GR19 SAL will set up GR19 to indicate the success of the rendezvous before returning to PAL GR19 is zero to indicate the rendezvous was successful GR19 is non zero to indicate tha...

Page 550: ...arked poison preventing any silent data corruption If bit 53 is 1 unconsumed data poisoning events are reported as MCAs To immediately report unconsumed data poisoning events as uncorrected errors in the sense that the data in question has been lost the caller can set bit 53 to 1 PSP settings for a data poisoning event with bit 53 equal to 1 are given in the table below See also Table 11 8 When pr...

Page 551: ...min state save area is shown in Figure 11 2 When SAL registers the area with PAL it passes in a pointer to offset zero of the area When PALE_CHECK is entered as a result of a machine check it fills in processor state processes the machine check and branches to SALE_ENTRY with a pointer to the first available memory location that SAL can use in GR16 SAL may allocate a variable sized area above the ...

Page 552: ...16 Bank 1 GR17 Bank 1 GR18 Bank 1 GR19 Bank 1 GR20 Bank 1 GR21 Bank 1 GR22 Bank 1 GR23 Bank 1 GR24 Bank 1 GR25 Bank 1 GR26 Bank 1 GR27 Bank 1 GR28 Bank 1 GR29 Bank 1 GR30 Bank 1 GR31 Predicate Registers BR0 RSC IIP IPSR IFS XIP or undefined XPSR or undefined XFS or undefined 0x0 0x8 0x10 0x18 0x20 0x28 0x30 0x38 0x40 0x48 0x50 0x58 0x60 0x68 0x70 0x78 0x80 0x88 0x90 0x98 0xa0 0xa8 0xb0 0xb8 0xc0 0...

Page 553: ...17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 NaT bits for Bank 0 GR16 to GR31 NaT bits for GR15 to GR1 UD 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 Undefined not used NaT bits for Bank 1 GR16 to GR31 Table 11 10 NaT Bits for Saved GRs Bits Description 0 Undefined not used 15 1 NaT bits for GR15 to GR1 Bit 1 represents GR1 and subsequent bits fol...

Page 554: ...te the caller must restore all register state that is not stored in the processors min state save area before making the PAL_MC_RESUME procedure call Since BR0 and BR1 are the only two branch registers saved in the min state save area the caller must only use these two branch registers when making the PAL_MC_RESUME procedure call 11 4 PAL Initialization Events 11 4 1 PALE_INIT PALE_INIT is entered...

Page 555: ...cept the RSE control register RSC the RSE backing store pointer BSP and the ITC and RUC counters The RSC register is unchanged except that the RSC mode field will be set to 0 enforced lazy mode and the RSC register at the time of the INIT has been saved in the min state save area A cover instruction is executed in the PALE_INIT handler which allocates a new stack frame of zero size BSP will be mod...

Page 556: ...e Parameter Fields Field Bits INIT value Description rsvd 1 0 Reserved rz 2 xa The attempted processor rendezvous was successful if set to 1 ra 3 xa A processor rendezvous was attempted if set to 1 me 4 0 Distinct multiple errors have occurred not multiple occurrences of a single error Software recovery may be possible if error information has not been lost mn 5 xa Min state save area has been reg...

Page 557: ... RSE is valid 1 valid 0 not valid cm 18 0 The machine check has been corrected 1 corrected 0 not corrected ex 19 0 A machine check was expected 1 expected 0 not expected cr 20 xa Control registers are valid 1 valid 0 not valid pc 21 xa Performance counters are valid 1 valid 0 not valid dr 22 xa Debug registers are valid 1 valid 0 not valid tr 23 xa Translation registers are valid 1 valid 0 not val...

Page 558: ... PMI handler which handles all platform related processing The location of the PALE_PMI and SALE_PMI handlers are programmable The location of the PALE_PMI handler can be programmed by the PAL_COPY_PAL procedure described on page 2 389 The SALE_PMI handler can be programmed by the PAL_PMI_ENTRYPOINT procedure described on page 2 443 If a PMI is taken very early in the boot sequence before PAL has ...

Page 559: ...iority PMI messages can be delivered by an external interrupt controller or as an inter processor interrupt using delivery mode 010 Table 11 15 shows the PMI message vector assignments Vectors 4 15 are reserved for PAL and within these PAL vectors a higher vector number has higher priority Vectors 1 3 are available for SAL to use and within these SAL vectors a higher vector number has higher prior...

Page 560: ...ister shall be preserved by SAL PMI handler GR27 contains the value of saved B0 The contents of this register shall be preserved by SAL PMI handler GR28 contains the value of saved B1 The contents of this register shall be preserved by SAL PMI handler GR29 contains the value of the saved predicate registers The contents of this register shall be preserved by SAL PMI handler GR30 31 are scratch reg...

Page 561: ...isters are unchanged from the time of the interruption DBR IBRs The contents of all breakpoint registers are unchanged from the time of the interruption PMCs PMDs The contents of the PMC registers are unchanged from the time of the PMI The contents of the PMD registers are not modified by PAL code but may be modified if events it is monitoring are encountered Cache The processor internal cache is ...

Page 562: ... power by stopping instruction execution but maintains cache and TLB coherence in response to external requests The processor transitions from this state to the NORMAL state in response to any unmasked external interrupt including NMI machine check reset PMI or INIT An unmasked external interrupt is defined to be an interrupt that is permitted to interrupt the processor based on the current settin...

Page 563: ...on about the power states implemented in a particular processor This information allows the caller to decide which low power states are implemented and which ones to call based on the callers requirements 11 6 1 Power Performance States P states This section describes the power performance states hence to be referred to as P states supported by the Itanium architecture P states enable the caller t...

Page 564: ...3 and a software policy that transitions between the states depending on the current system utilization During times of high utilization the software migrates the processor towards lower numbered P states which increases processor performance and increases the dissipated power When system utilization is low the software policy migrates the processor towards higher numbered P states thereby reducin...

Page 565: ...of the dependencies is so that appropriate coordination techniques can be applied To allow the architecture definition to comprehend multi threaded multi core designs we define the concept of dependency domain and coordination mechanisms A dependency domain is comprised of logical processors that share a common set of implementation dependant domain parameters that affect power consumption and per...

Page 566: ...dware is responsible for the required coordination with other processors in the same HCDD Hardware based coordination mechanisms would be implemented to allow for changes to the logical processor s power and performance local parameters which are implementation dependant in addition to the existing domain parameters Hardware would use a combination of changes to both of these parameters to satisfy...

Page 567: ...AL_GET_PSTATE procedure for details 11 6 1 3 PAL Interfaces for P states The PAL procedure PAL_PROC_GET_FEATURES returns whether an implementation supports P states If an implementation supports P states then the PAL_PROC_SET_FEATURE procedure will allow the caller to enable or disable this feature The Itanium architecture provides three PAL procedures to enable P state functionality PAL_PSTATE_IN...

Page 568: ...s from state P0 to state P3 in partial increments the logical processor may attempt to perform changes at a later time to the local parameters and or domain parameters to transition to the originally requested P state based on P state transition requests on other logical processors Software can also approximate the behavior of a SCDD by forcing P state transitions See the description of the PAL_SE...

Page 569: ...te that this return value may not necessarily correspond to the performance index of the target P state requested by the most recent PAL_SET_PSTATE procedure call For example let s assume that the previous PAL_GET_PSTATE procedure was called at time t0 when the processor was operating in state P0 The previous PAL_SET_PSTATE procedure requested a transition from P0 to P3 The transition happened ove...

Page 570: ...ansition starts The expectation is that any errors in computing the performance_index due to non instantaneous transitions to higher and lower P states will tend to cancel out and to the extent that they do not will be insignificant 11 6 1 4 Variable P state Performance Some processors support variable P state performance which allows the frequency to vary within a given P state in order to achiev...

Page 571: ...nter and exit a HALT state between two consecutive calls to PAL_GET_PSTATE Since the logical processor is not executing any instructions while in the HALT state the performance index contribution during this period is essentially 0 and will not be accounted for in the performance_index value returned when the next PAL_GET_PSTATE procedure call is made For example let us assume that the previous PA...

Page 572: ...policies to manage support virtualization of Itanium processors The PAL extensions for virtualization consist of three main components 1 A set of procedures to support virtualization operations These procedures allow the VMM to configure logical processors for virtualization operations and suspend resume virtual processors on logical processors Details for this component are described in Section 1...

Page 573: ...or Descriptor VPD represents the abstraction of processor resources of a single virtual processor The VPD consists of per virtual processor control information together with performance critical architectural state The VPD is 64K in size and the base must be 32K aligned Table 11 16 shows the fields and layout of the VPD The values in the VPD can be stored in little or big endian format depending o...

Page 574: ...pan sion Reserved vgr 16 31 16 1024 Virtual General Registers Represent the bank 1 general registers 16 31 of the virtual processor When the virtual processor is run ning and vpsr bn is 1 the values in these entries are undefined Architectural State a_bsw vbgr 16 31 16 1152 Virtual Banked General Registers Represent the bank 0 general registers 16 31 of the virtual processor When the virtual proce...

Page 575: ...e contents in this field to be preserved when the virtual processor is running Architectural State always Reserved 76 1440 Reserved Area Reserved for future expan sion This area may also be used by PAL to hold additional machine specific processor state Reserved vcr 0 127 128 2048 Virtual Control Registers Represent the con trol registers of the virtual processor For the reserved control registers...

Page 576: ... User Mask Optimization on page 2 345 for further details up 2 ac 3 mfl 4 mfh 5 System Mask PSR 23 0 ic 13 Always i 14 a_int a_from_psr pk 15 a_from_psr rv 12 6 16 Reserved dt 17 a_from_psr dfl 18 dfh 19 sp 20 pp 21 di 22 si 23 Always PSR l PSR 31 0 db 24 a_from_psr lp 25 tb 26 rt 27 rv 31 28 Reserved PSR 63 0 cpl 33 32 No accelerations require these fields is 34 mc 35 a_from_psr it 36 id 37 No ac...

Page 577: ...5 No accelerations require these virtual control registers VCR16 VIPSR a_from_int_cr a_to_int_cr VCR17 VISR VCR18 No accelerations require this virtual control register VCR19 VIIP a_from_int_cr a_to_int_cr VCR20 VIFA Always VCR21 VITIR Always VCR22 VIIPA a_from_int_cr a_to_int_cr VCR23 VIFS a_cover a_from_int_cr a_to_int_cr VCR24 VIIM a_from_int_cr a_to_int_cr VCR25 VIHA VCR26 VIIB0 VCR27 VIIB1 VC...

Page 578: ...37 36 35 34 33 32 Reserved Table 11 20 Virtualization Disable Control vdc Fields Field Bits Description d_vmsw 0 Disable vmsw instruction If 1 disables vmsw instruction on the logical pro cessor Execution of the vmsw instruction independent of the state of PSR vm will cause a virtualization intercept d_extint 1 Disable external interrupt control register virtualization If 1 accesses reads writes o...

Page 579: ...r to the host IVT specific to the virtual processor as an incoming parameter to the PAL_VP_CREATE or PAL_VP_REGISTER procedures The per virtual processor host IVT is setup to perform long branches to the corresponding vector of the IVT specified in the incoming parameter for all IVA based d_to_pmd 4 Disable PMD write virtualization If 1 writes to the performance monitor data registers PMDs are not...

Page 580: ...the IVA control register on the logical processor is set to point to the per virtual processor host IVT virtualization intercepts will be raised at the Virtualization vector or at an optional virtualization intercept handler specified by the VMM By default virtualization intercepts are delivered to the Virtualization vector of the IVT specified by the VMM during PAL_VP_CREATE PAL_VP_REGISTER If th...

Page 581: ...s the 41 bit opcode in little endian format and the type of the instruction which caused the fault excluding the qualifying predicate qp field See Figure 11 15 PAL Virtualization Intercept Handoff Opcode GR25 on page 2 335 for details GR26 31 are available for the VMM to use FRs The contents of all floating point registers are preserved from the time of the interruption Predicates The contents of ...

Page 582: ... handler but may be modified due to normal cache activity of running the handler code TLB The TRs are unchanged from the time of the interruption Table 11 22 PAL Virtualization Intercept Handoff Cause GR24 Value Cause Description 1 toAR Due to MOV to AR instruction 2 toARimm Due to MOV to AR imm instruction 3 fromAR Due to MOV from AR instruction 4 toCR Due to MOV to CR instruction 5 fromCR Due to...

Page 583: ... optimization settings were specified in the VPD and hence local to each virtual processor The VMM can specify different local optimization settings for different virtual processors The two classes of local virtualization optimizations are Virtualization accelerations Virtualization accelerations optimize the execution of virtualized instructions by supporting fast access to the virtual instance o...

Page 584: ...optimizations for more details on these requirements 11 7 4 1 1 Virtualization Opcode Optimization Virtualization opcode optimization is always enabled Opcode information is provided to the VMM during PAL intercepts in the virtual environment In some processor implementations the opcode provided may not be guaranteed to be the opcode that triggered the intercept virtual machine monitors can determ...

Page 585: ...n is enabled This optimization is enabled by the gitc bit in the config_options parameter of PAL_VP_INIT_ENV The behavior of the guest MOV from AR ITC instruction is affected by the settings of psr ic and vpsr ic as well as shown in Table 11 25 This optimization requires no special synchronization This optimization is not supported on all processor implementations Software can call PAL_VP_ENV_INFO...

Page 586: ...n this optimization is enabled execution of rsm and ssm instructions1 with PSR vm 1 which modify only vpsr i will not intercept to the VMM and vpsr i is updated with the new value unless a fault condition is detected see Table 11 29 for details Table 11 26 Virtualization Accelerations Summary Optimization Virtualization Acceleration Control vac a a The Virtualization Acceleration Control vac field...

Page 587: ...updated with the new value without handling off to the VMM unless a fault condition is detected see Table 11 29 for details A virtual external interrupt is raised if the virtual highest priority pending interrupt vhpi is unmasked by the new vpsr i and vtpr No virtual external interrupt is raised if the virtual highest priority pending interrupt is still masked by vpsr i or vtpr When this optimizat...

Page 588: ...state is accessed as described in Table 11 16 Virtual Processor Descriptor VPD on page 2 326 Table 11 29 Interruptions when Virtual External Interrupt Optimization is Enabled Instructions Interruptions rsm ssm When the virtual external interrupt optimization is enabled execution of rsm and ssm instructions with PSR vm 1 which modify only vpsr i may raise the following faults Privileged Operation f...

Page 589: ...uctions to read Table 11 31 Interruptions when Interruption Control Register Read Optimization is Enabled Instructions Interruptions Move from interruption control registers When the interruption control register read optimization is enabled reads of interruption control registers with PSR vm 1 may raise the following faults Illegal Operation fault if vpsr ic is not zero or the target operand spec...

Page 590: ...zation The MOV from CPUID optimization is enabled by the a_from_cpuid bit in the Virtualization Acceleration Control vac field in the VPD When this optimization is enabled software running with PSR vm 1 will be able to execute MOV from CPUID instruction to read the virtual CPUID registers without any intercepts to the VMM and the corresponding VCPUID value from the VPD will be returned unless a fa...

Page 591: ...ion is enabled execution of the bsw instruction with PSR vm 1 spills the currently active banked registers and the corresponding NaT bits to the VPD and loads the other banked registers and the Table 11 36 Synchronization Requirements for MOV from CPUID Optimization VPD Resource Synchronization Required vcpuid0 4 Write Table 11 37 Interruptions when MOV from CPUID Optimization is Enabled Instructi...

Page 592: ...privilege levels with PSR vm 1 will result in virtualization intercepts When the a_select_probes bit is set to 1 the following probe instructions will raise virtualization intercepts when executed with PSR vm 1 at the most privileged level VPSR cpl 0 probe instructions in immediate form with immediate field equal to privilege level 0 All probe instructions in register form Please refer to the inst...

Page 593: ... virtual processor is running Synchronization is required when this optimization is enabled see Table 11 43 for details This optimization is not supported on all processor implementations Software can call PAL_VP_ENV_INFO to determine the availability of this feature When this optimization is enabled certain VPD state is accessed as described in Table 11 16 Virtual Processor Descriptor VPD on page...

Page 594: ...nd the interruption collection and user mask optimization a_ic_um Software can enable or disable both optimizations together or enable each optimization indepen dently Section 11 7 4 4 1 Virtual External Interrupt Optimization and Interruption Collection and User Mask Optimization on page 2 349 describes the behavior when both optimizations are enabled Table 11 44 Synchronization Requirements for ...

Page 595: ...ion intercepts Note This field cannot be enabled together with the a_int virtualization acceleration control vac described in Section 11 7 4 2 1 Virtual External Interrupt Opti mization on page 2 338 If this control is enabled together with the a_int con trol an error will be returned during PAL_VP_CREATE and PAL_VP_REGISTER See Section 11 7 4 4 Virtualization Optimization Combinations on page 2 3...

Page 596: ...s set to 1 writes to the Interval Timer Match ITM register are not virtualized and code running with PSR vm 1 can write this resource directly without any intercepts to the VMM If this control is set to 0 writes to the ITM register with PSR vm 1 result in virtualization intercepts 11 7 4 3 7 Disable PSR Interrupt bit Virtualization The PSR interrupt bit virtualization disable is controlled by the ...

Page 597: ...hen PSR vm 1 rsm and ssm instructions with a mask targeting any fields other than i ic and user mask fields will result in virtualization intercepts independent of whether these two optimizations are enabled or not 11 7 4 5 Virtualization Synchronizations When certain virtualization accelerations described in Section 11 7 4 2 Virtualization Accelerations on page 2 337 are enabled processor impleme...

Page 598: ... or firmware This severity is for logging purposes only There is no architectural damage caused by the detecting and reporting functions Corrected errors require no operating system intervention to correct the error Corrected Machine Check CMC A corrected machine check is a machine check that as been successfully corrected by hardware and or firmware Information about the cause of the error is rec...

Page 599: ...ue contained in a register at exit from the procedure is the same as the value at entry to the procedure The value may have been changed and restored before exit Processor Abstraction Layer PAL PAL is firmware that abstracts processor implementation differences and provides a consistent interface to higher level firmware and software PAL has no knowledge of platform implementation details Procedur...

Page 600: ...from the time of the hardware event that caused the entrypoint to be invoked until it exited to higher level firmware or software When applied to a procedure unchanged means that the register referenced has not been changed from procedure entry until procedure exit In all cases the value at exit is the same as the value at entry or the occurrence of the hardware event Virtual Machine Monitor VMM T...

Page 601: ...ed function on that particular processor PAL procedures are implemented by a combination of firmware code and hardware The PAL procedures are defined to be relocatable from the firmware address space Higher level firmware and software must perform this relocation during the reset flow The PAL procedures may be called both before and after this relocation occurs but performance will usually be bett...

Page 602: ...is chapter otherwise the results of such procedure calls may be unpredictable 11 10 1 PAL Procedure Summary The following tables summarize the PAL procedures by application area Included are the name of the procedure the index of the procedure the class of the procedure whether required or optional the calling convention used for the procedure static or stacked and whether the procedure can be cal...

Page 603: ...ails Table 11 50 PAL Processor Identification Features and Configuration Procedures Procedure Idx Class Conv Mode Buffer Description PAL_BRAND_INFO 274 Opt Stacked Both No Provides processor branding information PAL_BUS_GET_FEATURES 9 Req Static Phys No Return configurable processor bus interface features and their current settings PAL_BUS_SET_FEATURESa 10 Req Static Phys No Enable or disable conf...

Page 604: ...ep Injects the requested processor error or returns information on the supported injection capabilities for this particular processor implementation PAL_MC_EXPECTED 23 Req Static Phys No Set Reset Expected Machine Check Indicator PAL_MC_HW_TRACKING 51 Opt Static Both Dep Query which hardware structures are performing hardware status tracking PAL_MC_REGISTER_MEM 27 Req Static Phys No Register min s...

Page 605: ...c Phys No Return information needed to relocate PAL procedures and PAL PMI code to memory PAL_COPY_PAL 256 Req Stacked Phys No Relocate PAL procedures and PAL PMI code to memory PAL_MEMORY_BUFFERa a Calling this procedure may affect resources on multiple processors Please refer to implementation specific reference manuals for details 277 Opt Stacked Phys No Provides cacheable memory to PAL for exc...

Page 606: ...emory has been made available and for procedures which require memory pointers as arguments The stacked registers are also used for parameter passing and local variable allocation This convention conforms to the Itanium Software Conventions and Runtime Architecture Guide Thus procedures using the stacked register calling convention can be written in the C language There are two exceptions to the r...

Page 607: ...t interruptible by external interrupt or NMI since PSR i must be 0 when the PAL procedure is called PAL procedures are not interruptible by PMI events if PSR ic is 0 If PSR ic is 1 PAL procedures can be interrupted by PMI events PAL procedures can be interrupted by machine checks and initialization events Generally PAL procedures will not enable interruptions not already enabled by the caller Any ...

Page 608: ...data access and dirty bit fault disable 0 0 unchanged dd data debug fault disable 0 0 unchanged ss single step trap enable 0 0 unchanged ri restart instruction 0 0 preserved ed exception deferral 0 0 preserved bn register bank 1 1 preserved ia instruction access bit fault disable 0 0 unchanged vm processor virtualization 0 0 unchanged a PAL procedures which are called in physical mode must remain ...

Page 609: ... procedure may modify these values as necessary during execution of the procedure The caller cannot rely on these values preserved The PAL procedure may modify these values as necessary during execution of the procedure However they will be restored to their entry values prior to exit from the procedure Table 11 58 System Register Conventions Name Description Class DCR Default Control Register pre...

Page 610: ...mplementation provides a means to read TRs for PAL this should be preserved c The PAL_MC_ERROR_INJECT may modify these registers if the caller is using the triggering capability Refer to PAL_MC_ERROR_INJECT Inject Processor Error 276 on page 2 421 for more information d No PAL procedure writes to the PMD Depending on the PMC the PMD may be kept counting performance monitor events during a procedur...

Page 611: ...nventions are the same as those of the Itanium Software Conventions and Runtime Architecture Guide 11 10 2 2 5 Predicate Registers The conventions for the predicate registers follow the Itanium Software Conventions and Runtime Architecture Guide 11 10 2 2 6 Branch Registers The conventions for the branch registers follow the Itanium Software Conventions and Runtime Architecture Guide 11 10 2 2 7 A...

Page 612: ...CR IA 32 Floating point Control Registers preserved EFLAG IA 32 EFLAG register preserved CSD IA 32 Code Segment Descriptor preserved SSD IA 32 Stack Segment Descriptor preserved CFLG IA 32 Combined CR0 and CR4 Register preserved FSR IA 32 Floating point Status Register preserved FIR IA 32 Floating point Instruction Register preserved FDR IA 32 Floating point Data Register preserved CCV Compare and...

Page 613: ...10 3 PAL Procedure Specifications The following pages provide detailed interface specifications for each of the PAL procedures defined in this document Included in the specification are the input parameters the output parameters and any required behavior ...

Page 614: ...mation was not available on the current processor Argument Description index Index of PAL_BRAND_INFO within the list of PAL procedures info_request Unsigned 64 bit integer specifying the information that is being requested See Table 11 62 address Unsigned 64 bit integer specifying the address of the 128 byte block to which the processor brand string shall be written Reserved 0 Return Value Descrip...

Page 615: ...S For all values in Table 11 63 the Class field indicates whether a feature is required to be available Req or is optional Opt The Control field indicates which features are required to be controllable These features will either be controllable through this PAL call or through other hardware means like forcing bus pins to a certain value during processor reset The control field applies only when t...

Page 616: ...signal 55 Opt Req Disable Response Error Checking When 0 the processor asserts BINIT if it detects a parity error on the signals which identify the transactions to which this is a response When 1 the processor ignores parity on these signals 54 Opt Req Disable Transaction Queuing When 0 the in order transaction queue is limited only by the number of hardware entries When 1 the processor s in order...

Page 617: ... they are software controllable before calling PAL_BUS_SET_FEATURES The list of possible processor features is defined in Table 11 63 Attempting to enable or disable any feature that cannot be changed will be ignored Argument Description index Index of PAL_BUS_SET_FEATURES within the list of PAL procedures feature_select 64 bit vector denoting desired state of each feature 1 select 0 non select Re...

Page 618: ...t argument is used Refer to Section 4 4 3 Cacheability and Coherency Attribute on page 2 77 for more information on stores and their coherency requirements with local instruction caches The effects of flushing data and unified caches is broadcast throughout the coherence domain The effects of flushing instruction caches may or may not be broadcast Argument Description index Index of PAL_CACHE_FLUS...

Page 619: ...bit The modified state represents a cache line that contains modified data The clean state represents a cache line that contains no modified data int 1 bit field indicating if the processor will periodically poll for external interrupts while flushing the specified cache_type s If this bit is a 0 unmasked external interrupts will not be polled The processor will ignore all pending unmasked externa...

Page 620: ...ll data caches then the call ensures all prior data prefetches are flushed If cache_type specifies to flush all caches then the call ensures all prior instruction and data prefetches are flushed from the caches If cache_type specifies to make local instruction caches coherent with the data caches then the call will ensure all prior instruction prefetches are flushed Due to the following conditions...

Page 621: ...rnal interrupts The amount of flushing that occurs before interrupts are polled will vary across implementations PAL_CACHE_FLUSH will return the following values to indicate to the caller the status of the call status When the call returns a 1 it indicates that the call did not have any errors but is returning due to a pending unmasked external interrupt To continue flushing the caches the caller ...

Page 622: ...r the line size of the Argument Description index Index of PAL_CACHE_INFO within the list of PAL procedures cache_level Unsigned 64 bit integer specifying the level in the cache hierarchy for which information is requested This value must be between 0 and one less than the value returned in the cache_levels return value from PAL_CACHE_SUMMARY cache_type Unsigned 64 bit integer with a value of 1 fo...

Page 623: ...lemented by the processor load instruction The config_info_2 return value has the following structure cache_size Unsigned 32 bit integer denoting the size of the cache in bytes alias_boundary Unsigned 8 bit integer indicating the binary log of the minimum number of bytes which must separate aliased addresses in order to obtain the highest performance tag_ls_bit Unsigned 8 bit integer denoting the ...

Page 624: ...ementation dependent side effects will occur cache_type Unsigned 64 bit integer with a value of 1 to initialize the instruction cache 2 to initialize the data cache or 3 to initialize both All other values are reserved restrict Unsigned 64 bit integer with a value of 0 or 1 All other values are reserved If restrict is 1 and initializing the specified level and cache_type of the cache would cause s...

Page 625: ...ialized using data_value repeated until it fills the line This procedure replicates data_value to a size equal to the largest line size in the processor controlled cache hierarchy This procedure call cannot be used where coherency is required Argument Description index Index of PAL_CACHE_LINE_INIT within the list of PAL procedures address Unsigned 64 bit integer value denoting the physical address...

Page 626: ...This value must be between 0 and one less than the value returned in the cache_levels return value from PAL_CACHE_SUMMARY cache_type Unsigned 64 bit integer with a value of 1 for instruction cache and 2 for data or unified cache All other values are reserved Reserved 0 Return Value Description status Return status of the PAL_CACHE_PROT_INFO procedure config_info_1 The format of config_info_1 is sh...

Page 627: ...e tag either to the left more significant or to the right less significant than a unit of data For the values of 2 and 3 the difference of tagprot_msb and tagprot_lsb indicates the number of tag bits that are protected with the data bits The data bits are described by the data_bits field below This field is encoded as follows When obtaining cache information via this call information for the data ...

Page 628: ...oting which portion of the specified cache line to read Argument Description index Index of PAL_CACHE_READ within the list of PAL procedures line_id 8 byte formatted value describing where in the cache to read the data address 64 bit 8 byte aligned physical address from which to read the data The address must be an implemented physical address on the processor model with bit 63 set to zero Reserve...

Page 629: ...otection bits are returned Table 11 75 part Input Values and corresponding data Return Values Part Data 0 64 bit data 1 right justified tag of the specified line 2 right justified protection bits corresponding to the 64 bits of data at address If the cache uses less than 64 bits of data to generate protection data will contain more than one value For example if a cache generates parity for every 8...

Page 630: ...c_n_cache_info2 The format of these return values is shown in Figure 11 9 and Figure 11 10 Argument Description index Index of PAL_CACHE_SHARED_INFO within the list of PAL procedures cache_level Unsigned 64 bit integer specifying the level in the cache hierarchy for which information is requested This value must be between 0 and one less than the value returned in the cache_levels return value fro...

Page 631: ...turned This is the same value that is returned by the PAL_FIXED_ADDR procedure when it is called on the logical processor rv Reserved This procedure must be supported on all implementations that contain more than one logical processor on a physical processor package and returns an unimplemented procedure error code otherwise Figure 11 9 Layout of proc_n_cache_info1 Return Value 31 30 29 28 27 26 2...

Page 632: ...ex of PAL_CACHE_SUMMARY within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_CACHE_SUMMARY procedure cache_levels Unsigned 64 bit integer denoting the number of levels of cache implemented by the processor Strictly this is the number of levels for which the cache controller is integrated into the processor the cache SRAMs may b...

Page 633: ...ache within the cache hierarchy to write data This value must be in the range from 0 up to one less than the cache_levels return value from PAL_CACHE_SUMMARY way Unsigned 8 bit integer denoting within which cache way to write data If the cache is direct mapped this argument is ignored part Unsigned 8 bit integer denoting where to write data into the cache Argument Description index Index of PAL_CA...

Page 634: ...re reserved The data argument contains the value to write into the cache Its contents are interpreted based on the part field as follows Table 11 77 part Input Values Value Description 0 data 1 tag 2 data protection 3 tag protection 4 combined data and tag protection Table 11 78 mesi Return Values Value Description 0 invalid 1 shared 2 exclusive 3 modified Table 11 79 Interpretation of data Input ...

Page 635: ...Volume 2 Part 1 Processor Abstraction Layer 2 387 PAL_CACHE_WRITE To guarantee correct behavior for this procedure it is required that there shall be no RSE activity that may cause cache side effects ...

Page 636: ...MI and Itanium architecture based operating systems All other values are reserved If the copy_type is 0 then SAL shall call PAL_COPY_PAL call subsequently to copy the PAL procedures and PAL PMI code to the allocated memory region The buffer_align return value must be a power of two between 4 KB and 1 MB Argument Description index Index of PAL_COPY_INFO within the list of PAL procedures copy_type U...

Page 637: ...data caches if target_addr has a cacheable memory attribute If a PAL procedure makes calls to internal PAL functions that execute only out of the firmware address space that portion of code will continue to execute out of the firmware address space even though the main procedure has been copied to RAM This is true only for some PAL procedures that can be called only in physical mode PAL_COPY_PAL c...

Page 638: ...ementations a hardware debugger may use two or more debug register pairs for its own use When a hardware debugger is attached PAL_DEBUG_INFO may return a value for i_regs and or d_regs less than the implemented number of debug registers When a hardware debugger is attached PAL_DEBUG_INFO may return a minimum value of 2 for d_regs and a minimum of 2 for i_regs Argument Description index Index of PA...

Page 639: ...ificance and is unique for the system interconnect to which the processor is connected If the processor is connected to multiple system interconnects the address return value must be unique among all such interconnects The maximum size of the address returned corresponds to the size of the fields id and eid in the LID register CR64 Argument Description index Index of PAL_FIXED_ADDR call within the...

Page 640: ...ter will be the frequency of this output clock in ticks per second If the processor does not generate an output clock for use by the platform this procedure will return with a status of 1 Argument Description index Index of PAL_FREQ_BASE within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_FREQ_BASE procedure base_freq Base fre...

Page 641: ...t integer Argument Description index Index of PAL_FREQ_RATIOS within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_FREQ_RATIOS procedure proc_ratio Ratio of the processor frequency to the input clock of the processor if the platform clock is generated externally or to the output clock to the platform if the platform clock is ge...

Page 642: ...he definition of the hardware sharing policies that can be returned in the cur_policy value are defined in Table 11 80 Argument Description index Index of PAL_GET_HW_POLICY within the list of PAL procedures proc_num Unsigned 64 bit integer that specifies for which logical processor information is being requested This input argument must be zero for the first call to this procedure and can be a max...

Page 643: ...t core This procedure is only supported on processors that have multiple logical processors sharing hardware resources that can be configured On all other processor implementations this procedure will return the Unimplemented procedure return status Table 11 80 Hardware policies returned in cur_policy Value Name Description 0 Performance The processor has its hardware resources configured to achie...

Page 644: ...6 indicates whether the processor supports variable P state performance The type argument allows the caller to select the performance_index value that will be returned See Table 11 81 below for details Argument Description index Index of PAL_GET_PSTATE within the list of PAL procedures type Type of performance_index value to be returned by this procedure Reserved 0 Reserved 0 Return Value Descript...

Page 645: ... returned will correspond to the target P state requested by software For SCDD software coordinated dependency domain logical processors this is the P state requested by the most recent PAL_SET_PSTATE procedure call made by any logical processor in the domain For HCDD hardware coordinated dependency domain or HIDD hardware independent dependency domain logical processors this is simply the P state...

Page 646: ...h type 1 and the current call If the processor had transitioned to a HALT state see Section 11 6 1 Power Performance States P states on page 2 315 in between successive invocations to the PAL_GET_PSTATE procedure the performance index computation returned will not take into account the performance of the processor during the time spent in HALT state see Section 11 6 1 5 Interaction of P states wit...

Page 647: ...y the io_detail_ptr is shown Table 11 82 I O size and type information has the format shown in Figure 11 13 Argument Description index Index of PAL_HALT within the list of PAL procedures halt_state Unsigned 64 bit integer denoting low power state requested io_detail_ptr 8 byte aligned physical address pointer to information on the type of I O load store requested Reserved 0 Return Value Descriptio...

Page 648: ...ormal state An unmasked external interrupt is defined to be an interrupt that is permitted to interrupt the processor based on the current setting of the TPR mic and TPR mmi fields in the TPR control register PAL sets the value in the load_return return parameter if the io_type is 1 otherwise this value is set to zero If the processor transitions to normal state via an unmasked external interrupt ...

Page 649: ...te power_consumption 28 bit unsigned integer denoting the typical power consumption of the state measured in milliwatts im 1 bit field denoting whether this low power state is implemented or not A value of 1 indicates that the low power state is implemented a value of 0 indicates that it is not implemented If this value is 0 then all other fields are invalid co 1 bit field denoting if the low powe...

Page 650: ...e the minimum number of processor cycles that will be required to transition the states The maximum or average cannot be determined by PAL due to its dependency on outstanding bus transactions For more information on power management please refer to Section 11 6 Power Management on page 2 313 ...

Page 651: ...based on the current setting of the TPR mic and TPR mmi fields in the TPR control register If the processor transitions to normal state via an unmasked external interrupt execution resumes to the caller If the processor transitions to normal state via a PMI execution resumes to the caller if PMIs are masked otherwise execution will resume to the PMI handler If the processor transitions to the norm...

Page 652: ...ocessor package do not share core pipeline resources but may share caches and bus interfaces A core may support multiple threads of execution The log_overview return value provides an overview of the logical processors on the physical processor package this procedure call was made on The format of the log_overview return argument is shown in Figure 11 15 Argument Description index Index of PAL_LOG...

Page 653: ...ay be called from any logical processor on the physical processor package to gather information about all the logical processors It may also be called to get information about the logical processor on which the procedure is running Information about the logical processors is in the return values proc_n_log_info1 and proc_n_log_info2 The format of these return values is shown in Figure 11 16 and Fi...

Page 654: ...n the logical processor rv Reserved This procedure must be supported on all implementations that contain more than one logical processor on a physical processor package and returns an unimplemented procedure error code otherwise Figure 11 17 Layout of proc_n_log_info2 Return Value 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 rv la 63 62 61 60 59 58 57 56 55...

Page 655: ...re does not clear any pending machine checks The pending return parameter returns a value of 0 if no subsequent event is pending a 1 in bit position 0 if a machine check is pending and or a 1 in bit position 1 if an INIT is pending All other values are reserved Argument Description index Index of PAL_MC_CLEAR_LOG within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Descr...

Page 656: ...lidate lines and or write them back to main memory if this is required to make the instruction caches coherent with the data caches Loads get their data returned Stores either update the cache or issue transactions to the system fabric Prefetches are either completed or cancelled As a result of completing these outstanding transactions Machine Check Aborts MCAs may be taken This call is typically ...

Page 657: ... called and may not always return the maximum size for every call The amount of data returned is provided in the processor state parameter field dsize Please see Table 11 7 for more information on the processor state parameter The caller of the procedure needs to ensure that the buffer is large enough to handle the max_size that is returned by this procedure The contents of the processor dynamic s...

Page 658: ... of PAL_MC_ERROR_INFO within the list of PAL procedures info_index Unsigned 64 bit integer identifying the error information that is being requested See Table 11 86 level_index 8 byte formatted value identifying the structure to return error information on See Figure 11 19 err_type_index Unsigned 64 bit integer denoting the type of error information that is being requested for the structure identi...

Page 659: ...cessor Error Map This info_index value will return the processor error map This return value specifies the processor core identification the processor thread identification and a bit map indicating which structure s of the processor generated the machine check This bit map has the same layout as the level_index A one in the structure bit map indicates that there is error information available for ...

Page 660: ...Error information is available for 1st 2nd 3rd and 4th level instruction TLB edt 23 20 Error information is available for 1st 2nd 3rd and 4th level data unified TLB ebh 27 24 Error information is available for the 1st 2nd 3rd and 4th level processor bus hierarchy erf 31 28 Error information is available on register file structures ems 47 32 Error information is available on micro architectural str...

Page 661: ...le for logging If there is it makes sub sequent calls with err_type_index equal to 1 2 3 and or 4 depending on which valid bits are set Additionally if the inc_err_type return value was set to one the caller knows that this structure logged multiple errors To get the second error of the structure it sets the err_type_index 8 and the structure specific information is returned in error_info The call...

Page 662: ... rsvd level op 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 pi rp rq tv mcc pv pl iv is rsvd index Table 11 90 cache_check Fields Field Bits Description op 3 0 Type of cache operation that caused the machine check 0 unknown or internal error 1 load 2 store 3 instruction fetch or instruction prefetch 4 data prefetch both hardware and software 5 sno...

Page 663: ... of the instruction bundle responsible for generating the machine check pv 58 The pl field of the cache_check parameter is valid mcc 59 Machine check corrected This bit is set to one to indicate that the machine check has been corrected tv 60 Target address is valid This bit is set to one to indicate that a valid target address has been logged rq 61 Requester identifier This bit is set to one to i...

Page 664: ...ry status the hardware damage bit of the PSP see Figure 11 11 Processor State Parameter on page 2 299 will be set as well 11 Reserved reserved 53 32 Reserved is 54 Instruction set If this value is set to zero the instruction that generated the machine check was an Intel Itanium instruction If this bit is set to one the instruction that generated the machine check was IA 32 instruction iv 55 The is...

Page 665: ...implicit or explicit write back operation 6 snoop probe 7 incoming or outgoing ptc g 8 write coalescing transactions 9 I O space read 10 I O space write 11 inter processor interrupt message IPI 12 interrupt acknowledge or external task priority cycle All other values are reserved sev 20 16 Bus error severity The encodings of error severity are platform specific hier 22 21 This value indicates whic...

Page 666: ...54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 pi rsvd mcc pv pl iv is reserved Table 11 93 reg_file_check Fields Field Bits Description id 3 0 Register file identifier 0 unknown unclassified 1 General register bank1 2 General register bank 0 3 Floating point register 4 Branch register 5 Predicate register 6 Application register 7 Control register 8 Region register 9 Protecti...

Page 667: ...described in Figure 11 24 and Table 11 94 pl 57 56 Privilege level The privilege level of the instruction bundle responsible for generating the machine check pv 58 The pl field of the reg_file_check parameter is valid mcc 59 Machine check corrected This bit is set to one to indicate that the machine check has been corrected reserved 62 60 Reserved pi 63 Precise instruction pointer This bit is set ...

Page 668: ... was located wv 22 The way field in the uarch_check parameter is valid xv 23 The index field in the uarch_check parameter is valid reserved 31 24 Reserved index 39 32 Index or set of the micro architectural structure where the error was located reserved 53 40 Reserved is 54 Instruction set If this value is set to zero the instruction that generated the machine check was an Intel Itanium instructio...

Page 669: ...ures err_type_info Unsigned 64 bit integer specifying the first level error information which identifies the error structure and corresponding structure hierarchy and the error severity err_struct_info Unsigned 64 bit integer identifying the optional structure specific information that provides the second level details for the requested error err_data_buffer Unsigned 64 bit integer specifying the ...

Page 670: ...o be specified 2 Cancel outstanding trigger All other fields in err_type_info err_struct_info and err_data_buffer are ignored All other values are reserved err_inj 5 3 Indicates the mode of error injection 0 Error inject only no error consumption 1 Error inject and consume All other values are reserved err_sev 7 6 Indicates the severity desired for error injection query Definitions of the differen...

Page 671: ...ing trigger programmed at a time Subsequent procedure calls that use the trigger functionality will overwrite the previous trigger parameters Once a trigger is programmed it remains active until either the trigger operation is encountered or software cancels the outstanding trigger via this call Software can cancel outstanding triggers by specifying Cancel outstanding trigger via the mode bit in e...

Page 672: ...that IBR0 1 are being used by the procedure for trigger functionality ibr2 1 When 1 indicates that IBR2 3 are being used by the procedure for trigger functionality ibr4 2 When 1 indicates that IBR4 5 are being used by the procedure for trigger functionality ibr6 3 When 1 indicates that IBR6 7 are being used by the procedure for trigger functionality dbr0 4 When 1 indicates that DBR0 1 are being us...

Page 673: ...igger condition The address corresponding to the trigger is specified in the trigger_addr field of the buffer pointed to by err_data_buffer 0 Instruction memory access The trigger match conditions for this operation type are similar to the IBR address breakpoint match conditions as outlined in Section 7 1 2 Debug Address Breakpoint Match Conditions on page 2 154 1 Data memory access The trigger ma...

Page 674: ...9 Buffer pointed to by err_data_buffer Cache Field Bits Description trigger_addr 63 0 64 bit virtual address to be used by the trigger in the err_struct_info input argument This field is ignored if tiv in err_struct_info is 0 The field is defined similar to the addr field in the debug breakpoint registers as specified in Table 7 1 Debug Breakpoint Register Fields DBR IBR on page 2 153 inj_addr 127...

Page 675: ...be injected This field is valid only when tc_tr is 2 else it is ignored Reserved 31 13 Reserved tiv 32 When 1 indicates that the trigger information fields trigger trigger_pl are valid and should be used for error injection When 0 the trigger information fields are ignored and error injection is performed immediately trigger 36 33 Indicates the operation type to be used as the error trigger condit...

Page 676: ...o by err_data_buffer TLB 63 0 trigger_addr 127 115 64 Reserved vpn 191 152151 133132 128 Reserved rid Table 11 102 Buffer pointed to by err_data_buffer TLB Field Bits Description trigger_addr 63 0 64 bit virtual address to be used by the trigger in the err_struct_info input argument The field is defined similar to the addr field in debug breakpoint registers as specified in Table 7 1 Debug Breakpo...

Page 677: ...egfile_id 128 254 Reserved for future use 255 Any register number When selected the actual register number used for error injection is determined by PAL Reserved 31 13 Reserved tiv 32 When 1 indicates that the trigger information fields trigger trigger_pl are valid and should be used for error injection When 0 the trigger information fields are ignored and error injection is performed immediately ...

Page 678: ...ion for Control register is supported rr 7 Error injection for Region register is supported pkr 8 Error injection for Protection key register is supported dbr 9 Error injection for Data breakpoint register is supported ibr 10 Error injection for Instruction breakpoint register is supported pmc 11 Error injection for Performance monitor control register is supported pmd 12 Error injection for Perfo...

Page 679: ...49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 Reserved Table 11 106 err_struct_info Bus Processor Interconnect Field Bits Description Reserved 63 0 Reserved Figure 11 37 capabilities vector for Bus Processor Interconnect 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 3...

Page 680: ...cedure hw_track 64 bit vector denoting which hardware structures are providing hardware status tracking See Figure 11 38 Reserved 0 Reserved 0 Status Value Description 0 Call completed without error 1 Unimplemented procedure 2 Invalid argument 3 Call completed with error Figure 11 38 Layout of hw_track Return Value 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1...

Page 681: ..._MC_HW_TRACKING The convention for the levels in the hw_track field is such that the least significant bit in the field represents the lowest level of the structures hierarchy For example bit 0 of the ICT field represents the first level instruction cache ...

Page 682: ... immediately prior to issuing an instruction which might generated an expected machine check It should then immediately reset the bit to the not expected state after checking the results of the operation The previous return parameter indicates the previous state of the hardware resource to inform PALE_CHECK of an expected machine check A value of 0 indicates that a machine check was not expected A...

Page 683: ...n state area is not registered The procedure will also return the required size of the min state save area in the req_size return value The layout of the min state save area is defined in Section 11 3 2 4 Processor Min state Save Area Layout on page 2 302 The address passed has a minimum alignment requirement of 512 bytes Argument Description index Index of PAL_MC_REGISTER_MEM within the list of P...

Page 684: ...sor behavior is undefined If the caller is resuming to the same context the new_context argument must be set to 0 and the save_ptr argument has to point to a copy of the min state save area written by PAL when the event occurred If the caller is resuming to a new context the new_context argument must be set to 1 and the save_ptr argument must point to a new min state save area set up by the caller...

Page 685: ...e processor The bit field position corresponds to the numeric memory attribute encoding defined in Section 4 4 Memory Attributes on page 2 75 Argument Description index Index of PAL_MEM_ATTRIB within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_MEM_ATTRIB procedure attrib 8 bit vector of memory attributes implemented by proces...

Page 686: ...l return an invalid argument and provide the minimum size required in the min_size return argument The control_word input argument specifies if this procedure is being used to register the memory buffer or if it is being used to relocate the memory buffer The format of the control_word is shown in Table 11 109 Argument Description index Index of PAL_MEMORY_BUFFER within the list of PAL procedures ...

Page 687: ...ftware is still required to make this call on all logical processors with the same input arguments when relocating the buffer Once the call has been made on all logical processors in the physical package the old memory can be reclaimed Software can choose if it wants this procedure to periodically poll for interrupts during the execution of the procedure If an interrupt is seen the procedure will ...

Page 688: ... an 8 byte aligned 128 byte memory buffer Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_PERF_MON_INFO procedure pm_info Information about the performance monitors implemented Reserved 0 Reserved 0 Status Value Description 0 Call completed without error 2 Invalid argument 3 Call completed with error Figure 11 40 Layout of pm_info Return Value 31 30 29 28 27 26 25 24...

Page 689: ...bstraction Layer 2 441 PAL_PERF_MON_INFO 0x40 256 bit mask defining which registers can count cycles 0x60 256 bit mask defining which registers can count retired bundles Table 11 111 pm_buffer Layout Continued Offset Description ...

Page 690: ...procedure will return an error status The address specified must also not overlay any firmware addresses in the 16 MB region immediately below the 4GB physical address boundary The Interrupt and I O Block pointers should be initialized by firmware before any Inter Processor Interrupt messages or I O Port accesses Otherwise the default block pointer values will be used Some processor implementation...

Page 691: ...llow initialization of the PMI entrypoint only once Under those situations this procedure may be called only once after a boot to initialize the PMI entrypoint register Subsequent calls will return a status of 3 This call must be made before PMI is enabled by SAL Argument Description index Index of PAL_PMI_ENTRYPOINT within the list of PAL procedures SAL_PMI_entry 256 byte aligned physical address...

Page 692: ...fc instructions from both the local processor and from remote processors This procedure when used for transitioning physical memory attributes will ensure that all prefetches that were initiated by the processor to the cacheable limited speculative memory prior to the call will either not be cached have been aborted or are visible to subsequent fc instructions from both the local processor and fro...

Page 693: ...quire this procedure call to be made on remote processors in the sequences this procedure will return a 1 upon successful completion A return value of 0 upon successful completion of this procedure is an indication to software that the processor implementation requires that this call be performed on all processors in the coherence domain to make prefetches visible in the sequences These return cod...

Page 694: ...ific feature sets are currently supported on a particular processor For each valid feature_set this procedure returns which processor features are implemented in the features_avail return argument the current feature setting is in feature_status return argument and the feature controllability in the feature_control return argument Only the processor features which are implemented and controllable ...

Page 695: ...he When 0 the processor performs cast outs on cacheable pages and issues and responds to coherency requests normally When 1 the processor performs a memory access for each reference regardless of cache contents and issues no coherence requests and responds as if the line were not present Cache contents cannot be relied upon when the cache is disabled WARNING Semaphore instructions may not be atomi...

Page 696: ...on Cache Prefetch When 0 the processor may prefetch into the caches any instruction which has not been executed but whose execution is likely When 1 instructions may not be fetched until needed or hinted for execution Prefetch for a hinted branch is allowed even when dynamic instruction cache prefetch is disabled 45 Opt Opt May Disable Dynamic Data Cache Prefetch When 0 the processor may prefetch ...

Page 697: ...red 37 Opt No RO INIT PMI and LINT pins present Denotes the absence of INIT PMI LINT0 and LINT1 pins on the processor When 1 the pins are absent When 0 the pins are present This feature may only be interrogated by PAL_PROC_GET_FEATURES It may not be enabled or disabled by PAL_PROC_SET_FEATURES The corresponding argument is ignored 36 Opt No RO Unimplemented instruction address reported as fault De...

Page 698: ...es which cannot be set will be ignored Argument Description index Index of PAL_PROC_SET_FEATURES within the list of PAL procedures feature_select 64 bit vector denoting desired state of each feature 1 select 0 non select feature_set Feature set to apply changes to See PAL_PROC_GET_FEATURES for more information on feature sets Reserved 0 Return Value Description status Return status of the PAL_PROC...

Page 699: ...e are placed in this P state measured in milliwatts perf_index is a 7 bit field denoting the performance index of this P state relative to the highest available P state P0 This field is enumerated relative to the index of the highest performing P state A value of 100 represents the minimum processor Argument Description index Index of PAL_PSTATE_INFO within the list of PAL procedures pstate_buffer...

Page 700: ...42 for dd_info layout ddt Dependency Domain Type is a 3 bit unsigned integer denoting the type of dependency domains that exist on the processor package The possible values are shown in Table 11 113 See Section 11 6 1 Power Performance States P states on page 2 315 for details of the values in this field ddid Dependency Domain Identifier is a 6 bit unsigned integer denoting this logical processor ...

Page 701: ...ed 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_PTCE_INFO procedure tc_base Unsigned 64 bit integer denoting the beginning address to be used by the first PTCE instruction in the purge loop tc_counts Two unsigned 32 bit integers denoting the loop counts of the outer loop 1 and inner loop 2 purge loops count1 loop 1 is contained in bits 63 32 of the parameter and...

Page 702: ...mation for registers 64 127 Bit 0 is register 64 bit 63 is register 127 Reserved 0 Status Value Description 0 Call completed without error 2 Invalid argument 3 Call completed with error Table 11 114 info_request Return Value info_request Meaning of Return Bit Vector 0 A 0 bit in the return vector indicates that the corresponding Application Register is not implemented a 1 bit in the return vector ...

Page 703: ...he mode is not implemented The bit field encodings are Lazy is the default RSE mode and must be implemented Hardware is not required to implement any of the other modes Argument Description index Index of PAL_RSE_INFO within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_RSE_INFO procedure phys_stacked Number of physical stacked...

Page 704: ...lt the hardware always sets the processor in the performance policy at reset Argument Description index Index of PAL_SET_HW_POLICY within the list of PAL procedures policy Unsigned 64 bit integer specifying the hardware resource sharing policy the caller is setting Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_SET_HW_POLICY procedure Reserved 0 Reserved 0 Reserved ...

Page 705: ...tus is explicitly released by that logical processor This procedure is only supported on processors that have multiple logical processors sharing hardware resources that can be configured On all other processor implementations this procedure will return the Unimplemented procedure return status 2 High priority The processor configures hardware resources to provide the logical processor this proced...

Page 706: ...ed on the nature of the dependencies that exist between the logical processors in the domain In such circumstances the procedure may initiate no transition partial transition or full transition to the requested P state The force_pstate argument may be used for a HCDD when it is necessary to get a deterministic response for the P state transition at the expense of compromising the power performance...

Page 707: ...argument of zero reinstates hardware coordination The force_pstate argument is ignored on SCDD and HIDD logical processors Calling this procedure on some processor implementations may affect P states of other processors in the same dependency domain Please refer to Section 11 6 1 Power Performance States P states on page 2 315 and implementation specific reference manuals for details ...

Page 708: ...latform before entering the shutdown state On receipt of a reset event the logical processor will reset itself and start execution at the PAL reset address All other events will are ignored by the logical processor when in shutdown state Argument Description index Index of PAL_SHUTDOWN within the list of PAL procedures notify_platform 8 byte aligned physical address pointer providing details on ho...

Page 709: ... run during PALE_RESET and do not require external memory to properly execute When information is requested about phase one of the processor self test a memory buffer and alignment argument will be returned as well since these tests may need to save and restore processor state to this memory buffer if executed from the PAL_TEST_PROC procedure Argument Description index Index of PAL_TEST_INFO withi...

Page 710: ...must be greater than or equal in size to the bytes_needed return value from PAL_TEST_INFO otherwise this procedure will return with an invalid argument return value Argument Description index Index of PAL_TEST_PROC within the list of PAL procedures test_address 64 bit physical address of main memory area to be used by processor self test The memory region passed must be cacheable bit 63 must be ze...

Page 711: ...overage and runtime of the processor self tests specified by the test_phase input argument Information about the self test control word can be found in Section 11 2 3 PAL Self test Control Word on page 2 295 and information on if this feature is implemented and the number of bits supported can be obtained by the PAL_TEST_INFO procedure call If this feature is implemented by the processor the calle...

Page 712: ...king this call PAL_TEST_PROC requires that the RSE is set up properly to handle spills and fills to a valid memory location if the contents of the register stack are needed PAL_TEST_PROC requires that the memory buffer passed to it is not shared with other processors running this procedure in the system at the same time PAL_TEST_PROC will use this memory region in a non coherent manner PAL_TEST_PR...

Page 713: ...ersion is not returned by this procedure in the split PAL_A model The version numbers selected for the PAL_A and PAL_B firmware is specific to the PAL_vendor The version numbers selected will always have the property that later versions of firmware will have a higher number than earlier versions of firmware Argument Description index Index of PAL_VERSION within the list of PAL procedures Reserved ...

Page 714: ...For a direct mapped TC num_ways 1 and num_sets num_entries For a fully associative TC num_sets 1 and num_ways num_entries Argument Description index Index of PAL_VM_INFO within the list of PAL procedures tc_level Unsigned 64 bit integer specifying the level in the TLB hierarchy for which information is required This value must be between 0 and one less than the value returned in the vm_info_1 num_...

Page 715: ...age sizes that are supported The insertable_pages returns the page sizes that are supported for TLB insertions and region registers The purge_pages returns the page sizes that are supported for the TLB purge operations Argument Description index Index of PAL_VM_PAGE_SIZE within the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_VM_P...

Page 716: ...ies 1 max_itr_entry Unsigned 8 bit integer denoting the maximum instruction translation register index number of itr entries 1 num_unique_tcs Unsigned 8 bit integer denoting the number of unique TCs implemented This is a maximum of 2 num_tc_levels num_tc_levels Unsigned 8 bit integer denoting the number of TC levels The vm_info_2 return is an 8 byte quantity in the following format Argument Descri...

Page 717: ...oting the maximum number of concurrent outstanding TLB purges allowed by the processor A value of 0 indicates one outstanding purge allowed A value of 216 1 indicates no limit on outstanding purges All other values indicate the actual number of concurrent outstanding purges allowed Figure 11 49 Layout of vm_info_2 Return Value 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7...

Page 718: ...rty bit is valid mv denotes that the memory attributes are valid A value of 1 denotes a valid field A value of 0 denotes an invalid field Any value returned in an invalid field must be ignored The tr_buffer parameter should be aligned on an 8 byte boundary Note This procedure may have the side effect of flushing all the translation cache entries depending on the implementation Argument Description...

Page 719: ...nal virtualization intercept handler If a non zero value is specified all virtualization intercepts are delivered to this handler If a zero value is specified all virtualization intercepts are delivered to the Virtualization vector in the host IVT If the VMM relocates the IVT specified by the host_iva parameter and or the virtualization intercept handler specified by the opt_handler parameter afte...

Page 720: ...er PAL_VP_CREATE This procedure returns unimplemented procedure when virtual machine features are disabled See Section 3 4 Processor Virtualization on page 2 44 and PAL_PROC_GET_FEATURES Get Processor Dependent Features 17 on page 2 446 for details ...

Page 721: ...hin the list of PAL procedures Reserved 0 Reserved 0 Reserved 0 Return Value Description status Return status of the PAL_VP_ENV_INFO procedure buffer_size Unsigned integer denoting the number of bytes required by the PAL virtual environment buffer during PAL_VP_INIT_ENV vp_env_info 64 bit vector of virtual environment information See Table 11 118 for details Reserved 0 Status Value Description 0 C...

Page 722: ...f 0 this optimization is not supported See Section 11 7 4 2 9 Test Feature Optimization on page 2 345 for details ic_um 34 If 1 guest interruption collection and user mask optimization is supported If 0 this optimization is not supported See Section 11 7 4 2 10 Interruption Collection and User Mask Optimization on page 2 345 for details Reserved 63 35 Reserved a Architecturally an implementation w...

Page 723: ...vironment to complete the procedure before freeing the memory resource allocated to the virtual environment This procedure returns unimplemented procedure when virtual machine features are disabled See Section 3 4 Processor Virtualization on page 2 44 and PAL_PROC_GET_FEATURES Get Processor Dependent Features 17 on page 2 446 for details Argument Description index Index of PAL_VP_EXIT_ENV within t...

Page 724: ... in the memory buffer pointed to by vp_buffer as needed Details for a given implementation specific feature_set of whether information is returned in the buffer the size of the buffer and the representation of this information in the buffer and in vp_info are described in VMM specific documentation Architected feature_set 0 vmm_id 0 index 0 is defined and required to be implemented if this procedu...

Page 725: ...ume 2 Part 1 Processor Abstraction Layer 2 477 PAL_VP_INFO get the vmm_id although vmm_id is also returned for any other implemented feature sets as well For feature_set 0 the vp_buffer argument is ignored ...

Page 726: ...ation service calls and the VMM must be prepared to handle DTLB faults during any PAL virtualization procedure calls Table 11 119 shows the layout of the config_options parameter The config_options parameter configures the global configuration options and global virtualization optimizations for all the logical processors in the virtual environment All logical Argument Description index Index of PA...

Page 727: ...3 2 Processor Status Register Fields on page 2 24 The VMM must have DCR pp equal to 0 when the fr_pmc option is 1 whenever the IVA control register on the logical processor is set to point to the per virtual processor host IVT See Section 11 7 2 Interruption Handling in a Virtual Environment on page 2 331 and Table 11 21 IVA Settings after PAL Virtualiza tion related Procedures and Services on pag...

Page 728: ...zations opcode 8 This bit must be set to 1 opcode information will be provided to the VMM during PAL intercepts within the virtual environment This opcode may or may not be guaranteed to be the opcode that triggered the intercept See Table 11 118 vp_env_info Virtual Environment Infor mation Parameter on page 2 473 for details This procedure returns an error if this bit is not set to 1 cause 9 If 1...

Page 729: ...or each virtual processor The opt_handler specifies an optional virtualization intercept handler If a non zero value is specified all virtualization intercepts are delivered to this handler If a zero value is specified all virtualization intercepts are delivered to the Virtualization vector in the host IVT Upon completion of this procedure the VMM must not relocate the IVT specified by the host_iv...

Page 730: ...sor Specify a different optional virtualization intercept handler for the virtual processor This procedure returns unimplemented procedure when virtual machine features are disabled See Section 3 4 Processor Virtualization on page 2 44 and PAL_PROC_GET_FEATURES Get Processor Dependent Features 17 on page 2 446 for details ...

Page 731: ...rol back to the VMM This procedure performs an implicit PAL_VPS_SYNC_WRITE there is no need for the VMM to invoke PAL_VPS_SYNC_WRITE unless the VPD values are modified before resuming the virtual processor After the procedure the caller is responsible for restoring all of the architectural state before resuming to the new virtual processor through PAL_VPS_RESUME_NORMAL or PAL_VPS_RESUME_HANDLER Up...

Page 732: ...urces before this procedure Upon completion of this procedure the IVA based interruptions will continue to be delivered to the host IVT associated with this virtual processor After this procedure the VMM can setup the IVA control register to use a different host IVT This procedure returns unimplemented procedure when virtual machine features are disabled See Section 3 4 Processor Virtualization on...

Page 733: ...ion of the virtual machine are freed Upon successful execution of PAL_VP_TERMINATE procedure and if the iva parameter is non zero the IVA control register will contain the value from the iva parameter This procedure returns unimplemented procedure when virtual machine features are disabled See Section 3 4 Processor Virtualization on page 2 44 and PAL_PROC_GET_FEATURES Get Processor Dependent Featu...

Page 734: ...is section describes the required parameters applicable to all PAL Virtualization Services Additional parameters are listed in the description section of specific PAL Virtualization Services Architectural state not listed in this section is managed by the VMM and can contain both VMM and or virtual processor state The architectural state not listed is unchanged by PAL virtualization services The s...

Page 735: ...ith PSR ic equal to 0 c i interrupt enable 0 pk protection key validation enable dt data address translation enable 1 dfl disabled FP register f2 to f31 dfh disabled FP register f32 to f127 sp secure performance monitors pp privileged performance monitor enable di disable ISA transition si secure interval timer db debug breakpoint fault enable 0 lp lower privilege transfer trap enable tb taken bra...

Page 736: ...ations for each of the PAL Virtualization Services c Specific PAL services can be invoked with PSR ic equal to 1 or 0 See the description of specific PAL services for details d Most PAL services can be invoked with PSR bn equal to 1 or 0 e Specific PAL services must be invoked with PSR bn equal to 0 See the description of specific PAL services for details ...

Page 737: ...ent on the state of vpsr ic Argument Description GR24 VBR0 GR25 64 bit host virtual pointer to the Virtual Processor Descriptor VPD GR26 Reserved GR27 Reserved GR28 Reserved GR29 Reserved GR30 Reserved GR31 Reserved Table 11 122 Virtual Processor Settings in Architectural Resources for PAL_VPS_RESUME_NORMAL and PAL_VPS_RESUME_HANDLER Resource Description Bank 1 GRs Contains state of bank 0 1 GRs o...

Page 738: ... Otherwise the performance monitor configuration registers are virtualized by the VMM and contain VMM state Performance Monitor Data Registers Contain the state of the virtual processor a Interval Timer Offset register is not supported on all processor implementations See Section 3 3 4 4 Interval Timer Offset ITO CR4 on page 2 34 for details Table 11 123 Processor Status Register Settings for Virt...

Page 739: ...NV was 1 Resume the virtual processor cpl 33 32 Contains the cpl field of the virtual processor is 34 VMM specific mc 35 VMM specific it 36 Must be 1 id 37 VMM specific da 38 VMM specific dd 39 VMM specific ss 40 VMM specific ri 42 41 Contains the ri field of the virtual processor ed 43 Contains the ed bit of the virtual processor bn 44 Must be 1 ia 45 VMM specific vm 46 Must be 1 rv 63 47 Reserve...

Page 740: ...for setting up all the required virtual processor state in the architectural registers as well as in the VPD prior to invoking this service See Table 11 122 Virtual Processor Settings in Architectural Resources for PAL_VPS_RESUME_NORMAL and PAL_VPS_RESUME_HANDLER on page 2 489 for details PAL_VPS_RESUME_HANDLER must be called with PSR bn equal to 0 PAL_VPS_RESUME_HANDLER performs the following act...

Page 741: ... accelerations that are disabled the corresponding resources in the VPD are unchanged The synchronization requirements of the related resources for each acceleration are described in the corresponding sections for each acceleration in Section 11 7 4 2 Virtualization Accelerations on page 2 337 PAL_VPS_SYNC_READ performs the following actions Copy implementation specific control resources of the en...

Page 742: ...re disabled the corresponding resources in the VPD are ignored The synchronization requirements of the related resources for each acceleration are described in the corresponding sections for each acceleration in Section 11 7 4 2 Virtualization Accelerations on page 2 337 PAL_VPS_SYNC_WRITE performs the following actions Copy values of the enabled accelerations in the VPD into implementation specif...

Page 743: ...ual return address GR25 64 bit host virtual pointer to the Virtual Processor Descriptor VPD GR26 Reserved GR27 Reserved GR28 Reserved GR29 Reserved GR30 Reserved GR31 Reserved Return Value Description GR24 Scratch GR25 Scratch GR26 Scratch GR27 Scratch GR28 Scratch GR29 Scratch GR30 Scratch GR31 Scratch Table 11 124 vhpi Virtual Highest Priority Pending Interrupt Value Description 0 Nothing pendin...

Page 744: ...PS_SET_PENDING_INTERRUPT PAL_VPS_SET_PENDING_INTERRUPT performs the following actions Copy the virtual highest priority pending interrupt from the VPD into implementation specific resources Return to VMM by an indirect branch specified in the GR24 parameter ...

Page 745: ...on page 2 35 the vf field is ignored by the service PAL_VPS_THASH returns the same long format VHPT entry address given the same input arguments across different implementations The long format VHPT entry address returned may not be the same as the long format VHPT entry address generated by the thash instruction of the processor PAL_VPS_THASH can be called with PSR ic equal to 1 or 0 Argument Des...

Page 746: ...nored by the service PAL_VPS_TTAG returns the same tag value given the same input arguments across different implementations The tag value returned may not be the same as the tag value generated by the ttag instruction of the processor PAL_VPS_TTAG can be called with PSR ic equal to 1 or 0 Argument Description GR24 64 bit host virtual return address GR25 64 bit virtual address used to compute the ...

Page 747: ...om the VPD and returns control back to the VMM If GR26 is zero this service performs an implicit PAL_VPS_SYNC_WRITE there is no need for the VMM to invoke PAL_VPS_SYNC_WRITE to synchronize the implementation specific control resources before this service If GR26 is one 0x1 no implicit synchronization will be performed by this service Upon completion of this service the IVA based interruptions will...

Page 748: ...licit synchronization will be performed by this service Upon completion of this service the IVA based interruptions will continue to be delivered to the host IVT associated with this virtual processor After this service the VMM can setup the IVA control register to use a different host IVT This service does not save any PAL procedure implementation specific state1 The caller of this service is res...

Page 749: ...2 501 Intel Itanium Architecture Software Developer s Manual Rev 2 3 Part II System Programmer s Guide ...

Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 751: ...ure neutral i e no assumptions are made about platform architecture capabilities such as busses chipsets or I O devices 1 1 Overview of the System Programmer s Guide The Itanium architecture provides numerous performance enhancing features of interest to the system programmer Many of these instruction set features focus on reducing overhead in common situations The chapters outlined below discuss ...

Page 752: ... chapter is useful for operating system developers Chapter 8 Floating point System Software discusses how processors based on the Itanium architecture handle floating point numeric exceptions and how the Itanium architecture based software stack provides complete IEEE 754 compliance This includes a discussion of the floating point software assist firmware the FP SWA EFI driver This chapter also de...

Page 753: ...tion error logging as well as fault containment capabilities This chapter is of interest to platform firmware and operating system developers 1 2 Related Documents The following documents are referred to fairly often in this document For more details on software conventions and platform firmware please consult these manuals available at http developer intel com SWC Intel Itanium Software Conventio...

Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...

Page 755: ...ordering relationships between memory accesses As Section 4 4 7 Memory Access Ordering on page 1 73 describes memory operations in the Itanium architecture come with one of four semantics unordered acquire release or fence Section 2 2 on page 2 510 describes how the memory ordering model uses these semantics to indicate how memory operations can be ordered with respect to each other Section 2 1 1 ...

Page 756: ... 4 7 Memory Access Ordering on page 1 73 and Section 4 4 7 Sequentiality Attribute and Ordering on page 2 82 discuss ordering orderable instructions and visibility in greater depth Section 2 2 on page 2 510 describes how the ordering semantics affect the Itanium memory ordering model 2 1 2 Loads and Stores In the Itanium architecture a load instruction has either unordered or acquire semantics whi...

Page 757: ...tecture based code does work with the following restrictions Itanium architecture based code can only manipulate an IA 32 semaphore if the IA 32 semaphore is aligned Itanium architecture based code can only manipulate an IA 32 semaphore if the IA 32 semaphore is allocated in write back cacheable memory An Itanium architecture based operating system can emulate IA 32 uncacheable or misaligned semap...

Page 758: ...5 Four factors determine how a processor or system based on the Itanium architecture orders a group of memory operations with respect to each other Data dependencies define the relationship between operations from the same processor that have register or memory dependencies on the same address1 This relationship need only be honored by the local processor i e the processor that executes the operat...

Page 759: ... an algorithm that does not require any particular ordering of A and B Although it is always safe to enforce stricter ordering constraints than an algorithm requires doing so may lead to lower performance If the ordering of memory operations is not important software should use unordered ordering semantics whenever possible for best possible performance This section presents multiprocessor executi...

Page 760: ... 2 1 expresses the Itanium ordering semantics from Section 2 1 1 Memory Ordering of Cacheable Memory References on page 2 507 and also Section 4 4 7 Memory Access Ordering on page 1 73 There are no implications regarding the ordering of the visibility for the following pairs of operations a release followed by an unordered operation a release followed by an acquire an unordered operation followed ...

Page 761: ...assumption that the outcome is r1 1 and r2 0 together imply that This contradicts the postulated outcome r1 1 and r2 0 and thus the Itanium ordering model disallows the r1 1 and r2 0 outcome In operational terms if Processor 1 observes M2 the release store to y i e r1 is 1 it must have also observed M1 the unordered store to x i e r2 is 1 as well given the ordering constraints Therefore the Itaniu...

Page 762: ...hat enforce either or 2 2 1 5 Preventing Loads from Passing Stores to Different Locations The only way to prevent the loads from moving ahead of the stores in the Table 2 3 execution is to separate them with a memory fence as the execution in Table 2 4 illustrates The Itanium memory ordering model only disallows the outcome r1 0 and r2 0 in this execution The memory fences on Processor 0 and Proce...

Page 763: ...register and memory dependencies between the instructions on Processor 0 these operations complete in program order on Processor 0 and also become locally visible in this order However the operations need not be made visible to remote processors in program order In this outcome it appears to Processor 0 as if while to Processor 1 it appears that There are two things to note here First the behavior...

Page 764: ...for M3 can avoid this outcome as doing so forces and thus prevents the outcome However this use of acquire is non intuitive given the RAW dependency through register r1 between M3 and M4 That is M3 produces a value that M4 requires in order to execute so how should it be possible for them to go out of order Further using an acquire in this case prevents any memory operation following M3 from movin...

Page 765: ...es through the compare C1 Thus The execution in Table 2 9 is a variation on the execution from Table 2 8 where the loads are truly independent In this execution there is no dependency between M3 and M4 and thus there are no constraints on the relative ordering of M3 and M4 Like the execution in Table 2 8 M4 is data dependent on the value of p2 that the branch B1 uses to resolve However p2 is indep...

Page 766: ... of M1 and M2 or M3 nor on the relative ordering of M4 and M5 or M6 Remember that both dependencies and the memory ordering model place requirements on the manner in which a processor based on the Itanium architecture may re order accesses Even though the Itanium memory ordering model allows loads to pass stores a processor based on the Itanium architecture cannot re order the following sequence s...

Page 767: ...th M2 and M3 or of operation M4 with M5 and M6 as the local visibilities meet the local ordering constraints that the dependencies impose The code in Table 2 10 and these constraints together imply that Thus the outcome r1 1 r3 1 r2 0 and r4 0 is allowed because these statements are consistent with our definition of local and global visibility Specifically a value becomes locally visible before it...

Page 768: ...e early M3 would see M1 before the fence semantics for M2 were met namely that M1 be visible before M2 and thus M3 Without local and global visibility of M1 and M5 the ordering constraints are as this example originally postulated The code in Table 2 11 and these constraints together imply that This contradicts the r1 1 r3 1 r2 0 and r4 0 outcome The visibility of the memory fence M2 implies that ...

Page 769: ...nd thus the outcome r1 1 r3 1 r2 0 and r4 0 is not allowed Because M1 and M4 cannot become locally visible to M2 and M5 before they become globally visible to M6 and M3 as read accesses from semaphores may not bypass from store buffers or other logically equivalent structures it is not possible to avoid this contradiction The Itanium architecture also prohibits local bypass from a semaphore operat...

Page 770: ...r1 1 r3 1 r2 0 and r4 0 in this execution By the definition of the Itanium memory ordering semantics The Itanium memory ordering model does not permit the r1 1 r3 1 r2 0 and r4 0 outcome as this would require that Processors 1 and 3 observe the release stores to x and y in different orders Specifically assuming that the outcome is r1 1 r3 1 r2 0 and r4 0 The final two statements are inconsistent s...

Page 771: ...phore operation then the two operations will become visible to all processors in the causality order Table 2 1 illustrates this behavior Suppose that M2 reads the value written by M1 In this case there is a causal relationship from M1 to M3 a control dependency could also establish such a relationship The fact that the store to x is a release store implies that since there is a causal relationship...

Page 772: ...the accesses Other observers e g processors or other peripheral domains need not see references to UC or UCE memory in sequential order if at all When multiple agents are writing to the same device it is up to software to synchronize the accesses to the device to ensure the proper interleaving The ordering semantics of an access to sequential memory determines how the access becomes visible to the...

Page 773: ...erformance issues many memory ordering models have been developed that relax the constraints of sequential consistency Adve categorizes these memory models by noting how they relax the ordering requirements between reads and writes and if they allow writes to be read early AG95 The Itanium architecture allows for relaxed ordering between reads and writes and also allows writes to be read early und...

Page 774: ...coherence domain A handler that expects that all memory operations that precede the interruption to be visible must enforce this requirement by executing a memory fence at the beginning of the handler 2 4 Synchronization Code Examples There are many synchronization primitives that software uses in multiprocessor or multi threaded environments to coordinate the activities of different code streams ...

Page 775: ...ls and the lock is highly contested the platform may have to generate a number of read for ownership transactions causing lock to move around the system Using the first ld8 cmp br loop avoids this problem by obtaining lock in a shared state In the worst case when lock is not contested this loop adds only the overhead of the additional compare and branch The initial ld8 need not be an acquire load ...

Page 776: ... barrier The last processor to arrive at the barrier provides this signal The signal to leave the barrier is deduced from the value of the release shared variable and the local_sense local variable Upon entering the barrier each processor complements the value in its private local_sense variable Once in the barrier all processors always have the same value in their local_sense variables This varia...

Page 777: ...n a loop Although there is an array of per processor flag variables the code uses flag_me and flag_you to indicate to the flag variables for the processor attempting to obtain the resource and the other remote processor respectively Dekker s algorithm assumes a sequential consistency ordering model Specifically it assumes that loading zero from flag_you implies that a processor s load and stores t...

Page 778: ...ce because of the data dependency from the load y to the compare and branch If the processor reaches the store y in the second sequence the load of y from the first sequence must be visible Likewise it is not possible for memory operations in the critical section to move ahead of the final load x because of the data dependency between this load and the compare and branch that guards the critical s...

Page 779: ...d8 r1 proc_id r1 unique process id start st8 ptr_b_id r1 b id true st8 x r1 x process id mf MUST fence here ld8 r2 y cmp ne p1 p0 0 r2 if y 0 then p1 st8 ptr_b_id r0 b id false p1 br cond sptk wait_y wait until y 0 st8 y r1 y process id mf MUST fence here ld8 r3 x cmp eq p1 p0 r1 r3 if x id then p1 br cond sptk cs_begin enter critical section st8 ptr_b_id r0 b id false ld8 r3 ptr_b_1 r3 b 1 mov ar...

Page 780: ... architecture does not require instruction caches to be coherent with data stores for Itanium architecture based code Next the sync i ensures that the code update is visible to the instruction stream of the local processor and orders the cache flush with respect to subsequent operations by waiting for the prior fc i instructions to be made visible Finally the srlz i instruction forces the pipeline...

Page 781: ...ensure that the branch to new_code is updated atom ically If an 8 byte store is used to update the branch then the programmer needs to ensure that the branch to new_code is either in the first or last slot of the bundle The release store ensures a processor cannot see the new branch at address x and the original code at address new_code That is if a processor encounters branch new_code at address ...

Page 782: ...s only possible with an IP relative branch because in this type of branch the target address is part of the instruction 2 5 3 Programmed I O Programmed I O requires that the CPU copy data from the device controller to main memory using load instructions to read from the device and store instructions to write data into cacheable memory page in To ensure correct operation Itanium architecture based ...

Page 783: ...py of some of the updated code is not present in the pipeline This can be accomplished by explicitly executing a srlz i before executing the updated code or by forcing an event that re initiates any code fetches performed after the fc i is observed to occur such as an interruption or rfi Several optimizations to this code are possible depending on how software uses the updated code Specifically th...

Page 784: ...erency This behavior allows an operating system to page code pages without taking explicit actions to ensure coherency Software must maintain coherency for DMA traffic through explicit action if the platform does not maintain coherency for this traffic Software can provide coherency by using the flush cache instruction fc to invalidate the instruction and data cache lines that a DMA transfer modif...

Page 785: ...current instruction can be executed For example if the current instruction misses the TLBs on a data reference a Data TLB Miss fault may be delivered by the processor Faults are delivered precisely on the instruction that caused the fault The faulting instruction and all subsequent instructions do not update any architectural state with the possible exception of subsequent instructions which viola...

Page 786: ...system 3 2 Interruption Vector Table The Interruption Vector Address IVA control register defines the base address of the interruption vector table IVT Each IVA based interruption has its own architected offset into this table as defined in Section 5 7 IVA based Interruption Vectors on page 2 113 For the remainder of this section interruption refers to an IVA based interruption unless otherwise no...

Page 787: ...al interrupts and interrupt state collection respectively PMI delivery is also disabled while PSR ic is 0 other PAL based interruptions can be delivered at any point during the execution of the interruption handler regardless of the state of PSR i and PSR ic In addition to clearing the PSR i and PSR ic bits the processor also automatically clears the PSR bn bit when an interruption is delivered sw...

Page 788: ...were touched See Section 4 2 2 Preservation of Floating point State in the OS on page 2 553 for details mfl mfh unchanged pp DCR pp Privileged Monitoring is determined by pp bit in DCR register By default user counters are enabled and performance monitors are unsecured in handlers See Chapter 12 Performance Monitoring Support for details up unchanged sp 0 di 0 Instruction set transitions are not i...

Page 789: ...o addressing materialize the default page size and permission key for the region to which the faulting address belongs in this register IIPA Contains the instruction bundle address of the last instruction to retire successfully while PSR ic was 1 In conjunction with ISR ei IIPA can be used by software to locate the instruction that caused a trap or that was executed successfully prior to a fault o...

Page 790: ... performs an instruction and data serialization on all in flight resources As described in Section 3 3 1 and Section 3 3 2 above the following resources determine the execution environment of the interruption handler CR IVA determines new IP CR DCR be determines new value of PSR be CR DCR pp determines new value of PSR pp PSR ic determines whether interruption collection is enabled RR 7 0 determin...

Page 791: ...s need to distinguish the following interruption handler types Lightweight interruptions Lightweight interruption handlers are allocated 1024 bytes 192 instructions per handler in the IVT These are discussed in Section 3 4 1 Heavyweight interruptions Heavyweight interruption handlers are allocated only 256 bytes 48 instructions per handler in the IVT These are discussed in Section 3 4 2 Nested int...

Page 792: ...dler For some heavyweight interruptions e g Data Debug fault these handlers are typically written in a high level programming language for others e g emulation handlers the interruption can be handled efficiently in Itanium architecture based assembly code The sequence given below illustrates the steps that an Itanium architecture based heavyweight handler needs to perform to save the interrupted ...

Page 793: ... however step 6 in that sequence can be omitted In either case the interrupted register stack and RSE state RSC PFS IFS BSPSTORE RNAT and BSP needs to be preserved and should be saved either to the trap frame on the kernel memory stack or to a newly allocated register stack frame 6 Switch banked register to bank one and re enable interruption collection as follows ssm 0x2000 Set PSR ic bsw 1 Switc...

Page 794: ...egister stack and RSE state by following the steps outlined in Section 6 11 2 Return to Interrupted Context on page 2 148 17 Restore interrupted context s interruption state e g IIP IPSR IFS from the trap frame on the kernel memory stack 18 Restore interrupted context s memory stack pointer and predicate registers from the trap frame on the kernel memory stack This step essentially deallocates the...

Page 795: ... does not apply to Data Nested TLB faults When a nested interruption occurs the processor will update ISR as defined in Chapter 8 Interruption Vector Descriptions and it will set the ISR ni bit to 1 A value of 1 in ISR ni is the only indication to an interruption handler that a nested interruption has occurred Since all other interruption registers are not updated there is generally no way for the...

Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...

Page 797: ...ways 1 0 Special Use Registers GR1 GR12 and GR13 have special uses Additional architectural register usage conventions apply to GR16 31 in register bank 0 which are used by low level interrupt handlers and by processor firmware For details refer to Section 3 3 1 Itanium general registers and floating point registers contain three state components their register value their control speculative NaT ...

Page 798: ... is restored This can be an issue particularly for user context data structures that may be moved around in memory e g a setjmp jump buffer Unlike the st8 spill and ld8 fill instructions the register stack engine RSE preserves not only register values and register NaT bits but it also manages the stacked register s ALAT state by invalidating ALAT that could be reused by software when the physical ...

Page 799: ...a large register set 128 general registers and 128 floating point registers account for approximately 1 KByte and 2 KBytes of state respectively The architecture provides a variety of mechanisms to reduce the amount of state preservation that is needed on commonly executed code paths such as system calls and high frequency exceptions such as TLB miss handlers Additionally Itanium architecture base...

Page 800: ...to flush all stacked registers to memory This allows such kernel entry points to switch from the user s to the kernel s backing store without causing any memory traffic as described in the next section 4 2 1 Preservation of Stacked Registers in the OS A switch from a thread of execution into the operating system kernel whether on behalf of an involuntary interruption or a voluntary system call req...

Page 801: ...hould have paired all function calls br call instructions with function returns br ret instructions or manually manipulated the kernel backing store pointer so that all kernel contents have been removed from the kernel backing store prior to the loadrs After loading the stacked registers the kernel can switch to the backing store of the interrupted frame This code sequence is described in Section ...

Page 802: ...cratch set is also managed here if the context switch was occasioned by an involuntary interruption e g timer interrupt which did not already spill the higher set Use of the modified bits by the OS to determine if the appropriate register set is dirty with previously unsaved data can help avoid needless spills and fills The modified bits are intentionally accessible through the user mask so that a...

Page 803: ...tion 4 4 2 Regardless of whether the epc or break instruction are used an Itanium architecture based operating system needs to check the integrity of system call parameters In addition to traditional integrity checking of the passed parameter values the system call handler should inspect system call parameters for set NaT bits as described in Section 4 4 3 4 4 1 epc Demoting Branch Return To execu...

Page 804: ...rearrange its input parameters to map to the register stack starting at GR33 This register jostling can be avoided by passing the system call number through a scratch static general register or by using the break immediate itself Additionally the system call can utilize all eight input registers of the register stack for system call parameters 4 4 3 NaT Checking for NaTs in System Calls In additio...

Page 805: ...s 1 Stop RSE by setting RSC mode bits to zero 2 Read current BSPSTORE referred to as current_bspstore further down 3 Find setjmp s RNAT collection rnat_value a Compute the backing store location of setjmp s RNAT collection as follows rnat_collection_address 63 0 setjmp_bsp 63 0 0x1F8 The RNAT location is computed by setting bits 8 3 of setjmp s BSP to all ones This is where setjmp s RNAT collectio...

Page 806: ...tween different threads in the same address space the following steps are required 1 Application architecture state associated with each thread GRs FRs PRs BRs ARs are saved and restores as if this were a user level coroutine This is described in Section 4 5 1 2 2 Memory Ordering to preserve correct memory ordering semantics the context switch routine needs to fence all memory references and flush...

Page 807: ...t Management 2 559 5 Restore the default control register DCR of the inbound context if the DCR is maintained on a per process basis 6 Restore the contents of the protection key registers associated with the inbound context ...

Page 808: ...2 560 Volume 2 Part 2 Context Management ...

Page 809: ...or each of the eight regions there is a corresponding region register RR which contains a RID for that region The operating system is responsible for managing the contents of the region registers RIDs are between 18 and 24 bits wide depending on the processor implementation This allows an Itanium architecture based operating system to uniquely address up to 224 address spaces each of which can be ...

Page 810: ... by the recycled RID Some processor implementations support an efficient region wide purge page size such that this can be accomplished with a single ptc ga operation The frequency of these global TLB flushes can be reduced by using a RID allocation strategy that maximizes the time between use and reuse of a RID For example RIDs could be assigned by using a counter that is as wide as the number of...

Page 811: ...ID to which the target location belongs into a different region register and then performing the copy from source to target directly For example assume a MAS OS wishes to copy and 8 byte buffer from virtual address 0x0000000000A00000 of the currently executing process process A to virtual address 0x0000000000A00000 of another process process B movl r2 2 61 mov r3 process_b_rid movl r4 0x0000000000...

Page 812: ...key even if the data TLB access rights permit the read xd execute disable When 1 execute permission is denied to translations which match this protection key even if the instruction TLB access rights give execute permission key protection key An 18 to 24 bit depending on the processor implementation unique key which tags a translation to a particular protection domain When protection key checking ...

Page 813: ...ed into between 218 and 224 61 bit regions up to eight of which may be accessed concurrently Note that there is no SAS OS or MAS OS mode in the Itanium architecture The processor behavior is the same regardless of the address space model used by the OS The difference between a SAS OS and a MAS OS is one of OS policy specifically how the RIDs and protection keys are managed by the OS and whether di...

Page 814: ...le kernel memory areas Two address translations are said to overlap when one or more virtual addresses are mapped by both translations Software must ensure that translations in an instruction TR never overlap other instruction TR or TC translations likewise software must ensure that translations in a data TR never overlap other data TR or TC translations If an overlap is created the processor will...

Page 815: ...egister 1 2 Place the address range in bytes of the purge into bits 7 2 of a second general register 3 Using these two GRs execute the ptr d or ptr i instruction A data or instruction serialization operation must be performed after the purge for ptr d or ptr i respectively before the translation is guaranteed to be purged from the processor s TLBs Note The TR purge instruction operates independent...

Page 816: ...ion must be performed after the insert for itc d or itc i respectively before the inserted translation can be referenced Instruction TC inserts always purge overlapping instruction TCs and may purge overlapping data TCs Likewise data TC inserts always purge overlapping data TCs and may purge overlapping instruction TCs 5 2 2 2 TC Purge There are several types of TC purge instructions Unlike the ot...

Page 817: ..._PTCE_INFO PAL routine at boot time to determine the parameters needed to use the ptc e instruction Specifically PAL_PTCE_INFO returns tc_base an unsigned 64 bit integer denoting the beginning address to be used by the first ptc e instruction in the purge loop tc_counts two unsigned 32 bit integers packed into a 64 bit parameter denoting the loop counts of the outer and inner purge loops count1 ou...

Page 818: ...n program execution software must use another release operation or a memory fence To purge a translation from all TLBs in the coherence domain software performs the following steps 1 Acquire the semaphore 2 Place the base virtual address of the translation to be purged into a general register 3 Place the address range in bytes of the purge into bits 7 2 of a second general register 4 Using these t...

Page 819: ...mains must use a platform specific method for maintaining TLB coherence across coherence domains 5 3 Virtual Hash Page Table The Itanium architecture defines a data structure that allows for the insertion of TLB entries by a hardware mechanism The data structure is called the virtual hash page table VHPT and the hardware mechanism is called the VHPT walker Unlike the IA 32 page tables the Itanium ...

Page 820: ...ical data structure and the last level of the hierarchy is a linear list of translations the VHPT can be mapped directly onto the page table as shown in Figure 5 1 If the VHPT walker tries to access a location in the VHPT for which no translation is present in the TLB a VHPT Translation fault is raised The original address for which the VHPT walker was trying to find an entry in the VHPT is suppli...

Page 821: ... purge has release semantics prior modifications to the VHPT will be visible to operations that occur after the TLB purge operation Atomic updates to short format VHPT entries can easily be done through 8 byte stores For atomic updates of long format VHPT entries the ti flag in bit 63 of the tag field can be utilized as follows Set the ti bit to 1 Issue a memory fence Update the entry Clear the ti...

Page 822: ...pared to handle a nested TLB fault when performing this load 3 Using the general register from step 2 that holds the contents of the VHPT entry perform a TC insert itc i for instruction faults itc d for data faults 4 In an MP environment reload the VHPT entry from step 2 into a third general register and compare the value to the one loaded in step 2 If the values are not the same then the VHPT has...

Page 823: ...at a minimum to handle the fault 1 Move the IHA register into a general register 2 Perform a thash instruction using the general register from step 1 This will produce in the target register the VHPT address of the VHPT entry that maps the VHPT entry corresponding to the original faulting address i e the address in IFA 3 Using the target general register of the thash from step 2 as the load addres...

Page 824: ... virtual address contained in IFA The access key field is set to the region ID from the RR corresponding to the faulting address The page size field is set to the preferred page size RR ps from the RR corresponding to the faulting address IFA the virtual address of the bundle for instruction faults or data reference for data faults which missed the TLB The OS needs to lookup the PTE for the faulti...

Page 825: ...he TLB using itc i rx for instruction pages and itc d rx for data pages or Step over the instruction data access bit fault by setting the IPSR ia or IPSR da bits prior to performing an rfi 5 4 7 Page Not Present Vector Forward the fault to the operating system s virtual memory subsystem 5 4 8 Data Instruction Access Rights Vector Forward the fault to the operating system s virtual memory subsystem...

Page 826: ... only has to insert the PTE for the current fault into the TLB but also the PTEs for up to seven faults that occurred before if these faults originate from the same IA 32 instruction This can be accomplished by maintaining a buffer for the most recent faulting IIP and for the parameters of up to 7 TLB insertions If a TLB fault occurs while executing in IA 32 mode and the IIP matches the most recen...

Page 827: ...Deferral of Control Speculative Loads Exceptions that occur on control speculative loads ld s or ld sa can be handled by the operating system in different ways The operating system can configure a processor based on the Itanium architecture in three ways Hardware Only Deferral automatic hardware deferral of all control speculative exceptions In this case the processor hardware will always defer ex...

Page 828: ...ve loads that have been hoisted from not taken paths and therefore are not needed As a result software handling of control speculative exceptions is recommended only for statistically infrequent light weight fault handlers such as TLB miss or protection key miss handlers If while handling the exception the operating system determines that this instance of the exception may require too much effort ...

Page 829: ...eption handler Furthermore a set ISR sp bit indicates that an exception was caused by a control speculative load 6 3 Speculation Related Exception Handlers 6 3 1 Unaligned Handler Misaligned control and data speculative loads as well as architectural loads are not required to be handled by the processor As a result the operating system s unaligned reference handler has to be prepared to emulate su...

Page 830: ...tion 6 Emulate the memory read of the load instruction by updating the target register as follows a Validate that emulated code has the access rights to the target memory location at the privilege level that it was running prior to taking the alignment fault The regular_form probe instruction can be used on the first and the last byte of the unaligned memory reference If both probes succeed the me...

Page 831: ...tion code in the Itanium architecture 7 1 Unaligned Reference Handler Misaligned memory references that are not supported by the processor cause Unaligned Reference Faults This behavior is implementation specific but typically occurs in cases where the access crosses a cache line or page boundary In cases where the operating system chooses to emulate misaligned operations some special cases need t...

Page 832: ...avior of the instruction taking the appropriate faults if necessary If the instruction is the base register update form update the appropriate base register A number of these steps may require the use of self modifying code to patch instructions with the appropriate operands for example the target register of the inval e must be patched to the destination register of the ldfe or stfe See Section 2...

Page 833: ...tectural load the processor must first re enable PSR ic to be able to handle potential TLB misses when reading the opcode from memory In other words this becomes a heavyweight handler For details see Section 3 4 2 Heavyweight Interruptions on page 2 544 Once the opcode has been read from memory successfully flow of the emulation continues at the next step The 128 bit bundle is moved from the FP re...

Page 834: ...as IPSR it indicates that taken branch traps are not enabled if the branch is taken and that single stepping is not enabled If a trap condition is detected the ISR code and ISR vector fields are set up as appropriate and the handler jumps to the appropriate operating system entry point after restoring the predicates at the time of the fault and setting the IIP to the appropriate address If no trap...

Page 835: ... and information to determine the dynamic precision and range of the result to be produced When a floating point exception occurs execution is transferred to the appropriate interruption vector either the Floating point Fault Vector at vector address 0x5c00 or the Floating point Trap Vector at vector address 0x5d00 There the operating system may handle the exception or save additional processor in...

Page 836: ... if not go to the last step 6 From the FPSR determine if the trap disabled or trap enabled result is wanted 7 Emulate the Itanium instruction to produce the Itanium architecture specified result 8 Place the result s in the correct FR and or PR registers if required 9 Update the flags in the appropriate status field of the FPSR if required 10 Update the ISR code if required This is required if the ...

Page 837: ...range Note The Itanium architecture also allows for SWA Traps to be raised when the result is just Inexact This is a trivial case for the SWA Trap handler since result of the second IEEE rounding is identical to the first IEEE rounding The general flow of the SWA Trap handler is as follows 1 From the interruption instruction previous address IIPA and exception instruction index ISR ei determine th...

Page 838: ...he inputs or result are at the extremes of their range For these special cases the SWA Fault handler must use alternate algorithms to provide the correct quotient or square root and place that result in the floating point destination register The predicate destination register is also cleared to indicate the result is not an approximation that needs to be improved via the iterative algorithm The p...

Page 839: ...filter is to hide the complexities of the parallel instructions from the user If a floating point fault occurs in the high half of a parallel floating point instruction and there is a user handler provided the parallel instruction is split into two scalar instructions The result for the high half comes from the user handler while the low half is emulated by the IEEE Filter The two results are comb...

Page 840: ...he exponent adjustment factors to do the scaling for the various formats are determined as follows 8 bit single exponents are adjusted by 3 26 0xc0 192 11 bit double exponents are adjusted by 3 29 0x600 1536 15 bit double extended exponents are adjusted by 3 213 0x6000 24576 17 bit register exponents are adjusted by 3 215 0x18000 98304 The actual scaling of the result is not performed by the Itani...

Page 841: ...FPSR register The operating system kernel reached via the floating point fault vector will then invoke the user floating point exception handler if one has been registered 8 2 IA 32 Floating point Exceptions IA 32 floating point exceptions may occur when executing code in IA 32 mode When this happens execution is transferred to the Itanium interruption vector for IA 32 Exceptions at vector address...

Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...

Page 843: ...nsure that all IA 32 code and data is located in the lower 4GBytes of the virtual address space Compute intensive IA 32 applications can improve their performance substantially by migrating compute kernels from IA 32 to Itanium architecture based code while preserving the bulk of the application s IA 32 binary code If mixed IA 32 Itanium architecture based applications are supported care has to be...

Page 844: ...CSD SSD are pointing at valid and correctly aligned memory areas It is also worth noting that the IA 32 GDT and LDT descriptors are maintained in GR30 and GR31 and are unprotected from Itanium architecture based user level code For more details on the IA 32 execution environment please refer to Section 6 2 2 IA 32 Application Register State Model on page 1 113 Some IA 32 execution environments may...

Page 845: ... Itanium instruction set as IP disp16 32 CSD base 0xfffffff0 Targets of the IA 32 JMPE instruction are forced to be 16 byte aligned and are constrained to the lower 4Gbytes of the 64 bit virtual address space The JMPE instruction leaves the IA 32 return address address of the IA 32 instruction following the JMPE itself in IA_64 register GR1 9 1 4 Procedure Calls between Intel Itanium and IA 32 Ins...

Page 846: ...need not be executed exactly in the order presented 1 Caller deposits arguments on memory stack and calls Itanium architecture based transition stub using the JMPE instruction 2 Execute JMPE instruction as an unconditional branch to Itanium architecture based code The JMPE instruction will leave the address of the IA 32 instruction following the JMPE itself in Itanium register GR1 This address may...

Page 847: ...xception vector the IA 32 Intercept and the IA 32 Interrupt vector Within each of these vectors the interrupt status register ISR provides detailed codes as to the origin of this exception Details on the IA 32 vectors is provided in Chapter 9 IA 32 Interruption Vector Descriptions More details on debug related IA 32 exceptions is given in the following section of this document Table 9 1 IA 32 Vect...

Page 848: ...causes an IA 32 Data Breakpoint trap which is delivered to the IA 32 Exception vector Debug In other words the debugger only gets control after the instruction IA 32 Taken Branch trap Debug Relay to debugger IA 32 Single Step trap Debug Relay to debugger IA 32 Invalid Opcode fault Bad Opcode Signal application IA 32 Intercept vector 0x6a00 IA 32 Instruction Intercept fault Attempted to access IA 3...

Page 849: ...ium architecture based applications each instruction that is stepped will stop at the Single Step trap handler When PSR ss or EFLAG tf enable single stepping of IA 32 applications an IA_32_Exception Debug trap is taken after each IA 32 instruction For more details refer to Section 9 1 IA 32 Trap Code on page 2 213 9 3 4 Taken Branch Traps When PSR tb enables taken branch trapping on Itanium archit...

Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...

Page 851: ...e operating system as an IVA based external interrupt This chapter discusses asynchronous external interrupts only PAL based platform management interrupts PMI are not discussed here External interrupts are IVA based and are delivered to the operating system by transferring control to code located at address CR IVA 0x3000 This code location is also known as the external interrupt vector and is des...

Page 852: ...ector priority classes The ITV PMV CMCV LRR0 and LRR1 external interrupt control registers configure the vector number for the processor s local interrupt sources Configuration of the external controllers and devices is controller device specific and is beyond the scope of this document 10 3 External Interrupt Masking The Itanium architecture provides four mechanisms to prevent external interrupts...

Page 853: ...ors in the in service state 10 3 3 Task Priority Register TPR The Task Priority Register TPR provides an additional interrupt masking capability It allows software to mask interrupt priority classes of 16 vectors each by specifying the mask priority class in the TPR mic field The TPR mmi field allows masking of all maskable external interrupts essentially all but NMI An example of TPR use is shown...

Page 854: ...dicated to the processor by writing the EOI register before another interrupt is received on this vector then the processor returns this vector to the Inactive state and all vectors with equal or lower priority are unmasked In Service One Pending an interrupt has been received by the processor on this vector and has been acquired by software by reading the IVR control register and software has not...

Page 855: ...s above is to check the IVR to acquire the vector so the operating system can determine what device the interrupt is associated with The code is setup to loop servicing interrupts until the spurious interrupt vector 15 is returned Looping and harvesting outstanding interrupts reduces the time wasted by returning to the previous state just to get interrupted again The benefit of interrupt harvestin...

Page 856: ...your current critical section We also take the expensive route here by updating not only the processor TPR but the External Task Priority Register used by the chipset if supported as a hint to what processor should receive the next external interrupt routine to set the task priority register to mask interrupts at the specific level or below INPUT SPL level TPR_MIC 4 TPR_MIC_LEN 4 global external_t...

Page 857: ...he interrupt originated This forces the OS to keep track of whether the vector is associated with a level or an edge trigger interrupt line 10 5 4 IRR Usage Example Waiting on an interrupt with interrupts disabled my_interrupt_loop check for vector 192 0xc0 via irr3 mov r3 cr irr3 and r3 0x1 r3 cmp eq p6 p7 0x1 r3 p7 br cond sptk few my_interrupt_loop mov r4 cr ivr read the vector mov cr eoi r0 cl...

Page 858: ...et sptk few rp return endp Since the ITC gets updated at a fixed relation to the processor clock in order to find out the frequency at run time one can use a firmware call to obtain the input frequency information to the interval time Using this frequency information the ITM can be set to deliver an interrupt at a specific time interval i e for operating system scheduling purposes Assuming the fre...

Page 859: ... ITC AR 44 except that it is clocked only when the logical processor is active Optimizations such as hardware multi threading and processor virtualization may cause a logical processor to sometimes be inactive The Resource Utilization Counter allows for better cycle accounting for logical processors given these types of optimizations RUC should only be written by Virtual Machine Monitors other Ope...

Page 860: ...n an invalid operation fault The access must be uncacheable in order to generate an IPI send_ipi_physical dest_id vector inputs processor destination ID vector to send Local ID 8 bits 8 EID 8 bits global ipi_block pointer to processor I O block IPI_DEST_EID 0x4 ENTRY send_ipi_physical alloc r19 ar pfs 2 0 0 0 movl r17 ipi_block ld8 r17 r17 get pointer to processor block shl r21 r32 IPI_DEST_EID ad...

Page 861: ...r31 movl inta_address INTA_PHYS_ADDRESS srlz d make sure everything is up to date mov r14 cr ivr read ivr srlz d serialize before the EOI is written cmp ne p1 p2 EXTINT r14 p1 br cond sptk process_interrupt A single byte load from the INTA address should cause the processor to emit the INTA cycle on the processor system bus Any Intel 8259A compatible external interrupt controller must respond with...

Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...

Page 863: ...he platform the Itanium architecture provides the mf a instruction Unlike the mf instruction that waits for visibility of prior operations the mf a waits for completion of prior operations on the platform More details in Section 11 1 To fully leverage the large set of existing platform infrastructure and I O devices the architecture also supports the IA 32 platform I O port space The Itanium instr...

Page 864: ...ic signals phase e g completion of response phase on the system bus Architecturally the definition of acceptance is platform dependent The next section discusses the usage of the mf a instruction in the context of the I O port space 11 2 I O Port Space IA 32 processors support two I O models memory mapped I O and the 64KB I O port space To support IA 32 platforms the Itanium architecture allows op...

Page 865: ...nce they enforce the strict memory ordering and platform acceptance requirements that IA 32 IN and OUT instructions are subject to The following Itanium architecture based assembly code outb out byte and inb in byte examples assume that the io_port_base is the virtual address mapping pointer set up by the IA_64 operating system An mf a instruction is used to verify acceptance by the platform befor...

Page 866: ...al 0 out 0 rot movl base_addr io_port_base extr u port_offset in0 2 14 mov mask 0xfff ld8 port_addr base_addr shl port_offset port_offset 12 and in0 mask in0 add port_offset port_offset in0 mf add port_addr port_addr port_offset ld1 acq r8 port_addr mf a mf br ret spnt few rp END inb ...

Page 867: ...rmance Monitoring Mechanisms As defined in Section 7 2 Performance Monitoring on page 2 155 processors based on the Itanium architecture provide a minimum of four generic performance counter pairs PMC PMD 4 7 The performance monitor control PMC registers are used to select the event to be counted and to define under what conditions the event should qualify for being counted for details refer to Se...

Page 868: ...ross all processes and the operating system kernel itself a monitor must be enabled continuously across all contexts This can be achieved by configuring a privileged monitor PMC pm 1 and by ensuring that PSR pp and DCR pp remain set for the duration of the monitor session Since the operating system typically reloads PSR and possibly DCR on context switch this requires the operating system to set P...

Page 869: ...lenges when doing instruction pointer IP profiling is to relate the current IP to an executable binary module and to an instruction within that module If appropriate symbol information is available the IP can be mapped to a line of source code To support this IP to module mapping it is recommended that the OS provide services to enumerate all kernel and user mode modules in memory and to allow a k...

Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...

Page 871: ...ivered with the processor The SAL UEFI and ACPI firmware is developed by the platform manufacturer and provide a means of supporting value added platform features from different vendors The interaction of the various functional firmware blocks with the processor platform and operating system is shown in Figure 13 1 Firmware Model on page 2 624 13 1 Processor Boot Flow Overview 13 1 1 Firmware Boot...

Page 872: ...om a variety of mass storage devices such as hard disk CD DVD as well as remote boot via a network At a minimum one of the mass storage devices contains an UEFI system partition Figure 13 1 Firmware Model Non performance criti cal hardware events e g reset machine checks Operating System Software System Abstraction Layer SAL Processor hardware Performance critical hard ware events e g inter rupts ...

Page 873: ...er can obtain information about the memory map usage of the firmware by making the UEFI procedure call GetMemoryMap This procedure provides information related to the size and attributes of the memory regions currently used by firmware The OS loader will then jump to the OS kernel that takes control of the system Until this point system firmware retained control of key system resources such as the...

Page 874: ... SAL_RESET BSP Selection Initialization Memory Test PAL Late Self test Wake APs for PAL Late Self test PAL Late Self test Rendezvous_2 Call to OS OS_Loader Yes No Yes No No BSP Handoff to the Itanium based OS Handoff to the Itanium based OS Itanium based OS will Rendez Rendez Interrupt Interrupt PALE_RESET Recovery No Update Firmware Do System Reset Yes BOOT_RENDEZ Rendezvous_1 PAL_RESET PAL Load ...

Page 875: ...4 1 1 5 Translation Insertion Format on page 2 53 A combination of a general register Interruption TLB Insertion Register ITIR and the Interruption Faulting Address register IFA are used to insert entries into the TLB To void TLB faults on specific text and data areas the operating system can lock critical virtual memory translations in the TLB by use of Translation Register TR section of the TLB ...

Page 876: ...lassified into two types static and stacked The static calls are intended for boot time use before main memory is available or in error recovery situations where memory or the RSE may not be reliable All parameters will be passed in the general registers GR28 to GR31 of Bank 1 The stacked registers GR32 to GR127 will not be used for these calls The static calls can be called at both boot time and ...

Page 877: ...tacked PAL calls conform to the calling conventions document SWC with the exception that general register GR28 must also contain the function index input argument The following code example describes how to make a stacked PAL call GetFeaturesCall mov r14 ip Get the ip of the current bundle movl r28 PAL_PROC_GET_FEATURES Index of the PAL procedure movl r4 AddressOfPALProc Address of the PAL proc en...

Page 878: ...ot use the floating point registers FR32 to FR127 This exception eliminates the need for the OS to save these registers across SAL procedure calls SAL procedures are non re entrant The OS is required to enforce single threaded access to the SAL procedures except for the following procedures SAL_MC_RENDEZ SAL_CACHE_INIT SAL_CACHE_FLUSH 13 2 3 UEFI Procedure Calls UEFI procedure calls are classified...

Page 879: ...it is the responsibility of the caller to map these PAL procedures with an ITR as well as either a DTR or DTC If the caller chooses to map the PAL procedures using a DTC it must be able to handle TLB faults that could occur See Section 11 10 1 PAL Procedure Summary for a summary of all PAL procedures and the calling conventions The SAL and UEFI firmware layers have been designed to operate in virt...

Page 880: ...ther to meet this goal This section will provide an overview of the machine check abort handling When the processor detects an error control is transferred to the PAL_CHECK entrypoint PAL_CHECK will perform error analysis and processor error correction where possible Subsequently PAL either returns to the interrupted context or hands off control to the SAL_CHECK component The level of recovery pro...

Page 881: ...oint in time more than one processor in the system may experience a local MCA and handle it without notifying other processors in the system The next sections will provide an overview of the responsibilities that the PAL SAL and OS have for handling machine checks These sections are not an exhaustive description of the functionality of the handlers but provides a high level description of how the ...

Page 882: ...ar log request to clear the processor error logs which enables further logging Log platform state for MCA and retain it until it is retrieved by the OS Attempt to correct processor machine check errors which are not corrected by PAL Attempt to correct platform machine check errors Branch to the OS MCA handler for uncorrected errors or optionally reset the system Return to the interrupted context v...

Page 883: ...to achieve this is to call the PAL_MC_DRAIN procedure when an application makes a system call to the OS This procedure completes all outstanding transactions within the processor and reports any pending machine checks This technique impacts system call and interrupt handling performance significantly but will improve system reliability by allowing the OS to recover from more errors than if this me...

Page 884: ...T Yes No OS_INIT SAL implementation specific save area Yes No SAL_MC_RENDEZ warm boot SAL_RESET or reset event due to failure to respond to rendezvous interrupt Wake up Return to Interrupted Context PAL_MC_RESUME Return value from OS Warm boot OS_INIT procedures valid INIT PAL_INIT INIT Event Interrupt ...

Page 885: ...code s memory area in RAM and the protection of this code space may be through the OS memory management s paging mechanism SAL sets the correct attributes for this memory space and passes this information to the OS through the Memory Descriptor Table from EfiGetMemoryMap UEFI 13 3 4 P state Feedback Mechanism Flow Diagram The example flowchart shown below illustrates how the caller can utilize the...

Page 886: ...istics using P states allows for efficient power utilization the processor consumes additional power by operating at a higher performance level only when the current workload requires it to do so Figure 13 6 Flowchart Showing P state Feedback Policy 1 getperfindex PAL_GET_PSTATE 2 OS computes newpstate index from busy ratio and getperfindex newpstate getperfindex PAL_SET_PSTATE newpstate Check Ret...

Page 887: ...E 0 5 Disable the VHPT 6 Initialize protection key registers 7 Initialize SP 8 Initialize BSP 9 Enable register stack engine 10 Setup IVA 11 Setup virtual physical address translation 12 Setup GP file start s globals global main type main function C function we will return to global __GLOB_DATA_PTR External pointer to Global Data area global IVT_BASE External pointer to IVT_BASE text This is the e...

Page 888: ...op until 8 Setup kernel stack pointer r12 movl sp kstack 64 1024 64K stack Set up the scratch area on stack add sp 32 sp Setup the Register stack backing store 1st deal with Register Stack Configuration register NOTE the RSC mode must be enforced lazy 00 to write to bspstore mode enforced lazy be little endian mov ar rsc r0 Now have to setup the RSE backing store pointer NOTE initializing the bsps...

Page 889: ... ignored 0 where ig ignored bits rv reserved bits p present bit ma memory attribute a accessed bit d dirty bit pl privilege level ar access rights ppn physical page number ed exception deferral ps page size of mapping 2 ps vpn virtual page number Setup virtual page number NOTE The virtual page number depends on a translation s page size Add entry for TEXT section movl r2 0x0 mov cr ifa r2 setup IT...

Page 890: ...time to set the appropriate bits in the PSR processor status register movl r3 1 44 1 36 1 38 1 27 1 17 1 15 1 14 1 13 mov cr ipsr r3 Initialize DCR to defer all speculation faults movl r2 0x7f00 mov cr dcr r2 Initialize the global pointer gp r1 movl gp __GLOB_DATA_PTR Clear out ifs mov cr ifs r0 Need to do a rfi in order to synchronize above instructions and set it and ed bits in the PSR movl r3 m...

Page 891: ......

Page 892: ...Intel Itanium Architecture Software Developer s Manual Volume 3 Intel Itanium Instruction Set Reference Revision 2 3 May 2010 Document Number 323207 ...

Page 893: ...hanges to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them Intel processors based on the Itanium architec...

Page 894: ...on Encodings 3 300 4 2 1 Integer ALU 3 300 4 2 2 Integer Compare 3 302 4 2 3 Multimedia 3 306 4 3 I Unit Instruction Encodings 3 310 4 3 1 Multimedia and Variable Shifts 3 310 4 3 2 Integer Shifts 3 315 4 3 3 Test Bit 3 316 4 3 4 Miscellaneous I Unit Instructions 3 318 4 3 5 GR BR Moves 3 320 4 3 6 GR Predicate IP Moves 3 321 4 3 7 GR AR Moves I Unit 3 321 4 3 8 Sign Zero Extend Compute Zero Index...

Page 895: ...5 Listing of Rules Referenced in Dependency Tables 3 387 5 4 Support Tables 3 389 Index 3 397 Figures 2 1 Add Pointer 3 15 2 2 Stack Frame 3 16 2 3 Operation of br ctop and br cexit 3 23 2 4 Operation of br wtop and br wexit 3 24 2 5 Deposit Example merge_form 3 51 2 6 Deposit Example zero_form 3 51 2 7 Extract Example 3 54 2 8 Floating point Merge Negative Sign Operation 3 80 2 9 Floating point M...

Page 896: ...ight Pair 3 248 2 45 Unpack Operation 3 271 4 1 Bundle Format 3 293 Tables 2 1 Instruction Page Description 3 11 2 2 Instruction Page Font Conventions 3 11 2 3 Register File Notation 3 12 2 4 C Syntax Differences 3 12 2 5 Pervasive Conditions Not Included in Instruction Description Code 3 13 2 6 Branch Types 3 20 2 7 Branch Whether Hint 3 25 2 8 Sequential Prefetch Hint 3 25 2 9 Branch Cache Deall...

Page 897: ...t Saturation Limits 3 227 2 50 Store Types 3 251 2 51 Store Hints 3 252 2 52 xsz Mnemonic Values 3 258 2 53 Test Bit Relations for Normal and unc tbits 3 261 2 54 Test Bit Relations for Parallel tbits 3 261 2 55 Test Feature Relations for Normal and unc tf 3 263 2 56 Test Feature Relations for Parallel tf 3 263 2 57 Test Feature Features Assignment 3 263 2 58 Test NaT Relations for Normal and unc ...

Page 898: ... Pair Set FR Opcode Extensions 3 327 4 38 Floating point Load Pair Imm Opcode Extensions 3 328 4 39 Load Hint Completer 3 328 4 40 Store Hint Completer 3 328 4 41 Line Prefetch Hint Completer 3 337 4 42 Opcode 0 System Memory Management 3 bit Opcode Extensions 3 345 4 43 Opcode 0 System Memory Management 4 bit 2 bit Opcode Extensions 3 345 4 44 Opcode 1 System Memory Management 3 bit Opcode Extens...

Page 899: ...bit Opcode Extensions 3 366 4 70 Misc X Unit 6 bit Opcode Extensions 3 366 4 71 Move Long 1 bit Opcode Extensions 3 367 4 72 Long Branch Types 3 367 4 73 Misc X Unit 1 bit Opcode Extensions 3 368 4 74 Immediate Formation 3 368 5 1 Semantics of Dependency Codes 3 373 5 2 RAW Dependencies Organized by Resource 3 375 5 3 WAW Dependencies Organized by Resource 3 383 5 4 WAR Dependencies Organized by R...

Page 900: ...he Itanium application architecture including application level resources programming environment and the IA 32 application interface This volume also describes optimization techniques used to generate high performance software 1 1 1 Part 1 Application Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Ch...

Page 901: ...software 1 2 1 Part 1 System Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Chapter 2 Intel Itanium System Environment introduces the environment designed to support execution of Itanium architecture based operating systems running IA 32 or Itanium architecture based applications Chapter 3 System Stat...

Page 902: ...ing systems need to preserve Itanium register contents and state This chapter also describes system architecture mechanisms that allow an operating system to reduce the number of registers that need to be spilled filled on interruptions system calls and context switches Chapter 5 Memory Management introduces various memory management strategies Chapter 6 Runtime Support for Control and Data Specul...

Page 903: ...2 Instruction Reference provides a detailed description of all Itanium instructions organized in alphabetical order by assembly language mnemonic Chapter 3 Pseudo Code Functions provides a table of pseudo code functions which are used to define the behavior of the Itanium instructions Chapter 4 Instruction Formats describes the encoding and instruction format instructions Chapter 5 Resource and De...

Page 904: ...Environment The operating system environment that supports the execution of both IA 32 and Itanium architecture based code Itanium Architecture based Firmware The Processor Abstraction Layer PAL and System Abstraction Layer SAL Processor Abstraction Layer PAL The firmware layer which abstracts processor features that are implementation dependent System Abstraction Layer SAL The firmware layer whic...

Page 905: ...irmware 1 7 Revision History Date of Revision Revision Number Description March 2010 2 3 Added information about illegal virtualization optimization combinations and IIPA requirements Added Resource Utilization Counter and PAL_VP_INFO PAL_VP_INIT and VPD vpr changes New PAL_VPS_RESUME_HANDLER parameter to indicate RSE Current Frame Load Enable setting at the target instruction PAL_VP_INIT_ENV impl...

Page 906: ...rocedures Allows IPI redirection feature to be optional Undefined behavior for 1 byte accesses to the non architected regions in the IPI block Modified insertion behavior for TR overlaps See Vol 2 Part I Ch 4 for details Bus parking feature is now optional for PAL_BUS_GET_FEATURES Introduced low power synchronization primitive using hint instruction FR32 127 is now preserved in PAL calling convent...

Page 907: ...n 2 2 and 3 Part I Volume 3 Added Performance Counter Standardization Sections 7 2 3 and 11 6 Part I Volume 2 Added Freeze Bit Functionality in Context Switching and Interrupt Generation Clarification Sections 7 2 1 7 2 2 7 2 4 1 and 7 2 4 2 Part I Volume 2 Added IA_32_Exception Debug IIPA Description Change Section 9 2 Part I Volume 2 Added capability for Allowing Multiple PAL_A_SPEC and PAL_B En...

Page 908: ...S changes extend calls to allow implementation specific feature control Section 11 8 3 Split PAL_A architecture changes Section 11 1 6 Simple barrier synchronization clarification Section 13 4 2 Limited speculation clarification added hardware generated speculative references Section 4 4 6 PAL memory accesses and restrictions clarification Section 11 9 PSP validity on INITs from PAL_MC_ERROR_INFO ...

Page 909: ...plify the call to provide more information regarding machine check Chapter 11 PAL_ENTER_IA_32_Env call changes entry parameter represents the entry order SAL needs to initialize all the IA 32 registers properly before making this call Chapter 11 PAL_CACHE_FLUSH added a new cache_type argument Chapter 11 PAL_SHUTDOWN removed from list of PAL calls Chapter 11 Clarified memory ordering changes Chapte...

Page 910: ...addr field specifies a register address as an assembly language field name or a register mnemonic For the general floating point and predicate register files which undergo register renaming addr is the register address prior to renaming and the renaming is not shown The field option specifies a named bit field within the register If field is absent then all fields of the register are accessed The ...

Page 911: ...ation registers CPUID cpuid Y Data breakpoint registers DBR dbr Y Instruction breakpoint registers IBR ibr Y Data TLB translation cache DTC N A Data TLB translation registers DTR dtr Y Floating point registers FR f General registers GR r Instruction TLB translation cache ITC N A Instruction TLB translation registers ITR itr Y Protection key registers PKR pkr Y Performance monitor configuration reg...

Page 912: ...nium instructions Table 2 5 Pervasive Conditions Not Included in Instruction Description Code Condition Action Read of a register outside the current frame An undefined value is returned no fault Access to a banked general register GR 16 through GR 31 The GR bank specified by PSR bn is accessed PSR ss is set A Single Step trap is raised ...

Page 913: ...tended imm22 encoding field In the imm22_form GR r3 can specify only GRs 0 1 2 and 3 The plus1_form is available only in the register_form although the equivalent effect in the immediate forms can be achieved by adjusting the immediate The immediate form pseudo op chooses the imm14_form or imm22_form based on the size of the immediate operand and the value of r3 Operation if PR qp check_target_reg...

Page 914: ...lt is placed in GR r1 In the register_form the first operand is GR r2 in the imm14_form the first operand is taken from the sign extended imm14 encoding field Operation if PR qp check_target_register r1 tmp_src register_form GR r2 sign_ext imm14 14 tmp_nat register_form GR r2 nat 0 tmp_res tmp_src GR r3 tmp_res zero_ext tmp_res 31 0 32 tmp_res 62 61 GR r3 31 30 GR r1 tmp_res GR r1 nat tmp_nat GR r...

Page 915: ...rame and be a multiple of 8 in number If this instruction attempts to change the size of CFM sor and the register rename base registers CFM rrb gr CFM rrb fr CFM rrb pr are not all zero then the instruction will cause a Reserved Register Field fault Although the assembler does not allow illegal combinations of operands for alloc illegal combinations can be encoded in the instruction Attempting to ...

Page 916: ...ate 0 tmp_sof CFM sof rse_new_frame CFM sof tmp_sof Make room for new registers Mandatory RSE stores can raise faults listed below CFM sof tmp_sof CFM sol tmp_sol CFM sor tmp_sor GR r1 AR PFS GR r1 nat 0 Interruptions Illegal Operation fault Data NaT Page Consumption fault Reserved Register Field fault Data Key Miss fault Unimplemented Data Address fault Data Key Permission fault VHPT Data fault D...

Page 917: ...ically ANDed and the result placed in GR r1 In the register_form the first operand is GR r2 in the imm8_form the first operand is taken from the imm8 encoding field Operation if PR qp check_target_register r1 tmp_src register_form GR r2 sign_ext imm8 8 tmp_nat register_form GR r2 nat 0 GR r1 tmp_src GR r3 GR r1 nat tmp_nat GR r3 nat Interruptions Illegal Operation fault ...

Page 918: ... 1 s complement of the second source operand and the result placed in GR r1 In the register_form the first operand is GR r2 in the imm8_form the first operand is taken from the imm8 encoding field Operation if PR qp check_target_register r1 tmp_src register_form GR r2 sign_ext imm8 8 tmp_nat register_form GR r2 nat 0 GR r1 tmp_src GR r3 GR r1 nat tmp_nat GR r3 nat Interruptions Illegal Operation f...

Page 919: ...ion as a signed immediate displacement imm21 between the target bundle and the bundle containing this instruction imm21 target25 IP 4 For indirect branches the target address is taken from BR b2 There are two pseudo ops for unconditional branches These are encoded like a conditional branch btype cond with the qp field specifying PR 0 and with the bwh hint of sptk The branch type determines how the...

Page 920: ... to the current code segment i e EIP 31 0 BR b2 31 0 CSD base The IA 32 instruction set can be entered at any privilege level provided PSR di is 0 If PSR dfh is 1 a Disabled FP Register fault is raised on the target IA 32 instruction No register bank switch nor change in privilege level occurs during the instruction set transition Software must ensure the code segment descriptor CSD and selector C...

Page 921: ...egister is not equal to zero it is decremented and the branch is taken In addition to these simple branch types there are four types which are used for accelerating modulo scheduled loops see also Section 4 5 1 Modulo scheduled Loop Support on page 1 75 Two of these are for counted loops which use the LC register and two for while loops which use the qualifying predicate These loop types use regis...

Page 922: ...ion of whether to branch or not For br wtop the branch is taken if either the qualifying predicate is one or EC is greater than one For br wexit the opposite is true It is not taken if either the qualifying predicate is one or EC is greater than one and is taken otherwise These branch types also use the qualifying predicate and EC to control register rotation and predicate initialization During th...

Page 923: ...fying predicate and PR 63 cannot be the qualifying predicate for any branch preceding a br wtop or br wexit in the same instruction group For dependency purposes the loop type branches effectively always write their associated resources whether they are taken or not The cloop type effectively always writes LC When LC is 0 a cloop branch leaves it unchanged but hardware may implement this as a re w...

Page 924: ... frame AR PFS pec AR EC AR PFS ppl PSR cpl alat_frame_update CFM sol 0 rse_preserve_frame CFM sol CFM sof CFM sol new frame size is size of outs CFM sol 0 CFM sor 0 CFM rrb gr 0 CFM rrb fr 0 CFM rrb pr 0 break case ret return restores stack frame Table 2 7 Branch Whether Hint bwh Completer Branch Whether Hint spnt Static Not Taken sptk Static Taken dpnt Dynamic Not Taken dptk Dynamic Taken Table 2...

Page 925: ...rse_enable_current_frame_load AR EC AR PFS pec if PSR cpl u AR PFS ppl and restores privilege PSR cpl AR PFS ppl lower_priv_transition 1 break case ia switch to IA mode tmp_taken 1 if PSR ic 0 PSR dt 0 PSR mc 1 PSR it 0 undefined_behavior if qp 0 illegal_operation_fault if AR BSPSTORE AR BSP illegal_operation_fault if PSR di disabled_instruction_set_transition_fault PSR is 1 set IA 32 Instruction ...

Page 926: ...gs else if AR EC 0 AR LC AR LC AR EC PR 63 0 rotate_regs else AR LC AR LC AR EC AR EC PR 63 0 CFM rrb gr CFM rrb gr CFM rrb fr CFM rrb fr CFM rrb pr CFM rrb pr break case wtop case wexit SW pipelined while loop if slot 2 illegal_operation_fault if btype wtop tmp_taken PR qp AR EC u 1 if btype wexit tmp_taken PR qp AR EC u 1 if PR qp AR EC AR EC PR 63 0 rotate_regs else if AR EC 0 AR EC PR 63 0 rot...

Page 927: ...nted_instruction_address_trap lower_priv_transition tmp_IP if lower_priv_transition PSR lp lower_privilege_transfer_trap if PSR tb taken_branch_trap Interruptions Illegal Operation fault Lower Privilege Transfer trap Disabled Instruction Set Transition fault Taken Branch trap Unimplemented Instruction Address trap Additional Faults on IA 32 target instructions IA_32_Exception GPFault Disabled FP R...

Page 928: ...ntrol register IIM For the x_unit_form the lower 21 bits of the value specified by imm62 is zero extended and placed in the Interruption Immediate control register IIM The L slot of the bundle contains the upper 41 bits of imm62 A break i instruction may be encoded in an MLI template bundle in which case the L slot of the bundle is ignored This instruction has five forms each of which can be execu...

Page 929: ... taken and several other actions occur The current values of the Current Frame Marker CFM the EC application register and the current privilege level are saved in the Previous Function State application register The caller s stack frame is effectively saved and the callee is provided with a frame containing only the caller s output region The rotation rename base registers in the CFM are reset to ...

Page 930: ... tmp_taken PR qp break case call call saves a return link tmp_taken PR qp if tmp_taken BR b1 IP 16 AR PFS pfm CFM and saves the stack frame AR PFS pec AR EC AR PFS ppl PSR cpl alat_frame_update CFM sol 0 rse_preserve_frame CFM sol CFM sof CFM sol new frame size is size of outs CFM sol 0 CFM sor 0 CFM rrb gr 0 CFM rrb fr 0 CFM rrb pr 0 break if tmp_taken taken_branch 1 IP tmp_IP set the new value f...

Page 931: ... branch In the indirect_form the target of the presaged branch is given by BR b2 The return_form is used to indicate that the presaged branch will be a return Other hints can be given about the presaged branch Values for various hint completers are shown in the following tables For more details refer to Section 4 5 2 Branch Prediction Hints on page 1 78 The ipwh and indwh completers provide inform...

Page 932: ...3 brp Operation tmp_tag IP sign_ext timm9 4 13 if ip_relative_form tmp_target IP sign_ext imm21 4 25 tmp_wh ipwh else indirect_form tmp_target BR b2 tmp_wh indwh branch_predict tmp_wh ih return_form tmp_target tmp_tag Interruptions None ...

Page 933: ... an instruction group otherwise operation is undefined Instructions in the same instruction group that access GR16 to GR31 reference the previous register bank Subsequent instruction groups reference the new register bank This instruction can only be executed at the most privileged level and when PSR vm is 0 This instruction cannot be predicated Operation if followed_by_stop undefined_behavior if ...

Page 934: ...s is encoded in the instruction as a signed immediate displacement imm21 between the target bundle and the bundle containing this instruction imm21 target25 IP 4 The branching behavior of this instruction can be optionally unimplemented If the instruction would have branched and the branching behavior is not implemented then a Speculative Operation fault is taken and the value specified by imm21 i...

Page 935: ... always_fail alat_index 0 alat_index 1 fail always_fail alat_cmp reg_type alat_index if fail if check_branch_implemented check_type taken_branch 1 IP IP sign_ext imm21 4 25 if impl_uia_fault_supported PSR it unimplemented_virtual_address IP PSR vm PSR it unimplemented_physical_address IP unimplemented_instruction_address_trap 0 IP if PSR tb taken_branch_trap else speculation_fault check_type zero_...

Page 936: ...rb pr are cleared In the pred_form the single register rename base register for the predicates CFM rrb pr is cleared This instruction must be the last instruction in an instruction group otherwise operation is undefined This instruction cannot be predicated Operation if followed_by_stop undefined_behavior if all_form CFM rrb gr 0 CFM rrb fr 0 CFM rrb pr 0 else pred_form CFM rrb pr 0 Interruptions ...

Page 937: ...indicates the presence of the feature on the processor model See Section 3 1 11 Processor Identification Registers on page 1 34 for details This capability may also be determined using the test feature tf instruction using the clz operand Operation if PR qp if instruction_implemented CLZ illegal_operation_fault check_target_register r1 tmp_val 0 do if GR r3 63 tmp_val 0 break while tmp_val 63 GR r...

Page 938: ... Table 2 15 A blank entry indicates the predicate target is left unchanged In the register_form the first operand is GR r2 in the imm8_form the first operand is taken from the sign extended imm8 encoding field and in the parallel_inequality_form the first operand must be GR 0 The parallel_inequality_form is only used when the compare type is one of the parallel types and the relation is an inequal...

Page 939: ...R 0 Comparisons where the second operand is GR 0 are pseudo ops for which the assembler switches the register specifiers and uses the opposite relation Table 2 16 64 bit Comparison Relations for Normal and unc Compares crel Compare Relation a rel b Register Form is a pseudo op of Immediate Form is a pseudo op of Immediate Range eq a b 128 127 ne a b eq p1 p2 eq p1 p2 128 127 lt a b signed 128 127 ...

Page 940: ...mp_rel greater_signed tmp_src GR r3 else if crel ge tmp_rel greater_equal_signed tmp_src GR r3 else if crel ltu tmp_rel lesser tmp_src GR r3 else if crel leu tmp_rel lesser_equal tmp_src GR r3 else if crel gtu tmp_rel greater tmp_src GR r3 else tmp_rel greater_equal tmp_src GR r3 geu switch ctype case and and type compare if tmp_nat tmp_rel PR p1 0 PR p2 0 break case or or type compare if tmp_nat ...

Page 941: ...3 42 Volume 3 Instruction Reference cmp illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...

Page 942: ...m the sign extended imm8 encoding field and in the parallel_inequality_form the first operand must be GR 0 The parallel_inequality_form is only used when the compare type is one of the parallel types and the relation is an inequality See the Compare instruction and Table 2 17 on page 3 40 If the two predicate register destinations are the same p1 and p2 specify the same predicate register the inst...

Page 943: ...rc 32 sign_ext GR r3 32 else if crel gt tmp_rel greater_signed sign_ext tmp_src 32 sign_ext GR r3 32 else if crel ge tmp_rel greater_equal_signed sign_ext tmp_src 32 sign_ext GR r3 32 else if crel ltu tmp_rel lesser zero_ext tmp_src 32 zero_ext GR r3 32 else if crel leu tmp_rel lesser_equal zero_ext tmp_src 32 zero_ext GR r3 32 else if crel gtu tmp_rel greater zero_ext tmp_src 32 zero_ext GR r3 32...

Page 944: ...cmp4 PR p2 0 break case unc unc type compare default normal compare if tmp_nat PR p1 0 PR p2 0 else PR p1 tmp_rel PR p2 tmp_rel break else if ctype unc if p1 p2 illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...

Page 945: ...by the value in GR r3 is not naturally aligned to the size of the value being accessed in memory an Unaligned Data Reference fault is taken independent of the state of the User Mask alignment checking bit UM ac PSR ac in the Processor Status Register For the cmp8xchg16 instruction the address specified must be 8 byte aligned The memory read and write are guaranteed to be atomic For the cmp8xchg16 ...

Page 946: ...orts_semaphores mattr unsupported_data_reference_fault SEMAPHORE GR r3 if sixteen_byte_form if sem acq val mem_xchg16_cond AR CCV GR r2 AR CSD paddr UM be mattr ACQUIRE ldhint else rel val mem_xchg16_cond AR CCV GR r2 AR CSD paddr UM be mattr RELEASE ldhint else if sem acq val mem_xchg_cond AR CCV GR r2 paddr size UM be mattr ACQUIRE ldhint else rel val mem_xchg_cond AR CCV GR r2 paddr size UM be ...

Page 947: ...o then the old value of the Current Frame Marker CFM is copied to the Interruption Function State register IFS and IFS v is set to one A cover instruction must be the last instruction in an instruction group otherwise operation is undefined This instruction cannot be predicated Operation if followed_by_stop undefined_behavior if PSR cpl 0 PSR vm 1 virtualization_fault alat_frame_update CFM sof 0 r...

Page 948: ...0000000000 0 GR r1 2 else if GR r3 0x000000ff00000000 0 GR r1 3 else if GR r3 0x00000000ff000000 0 GR r1 4 else if GR r3 0x0000000000ff0000 0 GR r1 5 else if GR r3 0x000000000000ff00 0 GR r1 6 else if GR r3 0x00000000000000ff 0 GR r1 7 else GR r1 8 else right_form scan from least significant up if GR r3 0x00000000000000ff 0 GR r1 0 else if GR r3 0x000000000000ff00 0 GR r1 1 else if GR r3 0x0000000...

Page 949: ...3 50 Volume 3 Instruction Reference czx else if GR r3 0x0000ffff00000000 0 GR r1 2 else if GR r3 0xffff000000000000 0 GR r1 3 else GR r1 4 GR r1 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 950: ...in the imm_form The pos6 immediate has a range of 0 to 63 In the zero_form a right justified bit field taken from either the value in GR r2 in the register_form or the sign extended value in imm8 in the imm_form is deposited into GR r1 and all other bits in GR r1 are cleared to zero The deposited bit field begins at the bit position specified by the pos6 immediate and extends to the left towards t...

Page 951: ...imm8 8 tmp_nat merge_form GR r3 nat 0 tmp_len len6 else register_form tmp_src GR r2 tmp_nat merge_form GR r3 nat 0 GR r2 nat tmp_len merge_form len4 len6 if pos6 tmp_len u 64 tmp_len 64 pos6 if merge_form GR r1 GR r3 else zero_form GR r1 0 GR r1 pos6 tmp_len 1 pos6 tmp_src tmp_len 1 0 GR r1 nat tmp_nat Interruptions Illegal Operation fault ...

Page 952: ...n the translation for the page containing the epc instruction This instruction can promote but cannot demote and the new privilege comes from the TLB entry If instruction address translation is disabled then the current privilege level is set to 0 most privileged Instructions after the epc in the same instruction group may be executed at the old privilege level or the new higher privilege level In...

Page 953: ...st significant bit of the extracted field If the specified field extends beyond the most significant bit of GR r3 the sign is taken from the most significant bit of GR r3 The immediate value len6 can be any number in the range 1 to 64 and is encoded as len6 1 in the instruction The immediate value pos6 can be any value in the range 0 to 63 The operation of extr r1 r3 7 50 is illustrated in Figure ...

Page 954: ...lue Format qp fabs f1 f3 pseudo op of qp fmerge s f1 f0 f3 Description The absolute value of the value in FR f3 is computed and placed in FR f1 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation See fmerge Floating point Merge on page 3 80 ...

Page 955: ...Val FR f1 is set to NaTVal instead of the computed result The mnemonic values for the opcode s pc are given in Table 2 22 The mnemonic values for sf are given in Table 2 23 For the encodings and interpretation of the status field s pc wre and rc refer to Table 5 5 and Table 5 6 on page 1 90 Operation See fma Floating point Multiply Add on page 3 77 Table 2 22 Specified pc Mnemonic Values pc Mnemon...

Page 956: ...es for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fminmax_exception_fault_check f2 f3 sf tmp_fp_env if fp_raise_fault tmp_fp_env fp_exception_fault fp_decode_fault tmp_fp_env tmp_right fp_reg_read FR f2 tmp_left fp...

Page 957: ...ues for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fminmax_exception_fault_check f2 f3 sf tmp_fp_env if fp_raise_fault tmp_fp_env fp_exception_fault fp_decode_fault tmp_fp_env tmp_left fp_reg_read FR f2 tmp_right f...

Page 958: ...ign field of FR f1 is set to positive 0 If either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else FR f1 significand FR f2 significand FR f3 significand FR f1 exponent FP_INTEGER_...

Page 959: ... for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If either FR f2 or FR f2 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else FR f1 significand FR f2 significand FR f3 significan...

Page 960: ...es aligned on a 32 byte boundary An implementation may flush a larger region When executed at privilege level 0 fc and fc i perform no access rights or protection key checks At other privilege levels fc and fc i perform access rights checks as if they were 1 byte reads but do not perform any protection key checks regardless of PSR pk The memory attribute of the page containing the affected line ha...

Page 961: ...Interruptions Register NaT Consumption fault Data TLB fault Unimplemented Data Address fault Data Page Not Present fault Data Nested TLB fault Data NaT Page Consumption fault Alternate Data TLB fault Data Access Rights fault VHPT Data fault ...

Page 962: ...ion fault is taken and the value specified by imm21 is zero extended and placed in the Interruption Immediate control register IIM The fault handler emulates the branch by sign extending the IIM value adding it to IIP and returning The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp switch sf case s0 tmp_flags AR FPSR sf0 flags break case s1 tmp_flags AR FPSR sf1 fla...

Page 963: ...1 or the number is a quiet NaN and fclass9 7 is 1 or the number is a signaling NaN and fclass9 6 is 1 or the sign of the number agrees with the sign specified by one of the two low order bits of fclass9 and the type of the number disregarding the sign agrees with the number type specified by the next four bits of fclass9 as shown in Table 2 25 Note An fclass9 of 0x1FF is equivalent to testing for ...

Page 964: ...is_zero FR f2 fclass9 3 fp_is_unorm FR f2 fclass9 4 fp_is_normal FR f2 fclass9 5 fp_is_inf FR f2 fclass9 6 fp_is_snan FR f2 fclass9 7 fp_is_qnan FR f2 fclass9 8 fp_is_natval FR f2 tmp_nat fp_is_natval FR f2 fclass9 8 if tmp_nat PR p1 0 PR p2 0 else PR p1 tmp_rel PR p2 tmp_rel else if fctype unc if p1 p2 illegal_operation_fault PR p1 0 PR p2 0 FP Exceptions None Interruptions Illegal Operation faul...

Page 965: ...ting point Clear Flags Format qp fclrf sf F13 Description The status field s 6 bit flags field is reset to zero The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_set_sf_flags sf 0 FP Exceptions None Interruptions None ...

Page 966: ...udo ops For these the assembler simply switches the source operand specifiers and or switches the predicate target specifiers and uses an implemented relation Table 2 26 Floating point Comparison Types fctype PR qp 0 PR qp 1 Result 0 No Source NaTVals Result 1 No Source NaTVals One or More Source NaTVals PR p1 PR p2 PR p1 PR p2 PR p1 PR p2 PR p1 PR p2 none 0 1 1 0 0 0 unc 0 0 0 1 1 0 0 0 Table 2 2...

Page 967: ...han tmp_fr2 tmp_fr3 else if frel le tmp_rel fp_lesser_or_equal tmp_fr2 tmp_fr3 else if frel gt tmp_rel fp_less_than tmp_fr3 tmp_fr2 else if frel ge tmp_rel fp_lesser_or_equal tmp_fr3 tmp_fr2 else if frel unord tmp_rel fp_unordered tmp_fr2 tmp_fr3 else if frel neq tmp_rel fp_equal tmp_fr2 tmp_fr3 else if frel nlt tmp_rel fp_less_than tmp_fr2 tmp_fr3 else if frel nle tmp_rel fp_lesser_or_equal tmp_f...

Page 968: ...ion Reference 3 69 fcmp FP Exceptions Invalid Operation V Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 969: ...ation Floating point Exception fault is disabled If FR f2 is a NaTVal FR f1 is set to NaTVal instead of the computed result The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 0 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 FR f1 NATVAL fp_update_psr f1 else tmp_default_result fcvt_ex...

Page 970: ...t fx FP Exceptions Invalid Operation V Inexact I Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault Floating point Exception trap ...

Page 971: ...is unaffected by the rounding mode Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 0 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 FR f1 NATVAL else tmp_res FR f2 if tmp_res significand 63 tmp_res significand tmp_res significand 1 tmp_res sign 1 else tmp_res sign 0 tmp_res exponent FP_INTEGER_EXP tmp_res fp_normalize tmp_res FR f1 significand ...

Page 972: ...te Multiplying FR f3 with FR 1 a 1 0 normalizes the canonical representation of an integer in the floating point register file producing a normal floating point value If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result The mnemonic values for the opcode s pc are given in Table 2 22 on page 3 56 The mnemonic values for sf are given in Table 2 23 on page 3 56 For the encodings...

Page 973: ...pendent of the state of the User Mask alignment checking bit UM ac PSR ac in the Processor Status Register Both read and write access privileges for the referenced page are required The write access privilege check is performed whether or not the memory write is performed Only accesses to UCE pages or cacheable pages with write back write policy are permitted Accesses to NaTPages result in a Data ...

Page 974: ...t else rel val mem_xchg_add inc3 paddr size UM be mattr RELEASE ldhint alat_inval_multiple_entries paddr size GR r1 zero_ext val size 8 GR r1 nat 0 Interruptions Illegal Operation fault Data Key Miss fault Register NaT Consumption fault Data Key Permission fault Unimplemented Data Address fault Data Access Rights fault Data Nested TLB fault Data Dirty Bit fault Alternate Data TLB fault Data Access...

Page 975: ...s instruction completes execution BSPSTORE is equal to BSP This instruction must be the first instruction in an instruction group and must either be in instruction slot 0 or in instruction slot 1 of a template having a stop after slot 0 otherwise the results are undefined This instruction cannot be predicated Operation while AR BSPSTORE AR BSP rse_store MANDATORY increments AR BSPSTORE deliver_unm...

Page 976: ...e 2 23 on page 3 56 For the encodings and interpretation of the status field s pc wre and rc refer to Table 5 5 and Table 5 6 on page 1 90 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 f4 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 fp_is_natval FR f4 FR f1 NATVAL fp_update_psr f1 else tmp_default_result fma_exception_fa...

Page 977: ...3 78 Volume 3 Instruction Reference fma Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault Floating point Exception trap ...

Page 978: ...ation The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fminmax_exception_fault_check f2 f3 sf tmp_fp_env if fp_raise_fault tmp_fp_env fp_exception_fault fp_decode_fault tmp_fp_env tmp_bool_res fp_...

Page 979: ...he same register for FR f2 and FR f3 For the sign_form the sign of FR f2 is concatenated with the exponent and the significand of FR f3 For the sign_exp_form the sign and exponent of FR f2 is concatenated with the significand of FR f3 For all forms if either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Figure 2 8 Floating point Merge Negative Sign Operation Figu...

Page 980: ... fp_is_natval FR f3 FR f1 NATVAL else FR f1 significand FR f3 significand if neg_sign_form FR f1 exponent FR f3 exponent FR f1 sign FR f2 sign else if sign_form FR f1 exponent FR f3 exponent FR f1 sign FR f2 sign else sign_exp_form FR f1 exponent FR f2 exponent FR f1 sign FR f2 sign fp_update_psr f1 FP Exceptions None Interruptions Illegal Operation fault Disabled Floating point Register fault ...

Page 981: ...ration The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fminmax_exception_fault_check f2 f3 sf tmp_fp_env if fp_raise_fault tmp_fp_env fp_exception_fault fp_decode_fault tmp_fp_env tmp_bool_res fp...

Page 982: ...sion value in FR f3 For all forms the exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 For all forms if either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Figure 2 11 Floating point Mix Left Figure 2 12 Floating point Mix Right Figure 2 13 Floating point Mix Left Right 81 0 80 64 63 81 0 80...

Page 983: ...FR f2 significand 63 32 tmp_res_lo FR f3 significand 63 32 else if mix_r_form tmp_res_hi FR f2 significand 31 0 tmp_res_lo FR f3 significand 31 0 else mix_lr_form tmp_res_hi FR f2 significand 63 32 tmp_res_lo FR f3 significand 31 0 FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITIVE fp_update_psr f1 FP Exceptions None Interruptions Illeg...

Page 984: ...d FPSR sf wre using the rounding mode specified by FPSR sf rc The rounded result is placed in FR f1 If either FR f3 or FR f4 is a NaTVal FR f1 is set to NaTVal instead of the computed result The mnemonic values for the opcode s pc are given in Table 2 22 on page 3 56 The mnemonic values for sf are given in Table 2 23 on page 3 56 For the encodings and interpretation of the status field s pc wre an...

Page 985: ...ge 3 56 For the encodings and interpretation of the status field s pc wre and rc refer to Table 5 5 and Table 5 6 on page 1 90 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 f4 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 fp_is_natval FR f4 FR f1 NATVAL fp_update_psr f1 else tmp_default_result fms_fnma_exception_fault_che...

Page 986: ...Volume 3 Instruction Reference 3 87 fms Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault Floating point Exception trap ...

Page 987: ...int Negate Format qp fneg f1 f3 pseudo op of qp fmerge ns f1 f3 f3 Description The value in FR f3 is negated and placed in FR f1 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation See fmerge Floating point Merge on page 3 80 ...

Page 988: ...alue Format qp fnegabs f1 f3 pseudo op of qp fmerge ns f1 f0 f3 Description The absolute value of the value in FR f3 is computed negated and placed in FR f1 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation See fmerge Floating point Merge on page 3 80 ...

Page 989: ...able 2 23 on page 3 56 For the encodings and interpretation of the status field s pc wre and rc refer to Table 5 5 and Table 5 6 on page 1 90 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 f4 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 fp_is_natval FR f4 FR f1 NATVAL fp_update_psr f1 else tmp_default_result fms_fnma_exce...

Page 990: ...Volume 3 Instruction Reference 3 91 fnma Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault Floating point Exception trap ...

Page 991: ...R sf pc and FPSR sf wre using the rounding mode specified by FPSR sf rc The rounded result is placed in FR f1 If either FR f3 or FR f4 is a NaTVal FR f1 is set to NaTVal instead of the computed result The mnemonic values for the opcode s pc are given in Table 2 22 on page 3 56 The mnemonic values for sf are given in Table 2 23 on page 3 56 For the encodings and interpretation of the status field s...

Page 992: ...ing the rounding mode specified by FPSR sf rc and placed in FR f1 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result The mnemonic values for the opcode s pc are given in Table 2 22 on page 3 56 The mnemonic values for sf are given in Table 2 23 on page 3 56 For the encodings and interpretation of the status field s pc wre and rc refer to Table 5 5 and Table 5 6 on page 1 90...

Page 993: ...n field of FR f1 is set to positive 0 If either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else FR f1 significand FR f2 significand FR f3 significand FR f1 exponent FP_INTEGER_EX...

Page 994: ...ir of single precision values in the significand field of FR f3 are computed and stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation See fpmerge Floating point Parallel Merge on page 3 111 ...

Page 995: ... FR f1 is set to NaTVal instead of the computed result Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else tmp_res_hi fp_single FR f2 tmp_res_lo fp_single FR f3 FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIG...

Page 996: ...nt instructions The Invalid Operation is signaled in the same manner as for the fpcmp lt operation The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fpminmax_exception_fault_check f2 f3 sf tmp_fp_e...

Page 997: ...ruction Reference fpamax FP Exceptions Invalid Operation V Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 998: ...structions The Invalid Operation is signaled in the same manner as for the fpcmp lt operation The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fpminmax_exception_fault_check f2 f3 sf tmp_fp_env if...

Page 999: ...ruction Reference fpamin FP Exceptions Invalid Operation V Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1000: ...implemented in hardware Some are actually pseudo ops For these the assembler simply switches the source operand specifiers and or switches the predicate type specifiers and uses an implemented relation If either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Table 2 29 Floating point Parallel Comparison Results PR qp 0 PR qp 1 Result false No Source NaTVals Result...

Page 1001: ...nlt tmp_rel fp_less_than tmp_fr2 tmp_fr3 else if frel nle tmp_rel fp_lesser_or_equal tmp_fr2 tmp_fr3 else if frel ngt tmp_rel fp_less_than tmp_fr3 tmp_fr2 else if frel nge tmp_rel fp_lesser_or_equal tmp_fr3 tmp_fr2 else tmp_rel fp_unordered tmp_fr2 tmp_fr3 ord tmp_res_hi tmp_rel 0xFFFFFFFF 0x00000000 tmp_fr2 fp_reg_read_lo f2 tmp_fr3 fp_reg_read_lo f3 if frel eq tmp_rel fp_equal tmp_fr2 tmp_fr3 el...

Page 1002: ...mp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITIVE fp_update_fpsr sf tmp_fp_env fp_update_psr f1 FP Exceptions Invalid Operation V Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1003: ...er the rounding mode specified in the FPSR sf rc or using Round to Zero if the trunc_form of the instruction is used The result is written as a pair of 32 bit integers into the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If the result of the conversion cannot be represented as a 32 bit integ...

Page 1004: ..._INTEGER_EXP tmp_res exponent if signed_form tmp_res sign tmp_res significand tmp_res significand 1 tmp_res_hi tmp_res significand 31 0 if fp_is_nan tmp_default_result_pair lo tmp_res_lo INTEGER_INDEFINITE_32_BIT else tmp_res fp_ieee_rnd_to_int_sp fp_reg_read_lo f2 LOW tmp_fp_env if tmp_res exponent tmp_res significand fp_U64_rsh tmp_res significand FP_INTEGER_EXP tmp_res exponent if signed_form t...

Page 1005: ...3 106 Volume 3 Instruction Reference fpcvt fx Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault Floating point Exception trap ...

Page 1006: ...f rc The pair of rounded results are stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If any of FR f3 FR f4 or FR f2 is a NaTVal FR f1 is set to NaTVal instead of the computed results Note If f2 is f0 in the fpma instruction just the IEEE multiply operation is performed See fpmpy Fl...

Page 1007: ...tmp_res_hi fp_ieee_round_sp tmp_res HIGH tmp_fp_env if fp_is_nan_or_inf tmp_default_result_pair lo tmp_res_lo fp_single tmp_default_result_pair lo else tmp_res fp_mul fp_reg_read_lo f3 fp_reg_read_lo f4 if f2 0 tmp_res fp_add tmp_res fp_reg_read_lo f2 tmp_fp_env tmp_res_lo fp_ieee_round_sp tmp_res LOW tmp_fp_env FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP F...

Page 1008: ...Invalid Operation is signaled in the same manner as for the fpcmp lt operation The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fpminmax_exception_fault_check f2 f3 sf tmp_fp_env if fp_raise_fault...

Page 1009: ...3 110 Volume 3 Instruction Reference fpmax Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1010: ... and the significands of the pair of single precision values in the significand field of FR f3 and stored in FR f1 For the sign_exp_form the signs and exponents of the pair of single precision values in the significand field of FR f2 are concatenated with the pair of single precision significands in the significand field of FR f3 and stored in the significand field of FR f1 For all forms the expon...

Page 1011: ...nificand 62 32 tmp_res_lo FR f2 significand 31 31 FR f3 significand 30 0 else sign_exp_form tmp_res_hi FR f2 significand 63 55 23 FR f3 significand 54 32 tmp_res_lo FR f2 significand 31 23 23 FR f3 significand 22 0 FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITIVE fp_update_psr f1 FP Exceptions None Interruptions Illegal Operation faul...

Page 1012: ...Invalid Operation is signaled in the same manner as for the fpcmp lt operation The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else fpminmax_exception_fault_check f2 f3 sf tmp_fp_env if fp_raise_fault...

Page 1013: ...3 114 Volume 3 Instruction Reference fpmin Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1014: ...g the rounding mode specified by FPSR sf rc The pair of rounded results are stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If either FR f3 or FR f4 is a NaTVal FR f1 is set to NaTVal instead of the computed results The mnemonic values for sf are given in Table 2 23 on page 3 56 Th...

Page 1015: ...erformed The mnemonic values for sf are given in Table 2 23 on page 3 56 The encodings and interpretation for the status field s rc are given in Table 5 6 on page 1 90 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 f4 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 fp_is_natval FR f4 FR f1 NATVAL fp_update_psr f1 else tmp_de...

Page 1016: ...VE fp_update_fpsr sf tmp_fp_env fp_update_psr f1 if fp_raise_traps tmp_fp_env fp_exception_trap fp_decode_trap tmp_fp_env FP Exceptions Invalid Operation V Underflow U Denormal Unnormal Operand D Overflow O Software Assist SWA fault Inexact I Software Assist SWA trap Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault Floating point Exception...

Page 1017: ...recision values in the significand field of FR f3 are negated and stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation See fpmerge Floating point Parallel Merge on page 3 111 ...

Page 1018: ...he pair of single precision values in the significand field of FR f3 are computed negated and stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation See fpmerge Floating point Parallel Merge on page 3 111...

Page 1019: ...rounded to single precision using the rounding mode specified by FPSR sf rc The pair of rounded results are stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If any of FR f3 FR f4 or FR f2 is a NaTVal FR f1 is set to NaTVal instead of the computed result Note If f2 is f0 in the fpnma...

Page 1020: ...env tmp_res_hi fp_ieee_round_sp tmp_res HIGH tmp_fp_env if fp_is_nan_or_inf tmp_default_result_pair lo tmp_res_lo fp_single tmp_default_result_pair lo else tmp_res fp_mul fp_reg_read_lo f3 fp_reg_read_lo f4 tmp_res sign tmp_res sign if f2 0 tmp_res fp_add tmp_res fp_reg_read_lo f2 tmp_fp_env tmp_res_lo fp_ieee_round_sp tmp_res LOW tmp_fp_env FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo F...

Page 1021: ...ision using the rounding mode specified by FPSR sf rc The pair of rounded results are stored in the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If either FR f3 or FR f4 is a NaTVal FR f1 is set to NaTVal instead of the computed results The mnemonic values for sf are given in Table 2 23 on pa...

Page 1022: ...allel frcpa instruction and merge the results into FR f1 keeping PR p2 cleared The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result and PR p2 is cleared The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp f...

Page 1023: ..._is_finite den tmp_res FP_INFINITY tmp_res sign num sign den sign tmp_pred_lo 0 else if fp_is_finite num fp_is_inf den tmp_res FP_ZERO tmp_res sign num sign den sign tmp_pred_lo 0 else if fp_is_zero num fp_is_finite den tmp_res FP_ZERO tmp_res sign num sign den sign tmp_pred_lo 0 else tmp_res fp_ieee_recip den if limits_check lo_fr2_or_quot tmp_pred_lo 0 else tmp_pred_lo 1 tmp_res_lo fp_single tmp...

Page 1024: ...lume 3 Instruction Reference 3 125 fprcpa Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1025: ...when PR p2 is cleared user software is expected to compute the square root for each half using the non parallel frsqrta instruction and merge the results in FR f1 keeping PR p2 cleared The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result and PR p2 is cl...

Page 1026: ...is_pos_inf tmp_fr3 tmp_res FP_ZERO tmp_pred_lo 0 else tmp_res fp_ieee_recip_sqrt tmp_fr3 if limits_check lo tmp_pred_lo 0 else tmp_pred_lo 1 tmp_res_lo fp_single tmp_res FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITIVE PR p2 tmp_pred_hi tmp_pred_lo fp_update_fpsr sf tmp_fp_env fp_update_psr f1 else PR p2 0 FP Exceptions Invalid Operat...

Page 1027: ...re is expected to compute the IEEE 754 quotient FR f2 FR f3 return the result in FR f1 and set PR p2 to 0 If either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result and PR p2 is cleared The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tm...

Page 1028: ...x19e 0x19a 0x197 0x193 0x18f 0x18b 0x187 0x183 0x17f 0x17c 0x178 0x174 0x171 0x16d 0x169 0x166 0x162 0x15e 0x15b 0x157 0x154 0x150 0x14d 0x149 0x146 0x142 0x13f 0x13b 0x138 0x134 0x131 0x12e 0x12a 0x127 0x124 0x120 0x11d 0x11a 0x117 0x113 0x110 0x10d 0x10a 0x107 0x103 0x100 0x0fd 0x0fa 0x0f7 0x0f4 0x0f1 0x0ee 0x0eb 0x0e8 0x0e5 0x0e2 0x0df 0x0dc 0x0d9 0x0d6 0x0d3 0x0d0 0x0cd 0x0ca 0x0c8 0x0c5 0x0c2...

Page 1029: ...nce frcpa return tmp_res FP Exceptions Invalid Operation V Zero Divide Z Denormal Unnormal Operand D Software Assist SWA fault Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1030: ...754 square root result then a Floating point Exception fault for Software Assist occurs System software is expected to compute the IEEE 754 square root return the result in FR f1 and set PR p2 to 0 If FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result and PR p2 is cleared The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp fp_check_target_registe...

Page 1031: ...07 0x005 0x003 0x001 0x3fc 0x3f4 0x3ec 0x3e5 0x3dd 0x3d5 0x3ce 0x3c7 0x3bf 0x3b8 0x3b1 0x3aa 0x3a3 0x39c 0x395 0x38e 0x388 0x381 0x37a 0x374 0x36d 0x367 0x361 0x35a 0x354 0x34e 0x348 0x342 0x33c 0x336 0x330 0x32b 0x325 0x31f 0x31a 0x314 0x30f 0x309 0x304 0x2fe 0x2f9 0x2f4 0x2ee 0x2e9 0x2e4 0x2df 0x2da 0x2d5 0x2d0 0x2cb 0x2c6 0x2c1 0x2bd 0x2b8 0x2b3 0x2ae 0x2aa 0x2a5 0x2a1 0x29c 0x298 0x293 0x28f 0...

Page 1032: ...Volume 3 Instruction Reference 3 133 frsqrta Interruptions Illegal Operation fault Floating point Exception fault Disabled Floating point Register fault ...

Page 1033: ... to the biased exponent for 2 063 0x1003E The sign bit field of FR f1 is set to positive 0 If any of FR f3 FR f4 or FR f2 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 f4 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 fp_is_natval FR f4 FR f1 NATVAL else FR ...

Page 1034: ...ally AND ing the sf0 controls and amask7 immediate field and logically OR ing the omask7 immediate field The mnemonic values for sf are given in Table 2 23 on page 3 56 Operation if PR qp tmp_controls AR FPSR sf0 controls amask7 omask7 if is_reserved_field FSETC sf tmp_controls reserved_register_field_fault fp_set_sf_controls sf tmp_controls FP Exceptions None Interruptions Reserved Register Field...

Page 1035: ...sf wre using the rounding mode specified by FPSR sf rc and placed in FR f1 If either FR f3 or FR f2 is a NaTVal FR f1 is set to NaTVal instead of the computed result The mnemonic values for the opcode s pc are given in Table 2 22 on page 3 56 The mnemonic values for sf are given in Table 2 23 on page 3 56 For the encodings and interpretation of the status field s pc wre and rc refer to Table 5 5 a...

Page 1036: ...on value is negated For the swap_nr_form the left single precision value in FR f2 is concatenated with the right single precision value in FR f3 The concatenated pair is then swapped and the right single precision value is negated For all forms the exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 For all forms if either FR f2 ...

Page 1037: ...icand 31 31 FR f3 significand 30 0 tmp_res_lo FR f2 significand 63 32 else swap_nr_form tmp_res_hi FR f3 significand 31 0 tmp_res_lo FR f2 significand 63 31 FR f2 significand 62 32 FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITIVE fp_update_psr f1 FP Exceptions None Interruptions Illegal Operation fault Disabled Floating point Register...

Page 1038: ...n FR f3 For all forms the exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sign field of FR f1 is set to positive 0 For all forms if either FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Figure 2 21 Floating point Sign Extend Left Figure 2 22 Floating point Sign Extend Right 81 0 80 64 81 0 80 64 63 81 0 80 64 63 FR f2 FR f3 FR f1 31...

Page 1039: ...if sxt_l_form tmp_res_hi FR f2 significand 63 0xFFFFFFFF 0x00000000 tmp_res_lo FR f3 significand 63 32 else sxt_r_form tmp_res_hi FR f2 significand 31 0xFFFFFFFF 0x00000000 tmp_res_lo FR f3 significand 31 0 FR f1 significand fp_concatenate tmp_res_hi tmp_res_lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITIVE fp_update_psr f1 FP Exceptions None Interruptions Illegal Operation fault Disabl...

Page 1040: ... of any prior stores is completed An fwb instruction does not ensure ordering of stores since later stores may be flushed before prior stores To ensure prior coalesced stores are made visible before later stores software must issue a release operation between stores see Table 4 15 on page 2 83 for a list of release operations This instruction can be used to help ensure stores held in write or coal...

Page 1041: ...he sign field of FR f1 is set to positive 0 If either of FR f2 or FR f3 is a NaTVal FR f1 is set to NaTVal instead of the computed result Operation if PR qp fp_check_target_register f1 if tmp_isrcode fp_reg_disabled f1 f2 f3 0 disabled_fp_register_fault tmp_isrcode 0 if fp_is_natval FR f2 fp_is_natval FR f3 FR f1 NATVAL else FR f1 significand FR f2 significand FR f3 significand FR f1 exponent FP_I...

Page 1042: ...d Figure 5 8 on page 1 95 respectively In the single_form the most significant 32 bits of GR r1 are set to 0 In the exponent_form the exponent field of FR f2 is copied to bits 16 0 of GR r1 and the sign bit of the value in FR f2 is copied to bit 17 of GR r1 The most significant 46 bits of GR r1 are set to zero In the significand_form the significand field of the value in FR f2 is copied to GR r1 F...

Page 1043: ...gle_form GR r1 31 0 fp_fr_to_mem_format FR f2 4 0 GR r1 63 32 0 else if double_form GR r1 fp_fr_to_mem_format FR f2 8 0 else if exponent_form GR r1 63 18 0 GR r1 16 0 FR f2 exponent GR r1 17 FR f2 sign else significand_form GR r1 FR f2 significand if fp_is_natval FR f2 GR r1 nat 1 else GR r1 nat 0 Interruptions Illegal Operation fault Disabled Floating point Register fault ...

Page 1044: ...pinning or performing low priority tasks This hint can be used by the processor to allocate more resources or time to another executing stream on the same processor For the case where the currently executing stream is spinning or otherwise waiting for a particular address in memory to change an advanced load to that address should be done before executing a hint pause this hint can be used by the ...

Page 1045: ...T are invalidated In the complete_form all ALAT entries are invalidated In the entry_form the ALAT is queried using the general register specifier r1 gr_form or the floating point register specifier f1 fr_form and if any ALAT entry matches it is invalidated Operation if PR qp if complete_form alat_inval else entry_form if gr_form alat_inval_single_entry GENERAL r1 else fr_form alat_inval_single_en...

Page 1046: ... algorithm The visibility of the itc instruction to externally generated purges ptc g ptc ga must occur before subsequent memory operations From a software perspective this is similar to acquire semantics Serialization is still required to observe the side effects of a translation being present itc must be the last instruction in an instruction group otherwise its behavior including its ordering s...

Page 1047: ..._entries tmp_rid tmp_va tmp_size slot tlb_replacement_algorithm ITC_TYPE tlb_insert_inst slot GR r2 CR ITIR CR IFA tmp_rid TC else data_form tlb_must_purge_dtc_entries tmp_rid tmp_va tmp_size tlb_may_purge_itc_entries tmp_rid tmp_va tmp_size slot tlb_replacement_algorithm DTC_TYPE tlb_insert_data slot GR r2 CR ITIR CR IFA tmp_rid TC Interruptions Machine Check abort Reserved Register Field fault I...

Page 1048: ...xplicit ptr instructions before inserting the new TR entry This instruction can only be executed at the most privileged level and when PSR ic and PSR vm are both 0 Operation if PR qp if PSR ic illegal_operation_fault if PSR cpl 0 privileged_operation_fault 0 if GR r3 nat GR r2 nat register_nat_consumption_fault 0 slot GR r3 7 0 tmp_size CR ITIR ps tmp_va CR IFA 60 0 tmp_rid RR CR IFA 63 61 rid tmp...

Page 1049: ...an instruction serialization operation before a dependent instruction fetch access For the data_form software must issue a data serialization operation before issuing a data access or non access reference dependent on the new translation Notes The processor may use invalid translation registers for translation cache entries Performance can be improved on some processor models by ensuring translati...

Page 1050: ..._form are none and acq For the fill_form an 8 byte value is loaded and a bit in the UNAT application register is copied into the target register NaT bit This instruction is used for reloading a spilled register NaT pair See Section 4 4 4 Control Speculation on page 1 60 for details In the base update forms the value in GR r3 is added to either a signed immediate value imm9 or a value from GR r2 an...

Page 1051: ...ceptions may be deferred Deferral causes the target register s NaT bit to be set and the processor ensures that no ALAT entry exists for the target register The absence of an ALAT entry is later used to detect deferral or collision c nc Check load no clear The ALAT is searched for a matching entry If found no load is done and the target register is unchanged Regardless of ALAT hit or miss base reg...

Page 1052: ...NaTPage is optional On processor models that do not support such ld16 accesses an Unsupported Data Reference fault is raised when an unsupported reference is attempted For the sixteen_byte_form Illegal Operation fault is raised on processor models that do not support the instruction CPUID register 4 indicates the presence of the feature on the processor model See Section 3 1 11 Processor Identific...

Page 1053: ...1 r3 illegal_operation_fault check_target_register r1 if reg_base_update_form imm_base_update_form check_target_register r3 if reg_base_update_form tmp_r2 GR r2 tmp_r2nat GR r2 nat if speculative GR r3 nat fault on NaT address register_nat_consumption_fault itype defer speculative GR r3 nat PSR ed defer exception if spec if check alat_cmp GENERAL r1 translate_address alat_translate_address_on_hit ...

Page 1054: ...m fill NaT on ld8 fill bit_pos GR r3 8 3 GR r1 val GR r1 nat AR UNAT bit_pos else clear NaT on other types if size 16 GR r1 val AR CSD val_ar else GR r1 zero_ext val size 8 GR r1 nat 0 if check_no_clear advanced ma_is_speculative mattr add entry to ALAT alat_write ldtype GENERAL r1 paddr size if imm_base_update_form update base register GR r3 GR r3 sign_ext imm9 9 GR r3 nat GR r3 nat else if reg_b...

Page 1055: ...sumption fault Data Key Miss fault Unimplemented Data Address fault Data Key Permission fault Data Nested TLB fault Data Access Rights fault Alternate Data TLB fault Data Access Bit fault VHPT Data fault Data Debug fault Data TLB fault Unaligned Data Reference fault Data Page Not Present fault Unsupported Data Reference fault ...

Page 1056: ...the fill_form a 16 byte value is loaded and the appropriate fields are placed in FR f1 without conversion This instruction is used for reloading a spilled register See Section 4 4 4 Control Speculation on page 1 60 for details In the base update forms the value in GR r3 is added to either a signed immediate value imm9 or a value from GR r2 and the result is placed back in GR r3 This base register ...

Page 1057: ...r a cacheable page with write back policy nor a NaTPage is optional On processor models that do not support such ldfe accesses an Unsupported Data Reference fault is raised when an unsupported reference is attempted The fault is delivered only on the normal advanced and check load flavors Control speculative flavors of ldfe always defer the Unsupported Data Reference fault sa Speculative Advanced ...

Page 1058: ...t PSR ed defer exception if spec if check alat_cmp FLOAT f1 translate_address alat_translate_address_on_hit fldtype FLOAT f1 read_memory alat_read_memory_on_hit fldtype FLOAT f1 if translate_address if check_clear advanced remove any old ALAT entry alat_inval_single_entry FLOAT f1 else if defer paddr tlb_translate GR r3 size itype PSR cpl mattr defer spontaneous_deferral paddr size UM be mattr UNO...

Page 1059: ... GR r3 ldhint itype fp_update_psr f1 Interruptions Illegal Operation fault Data NaT Page Consumption fault Disabled Floating point Register fault Data Key Miss fault Register NaT Consumption fault Data Key Permission fault Unimplemented Data Address fault Data Access Rights fault Data Nested TLB fault Data Access Bit fault Alternate Data TLB fault Data Debug fault VHPT Data fault Unaligned Data Re...

Page 1060: ...lue in GR r3 is added to an implied immediate value equal to double the data size and the result is placed back in GR r3 This base register update is done after the load and does not affect the load address The value of the ldhint modifier specifies the locality of the memory access The mnemonic values of ldhint are given in Table 2 34 on page 3 152 A prefetch hint is implied in the base update fo...

Page 1061: ... fault on NaT address register_nat_consumption_fault itype defer speculative GR r3 nat PSR ed defer exception if spec if check alat_cmp FLOAT f1 translate_address alat_translate_address_on_hit fldtype FLOAT f1 read_memory alat_read_memory_on_hit fldtype FLOAT f1 if translate_address if check_clear advanced remove any old ALAT entry alat_inval_single_entry FLOAT f1 else if defer paddr tlb_translate...

Page 1062: ... register GR r3 GR r3 size GR r3 nat GR r3 nat if GR r3 nat mem_implicit_prefetch GR r3 ldhint itype fp_update_psr f1 fp_update_psr f2 Interruptions Illegal Operation fault Data Page Not Present fault Disabled Floating point Register fault Data NaT Page Consumption fault Register NaT Consumption fault Data Key Miss fault Unimplemented Data Address fault Data Key Permission fault Data Nested TLB fa...

Page 1063: ...he completer lftype specifies whether or not the instruction raises faults normally associated with a regular load Table 2 37 defines these two options In the base update forms after being used to address memory the value in GR r3 is incremented by either the sign extended value in imm9 in the imm_base_update_form or the value in GR r2 in the reg_base_update_form In the reg_base_update_form if the...

Page 1064: ...andled invisibly e g if handling the fault would involve terminating the program the OS must return to the interrupted program skipping over the data prefetch This can easily be done by setting the IPSR ed bit to 1 before executing an rfi to go back to the process which will allow the lfetch fault to perform its base register post increment if specified but will suppress any prefetch request and h...

Page 1065: ...r if defer mem_promote paddr mattr lfhint excl_hint if imm_base_update_form GR r3 GR r3 sign_ext imm9 9 GR r3 nat GR r3 nat else if reg_base_update_form GR r3 GR r3 GR r2 GR r3 nat GR r2 nat GR r3 nat if reg_base_update_form imm_base_update_form GR r3 nat mem_implicit_prefetch GR r3 lfhint excl_hint itype Interruptions Illegal Operation fault Data Page Not Present fault Register NaT Consumption fa...

Page 1066: ...ion will fault with an Illegal Operation fault under any of the following conditions the RSE is not in enforced lazy mode RSC mode is non zero CFM sof and RSC loadrs are both non zero an attempt is made to load up more registers than are available in the physical stacked register file This instruction must be the first instruction in an instruction group and must either be in instruction slot 0 or...

Page 1067: ...m prevents any subsequent data memory accesses by the processor from initiating transactions to the external platform until all prior loads to sequential pages have returned data and all prior stores to sequential pages have been accepted by the external platform The definition of acceptance is platform dependent The acceptance_form is typically used to ensure the processor has waited until a memo...

Page 1068: ...ix4 r r1 r2 r3 four_byte_form right_form I2 Description The data elements of GR r2 and r3 are mixed as shown in Figure 2 25 and the result placed in GR r1 The data elements in the source registers are grouped in pairs and one element from each pair is selected for the result In the left_form the result is formed from the leftmost elements from each of the pairs In the right_form the result is form...

Page 1069: ...olume 3 Instruction Reference mix Figure 2 25 Mix Examples GR r2 GR r1 GR r3 mix1 l GR r2 GR r1 GR r3 GR r2 GR r1 GR r3 GR r2 GR r1 GR r3 mix1 r GR r2 GR r1 GR r3 mix2 l mix2 r GR r2 GR r1 GR r3 mix4 l mix4 r ...

Page 1070: ...8 x 7 y 7 x 5 y 5 x 3 y 3 x 1 y 1 else right_form GR r1 concatenate8 x 6 y 6 x 4 y 4 x 2 y 2 x 0 y 0 else if two_byte_form two byte elements x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 if left_form GR r1 concatenate4 x 3 y 3 x 1 y 1 else right_form GR r1 concatenate4 x 2 y 2 x 0 y 0 else four byte elements x 0 GR r2 ...

Page 1071: ...gn extended value in imm8 in the immediate_form is placed in AR ar3 In the register_form if the NaT bit corresponding to GR r2 is set then a Register NaT Consumption fault is raised Only a subset of the application registers can be accessed by each execution unit M or I Table 3 3 on page 1 28 indicates which application registers may be accessed from which execution unit type An access to an appli...

Page 1072: ...3 ar3 BSPSTORE ar3 RNAT AR RSC mode 0 illegal_operation_fault if register_form GR r2 nat register_nat_consumption_fault 0 if is_reserved_field AR_TYPE ar3 tmp_val reserved_register_field_fault if is_kernel_reg ar3 ar3 ITC ar3 RUC PSR cpl 0 privileged_register_fault if ar3 ITC ar3 RUC PSR vm 1 virtualization_fault if is_ignored_reg ar3 tmp_val ignored_field_mask AR_TYPE ar3 tmp_val check for illega...

Page 1073: ...the value being moved into BR b1 The return_form is used to provide the hint that this value will be used in a return type branch The values for the mwh whether hint completer are given in Table 2 39 For a description of the ih hint completer see the Branch Prediction instruction and Table 2 13 on page 3 32 A pseudo op is provided for copying a general register into a branch register when there is...

Page 1074: ...l Operation fault Operation if PR qp if is_reserved_reg CR_TYPE cr3 to_form is_read_only_reg CR_TYPE cr3 PSR ic is_interruption_cr cr3 illegal_operation_fault if from_form check_target_register r1 if PSR cpl 0 privileged_operation_fault 0 if from_form if PSR vm 1 virtualization_fault if cr3 IVR check_interrupt_request if cr3 ITIR GR r1 impl_itir_cwi_mask CR ITIR else GR r1 CR cr3 GR r1 nat 0 else ...

Page 1075: ... fault Serialization Reads of control registers reflect the results of all prior instruction groups and interruptions In general writes to control registers do not immediately affect subsequent instructions Software must issue a serialize operation before a dependent instruction uses a modified resource Control register writes are not implicitly synchronized with a corresponding control register r...

Page 1076: ...ion Reference 3 177 mov fr mov Move Floating point Register Format qp mov f1 f3 pseudo op of qp fmerge s f1 f3 f3 Description The value of FR f3 is copied to FR f1 Operation See fmerge Floating point Merge on page 3 80 ...

Page 1077: ... 178 Volume 3 Instruction Reference mov gr mov Move General Register Format qp mov r1 r3 pseudo op of qp adds r1 0 r3 Description The value of GR r3 is copied to GR r1 Operation See add Add on page 3 14 ...

Page 1078: ...n Reference 3 179 mov imm mov Move Immediate Format qp mov r1 imm22 pseudo op of qp addl r1 imm22 r0 Description The immediate value imm22 is sign extended to 64 bits and placed in GR r1 Operation See add Add on page 3 14 ...

Page 1079: ...PU identification registers can only be read There is no to_form of this instruction For move to protection key register the processor ensures uniqueness of protection keys by checking new valid protection keys against all protection key registers If any matching keys are found duplicate protection keys are invalidated Apart from the PMC and PMD register files access of a non existent register res...

Page 1080: ...ex break case PKR_TYPE GR r1 PKR tmp_index break case PMC_TYPE GR r1 pmc_read tmp_index break case RR_TYPE GR r1 RR tmp_index break GR r1 nat 0 else to_form if PSR cpl 0 privileged_operation_fault 0 if GR r2 nat GR r3 nat register_nat_consumption_fault 0 if is_reserved_reg ireg tmp_index ireg CPUID_TYPE is_reserved_field ireg tmp_index GR r2 reserved_register_field_fault if PSR vm 1 virtualization...

Page 1081: ...ng a memory reference dependent on the modified register For move to instruction breakpoint registers software must issue an instruction serialize operation before fetching an instruction dependent on the modified register For move to protection key region performance monitor configuration and performance monitor data registers software must issue an instruction or data serialize operation to ensu...

Page 1082: ...ve Instruction Pointer Format qp mov r1 ip I25 Description The Instruction Pointer IP for the bundle containing this instruction is copied into GR r1 Operation if PR qp check_target_register r1 GR r1 IP GR r1 nat 0 Interruptions Illegal Operation fault ...

Page 1083: ...ferred exception for GR r2 the NaT bit is 1 a Register NaT Consumption fault is taken In the to_rotate_form only the 48 rotating predicates can be written The source operand is taken from the imm44 operand which is encoded in the instruction in an imm28 field such that imm28 imm44 16 The low 16 bits correspond to the static predicates The immediate is sign extended to set the top 21 predicates Bit...

Page 1084: ... vm is 0 The contents of the interruption resources that are overwritten when the PSR ic bit is 1 are undefined if an interruption occurs between the enabling of the PSR ic bit and a subsequent instruction serialize operation Operation if PR qp if from_form check_target_register r1 if PSR cpl 0 privileged_operation_fault 0 if from_form if PSR vm 1 virtualization_fault tmp_val zero_ext PSR 31 0 32 ...

Page 1085: ...wise PSR up is not modified Writing a non zero value into any other parts of the PSR results in a Reserved Register Field fault Operation if PR qp if from_form check_target_register r1 GR r1 zero_ext PSR 5 0 6 GR r1 nat 0 else to_form if GR r2 nat register_nat_consumption_fault 0 if is_reserved_field PSR_TYPE PSR_UM GR r2 reserved_register_field_fault PSR 1 0 GR r2 1 0 if PSR sp 0 unsecured perf m...

Page 1086: ...ong Immediate Format qp movl r1 imm64 X2 Description The immediate value imm64 is copied to GR r1 The L slot of the bundle contains 41 bits of imm64 Operation if PR qp check_target_register r1 GR r1 imm64 GR r1 nat 0 Interruptions Illegal Operation fault ...

Page 1087: ...rands are treated as unsigned values and are multiplied and the result is placed in GR r1 The upper 32 bits of each of the source operands are ignored Operation if PR qp if instruction_implemented mpy4 illegal_operation_fault check_target_register r1 GR r1 zero_ext GR r2 32 zero_ext GR r3 32 GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1088: ...ts of GR r2 and the upper 32 bits of GR r3 are ignored This instruction can be used to perform a 64 bit integer multiply operation producing a 64 bit result rc ra rb mpy4 r1 ra rb partial product low 32 bits low 32 bits mpyshl4 r2 ra rb partial product high 32 bits low 32 bits mpyshl4 r3 rb ra partial product low 32 bits high 32 bits add r1 r1 r2 partial sum add rc r1 r3 final sum Operation if PR ...

Page 1089: ...given in Table 2 41 and shown in Figure 2 26 Table 2 41 Mux Permutations for 8 bit Elements mbtype4 Function rev Reverse the order of the bytes mix Perform a Mix operation on the two halves of GR r2 shuf Perform a Shuffle operation on the two halves of GR r2 alt Perform an Alternate operation on the two halves of GR r2 brcst Perform a Broadcast operation on the least significand byte of GR r2 Figu...

Page 1090: ...opied to corresponding 16 bit positions in the target register GR r1 The indices are encoded in little endian order The 8 bits of mhtype8 7 0 are grouped in pairs of bits and named mhtype8 3 mhtype8 2 mhtype8 1 mhtype8 0 in the Operation section Figure 2 27 Mux2 Examples 16 bit elements GR r1 GR r2 mux2 r1 r2 0x8d shuffle 10 00 11 01 GR r1 GR r2 mux2 r1 r2 0x1b reverse 00 01 10 11 GR r1 GR r2 mux2...

Page 1091: ...R r1 concatenate8 x 7 x 3 x 5 x 1 x 6 x 2 x 4 x 0 break case shuf GR r1 concatenate8 x 7 x 3 x 6 x 2 x 5 x 1 x 4 x 0 break case alt GR r1 concatenate8 x 7 x 5 x 3 x 1 x 6 x 4 x 2 x 0 break case brcst GR r1 concatenate8 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 break else two_byte_form x 0 GR r2 15 0 x 1 GR r2 31 16 x 2 GR r2 47 32 x 3 GR r2 63 48 res 0 x mhtype8 1 0 res 1 x mhtype8 3 2 res 2 x mhtype8 5 4 r...

Page 1092: ...mm62 can be used by software as a marker in program code It is ignored by hardware For the x_unit_form the L slot of the bundle contains the upper 41 bits of imm62 A nop i instruction may be encoded in an MLI template bundle in which case the L slot of the bundle is ignored This instruction has five forms each of which can be executed only on a particular execution unit type The pseudo op can be u...

Page 1093: ...ly ORed and the result placed in GR r1 In the register form the first operand is GR r2 in the immediate form the first operand is taken from the imm8 encoding field Operation if PR qp check_target_register r1 tmp_src register_form GR r2 sign_ext imm8 8 tmp_nat register_form GR r2 nat 0 GR r1 tmp_src GR r3 GR r1 nat tmp_nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1094: ... source element cannot be represented in the result element then saturation clipping is performed The saturation can either be signed or unsigned If an element is larger than the upper limit value the result is the upper limit value If it is smaller than the lower limit value the result is the lower limit value The saturation limits are given in Table 2 42 Table 2 42 Pack Saturation Limits Size So...

Page 1095: ...16 16 temp 6 sign_ext GR r3 47 32 16 temp 7 sign_ext GR r3 63 48 16 for i 0 i 8 i if temp i max temp i max if temp i min temp i min GR r1 concatenate8 temp 7 temp 6 temp 5 temp 4 temp 3 temp 2 temp 1 temp 0 else four_byte_form max sign_ext 0x7fff 16 signed_saturation_form min sign_ext 0x8000 16 temp 0 sign_ext GR r2 31 0 32 temp 1 sign_ext GR r2 63 32 32 temp 2 sign_ext GR r3 31 0 32 temp 3 sign_e...

Page 1096: ...represented in the result element and a saturation completer is specified then saturation clipping is performed The saturation can either be signed or unsigned as given in Table 2 43 If the sum of two elements is larger than the upper limit value the result is the upper limit value If it is smaller than the lower limit value the result is the lower limit value The saturation limits are given in Ta...

Page 1097: ...ext 0x80 8 for i 0 i 8 i temp i sign_ext x i 8 sign_ext y i 8 else if uus_saturation_form max 0xff min 0x00 for i 0 i 8 i temp i zero_ext x i 8 sign_ext y i 8 else if uuu_saturation_form max 0xff min 0x00 for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 else modulo_form for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 if sss_saturation_form uus_saturation_form uuu_saturation_form for i 0 i 8 i...

Page 1098: ...turation_form max 0xffff min 0x0000 for i 0 i 4 i temp i zero_ext x i 16 zero_ext y i 16 else modulo_form for i 0 i 4 i temp i zero_ext x i 16 zero_ext y i 16 if sss_saturation_form uus_saturation_form uuu_saturation_form for i 0 i 4 i if temp i max temp i max if temp i min temp i min GR r1 concatenate4 temp 3 temp 2 temp 1 temp 0 else four byte elements x 0 GR r2 31 0 y 0 GR r3 31 0 x 1 GR r2 63 ...

Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...

Page 1100: ...e bit position The high order bits of each element are filled with the carry bits of the sums To prevent cumulative round off errors an averaging is performed The unsigned results are placed in GR r1 The averaging operation works as follows In the normal_form the low order bit of each result is set to 1 if at least one of the two least significant bits of the corresponding sum is 1 In the raz_form...

Page 1101: ...lume 3 Instruction Reference pavg Figure 2 31 Parallel Average with Round Away from Zero Example GR r2 GR r1 GR r3 pavg2 raz Sum Bits Carry Bit 1 1 1 1 16 bit Sum Plus Carry Shift Right 1 Bit Shift Right 1 Bit ...

Page 1102: ... else normal form for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 res i shift_right_unsigned temp i 1 temp i 0 GR r1 concatenate8 res 7 res 6 res 5 res 4 res 3 res 2 res 1 res 0 else two_byte_form x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 if raz_form for i 0 i 4 i temp i zero_ext x i 16 zero_ext y i 16 1 res i s...

Page 1103: ... position The high order bits of each element are filled with the borrow bits of the subtraction the complements of the ALU carries To prevent cumulative round off errors an averaging is performed The low order bit of each result is set to 1 if at least one of the two least significant bits of the corresponding difference is 1 The signed results are placed in GR r1 Figure 2 32 Parallel Average Sub...

Page 1104: ...48 x 7 GR r2 63 56 y 7 GR r3 63 56 for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 res i temp i 8 0 u 1 temp i 0 GR r1 concatenate8 res 7 res 6 res 5 res 4 res 3 res 2 res 1 res 0 else two_byte_form x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 for i 0 i 4 i temp i zero_ext x i 16 zero_ext y i 16 res i temp i 16 0 u...

Page 1105: ...GR r2 and GR r3 then the corresponding data element in GR r1 is set to all ones If the comparison condition is false then the corresponding data element in GR r1 is set to all zeros For the relation both operands are interpreted as signed Table 2 45 Pcmp Relations prel Compare Relation r2 prel r3 eq r2 r3 gt r2 r3 signed Figure 2 33 Parallel Compare Examples GR r2 GR r1 GR r3 pcmp2 eq True False T...

Page 1106: ...f tmp_rel res i 0xff else res i 0x00 GR r1 concatenate8 res 7 res 6 res 5 res 4 res 3 res 2 res 1 res 0 else if two_byte_form two byte elements x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 for i 0 i 4 i if prel eq tmp_rel x i y i else gt tmp_rel greater_signed sign_ext x i 16 sign_ext y i 16 if tmp_rel res i 0xffff el...

Page 1107: ...3 208 Volume 3 Instruction Reference pcmp else res i 0x00000000 GR r1 concatenate2 res 1 res 0 GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1108: ...R r2 is compared with the corresponding unsigned 8 bit element of GR r3 and the greater of the two is placed in the corresponding 8 bit element of GR r1 In the two_byte_form each signed 16 bit element of GR r2 is compared with the corresponding signed 16 bit element of GR r3 and the greater of the two is placed in the corresponding 16 bit element of GR r1 Figure 2 34 Parallel Maximum Examples GR r...

Page 1109: ...x 6 GR r2 55 48 y 6 GR r3 55 48 x 7 GR r2 63 56 y 7 GR r3 63 56 for i 0 i 8 i res i zero_ext x i 8 zero_ext y i 8 y i x i GR r1 concatenate8 res 7 res 6 res 5 res 4 res 3 res 2 res 1 res 0 else two byte elements x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 for i 0 i 4 i res i sign_ext x i 16 sign_ext y i 16 y i x i GR...

Page 1110: ...R r2 is compared with the corresponding unsigned 8 bit element of GR r3 and the smaller of the two is placed in the corresponding 8 bit element of GR r1 In the two_byte_form each signed 16 bit element of GR r2 is compared with the corresponding signed 16 bit element of GR r3 and the smaller of the two is placed in the corresponding 16 bit element of GR r1 Figure 2 35 Parallel Minimum Examples GR r...

Page 1111: ...x 6 GR r2 55 48 y 6 GR r3 55 48 x 7 GR r2 63 56 y 7 GR r3 63 56 for i 0 i 8 i res i zero_ext x i 8 zero_ext y i 8 x i y i GR r1 concatenate8 res 7 res 6 res 5 res 4 res 3 res 2 res 1 res 0 else two byte elements x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 for i 0 i 4 i res i sign_ext x i 16 sign_ext y i 16 x i y i GR...

Page 1112: ...32 bit results are placed in GR r1 Operation if PR qp check_target_register r1 if right_form GR r1 31 0 sign_ext GR r2 15 0 16 sign_ext GR r3 15 0 16 GR r1 63 32 sign_ext GR r2 47 32 16 sign_ext GR r3 47 32 16 else left_form GR r1 31 0 sign_ext GR r2 31 16 16 sign_ext GR r3 31 16 16 GR r1 63 32 sign_ext GR r2 63 48 16 sign_ext GR r3 63 48 16 GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Oper...

Page 1113: ... is then shifted to the right count2 bits and the least significant 16 bits of each shifted product form 4 16 bit results which are placed in GR r1 A count2 of 0 gives the 16 low bits of the results a count2 of 16 gives the 16 high bits of the results The allowed values for count2 are given in Table 2 46 Table 2 46 Parallel Multiply and Shift Right Shift Options count2 Selected Bit Field from Each...

Page 1114: ... r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 for i 0 i 4 i if unsigned_form unsigned multiplication temp i zero_ext x i 16 zero_ext y i 16 else signed multiplication temp i sign_ext x i 16 sign_ext y i 16 res i temp i count2 15 count2 GR r1 concatenate4 res 3 res 2 res 1 res 0 GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1115: ... r3 I9 Description The number of bits in GR r3 having the value 1 is counted and the resulting sum is placed in GR r1 Operation if PR qp check_target_register r1 res 0 Count up all the one bits for i 0 i 64 i res GR r3 i GR r1 res GR r1 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1116: ... bits 60 0 and the region register indexed by GR r3 bits 63 61 is permitted at the privilege level given by either GR r2 bits 1 0 or imm2 If PSR pk is 1 protection key checks are also performed The read or write form specifies whether the instruction checks for read or write access or both When PSR dt is 0 a regular_form probe uses its address operand as a virtual address to query the DTLB only be...

Page 1117: ...robe Instructions Probe Form Type Faults regular_form Register NaT Consumption fault Virtualization faulta Data Nested TLB fault Alternate Data TLB fault VHPT Data fault Data TLB fault Data Page Not Present fault Data NaT Page Consumption fault Data Key Miss fault a This instruction may optionally raise Virtualization faults see Section 11 7 4 2 8 Probe Instruction Virtualization on page 2 344 for...

Page 1118: ...if fault_form tlb_translate GR r3 1 itype tmp_pl mattr defer else regular_form if impl_probe_intercept check_probe_virtualization_fault itype tmp_pl GR r1 tlb_grant_permission GR r3 itype tmp_pl GR r1 nat 0 Interruptions Illegal Operation fault Data Page Not Present fault Register NaT Consumption fault Data NaT Page Consumption fault Unimplemented Data Address fault Data Key Miss fault Virtualizat...

Page 1119: ... Description The unsigned 8 bit elements of GR r2 are subtracted from the unsigned 8 bit elements of GR r3 The absolute value of each difference is accumulated across the elements and placed in GR r1 Figure 2 38 Parallel Sum of Absolute Difference Example psad1 GR r2 GR r1 GR r3 abs abs abs abs abs abs abs abs ...

Page 1120: ... r2 23 16 y 2 GR r3 23 16 x 3 GR r2 31 24 y 3 GR r3 31 24 x 4 GR r2 39 32 y 4 GR r3 39 32 x 5 GR r2 47 40 y 5 GR r3 47 40 x 6 GR r2 55 48 y 6 GR r3 55 48 x 7 GR r2 63 56 y 7 GR r3 63 56 GR r1 0 for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 if temp i 0 temp i temp i GR r1 temp i GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1121: ...er than 15 for 16 bit quantities or 31 for 32 bit quantities yield all zero results The results are placed in GR r1 Operation if PR qp check_target_register r1 shift_count variable_form GR r3 count5 tmp_nat variable_form GR r3 nat 0 if two_byte_form two_byte_form if shift_count u 16 shift_count 16 GR r1 15 0 GR r2 15 0 shift_count GR r1 31 16 GR r2 31 16 shift_count GR r1 47 32 GR r2 47 32 shift_c...

Page 1122: ...as a signed 16 bit value the final result is saturated The four signed 16 bit results are placed in GR r1 The first operand can be shifted by 1 2 or 3 bits Operation if PR qp check_target_register r1 x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 max sign_ext 0x7fff 16 min sign_ext 0x8000 16 for i 0 i 4 i temp i sign_ex...

Page 1123: ...able_form I5 qp pshr4 u r1 r3 count5 unsigned_form four_byte_form fixed_form I6 Description The data elements of GR r3 are each independently shifted to the right by the scalar shift count in GR r2 or in the immediate field count5 The high order bits of each element are filled with either the initial value of the sign bits of the data elements in GR r3 arithmetic shift or zeros logical shift The s...

Page 1124: ...count else signed shift GR r1 15 0 shift_right_signed sign_ext GR r3 15 0 16 shift_count GR r1 31 16 shift_right_signed sign_ext GR r3 31 16 16 shift_count GR r1 47 32 shift_right_signed sign_ext GR r3 47 32 16 shift_count GR r1 63 48 shift_right_signed sign_ext GR r3 63 48 16 shift_count else four_byte_form if shift_count 32 shift_count 32 if unsigned_form unsigned shift GR r1 31 0 shift_right_un...

Page 1125: ...r2 The add operation is performed with signed saturation The four signed 16 bit results of the add are placed in GR r1 The first operand can be shifted by 1 2 or 3 bits Operation if PR qp check_target_register r1 x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 max sign_ext 0x7fff 16 min sign_ext 0x8000 16 for i 0 i 4 i t...

Page 1126: ... represented in the result element and a saturation completer is specified then saturation clipping is performed The saturation can either be signed or unsigned as given in Table 2 48 If the difference of two elements is larger than the upper limit value the result is the upper limit value If it is smaller than the lower limit value the result is the lower limit value The saturation limits are giv...

Page 1127: ...e if uus_saturation_form uus_saturation_form max 0xff min 0x00 for i 0 i 8 i temp i zero_ext x i 8 sign_ext y i 8 else if uuu_saturation_form uuu_saturation_form max 0xff min 0x00 for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 else modulo_form for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 if sss_saturation_form uus_saturation_form uuu_saturation_form for i 0 i 8 i if temp i max temp i max...

Page 1128: ...0 i 4 i temp i zero_ext x i 16 zero_ext y i 16 else modulo_form for i 0 i 4 i temp i zero_ext x i 16 zero_ext y i 16 if sss_saturation_form uus_saturation_form uuu_saturation_form for i 0 i 4 i if temp i max temp i max if temp i min temp i min GR r1 concatenate4 temp 3 temp 2 temp 1 temp 0 else four byte elements x 0 GR r2 31 0 y 0 GR r3 31 0 x 1 GR r2 63 32 y 1 GR r3 63 32 for i 0 i 2 i modulo_fo...

Page 1129: ...all processor models Software can acquire parameters through a processor dependent layer that is accessed through a procedural interface The selected region registers must remain unchanged during the loop disable_interrupts addr base for i 0 i count1 i for j 0 j count2 j ptc e addr addr stride2 addr stride1 enable_interrupts This instruction can only be executed at the most privileged level and wh...

Page 1130: ...tc g must be the last instruction in an instruction group otherwise its behavior including its ordering semantics is undefined The behavior of the ptc ga instruction is similar to ptc g In addition to the behavior specified for ptc g the ptc ga instruction encodes an extra bit of information in the broadcast transaction This information specifies the purge is due to a page remapping as opposed to ...

Page 1131: ...on fault Register NaT Consumption fault Serialization The broadcast purge TC is not synchronized with the instruction stream on a remote processor Software cannot depend on any such synchronization with the instruction stream Hardware on the remote machine cannot reload an instruction from memory or cache after acknowledging a broadcast purge TC without first retranslating the I side access in the...

Page 1132: ...translation cache This instruction can only be executed at the most privileged level and when PSR vm is 0 This is a local operation no purge broadcast to other processors occurs in a multiprocessor system This instruction ensures that all prior stores are made locally visible before the actual purge operation is performed Operation if PR qp if PSR cpl 0 privileged_operation_fault 0 if GR r3 nat GR...

Page 1133: ...the purge In addition in both forms the instruction and data translation cache may be purged of more translations than specified by the purge parameters up to and including removal of all entries within the translation cache The purge virtual address is specified by GR r3 bits 60 0 and the purge region identifier is selected by GR r3 bits 63 61 GR r2 specifies the address range of the purge as 1 G...

Page 1134: ...ze tlb_may_purge_itc_entries tmp_rid tmp_va tmp_size else instruction_form tlb_must_purge_itr_entries tmp_rid tmp_va tmp_size tlb_must_purge_itc_entries tmp_rid tmp_va tmp_size tlb_may_purge_dtc_entries tmp_rid tmp_va tmp_size Interruptions Privileged Operation fault Unimplemented Data Address fault Register NaT Consumption fault Virtualization fault Serialization For the data form software must i...

Page 1135: ...is instruction can only be executed at the most privileged level and when PSR vm is 0 This instruction can not be predicated Execution of this instruction is undefined if PSR ic or PSR i are 1 Software must ensure that an interruption cannot occur that could modify IIP IPSR or IFS between when they are written and the subsequent rfi Execution of this instruction is undefined if IPSR ic is 0 and th...

Page 1136: ...n EFLAG rf and PSR id are unmodified until the successful completion of the target IA 32 instruction PSR da PSR dd PSR ia and PSR ed are cleared to zero before the target IA 32 instruction begins execution IA 32 instruction set execution leaves the contents of the ALAT undefined Software can not rely on ALAT state across an instruction set transition On entry to IA 32 code existing entries in the ...

Page 1137: ...date CR IFS ifm sof 0 rse_restore_frame CR IFS ifm sof tmp_growth CFM sof CFM CR IFS ifm rse_enable_current_frame_load IP tmp_IP instruction_serialize if unimplemented_address unimplemented_instruction_address_trap 0 tmp_IP Interruptions Privileged Operation fault Unimplemented Instruction Address trap Virtualization fault Additional Faults on IA 32 target instructions IA_32_Exception GPFault Disa...

Page 1138: ...mediately following the rsm instruction If the qualifying predicate of the rsm is false then external interrupts may be disabled until the next data serialization operation that follows the rsm instruction The external interrupt disable window is guaranteed to be no larger than defined by the above criteria but it may be smaller depending on the processor implementation When the current privilege ...

Page 1139: ...tualization fault Reserved Register Field fault Serialization Software must use a data serialize or instruction serialize operation before issuing instructions dependent upon the altered PSR bits except the PSR i bit The PSR i bit is implicitly serialized and the processor ensures that external interrupts are masked by the time the next instruction executes ...

Page 1140: ...PSR up is only cleared if the secure performance monitor bit PSR sp is zero Otherwise PSR up is not modified Operation if PR qp if is_reserved_field PSR_TYPE PSR_UM imm24 reserved_register_field_fault if imm24 1 PSR 1 0 be if imm24 2 PSR sp 0 non secure perf monitor PSR 2 0 up if imm24 3 PSR 3 0 ac if imm24 4 PSR 4 0 mfl if imm24 5 PSR 5 0 mfh Interruptions Reserved Register Field fault Serializat...

Page 1141: ...ure 5 5 on page 1 93 respectively In the exponent_form bits 16 0 of GR r2 are copied to the exponent field of FR f1 and bit 17 of GR r2 is copied to the sign bit of FR f1 The significand field of FR f1 is set to one 0x800 000 In the significand_form the value in GR r2 is copied to the significand field of FR f1 The exponent field of FR f1 is set to the biased exponent for 2 063 0x1003E and the sig...

Page 1142: ...orm FR f1 fp_mem_to_fr_format GR r2 4 0 else if double_form FR f1 fp_mem_to_fr_format GR r2 8 0 else if significand_form FR f1 significand GR r2 FR f1 exponent FP_INTEGER_EXP FR f1 sign 0 else exponent_form FR f1 significand 0x8000000000000000 FR f1 exp GR r2 16 0 FR f1 sign GR r2 17 else FR f1 NATVAL fp_update_psr f1 Interruptions Illegal Operation fault Disabled Floating point Register fault ...

Page 1143: ...and placed in GR r1 The number of bit positions to shift is specified by the value in GR r3 or by an immediate value count6 The shift count is interpreted as an unsigned number If the value in GR r3 is greater than 63 then the result is all zeroes See dep Deposit on page 3 51 for the immediate form Operation if PR qp check_target_register r1 count GR r3 GR r1 count 63 0 GR r2 count GR r1 nat GR r2...

Page 1144: ... first source operand is shifted to the left by count2 bits and then added to the second source operand and the result placed in GR r1 The first operand can be shifted by 1 2 3 or 4 bits Operation if PR qp check_target_register r1 GR r1 GR r2 count2 GR r3 GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1145: ...ts of the result are forced to zero and then bits 31 30 of GR r3 are copied to bits 62 61 of the result This result is placed in GR r1 The first operand can be shifted by 1 2 3 or 4 bits Operation if PR qp check_target_register r1 tmp_res GR r2 count2 GR r3 tmp_res zero_ext tmp_res 31 0 32 tmp_res 62 61 GR r3 31 30 GR r1 tmp_res GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault F...

Page 1146: ... to shift is specified by the value in GR r2 or by an immediate value count6 The shift count is interpreted as an unsigned number If the value in GR r2 is greater than 63 then the result is all zeroes for the unsigned_form or if bit 63 of GR r3 was 0 or all ones for the signed_form if bit 63 of GR r3 was 1 If the u completer is specified the shift is unsigned logical otherwise it is signed arithme...

Page 1147: ...the right count6 bits The least significant 64 bits of the result are placed in GR r1 The immediate value count6 can be any number in the range 0 to 63 Operation if PR qp check_target_register r1 temp1 shift_right_unsigned GR r3 count6 temp2 GR r2 64 count6 GR r1 zero_ext temp1 64 count6 temp2 GR r1 nat GR r2 nat GR r3 nat Interruptions Illegal Operation fault Figure 2 44 Shift Right Pair GR r3 GR...

Page 1148: ...n must be in an instruction group after the instruction group containing the srlz i Data serialization srlz d ensures prior modifications to processor register resources that affect subsequent execution or data memory accesses are observed The srlz d instruction must be in an instruction group after the instruction group containing the operation that is to be serialized Operations dependent on the...

Page 1149: ...PSR_SM imm24 reserved_register_field_fault if PSR vm 1 virtualization_fault if imm24 1 PSR 1 1 be if imm24 2 PSR 2 1 up if imm24 3 PSR 3 1 ac if imm24 4 PSR 4 1 mfl if imm24 5 PSR 5 1 mfh if imm24 13 PSR 13 1 ic if imm24 14 PSR 14 1 i if imm24 15 PSR 15 1 pk if imm24 17 PSR 17 1 dt if imm24 18 PSR 18 1 dfl if imm24 19 PSR 19 1 dfh if imm24 20 PSR 20 1 sp if imm24 21 PSR 21 1 pp if imm24 22 PSR 22 ...

Page 1150: ...uction is used for spilling a register NaT pair See Section 4 4 4 Control Speculation on page 1 60 for details In the imm_base_update form the value in GR r3 is added to a signed immediate value imm9 and the result is placed back in GR r3 This base register update is done after the store and does not affect the store address nor the value stored for the case where r2 and r3 specify the same regist...

Page 1151: ...e GR r3 size itype PSR cpl mattr tmp_unused if spill_form GR r2 nat natd_gr_write GR r2 paddr size UM be mattr otype sthint else if sixteen_byte_form mem_write16 GR r2 AR CSD paddr UM be mattr otype sthint else mem_write GR r2 paddr size UM be mattr otype sthint if spill_form bit_pos GR r3 8 3 AR UNAT bit_pos GR r2 nat alat_inval_multiple_entries paddr size if imm_base_update_form GR r3 GR r3 sign...

Page 1152: ...Volume 3 Instruction Reference 3 253 st Data TLB fault Unaligned Data Reference fault Data Page Not Present fault Unsupported Data Reference fault Data NaT Page Consumption fault ...

Page 1153: ...r format In the spill_form a 16 byte value from FR f2 is stored without conversion This instruction is used for spilling a register See Section 4 4 4 Control Speculation on page 1 60 for details In the imm_base_update form the value in GR r3 is added to a signed immediate value imm9 and the result is placed back in GR r3 This base register update is done after the store and does not affect the sto...

Page 1154: ...ite val paddr size UM be mattr UNORDERED sthint alat_inval_multiple_entries paddr size if imm_base_update_form GR r3 GR r3 sign_ext imm9 9 GR r3 nat 0 mem_implicit_prefetch GR r3 sthint WRITE Interruptions Illegal Operation fault Data NaT Page Consumption fault Disabled Floating point Register fault Data Key Miss fault Register NaT Consumption fault Data Key Permission fault Unimplemented Data Add...

Page 1155: ...e register form the first operand is GR r2 in the immediate form the first operand is taken from the sign extended imm8 encoding field The minus1_form is available only in the register_form although the equivalent effect can be achieved by adjusting the immediate Operation if PR qp check_target_register r1 tmp_src register_form GR r2 sign_ext imm8 8 tmp_nat register_form GR r2 nat 0 if minus1_form...

Page 1156: ...only be set if the secure performance monitor bit PSR sp is zero Otherwise PSR up is not modified Operation if PR qp if is_reserved_field PSR_TYPE PSR_UM imm24 reserved_register_field_fault if imm24 1 PSR 1 1 be if imm24 2 PSR sp 0 non secure perf monitor PSR 2 1 up if imm24 3 PSR 3 1 ac if imm24 4 PSR 4 1 mfl if imm24 5 PSR 5 1 mfh Interruptions Reserved Register Field fault Serialization All use...

Page 1157: ...nded from the bit position specified by xsz and the result is placed in GR r1 The mnemonic values for xsz are given in Table 2 52 Operation if PR qp check_target_register r1 GR r1 sign_ext GR r3 xsz 8 GR r1 nat GR r3 nat Interruptions Illegal Operation fault Table 2 52 xsz Mnemonic Values xsz Mnemonic Bit Position 1 7 2 15 4 31 ...

Page 1158: ... the programmer must explicitly insert ordered data references acquire release or fence type to appropriately constrain sync i and hence fc and fc i visibility to the data stream on other processors sync i is used to maintain an ordering relationship between instruction and data caches on local and remote processors An instruction serialize operation must be used to ensure synchronization initiate...

Page 1159: ...cified by GR r3 the value 1 is returned When PSR dt is 0 only the DTLB is searched because the VHPT walker is disabled If no matching present translation is found in the DTLB the value 1 is returned A translation with the NaTPage attribute is not treated differently and returns its key field This instruction can only be executed at the most privileged level and when PSR vm is 0 Operation if PR qp ...

Page 1160: ...o and zero sense of the test For normal and unc types only the z value is directly implemented in hardware the nz value is actually a pseudo op For it the assembler simply switches the predicate target specifiers and uses the implemented relation For the parallel types both relations are implemented in hardware If the two predicate register destinations are the same p1 and p2 specify the same pred...

Page 1161: ...re if GR r3 nat tmp_rel PR p1 0 PR p2 0 break case or or type compare if GR r3 nat tmp_rel PR p1 1 PR p2 1 break case or andcm or andcm type compare if GR r3 nat tmp_rel PR p1 1 PR p2 0 break case unc unc type compare default normal compare if GR r3 nat PR p1 0 PR p2 0 else PR p1 tmp_rel PR p2 tmp_rel break else if ctype unc if p1 p2 illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Op...

Page 1162: ...he test For normal and unc types only the z value is directly implemented in hardware the nz value is actually a pseudo op For it the assembler simply switches the predicate target specifiers and uses the implemented relation For the parallel types both relations are implemented in hardware If the two predicate register destinations are the same p1 and p2 specify the same predicate register the in...

Page 1163: ...rel tmp_rel switch ctype case and and type compare if tmp_rel PR p1 0 PR p2 0 break case or or type compare if tmp_rel PR p1 1 PR p2 1 break case or andcm or andcm type compare if tmp_rel PR p1 1 PR p2 0 break case unc unc type compare default normal compare PR p1 tmp_rel PR p2 tmp_rel break else if ctype unc if p1 p2 illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...

Page 1164: ... implementation specific long format hash function on the virtual address to generate a hash index into the long format VHPT In the long format a translation in the VHPT must be uniquely identified by its hash index generated by this instruction and the hash tag produced from the ttag instruction The hash function must use all implemented region bits and only virtual address bits 60 0 to determine...

Page 1165: ...r normal and unc types only the z value is directly implemented in hardware the nz value is actually a pseudo op For it the assembler simply switches the predicate target specifiers and uses the implemented relation For the parallel types both relations are implemented in hardware If the two predicate register destinations are the same p1 and p2 specify the same predicate register the instruction ...

Page 1166: ...h ctype case and and type compare if tmp_rel PR p1 0 PR p2 0 break case or or type compare if tmp_rel PR p1 1 PR p2 1 break case or andcm or andcm type compare if tmp_rel PR p1 1 PR p2 0 break case unc unc type compare default normal compare PR p1 tmp_rel PR p2 tmp_rel break else if ctype unc if p1 p2 illegal_operation_fault PR p1 0 PR p2 0 Interruptions Illegal Operation fault ...

Page 1167: ...nslation is found in the DTLB an Alternate Data TLB fault is raised if psr ic is one or a Data Nested TLB fault is raised if psr ic is zero If this instruction faults then it will set the non access bit in the ISR The ISR read and write bits are not set This instruction can only be executed at the most privileged level and when PSR vm is 0 Operation if PR qp itype NON_ACCESS TPA check_target_regis...

Page 1168: ...generation function must use all implemented region bits and only virtual address bits 60 0 PTA vf is ignored by this instruction A translation in the long format VHPT must be uniquely identified by its hash index generated by the thash instruction and the tag produced from this instruction This instruction must be implemented on all processor models even processor models that do not implement a V...

Page 1169: ...yte_form low_form I2 qp unpack2 l r1 r2 r3 two_byte_form low_form I2 qp unpack4 l r1 r2 r3 four_byte_form low_form I2 Description The data elements of GR r2 and r3 are unpacked and the result placed in GR r1 In the high_form the most significant elements of each source register are selected while in the low_form the least significant elements of each source register are selected Elements are selec...

Page 1170: ...n Reference 3 271 unpack Figure 2 45 Unpack Operation GR r2 GR r1 GR r3 unpack1 h GR r2 GR r1 GR r3 GR r2 GR r1 GR r3 GR r2 GR r1 GR r3 unpack1 l GR r2 GR r1 GR r3 unpack2 h unpack2 l GR r2 GR r1 GR r3 unpack4 h unpack4 l ...

Page 1171: ...nate8 x 7 y 7 x 6 y 6 x 5 y 5 x 4 y 4 else low_form GR r1 concatenate8 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 else if two_byte_form two byte elements x 0 GR r2 15 0 y 0 GR r3 15 0 x 1 GR r2 31 16 y 1 GR r3 31 16 x 2 GR r2 47 32 y 2 GR r3 47 32 x 3 GR r2 63 48 y 3 GR r3 63 48 if high_form GR r1 concatenate4 x 3 y 3 x 2 y 2 else low_form GR r1 concatenate4 x 1 y 1 x 0 y 0 else four byte elements x 0 GR r2 ...

Page 1172: ...f PSR vm Instructions in subsequent instruction groups will be executed with PSR vm equal to the new value If the above conditions are not met this instruction takes a Virtualization fault This instruction can only be executed at the most privileged level This instruction cannot be predicated Implementation of PSR vm is optional If it is not implemented this instruction takes Illegal Operation fau...

Page 1173: ...write access privileges for the referenced page are required The exchange is performed with acquire semantics i e the memory read write is made visible prior to all subsequent data memory accesses See Section 4 4 7 Sequentiality Attribute and Ordering on page 2 82 for details on memory ordering The memory read and write are guaranteed to be atomic This instruction is only supported to cacheable pa...

Page 1174: ... mattr ACQUIRE ldhint alat_inval_multiple_entries paddr sz GR r1 zero_ext val sz 8 GR r1 nat 0 Interruptions Illegal Operation fault Data Key Miss fault Register NaT Consumption fault Data Key Permission fault Unimplemented Data Address fault Data Access Rights fault Data Nested TLB fault Data Dirty Bit fault Alternate Data TLB fault Data Access Bit fault VHPT Data fault Data Debug fault Data TLB ...

Page 1175: ...d in the significand field of FR f1 In the high_form the significand fields of FR f3 and FR f4 are treated as signed integers and multiplied to produce a full 128 bit signed result The significand field of FR f2 is zero extended and added to the product The most significant 64 bits of the resultant sum are placed in the significand field of FR f1 In the other forms the significand fields of FR f3 ...

Page 1176: ...m tmp_res_128 fp_I64_x_I64_to_I128 FR f3 significand FR f4 significand else high_unsigned_form tmp_res_128 fp_U64_x_U64_to_U128 FR f3 significand FR f4 significand tmp_res_128 fp_U128_add tmp_res_128 fp_U64_to_U128 FR f2 significand if high_form high_unsigned_form FR f1 significand tmp_res_128 hi else low_form FR f1 significand tmp_res_128 lo FR f1 exponent FP_INTEGER_EXP FR f1 sign FP_SIGN_POSITI...

Page 1177: ...ificant 64 bits of the resultant product are placed in the significand field of FR f1 In the high_form the significand fields of FR f3 and FR f4 are treated as signed integers and multiplied to produce a full 128 bit signed result The most significant 64 bits of the resultant product are placed in the significand field of FR f1 In the other forms the significand fields of FR f3 and FR f4 are treat...

Page 1178: ...gically XORed and the result placed in GR r1 In the register_form the first operand is GR r2 in the imm8_form the first operand is taken from the imm8 encoding field Operation if PR qp check_target_register r1 tmp_src register_form GR r2 sign_ext imm8 8 tmp_nat register_form GR r2 nat 0 GR r1 tmp_src GR r3 GR r1 nat tmp_nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1179: ... value in GR r3 is zero extended above the bit position specified by xsz and the result is placed in GR r1 The mnemonic values for xsz are given in Table 2 52 on page 3 258 Operation if PR qp check_target_register r1 GR r1 zero_ext GR r3 xsz 8 GR r1 nat GR r3 nat Interruptions Illegal Operation fault ...

Page 1180: ...at_inval_multiple_entries paddr size The ALAT is queried using the physical memory address specified by paddr and the access size specified by size All matching ALAT entries are invalidated No value is returned alat_inval_single_entry rtype rega The ALAT is queried using the register type specified by rtype and the register address specified by rega At most one matching ALAT entry is invalidated N...

Page 1181: ...e sum ready for rounding fcmp_exception_fault_check f2 f3 frel sf tmp_fp_env Checks for all floating point faulting conditions for the fcmp instruction fcvt_fx_exception_fault_check fr2 signed_form trunc_form sf tmp_fp_env Checks for all floating point faulting conditions for the fcvt fx fcvt fxu fcvt fx trunc and fcvt fxu trunc instructions It propagates NaNs fma_exception_fault_check f2 f3 f4 pc...

Page 1182: ...psr sf tmp_fp_env Copies a floating point instruction s local state into the global FPSR fp_update_psr dest_freg Conditionally sets PSR mfl or PSR mfh based on dest_freg fpcmp_exception_fault_check f2 f3 frel sf tmp_fp_env Checks for all floating point faulting conditions for the fpcmp instruction fpcvt_exception_fault_check f2 signed_form trunc_form sf tmp_fp_env Checks for all floating point fau...

Page 1183: ...tation specific function indicates whether probe interceptions are supported impl_ruc Implementation specific function which indicates whether Resource Utilization Counter RUC application register is implemented impl_uia_fault_supported Implementation specific function that either returns TRUE if the processor reports unimplemented instruction addresses with an Unimplemented Instruction Address fa...

Page 1184: ...ry attributes specified by mattr and access hint specified by hint otype specifies the memory ordering attribute of this access and must be UNORDERED or ACQUIRE mem_read_pair low_value high_value paddr size border mattr otype hint Reads the size 2 bytes of memory starting at the physical memory address specified by paddr into low_value and the size 2 bytes of memory starting at the physical memory...

Page 1185: ... ordering only affects the ordering of bytes within each of the 8 byte values stored otype specifies the memory ordering attribute of this access and has the value ACQUIRE or RELEASE ordering_fence Ensures prior data memory references are made visible before future data memory references are made visible by the processor partially_implemented_ip Implementation dependent routine which returns TRUE ...

Page 1186: ...es by how many registers the top of the current frame will grow growth will generally be negative The number of registers specified by preserved_sol are marked to be restored Register renaming causes the preserved_sol registers before GR 32 to be renamed to GR 32 AR BSP is updated to contain the backing store address where the new GR 32 will be stored If the number of dirty and clean registers is ...

Page 1187: ...ge access rights and the privilege level assigned to the page is higher than numerically less than the current privilege level then the current privilege level is set to the privilege level field in the translation for the page containing the epc instruction tlb_grant_permission vaddr type pl Returns a boolean indicating if read write access is granted for the specified virtual memory address vadd...

Page 1188: ...he tlb_must_purge_itc_entries rid vaddr size Purges all local possibly overlapping ITC entry matching the specified region identifier rid virtual address vaddr and page size size vaddr 63 61 VRN is ignored in the purge i e all entries that match vaddr 60 0 must be purged regardless of the VRN bits If the purge size is not supported an implementation may generate a machine check abort or over purge...

Page 1189: ...ult is generated tlb_translate_nonaccess does not return The following faults are checked Unimplemented Data Address fault Virtualization fault tpa only Data Nested TLB fault Alternate Data TLB fault VHPT Data fault Data TLB fault Data Page Not Present fault Data NaT Page Consumption fault Data Access Rights fault fc only tlb_vhpt_hash vrn vaddr61 rid size Generates a VHPT entry address for the sp...

Page 1190: ...it and virtual machine features are disabled See Section 3 4 Processor Virtualization on page 2 44 in SDM and PAL_PROC_GET_FEATURES Get Processor Dependent Features 17 on page 2 446 in SDM for details vm_select_probes Returns TRUE if the processor is configured to virtualize selected probe instructions when PSR vm is 1 See Section 11 7 4 2 8 Probe Instruction Virtualization on page 2 344 for detai...

Page 1191: ...3 292 Volume 3 Pseudo Code Functions Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 1192: ... for each encoding of the template field A double line to the right of an instruction slot indicates that a stop occurs at that point within the current bundle See Instruction Encoding Overview on page 1 38 for the definition of a stop Within a bundle execution order proceeds from slot 0 to slot 2 Unused template values appearing as empty rows in Table 4 2 are reserved and cause an Illegal Operati...

Page 1193: ...Field Encoding and Instruction Slot Mapping Template Slot 0 Slot 1 Slot 2 00 M unit I unit I unit 01 M unit I unit I unit 02 M unit I unit I unit 03 M unit I unit I unit 04 M unit L unit X unita 05 M unit L unit X unita 06 07 08 M unit M unit I unit 09 M unit M unit I unit 0A M unit M unit I unit 0B M unit M unit I unit 0C M unit F unit I unit 0D M unit F unit I unit 0E M unit M unit F unit 0F M u...

Page 1194: ...on field names used throughout this chapter are described in Table 4 6 on page 3 298 The set of special notations such as whether an instruction is privileged are listed in Table 4 7 on page 3 299 These notations appear in the Instruction column of the opcode tables Most instruction containing immediates encode those immediates in more than one instruction field For example the 14 bit immediate in...

Page 1195: ... len6d r3 cpos6b r1 qp Deposit I15 4 cpos6d len4d r3 r2 r1 qp Test Bit I16 5 tb x2 ta p2 r3 pos6b y c p1 qp Test NaT I17 5 tb x2 ta p2 r3 x y c p1 qp Nop Hint I18 0 i x3 x6 y imm20a qp Break I19 0 i x3 x6 imm20a qp Int Spec Check I20 0 s x3 imm13c r2 imm7a qp Move to BR I21 0 x3 timm9c ih x wh r2 b1 qp Move from BR I22 0 x3 x6 b2 r1 qp Move to Pred I23 0 s x3 mask8c r2 mask7a qp Move to Pred Imm44...

Page 1196: ...btype 0 IP Relative Call B3 5 s d wh imm20b p b1 qp Indirect Branch B4 0 d wh x6 b2 p btype qp Indirect Call B5 1 d wh b2 p b1 qp IP Relative Predict B6 7 s ih t2e imm20b timm7a wh Indirect Predict B7 2 ih t2e x6 b2 timm7a wh Misc B8 0 x6 0 Break Nop Hint B9 0 2 i x6 imm20a qp FP Arithmetic F1 8 D x sf f4 f3 f2 f1 qp Fixed Multiply Add F2 E x x2 f4 f3 f2 f1 qp FP Select F3 E x f4 f3 f2 f1 qp FP Co...

Page 1197: ...opcode extension ccount5c multimedia shift left complemented shift count immediate count5b count6d multimedia shift right shift right pair shift count immediate cposx deposit complemented bit position immediate cr3 control register source target ct2d multimedia multiply shift shift and add shift count immediate d branch cache deallocation hint opcode extension fn floating point register source tar...

Page 1198: ... 0 Reserved if PR qp is 1 B unit instructions medium gray in the gray scale version of the tables cyan in the color version cause an Illegal Operation fault if the predicate register specified by the qp field of the instruction bits 5 0 is 1 and execute as a nop instruction if 0 These differ from the Reserved if PR qp is 1 instructions purple only in their RAW dependency behavior see RAW Dependenc...

Page 1199: ... in this revision of the architecture future architecture revisions may define these fields as hint extensions These hint extensions will be defined such that the 0 value in each field corresponds to the default hint It is expected that assemblers will automatically set these fields to zero by default Unused opcode hint extension values white color entries in Hint Completer tables should not be us...

Page 1200: ... A2 5 6 shladdp4 A2 7 8 9 sub imm8 A3 A B and imm8 A3 andcm imm8 A3 or imm8 A3 xor imm8 A3 C D E F 40 373635343332 29282726 2019 1312 6 5 0 8 x2a ve x4 x2b r3 r2 r1 qp 4 1 2 1 4 2 7 7 7 6 Instruction Operands Opcode Extension x2a ve x4 x2b add r1 r2 r3 8 0 0 0 0 r1 r2 r3 1 1 sub r1 r2 r3 1 1 r1 r2 r3 1 0 addp4 r1 r2 r3 2 0 and 3 0 andcm 1 or 2 xor 3 40 373635343332 29282726 2019 1312 6 5 0 8 x2a v...

Page 1201: ...opcodes C E using a 2 bit opcode extension field x2 in bits 35 34 and two 1 bit opcode extension fields in bits 33 ta and 12 c as shown in Table 4 11 40 373635343332 29282726 2019 1312 6 5 0 8 s x2a ve x4 x2b r3 imm7b r1 qp 4 1 2 1 4 2 7 7 7 6 Instruction Operands Opcode Extension x2a ve x4 x2b sub r1 imm8 r3 8 0 0 9 1 and B 0 andcm 1 or 2 xor 3 40 373635343332 2726 2019 1312 6 5 0 8 s x2a ve imm6...

Page 1202: ... A6 1 cmp4 ne and A6 cmp4 ne or A6 cmp4 ne or andcm A6 1 0 0 cmp4 gt and A7 cmp4 gt or A7 cmp4 gt or andcm A7 1 cmp4 le and A7 cmp4 le or A7 cmp4 le or andcm A7 1 0 cmp4 ge and A7 cmp4 ge or A7 cmp4 ge or andcm A7 1 cmp4 lt and A7 cmp4 lt or A7 cmp4 lt or andcm A7 Table 4 11 Integer Compare Immediate Opcode Extensions x2 Bits 35 34 ta Bit 33 c Bit 12 Opcode Bits 40 37 C D E 2 0 0 cmp lt imm8 A8 cm...

Page 1203: ... Opcode Extension x2 tb ta c cmp lt p1 p2 r2 r3 C 0 0 0 0 cmp ltu D cmp eq E cmp lt unc C 1 cmp ltu unc D cmp eq unc E cmp eq and C 1 0 cmp eq or D cmp eq or andcm E cmp ne and C 1 cmp ne or D cmp ne or andcm E cmp4 lt C 1 0 0 0 cmp4 ltu D cmp4 eq E cmp4 lt unc C 1 cmp4 ltu unc D cmp4 eq unc E cmp4 eq and C 1 0 cmp4 eq or D cmp4 eq or andcm E cmp4 ne and C 1 cmp4 ne or D cmp4 ne or andcm E ...

Page 1204: ...x2 tb ta c cmp gt and p1 p2 r0 r3 C 0 1 0 0 cmp gt or D cmp gt or andcm E cmp le and C 1 cmp le or D cmp le or andcm E cmp ge and C 1 0 cmp ge or D cmp ge or andcm E cmp lt and C 1 cmp lt or D cmp lt or andcm E cmp4 gt and C 1 0 0 cmp4 gt or D cmp4 gt or andcm E cmp4 le and C 1 cmp4 le or D cmp4 le or andcm E cmp4 ge and C 1 0 cmp4 ge or D cmp4 ge or andcm E cmp4 lt and C 1 cmp4 lt or D cmp4 lt or...

Page 1205: ...6 2019 1312 11 6 5 0 C E s x2 ta p2 r3 imm7b c p1 qp 4 1 2 1 6 7 7 1 6 6 Instruction Operands Opcode Extension x2 ta c cmp lt p1 p2 imm8 r3 C 2 0 0 cmp ltu D cmp eq E cmp lt unc C 1 cmp ltu unc D cmp eq unc E cmp eq and C 1 0 cmp eq or D cmp eq or andcm E cmp ne and C 1 cmp ne or D cmp ne or andcm E cmp4 lt C 3 0 0 cmp4 ltu D cmp4 eq E cmp4 lt unc C 1 cmp4 ltu unc D cmp4 eq unc E cmp4 eq and C 1 0...

Page 1206: ...uuu A9 psub1 uus A9 2 pavg1 A9 pavg1 raz A9 3 pavgsub1 A9 4 5 6 7 8 9 pcmp1 eq A9 pcmp1 gt A9 A B C D E F Table 4 14 Multimedia ALU Size 2 4 bit 2 bit Opcode Extensions Opcode Bits 40 37 x2a Bits 35 34 za Bit 36 zb Bit 33 x4 Bits 32 29 x2b Bits 28 27 0 1 2 3 8 1 0 1 0 padd2 A9 padd2 sss A9 padd2 uuu A9 padd2 uus A9 1 psub2 A9 psub2 sss A9 psub2 uuu A9 psub2 uus A9 2 pavg2 A9 pavg2 raz A9 3 pavgsub...

Page 1207: ... Table 4 15 Multimedia ALU Size 4 4 bit 2 bit Opcode Extensions Opcode Bits 40 37 x2a Bits 35 34 za Bit 36 zb Bit 33 x4 Bits 32 29 x2b Bits 28 27 0 1 2 3 8 1 1 0 0 padd4 A9 1 psub4 A9 2 3 4 5 6 7 8 9 pcmp4 eq A9 pcmp4 gt A9 A B C D E F ...

Page 1208: ...padd1 uuu 0 0 2 padd2 uuu 1 padd1 uus 0 0 3 padd2 uus 1 psub1 0 0 1 0 psub2 1 psub4 1 0 psub1 sss 0 0 1 psub2 sss 1 psub1 uuu 0 0 2 psub2 uuu 1 psub1 uus 0 0 3 psub2 uus 1 pavg1 0 0 2 2 pavg2 1 pavg1 raz 0 0 3 pavg2 raz 1 pavgsub1 0 0 3 2 pavgsub2 1 pcmp1 eq 0 0 9 0 pcmp2 eq 1 pcmp4 eq 1 0 pcmp1 gt 0 0 1 pcmp2 gt 1 pcmp4 gt 1 0 40 373635343332 29282726 2019 1312 6 5 0 8 za x2a zb x4 ct2d r3 r2 r1 ...

Page 1209: ...2 bit field in bits 29 28 x2b and most have a 2 bit field in bits 31 30 x2c as shown in Table 4 17 Table 4 16 Multimedia and Variable Shift 1 bit Opcode Extensions Opcode Bits 40 37 za Bit 36 zb Bit 33 ve Bit 32 0 1 7 0 0 Multimedia Size 1 Table 4 17 1 Multimedia Size 2 Table 4 18 1 0 Multimedia Size 4 Table 4 19 1 Variable Shift Table 4 20 Table 4 17 Multimedia Opcode 7 Size 1 2 bit Opcode Extens...

Page 1210: ...ed I6 2 0 pack2 uss I2 unpack2 h I2 mix2 r I2 1 pmpy2 r I2 2 pack2 sss I2 unpack2 l I2 mix2 l I2 3 pmin2 I2 pmax2 I2 pmpy2 l I2 3 0 1 pshl2 fixed I8 2 mux2 I4 3 Table 4 19 Multimedia Opcode 7 Size 4 2 bit Opcode Extensions Opcode Bits 40 37 za Bit 36 zb Bit 33 ve Bit 32 x2a Bits 35 34 x2b Bits 29 28 x2c Bits 31 30 0 1 2 3 7 1 0 0 0 0 pshr4 u var I5 pshl4 var I7 1 mpy4 I2 2 pshr4 var I5 3 mpyshl4 I...

Page 1211: ...zb Bit 33 ve Bit 32 x2a Bits 35 34 x2b Bits 29 28 x2c Bits 31 30 0 1 2 3 7 1 1 0 0 0 shr u var I5 shl var I7 1 2 shr var I5 3 1 0 1 2 3 2 0 1 2 3 3 0 1 2 3 40 373635343332313029282726 2019 1312 6 5 0 7 za x2a zb ve ct2d x2b r3 r2 r1 qp 4 1 2 1 1 2 2 1 7 7 7 6 Instruction Operands Opcode Extension za zb ve x2a x2b pmpyshr2 r1 r2 r3 count2 7 0 1 0 0 3 pmpyshr2 u 1 ...

Page 1212: ...mix2 l 0 1 mix4 l 1 0 pack2 uss 0 1 0 0 pack2 sss 0 1 2 pack4 sss 1 0 unpack1 h 0 0 0 1 unpack2 h 0 1 unpack4 h 1 0 unpack1 l 0 0 2 unpack2 l 0 1 unpack4 l 1 0 pmin1 u 0 0 1 0 pmax1 u 1 pmin2 0 1 3 0 pmax2 1 psad1 0 0 3 2 40 3736353433323130292827 2423 2019 1312 6 5 0 7 za x2a zb ve x2c x2b mbt4c r2 r1 qp 4 1 2 1 1 2 2 4 4 7 7 6 Instruction Operands Opcode Extension za zb ve x2a x2b x2c mux1 r1 r2...

Page 1213: ...282726 201918 141312 6 5 0 7 za x2a zb ve x2c x2b r3 count5b r1 qp 4 1 2 1 1 2 2 1 7 1 5 1 7 6 Instruction Operands Opcode Extension za zb ve x2a x2b x2c pshr2 r1 r3 count5 7 0 1 0 1 3 0 pshr4 1 0 pshr2 u 0 1 1 pshr4 u 1 0 40 373635343332313029282726 2019 1312 6 5 0 7 za x2a zb ve x2c x2b r3 r2 r1 qp 4 1 2 1 1 2 2 1 7 7 7 6 Instruction Operands Opcode Extension za zb ve x2a x2b x2c pshl2 r1 r2 r3 ...

Page 1214: ...Right Pair I10 40 373635343332313029282726 2019 1312 6 5 0 7 za x2a zb ve x2c x2b r3 0 r1 qp 4 1 2 1 1 2 2 1 7 7 7 6 Instruction Operands Opcode Extension za zb ve x2a x2b x2c popcnt r1 r3 7 0 1 0 1 1 2 clz 3 Table 4 21 Integer Shift Test Bit Test NaT 2 bit Opcode Extensions Opcode Bits 40 37 x2 Bits 35 34 x Bit 33 y Bit 13 0 1 5 0 0 Test Bit Table 4 23 Test NaT Test Feature Table 4 23 1 extr u I1...

Page 1215: ...1 7 6 Instruction Operands Opcode Extension x2 x y extr u r1 r3 pos6 len6 5 1 0 0 extr 1 40 373635343332 272625 2019 1312 6 5 0 5 x2 x len6d y cpos6c r2 r1 qp 4 1 2 1 6 1 6 7 7 6 Instruction Operands Opcode Extension x2 x y dep z r1 r2 pos6 len6 5 1 1 0 40 373635343332 272625 2019 1312 6 5 0 5 s x2 x len6d y cpos6c imm7b r1 qp 4 1 2 1 6 1 6 7 7 6 Instruction Operands Opcode Extension x2 x y dep z ...

Page 1216: ...t nz and I17 tf nz and I30 1 0 0 0 tbit z or I16 1 tnat z or I17 tf z or I30 1 0 tbit nz or I16 1 tnat nz or I17 tf nz or I30 1 0 0 tbit z or andcm I16 1 tnat z or andcm I17 tf z or andcm I30 1 0 tbit nz or andcm I16 1 tnat nz or andcm I17 tf nz or andcm I30 40 373635343332 2726 2019 141312 11 6 5 0 5 tb x2 ta p2 r3 pos6b y c p1 qp 4 1 2 1 6 7 6 1 1 6 6 Instruction Operands Opcode Extension x2 ta ...

Page 1217: ...and Table 4 25 summarizes the 6 bit assignments 40 373635343332 2726 201918 141312 11 6 5 0 5 tb x2 ta p2 r3 x y c p1 qp 4 1 2 1 6 7 1 5 1 1 6 6 Instruction Operands Opcode Extension x2 ta tb y x c tnat z p1 p2 r3 5 0 0 0 1 0 0 tnat z unc 1 tnat z and 1 0 tnat nz and 1 tnat z or 1 0 0 tnat nz or 1 tnat z or andcm 1 0 tnat nz or andcm 1 Table 4 24 Misc I Unit 3 bit Opcode Extensions Opcode Bits 40 ...

Page 1218: ...35 33 x6 Bits 30 27 Bits 32 31 0 1 2 3 0 0 0 break i I19 zxt1 I29 mov from ip I25 1 1 bit Ext Table 4 26 zxt2 I29 mov from b I22 2 zxt4 I29 mov i from ar I28 3 mov from pr I25 4 sxt1 I29 5 sxt2 I29 6 sxt4 I29 7 8 czx1 l I29 9 czx2 l I29 A mov i to ar imm8 I27 mov i to ar I26 B C czx1 r I29 D czx2 r I29 E F Table 4 26 Misc I Unit 1 bit Opcode Extensions Opcode Bits 40 37 x3 Bits 35 33 x6 Bits 32 27...

Page 1219: ...ormal form and a 1 bit hint extension in bit 23 ih see Table 4 56 on page 3 354 4 3 5 1 Move to BR I21 40 373635 3332 272625 6 5 0 0 i x3 x6 imm20a qp 4 1 3 6 1 20 6 Instruction Operands Opcode Extension x3 x6 break i i imm21 0 0 00 40 373635 3332 2019 1312 6 5 0 0 s x3 imm13c r2 imm7a qp 4 1 3 13 7 7 6 Instruction Operands Opcode Extension x3 chk s i r2 target25 0 1 Table 4 27 Move to BR Whether ...

Page 1220: ...y management instructions on the M unit See GR AR Moves M Unit on page 3 342 See Miscellaneous I Unit Instructions on page 3 318 for a summary of the I Unit GR AR opcode extensions 40 373635 3332 2726 1615 1312 6 5 0 0 x3 x6 b2 r1 qp 4 1 3 6 11 3 7 6 Instruction Operands Opcode Extension x3 x6 mov r1 b2 0 0 31 40 373635 333231 2423 2019 1312 6 5 0 0 s x3 mask8c r2 mask7a qp 4 1 3 1 8 4 7 7 6 Instr...

Page 1221: ...6 mov i ar3 r2 0 0 2A 40 373635 3332 2726 2019 1312 6 5 0 0 s x3 x6 ar3 imm7b qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 mov i ar3 imm8 0 0 0A 40 373635 3332 2726 2019 1312 6 5 0 0 x3 x6 ar3 r1 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 mov i r1 ar3 0 0 32 40 373635 3332 2726 2019 1312 6 5 0 0 x3 x6 r3 r1 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extens...

Page 1222: ... on page 3 324 and Table 4 32 on page 3 325 and the semaphore and get FR opcode extensions in Table 4 33 on page 3 325 The floating point load store 40 373635343332 2726 201918 141312 11 6 5 0 5 tb x2 ta p2 0 x imm5b y c p1 qp 4 1 2 1 6 7 1 5 1 1 6 6 Instruction Operands Opcode Extension x2 ta tb y x c tf z p1 p2 imm5 5 0 0 0 1 1 0 tf z unc 1 tf z and 1 0 tf nz and 1 tf z or 1 0 0 tf nz or 1 tf z ...

Page 1223: ...ll M2 7 8 ld1 c clr M2 ld2 c clr M2 ld4 c clr M2 ld8 c clr M2 9 ld1 c nc M2 ld2 c nc M2 ld4 c nc M2 ld8 c nc M2 A ld1 c clr acq M2 ld2 c clr acq M2 ld4 c clr acq M2 ld8 c clr acq M2 B C st1 M6 st2 M6 st4 M6 st8 M6 D st1 rel M6 st2 rel M6 st4 rel M6 st8 rel M6 E st8 spill M6 F Table 4 31 Integer Load Reg Opcode Extensions Opcode Bits 40 37 m Bit 36 x Bit 27 x6 Bits 35 32 Bits 31 30 0 1 2 3 4 1 0 0 ...

Page 1224: ...M3 A ld1 c clr acq M3 ld2 c clr acq M3 ld4 c clr acq M3 ld8 c clr acq M3 B C st1 M5 st2 M5 st4 M5 st8 M5 D st1 rel M5 st2 rel M5 st4 rel M5 st8 rel M5 E st8 spill M5 F Table 4 33 Semaphore Get FR 16 Byte Opcode Extensions Opcode Bits 40 37 m Bit 36 x Bit 27 x6 Bits 35 32 Bits 31 30 0 1 2 3 4 0 1 0 cmpxchg1 acq M16 cmpxchg2 acq M16 cmpxchg4 acq M16 cmpxchg8 acq M16 1 cmpxchg1 rel M16 cmpxchg2 rel M...

Page 1225: ... M9 ldfd c nc M9 A B lfetch M18 lfetch excl M18 lfetch fault M18 lfetch fault excl M18 C stfe M13 stf8 M13 stfs M13 stfd M13 D E stf spill M13 F Table 4 35 Floating point Load Lfetch Reg Opcode Extensions Opcode Bits 40 37 m Bit 36 x Bit 27 x6 Bits 35 32 Bits 31 30 0 1 2 3 6 1 0 0 ldfe M7 ldf8 M7 ldfs M7 ldfd M7 1 ldfe s M7 ldf8 s M7 ldfs s M7 ldfd s M7 2 ldfe a M7 ldf8 a M7 ldfs a M7 ldfd a M7 3 ...

Page 1226: ...f8 c nc M8 ldfs c nc M8 ldfd c nc M8 A B lfetch M22 lfetch excl M22 lfetch fault M22 lfetch fault excl M22 C stfe M10 stf8 M10 stfs M10 stfd M10 D E stf spill M10 F Table 4 37 Floating point Load Pair Set FR Opcode Extensions Opcode Bits 40 37 m Bit 36 x Bit 27 x6 Bits 35 32 Bits 31 30 0 1 2 3 6 0 1 0 ldfp8 M11 ldfps M11 ldfpd M11 1 ldfp8 s M11 ldfps s M11 ldfpd s M11 2 ldfp8 a M11 ldfps a M11 ldf...

Page 1227: ... 37 m Bit 36 x Bit 27 x6 Bits 35 32 Bits 31 30 0 1 2 3 6 1 1 0 ldfp8 M12 ldfps M12 ldfpd M12 1 ldfp8 s M12 ldfps s M12 ldfpd s M12 2 ldfp8 a M12 ldfps a M12 ldfpd a M12 3 ldfp8 sa M12 ldfps sa M12 ldfpd sa M12 4 5 6 7 8 ldfp8 c clr M12 ldfps c clr M12 ldfpd c clr M12 9 ldfp8 c nc M12 ldfps c nc M12 ldfpd c nc M12 A B C D E F Table 4 39 Load Hint Completer hint Bits 29 28 ldhint 0 none 1 nt1 2 3 nt...

Page 1228: ...nt 09 ld4 a ldhint 0A ld8 a ldhint 0B ld1 sa ldhint 0C ld2 sa ldhint 0D ld4 sa ldhint 0E ld8 sa ldhint 0F ld1 bias ldhint 10 ld2 bias ldhint 11 ld4 bias ldhint 12 ld8 bias ldhint 13 ld1 acq ldhint 14 ld2 acq ldhint 15 ld4 acq ldhint 16 ld8 acq ldhint 17 ld8 fill ldhint 1B ld1 c clr ldhint 20 ld2 c clr ldhint 21 ld4 c clr ldhint 22 ld8 c clr ldhint 23 ld1 c nc ldhint 24 ld2 c nc ldhint 25 ld4 c nc ...

Page 1229: ...8 s ldhint 07 ld1 a ldhint 08 ld2 a ldhint 09 ld4 a ldhint 0A ld8 a ldhint 0B ld1 sa ldhint 0C ld2 sa ldhint 0D ld4 sa ldhint 0E ld8 sa ldhint 0F ld1 bias ldhint 10 ld2 bias ldhint 11 ld4 bias ldhint 12 ld8 bias ldhint 13 ld1 acq ldhint 14 ld2 acq ldhint 15 ld4 acq ldhint 16 ld8 acq ldhint 17 ld8 fill ldhint 1B ld1 c clr ldhint 20 ld2 c clr ldhint 21 ld4 c clr ldhint 22 ld8 c clr ldhint 23 ld1 c n...

Page 1230: ... s ldhint 07 ld1 a ldhint 08 ld2 a ldhint 09 ld4 a ldhint 0A ld8 a ldhint 0B ld1 sa ldhint 0C ld2 sa ldhint 0D ld4 sa ldhint 0E ld8 sa ldhint 0F ld1 bias ldhint 10 ld2 bias ldhint 11 ld4 bias ldhint 12 ld8 bias ldhint 13 ld1 acq ldhint 14 ld2 acq ldhint 15 ld4 acq ldhint 16 ld8 acq ldhint 17 ld8 fill ldhint 1B ld1 c clr ldhint 20 ld2 c clr ldhint 21 ld4 c clr ldhint 22 ld8 c clr ldhint 23 ld1 c nc...

Page 1231: ...hint 31 st4 sthint 32 st8 sthint 33 st1 rel sthint 34 st2 rel sthint 35 st4 rel sthint 36 st8 rel sthint 37 st8 spill sthint 3B st16 sthint r3 r2 ar csd 0 1 30 st16 rel sthint 34 40 373635 3029282726 2019 1312 6 5 0 5 s x6 hint i r3 r2 imm7a qp 4 1 6 2 1 7 7 7 6 Instruction Operands Opcode Extension x6 hint st1 sthint r3 r2 imm9 5 30 See Table 4 40 on page 3 328 st2 sthint 31 st4 sthint 32 st8 sth...

Page 1232: ...e 3 328 ldfd ldhint 03 ldf8 ldhint 01 ldfe ldhint 00 ldfs s ldhint 06 ldfd s ldhint 07 ldf8 s ldhint 05 ldfe s ldhint 04 ldfs a ldhint 0A ldfd a ldhint 0B ldf8 a ldhint 09 ldfe a ldhint 08 ldfs sa ldhint 0E ldfd sa ldhint 0F ldf8 sa ldhint 0D ldfe sa ldhint 0C ldf fill ldhint 1B ldfs c clr ldhint 22 ldfd c clr ldhint 23 ldf8 c clr ldhint 21 ldfe c clr ldhint 20 ldfs c nc ldhint 26 ldfd c nc ldhint...

Page 1233: ...le 4 39 on page 3 328 ldfd ldhint 03 ldf8 ldhint 01 ldfe ldhint 00 ldfs s ldhint 06 ldfd s ldhint 07 ldf8 s ldhint 05 ldfe s ldhint 04 ldfs a ldhint 0A ldfd a ldhint 0B ldf8 a ldhint 09 ldfe a ldhint 08 ldfs sa ldhint 0E ldfd sa ldhint 0F ldf8 sa ldhint 0D ldfe sa ldhint 0C ldf fill ldhint 1B ldfs c clr ldhint 22 ldfd c clr ldhint 23 ldf8 c clr ldhint 21 ldfe c clr ldhint 20 ldfs c nc ldhint 26 ld...

Page 1234: ...dhint 05 ldfe s ldhint 04 ldfs a ldhint 0A ldfd a ldhint 0B ldf8 a ldhint 09 ldfe a ldhint 08 ldfs sa ldhint 0E ldfd sa ldhint 0F ldf8 sa ldhint 0D ldfe sa ldhint 0C ldf fill ldhint 1B ldfs c clr ldhint 22 ldfd c clr ldhint 23 ldf8 c clr ldhint 21 ldfe c clr ldhint 20 ldfs c nc ldhint 26 ldfd c nc ldhint 27 ldf8 c nc ldhint 25 ldfe c nc ldhint 24 40 373635 3029282726 2019 1312 6 5 0 6 m x6 hint x ...

Page 1235: ...tfe sthint 30 stf spill sthint 3B 40 373635 3029282726 2019 1312 6 5 0 6 m x6 hint x r3 f2 f1 qp 4 1 6 2 1 7 7 7 6 Instruction Operands Opcode Extension m x x6 hint ldfps ldhint f1 f2 r3 6 0 1 02 See Table 4 39 on page 3 328 ldfpd ldhint 03 ldfp8 ldhint 01 ldfps s ldhint 06 ldfpd s ldhint 07 ldfp8 s ldhint 05 ldfps a ldhint 0A ldfpd a ldhint 0B ldfp8 a ldhint 09 ldfps sa ldhint 0E ldfpd sa ldhint ...

Page 1236: ... hint x r3 f2 f1 qp 4 1 6 2 1 7 7 7 6 Instruction Operands Opcode Extension m x x6 hint ldfps ldhint f1 f2 r3 8 6 1 1 02 See Table 4 39 on page 3 328 ldfpd ldhint f1 f2 r3 16 03 ldfp8 ldhint 01 ldfps s ldhint f1 f2 r3 8 06 ldfpd s ldhint f1 f2 r3 16 07 ldfp8 s ldhint 05 ldfps a ldhint f1 f2 r3 8 0A ldfpd a ldhint f1 f2 r3 16 0B ldfp8 a ldhint 09 ldfps sa ldhint f1 f2 r3 8 0E ldfpd sa ldhint f1 f2 ...

Page 1237: ...29282726 2019 6 5 0 6 m x6 hint x r3 qp 4 1 6 2 1 7 14 6 Instruction Operands Opcode Extension m x x6 hint lfetch excl lfhint r3 6 0 0 2D See Table 4 41 on page 3 337 lfetch fault lfhint 2E lfetch fault excl lfhint 2F 40 373635 3029282726 2019 1312 6 5 0 6 m x6 hint x r3 r2 qp 4 1 6 2 1 7 7 7 6 Instruction Operands Opcode Extension m x x6 hint lfetch lfhint r3 r2 6 1 0 2C See Table 4 41 on page 3 ...

Page 1238: ...n Operands Opcode Extension m x x6 hint cmpxchg1 acq ldhint r1 r3 r2 ar ccv 4 0 1 00 See Table 4 39 on page 3 328 cmpxchg2 acq ldhint 01 cmpxchg4 acq ldhint 02 cmpxchg8 acq ldhint 03 cmpxchg1 rel ldhint 04 cmpxchg2 rel ldhint 05 cmpxchg4 rel ldhint 06 cmpxchg8 rel ldhint 07 cmp8xchg16 acq ldhint r1 r3 r2 ar csd ar ccv 20 cmp8xchg16 rel ldhint 24 xchg1 ldhint r1 r3 r2 08 xchg2 ldhint 09 xchg4 ldhin...

Page 1239: ...ck M21 40 373635 3029282726 2019 1312 6 5 0 6 m x6 x r2 f1 qp 4 1 6 2 1 7 7 7 6 Instruction Operands Opcode Extension m x x6 setf sig f1 r2 6 0 1 1C setf exp 1D setf s 1E setf d 1F 40 373635 3029282726 2019 1312 6 5 0 4 m x6 x f2 r1 qp 4 1 6 2 1 7 7 7 6 Instruction Operands Opcode Extension m x x6 getf sig r1 f2 4 0 1 1C getf exp 1D getf s 1E getf d 1F 40 373635 3332 2019 1312 6 5 0 1 s x3 imm13c ...

Page 1240: ...345 for a summary of the opcode extensions 4 4 6 1 Sync Fence Serialize ALAT Control M24 40 373635 3332 1312 6 5 0 0 s x3 imm20b r1 qp 4 1 3 20 7 6 Instruction Operands Opcode Extension x3 chk a nc r1 target25 0 4 chk a clr 5 40 373635 3332 1312 6 5 0 0 s x3 imm20b f1 qp 4 1 3 20 7 6 Instruction Operands Opcode Extension x3 chk a nc f1 target25 0 6 chk a clr 7 40 373635 33323130 2726 6 5 0 0 x3 x2...

Page 1241: ...1 See System Memory Management on page 3 345 for a summary of the M Unit GR AR opcode extensions 40 373635 33323130 2726 6 5 0 0 x3 x2 x4 0 4 1 3 2 4 21 6 Instruction Opcode Extension x3 x4 x2 flushrs f 0 0 C 0 loadrs f A 40 373635 33323130 2726 1312 6 5 0 0 x3 x2 x4 r1 qp 4 1 3 2 4 14 7 6 Instruction Operands Opcode Extension x3 x4 x2 invala e r1 0 0 2 1 40 373635 33323130 2726 1312 6 5 0 0 x3 x2...

Page 1242: ...x6 ar3 r2 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 mov m ar3 r2 1 0 2A 40 373635 33323130 2726 2019 1312 6 5 0 0 s x3 x2 x4 ar3 imm7b qp 4 1 3 2 4 7 7 7 6 Instruction Operands Opcode Extension x3 x4 x2 mov m ar3 imm8 0 0 8 2 40 373635 3332 2726 2019 1312 6 5 0 1 x3 x6 ar3 r1 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 mov m r1 ar3 1 0 22 40 373635 3332 2726...

Page 1243: ...Move to PSR M35 4 4 9 3 Move from PSR M36 4 4 9 4 Break M Unit M37 40 373635 33323130 2726 2019 1312 6 5 0 1 x3 sor sol sof r1 qp 4 1 3 2 4 7 7 7 6 Instruction Operands Opcode Extension x3 alloc f r1 ar pfs i l o r 1 6 40 373635 3332 2726 2019 1312 6 5 0 1 x3 x6 r2 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 mov p psr l r2 1 0 2D mov psr um r2 29 40 373635 3332 2726 1312 6 5 0 1...

Page 1244: ...r opcode 0 Table 4 44 shows the 3 bit assignments for opcode 1 and Table 4 45 summarizes the 6 bit assignments for opcode 1 Table 4 42 Opcode 0 System Memory Management 3 bit Opcode Extensions Opcode Bits 40 37 x3 Bits 35 33 0 0 System Memory Management 4 bit 2 bit Ext Table 4 43 1 2 3 4 chk a nc int M22 5 chk a clr int M22 6 chk a nc fp M23 7 chk a clr fp M23 Table 4 43 Opcode 0 System Memory Man...

Page 1245: ...be rw fault imm2 M40 2 mov to ibr M42 mov from ibr M43 mov m from ar M31 probe r fault imm2 M40 3 mov to pkr M42 mov from pkr M43 probe w fault imm2 M40 4 mov to pmc M42 mov from pmc M43 mov from cr M33 ptc e M47 5 mov to pmd M42 mov from pmd M43 mov from psr M36 6 7 mov from cpuid M43 8 probe r imm2 M39 probe r M38 9 ptc l M45 probe w imm2 M39 mov to psr um M35 probe w M38 A ptc g M45 thash M46 m...

Page 1246: ...0 373635 3332 2726 2019 15141312 6 5 0 1 x3 x6 r3 i2b qp 4 1 3 6 7 5 2 7 6 Instruction Operands Opcode Extension x3 x6 probe rw fault r3 imm2 1 0 31 probe r fault 32 probe w fault 33 40 373635 3332 2726 2019 1312 6 5 0 1 x3 x6 r2 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 itc d l p r2 1 0 2E itc i l p 2F 40 373635 3332 2726 2019 13 12 6 5 0 1 x3 x6 r3 r2 qp 4 1 3 6 7 7 7 6 Inst...

Page 1247: ...sion x3 x6 mov p r1 rr r3 1 0 10 r1 dbr r3 11 r1 ibr r3 12 r1 pkr r3 13 r1 pmc r3 14 mov r1 pmd r3 15 r1 cpuid r3 17 40 373635 33323130 2726 6 5 0 0 i x3 i2d x4 imm21a qp 4 1 3 2 4 21 6 Instruction Operands Opcode Extension x3 x4 sum imm24 0 0 4 rum 5 ssm p 6 rsm p 7 40 373635 3332 2726 2019 1312 6 5 0 1 x3 x6 r3 r2 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 ptc l p r3 r2 1 0 0...

Page 1248: ...odings The branch unit includes branch predict and miscellaneous instructions 40 373635 3332 2726 2019 1312 6 5 0 1 x3 x6 r3 r1 qp 4 1 3 6 7 7 7 6 Instruction Operands Opcode Extension x3 x6 thash r1 r3 1 0 1A ttag 1B tpa p 1E tak p 1F 40 373635 3332 2726 2019 6 5 0 1 x3 x6 r3 qp 4 1 3 6 7 14 6 Instruction Operands Opcode Extension x3 x6 ptc e p r3 1 0 34 Table 4 46 Misc M Unit 1 bit Opcode Extens...

Page 1249: ...r opcode 0 using a 6 bit opcode extension field in bits 32 27 x6 Table 4 48 summarizes these assignments Table 4 47 IP Relative Branch Types Opcode Bits 40 37 btype Bits 8 6 4 0 br cond B1 1 e 2 br wexit B1 3 br wtop B1 4 e 5 br cloop B2 6 br cexit B2 7 br ctop B2 Table 4 48 Indirect Miscellaneous Branch Opcode Extensions Opcode Bits 40 37 x6 Bits 30 27 Bits 32 31 0 1 2 3 0 0 break b B9 epc B8 Ind...

Page 1250: ...n field p in bit 12 Table 4 51 summarizes these assignments The IP relative and indirect branch instructions all have a 2 bit branch prediction whether opcode hint extension field in bits 34 33 wh as shown in Table 4 52 Indirect call instructions have a 3 bit whether opcode hint extension field in bits 34 32 wh as shown in Table 4 53 Table 4 49 Indirect Branch Types Opcode Bits 40 37 x6 Bits 32 27...

Page 1251: ...nt Table 4 53 Indirect Call Whether Hint Completer wh Bits 34 32 bwh 0 1 sptk 2 3 spnt 4 5 dptk 6 7 dpnt Table 4 54 Branch Cache Deallocation Hint Completer d Bit 35 dh 0 none 1 clr 40 373635343332 1312 11 9 8 6 5 0 4 s d wh imm20b p btype qp 4 1 1 2 20 1 3 3 6 Instruction Operands Opcode Extension btype p wh d br cond bwh ph dh e target25 4 0 See Table 4 51 on page 3 351 See Table 4 52 on page 3 ...

Page 1252: ...on page 3 352 See Table 4 54 on page 3 352 br cexit bwh ph dh e t 6 br ctop bwh ph dh e t 7 40 373635343332 1312 11 9 8 6 5 0 5 s d wh imm20b p b1 qp 4 1 1 2 20 1 3 3 6 Instruction Operands Opcode Extension p wh d br call bwh ph dh e b1 target25 5 See Table 4 51 on page 3 351 See Table 4 52 on page 3 352 See Table 4 54 on page 3 352 40 373635343332 2726 1615 1312 11 9 8 6 5 0 0 d wh x6 b2 p btype ...

Page 1253: ...combination of the loop or exit whether hint completer with the none importance hint completer is undefined The indirect branch predict instructions have a 2 bit branch prediction whether opcode hint extension field in bits 4 3 wh as shown in Table 4 58 Table 4 55 Indirect Predict Nop Hint Opcode Extensions Opcode Bits 40 37 x6 Bits 30 27 Bits 32 31 0 1 2 3 2 0 nop b B9 brp B7 1 hint b B9 brp ret ...

Page 1254: ...wh 0 sptk 1 2 dptk 3 40 373635343332 1312 6 5 4 3 2 0 7 s ih t2e imm20b timm7a wh 4 1 1 2 20 7 1 2 3 Instruction Operands Opcode Extension ih wh brp ipwh ih target25 tag13 7 See Table 4 56 on page 3 354 See Table 4 57 on page 3 354 40 373635343332 2726 1615 1312 6 5 4 3 2 0 2 ih t2e x6 b2 timm7a wh 4 1 1 2 6 11 3 7 1 2 3 Instruction Operands Opcode Extension x6 ih wh brp indwh ih b2 tag13 2 10 See...

Page 1255: ... either a second 1 bit extension field in bit 36 q or a 6 bit opcode extension field x6 in bits 32 27 Table 4 59 shows the 1 bit x assignments Table 4 62 shows the additional 1 bit q assignments for the reciprocal approximation instructions Table 4 60 and Table 4 61 summarize the 6 bit x6 assignments vmsw 0 p 0 18 vmsw 1 p 19 40 373635 3332 272625 6 5 0 0 2 i x6 imm20a qp 4 1 3 6 1 20 6 Instructio...

Page 1256: ...9 9 fcvt fxu F10 fmix lr F9 A fcvt fx trunc F10 fmix r F9 B fcvt fxu trunc F10 fmix l F9 C fcvt xf F11 fand F9 fsxt r F9 D fandcm F9 fsxt l F9 E for F9 F fxor F9 Table 4 61 Opcode 1 Miscellaneous Floating point 6 bit Opcode Extensions Opcode Bits 40 37 x Bit 33 x6 Bits 30 27 Bits 32 31 0 1 2 3 1 0 0 fpmerge s F9 fpcmp eq F8 1 fpmerge ns F9 fpcmp lt F8 2 fpmerge se F9 fpcmp le F8 3 fpcmp unord F8 4...

Page 1257: ...it opcode extension field x in bit 36 The fixed point arithmetic instructions also have a 2 bit opcode extension field x2 in bits 35 34 These assignments are shown in Table 4 65 Table 4 62 Reciprocal Approximation 1 bit Opcode Extensions Opcode Bits 40 37 x Bit 33 q Bit 36 0 1 0 frcpa F6 1 frsqrta F7 1 0 fprcpa F6 1 fprsqrta F7 Table 4 63 Floating point Status Field Completer sf Bits 35 34 sf 0 s0...

Page 1258: ...ble 4 63 on page 3 358 The parallel floating point compare instructions are described on page 3 362 40 3736353433 2726 2019 1312 6 5 0 8 D x sf f4 f3 f2 f1 qp 4 1 2 7 7 7 7 6 Instruction Operands Opcode Extension x sf fma sf f1 f3 f4 f2 8 0 See Table 4 63 on page 3 358 fma s sf 1 fma d sf 9 0 fpma sf 1 fms sf A 0 fms s sf 1 fms d sf B 0 fpms sf 1 fnma sf C 0 fnma s sf 1 fnma d sf D 0 fpnma sf 1 40...

Page 1259: ...fcmp unord F4 fcmp unord unc F4 Table 4 67 Floating point Class 1 bit Opcode Extensions Opcode Bits 40 37 ta Bit 12 5 0 fclass m F5 1 fclass m unc F5 40 373635343332 2726 2019 1312 11 6 5 0 4 rb sf ra p2 f3 f2 ta p1 qp 4 1 2 1 6 7 7 1 6 6 Instruction Operands Opcode Extension ra rb ta sf fcmp eq sf p1 p2 f2 f3 4 0 0 0 See Table 4 63 on page 3 358 fcmp lt sf 1 fcmp le sf 1 0 fcmp unord sf 1 fcmp eq...

Page 1260: ...ciprocal Square Root Approximation instructions The first in major op 0 encodes the full register variant The second in major op 1 encodes the parallel variant F7 40 373635343332 2726 2019 1312 6 5 0 0 1 q sf x p2 f3 f2 f1 qp 4 1 2 1 6 7 7 7 6 Instruction Operands Opcode Extension x q sf frcpa sf f1 p2 f2 f3 0 1 0 See Table 4 63 on page 3 358 fprcpa sf 1 40 373635343332 2726 2019 1312 6 5 0 0 1 q ...

Page 1261: ...The parallel compare instructions are all encoded in major op 1 F8 40 373635343332 2726 2019 1312 6 5 0 0 1 sf x x6 f3 f2 f1 qp 4 1 2 1 6 7 7 7 6 Instruction Operands Opcode Extension x x6 sf fmin sf f1 f2 f3 0 0 14 See Table 4 63 on page 3 358 fmax sf 15 famin sf 16 famax sf 17 fpmin sf 1 14 fpmax sf 15 fpamin sf 16 fpamax sf 17 fpcmp eq sf 30 fpcmp lt sf 31 fpcmp le sf 32 fpcmp unord sf 33 fpcmp...

Page 1262: ...erge se 12 fmix lr 39 fmix r 3A fmix l 3B fsxt r 3C fsxt l 3D fpack 28 fswap 34 fswap nl 35 fswap nr 36 fand 2C fandcm 2D for 2E fxor 2F fpmerge s 1 10 fpmerge ns 11 fpmerge se 12 40 373635343332 2726 2019 1312 6 5 0 0 1 sf x x6 f2 f1 qp 4 1 2 1 6 7 7 7 6 Instruction Operands Opcode Extension x x6 sf fcvt fx sf f1 f2 0 0 18 See Table 4 63 on page 3 358 fcvt fxu sf 19 fcvt fx trunc sf 1A fcvt fxu t...

Page 1263: ...e Extension x x6 fcvt xf f1 f2 0 0 1C 40 373635343332 2726 2019 1312 6 5 0 0 sf x x6 omask7c amask7b qp 4 1 2 1 6 7 7 7 6 Instruction Operands Opcode Extension x x6 sf fsetc sf amask7 omask7 0 0 04 See Table 4 63 on page 3 358 40 373635343332 2726 6 5 0 0 sf x x6 qp 4 1 2 1 6 21 6 Instruction Opcode Extension x x6 sf fclrf sf 0 0 05 See Table 4 63 on page 3 358 40 373635343332 272625 6 5 0 0 s sf ...

Page 1264: ...ction slot For brl the imm39 field and a 2 bit Ignored field occupy the L instruction slot 4 7 1 Miscellaneous X Unit Instructions The miscellaneous X unit instructions are encoded in major opcode 0 using a 3 bit opcode extension field x3 in bits 35 33 and a 6 bit opcode extension field x6 in bits 32 27 Table 4 69 shows the 3 bit assignments and Table 4 70 summarizes the 6 bit assignments These in...

Page 1265: ...uted by an I unit Table 4 69 Misc X Unit 3 bit Opcode Extensions Opcode Bits 40 37 x3 Bits 35 33 0 0 6 bit Ext Table 4 70 1 2 3 4 5 6 7 Table 4 70 Misc X Unit 6 bit Opcode Extensions Opcode Bits 40 37 x3 Bits 35 33 x6 Bits 30 27 Bits 32 31 0 1 2 3 0 0 0 break x X1 1 1 bit Ext Table 4 73 2 3 4 5 6 7 8 9 A B C D E F 40 373635 3332 272625 6 5 0 40 0 0 i x3 x6 imm20a qp imm41 4 1 3 6 1 20 6 41 Instruc...

Page 1266: ... 351 Table 4 52 on page 3 352 and Table 4 54 on page 3 352 4 7 3 1 Long Branch X3 Table 4 71 Move Long 1 bit Opcode Extensions Opcode Bits 40 37 vc Bit 20 6 0 movl X2 1 40 373635 2726 22212019 1312 6 5 0 40 0 6 i imm9d imm5c ic vc imm7b r1 qp imm41 4 1 9 5 1 1 7 7 6 41 Instruction Operands Opcode Extension vc movl i r1 imm64 6 0 Table 4 72 Long Branch Types Opcode Bits 40 37 btype Bits 8 6 C 0 brl...

Page 1267: ...ruction encoding 40 373635343332 1312 11 9 8 6 5 0 40 2 1 0 D i d wh imm20b p b1 qp imm39 4 1 1 2 20 1 3 3 6 39 2 Instruction Operands Opcode Extension p wh d brl call bwh ph dh e l b1 target64 D See Table 4 51 on page 3 351 See Table 4 52 on page 3 352 See Table 4 54 on page 3 352 Table 4 73 Misc X Unit 1 bit Opcode Extensions Opcode Bits 40 37 x3 Bits 35 33 x6 Bits 32 27 y Bit 26 0 0 01 0 nop x ...

Page 1268: ... sign_ext s 8 i 7 imm7b 9 M5 M10 imm9 sign_ext s 8 i 7 imm7a 9 M17 inc3 sign_ext s 1 1 i2b 3 1 1 4 i2b 6 I20 M20 M21 target25 IP sign_ext s 20 imm13c 7 imm7a 21 4 M22 M23 target25 IP sign_ext s 20 imm20b 21 4 M34 il sol o sof sol r sor 3 M39 M40 imm2 i2b M44 imm24 i 23 i2d 21 imm21a B1 B2 B3 target25 IP sign_ext s 20 imm20b 21 4 B6 target25 IP sign_ext s 20 imm20b 21 4 tag13 IP sign_ext t2e 7 timm...

Page 1269: ...3 370 Volume 3 Instruction Formats a This encoding causes an Illegal Operation fault if the value of the qualifying predicate is 1 ...

Page 1270: ...oded as an immediate in the instruction Since the PSR bits to be written can be determined by examining the encoded instruction the instruction is treated as only writing those bits which have a corresponding mask bit set All exceptions to these general rules are described in this appendix 5 2 Dependencies and Serialization A RAW Read After Write dependency is a sequence of two events where the fi...

Page 1271: ... some cases other than CRs where pairs of instructions explicitly encode the same resource serialization is implied There are cases where it is architecturally allowed to omit a serialization and that the response from the CPU must be atomic act as if either the old or the new state were fully in place The tables in this appendix indicate dependency requirements under the assumption that the desir...

Page 1272: ...red resources are represented in a single row of a table using the notation as a substitute for the number of the resource In such cases the semantics of the table are as if each numbered resource had its own row in that table and is thus an independent resource The range of values that the can take are given in the Resource Name column An asterisk in the Resource Name column indicates that this r...

Page 1273: ...g register base registers in CFM and always write AR EC the rotating register bases in CFM and PR 63 even if they do not change their values or if their PR qp is false Instructions of type mod sched brs counted always read and write AR LC even if they do not change its value For instructions of type pr or writers or pr and writers if their completer is or andcm then only the first target predicate...

Page 1274: ...EC mod sched brs br ret mov to AR EC br call brl call br ia mod sched brs mov from AR EC impliedF AR EFLAG mov to AR EFLAG br ia mov from AR EFLAG impliedF AR FCR mov to AR FCR br ia mov from AR FCR impliedF AR FDR mov to AR FDR br ia mov from AR FDR impliedF AR FIR mov to AR FIR br ia mov from AR FIR impliedF AR FPSR sf0 controls mov to AR FPSR fsetc s0 br ia fp arith s0 fcmp s0 fpcmp s0 fsetc mo...

Page 1275: ...liedF AR SSD mov to AR SSD br ia mov from AR SSD impliedF AR UNAT in 0 63 mov to AR UNAT st8 spill br ia ld8 fill mov from AR UNAT impliedF AR in 8 15 20 22 23 31 33 35 37 39 41 43 46 47 67 111 none br ia mov from AR rv1 none AR in 48 63 112 127 mov to AR ig1 br ia mov from AR ig1 impliedF BR in 0 7 br call1 brl call1 indirect brs1 indirect brp1 mov from BR1 impliedF mov to BR1 indirect brs1 none ...

Page 1276: ...v from CR IVR mov from CR IRR1 data CR ISR mov to CR ISR mov from CR ISR data CR ITIR mov to CR ITIR mov from CR ITIR data itc i itc d itr i itr d implied CR ITM mov to CR ITM mov from CR ITM data CR ITO mov to CR ITO mov from AR ITC mov from CR ITO data CR ITV mov to CR ITV mov from CR ITV data CR IVA mov to CR IVA mov from CR IVA instr CR IVR none mov from CR IVR SC Section 5 8 3 2 External Inte...

Page 1277: ...on access data ptc g ptc ga ptc l ptr d itr d impliedF ptr d mem readers mem writers non access data ptc g ptc ga ptc l ptr d none itr d itc d impliedF FR in 0 1 none fr readers1 none FR in 2 127 fr writers1 ldf c1 ldfp c1 fr readers1 impliedF ldf c1 ldfp c1 fr readers1 none GR0 none gr readers1 none GR in 1 127 ld c1 13 gr readers1 none gr writers1 ld c1 13 gr readers1 impliedF IBR mov to IND IBR...

Page 1278: ... nomovpr1 mov from PR12 mov to PR12 none PR in 1 15 pr writers1 mov to PR allreg7 pr readers nobr nomovpr1 mov from PR mov to PR12 impliedF pr writers fp1 pr readers br1 impliedF pr writers int1 mov to PR allreg7 pr readers br1 none PR in 16 62 pr writers1 mov to PR allreg7 mov to PR rotreg pr readers nobr nomovpr1 mov from PR mov to PR12 impliedF pr writers fp1 pr readers br1 impliedF pr writers ...

Page 1279: ...SR mov from PSR um impliedF PSR bn bsw rfi gr readers10 gr writers10 impliedF PSR cpl epc br ret priv ops br call brl call epc mov from AR ITC mov from AR RUC mov to AR ITC mov to AR RSC mov to AR RUC mov to AR K mov from IND PMD probe all mem readers mem writers lfetch all implied rfi priv ops br call brl call epc mov from AR ITC mov from AR RUC mov to AR ITC mov to AR RSC mov to AR RUC mov to AR...

Page 1280: ... mov from PSR impliedF PSR ia rfi all none PSR ic sys mask writers partial7 mov to PSR l mov from PSR impliedF cover itc i itc d itr i itr d mov from interruption CR mov to interruption CR data rfi mov from PSR cover itc i itc d itr i itr d mov from interruption CR mov to interruption CR impliedF PSR id rfi all none PSR is br ia rfi none none PSR it rfi branches mov from PSR chk epc fchkf vmsw imp...

Page 1281: ...v from IND PMD mov from PSR mov to PSR um rum sum impliedF PSR ss rfi all impliedF PSR tb mov to PSR l branches chk fchkf data mov from PSR impliedF rfi branches chk fchkf mov from PSR impliedF PSR up user mask writers partial7 mov to PSR um sys mask writers partial7 mov to PSR l rfi mov from PSR um mov from PSR impliedF PSR vm vmsw mem readers mem writers mov from AR ITC mov from AR RUC mov from ...

Page 1282: ...ushrs mov to AR BSPSTORE impliedF AR CCV mov to AR CCV impliedF AR CFLG mov to AR CFLG impliedF AR CSD ld16 mov to AR CSD impliedF AR EC br ret mod sched brs mov to AR EC impliedF AR EFLAG mov to AR EFLAG impliedF AR FCR mov to AR FCR impliedF AR FDR mov to AR FDR impliedF AR FIR mov to AR FIR impliedF AR FPSR sf0 controls mov to AR FPSR fsetc s0 impliedF AR FPSR sf1 controls mov to AR FPSR fsetc ...

Page 1283: ... call1 brl call1 none CFM mod sched brs br call brl call br ret alloc clrrrb cover rfi impliedF CPUID none none CR CMCV mov to CR CMCV impliedF CR DCR mov to CR DCR impliedF CR EOI mov to CR EOI SC Section 5 8 3 4 End of External Interrupt Register EOI CR67 on page 2 124 CR IFA mov to CR IFA impliedF CR IFS mov to CR IFS cover impliedF CR IHA mov to CR IHA impliedF CR IIB in 0 1 mov to CR IIB impl...

Page 1284: ...impliedF DTR itr d impliedF itr d ptr d impliedF ptr d none FR in 0 1 none none FR in 2 127 fr writers1 ldf c1 ldfp c1 impliedF GR0 none none GR in 1 127 ld c1 gr writers1 impliedF IBR mov to IND IBR3 impliedF InService mov to CR EOI mov from CR IVR SC IP all none ITC ptc e ptc g ptc ga ptc l ptr i ptr d none ptc e ptc g ptc ga ptc l ptr i ptr d itc i itc d itr i itr d itc i itc d itr i itr d impl...

Page 1285: ...ters fp1 pr unc writers int1 pr norm writers fp1 pr norm writers int1 pr or writers1 mov to PR allreg7 mov to PR rotreg impliedF PSR ac user mask writers partial7 mov to PSR um sys mask writers partial7 mov to PSR l rfi impliedF PSR be user mask writers partial7 mov to PSR um sys mask writers partial7 mov to PSR l rfi impliedF PSR bn bsw rfi impliedF PSR cpl epc br ret rfi impliedF PSR da rfi impl...

Page 1286: ...instruction or encoded as its PR qp PSR mfh fr writers9 none user mask writers partial7 mov to PSR um fr writers9 sys mask writers partial7 mov to PSR l rfi user mask writers partial7 mov to PSR um sys mask writers partial7 mov to PSR l rfi impliedF PSR mfl fr writers9 none user mask writers partial7 mov to PSR um fr writers9 sys mask writers partial7 mov to PSR l rfi user mask writers partial7 mo...

Page 1287: ...he PSR bn bit is only accessed when one of GR16 31 is specified in the instruction Rule 11 The target predicates are written independently of PR qp but source registers are only read if PR qp is true Rule 12 This instruction only reads the specified predicate register when that register is the PR qp Rule 13 This reference to ld c only applies to the GR whose value is loaded with data returned from...

Page 1288: ...fpcmp Field sf s0 fpcmp s1 fpcmp Field sf s1 fpcmp s2 fpcmp Field sf s2 fpcmp s3 fpcmp Field sf s3 fr readers fp arith fp non arith mem writers fp pr writers fp chk s Format in M21 getf fr writers fp arith fp non arith fclass mem readers fp setf gr readers gr readers writers mem readers mem writers chk s cmp cmp4 fc itc i itc d itr i itr d mov to AR gr mov to BR mov to CR mov to IND mov from IND m...

Page 1289: ...dfp8 sa lfetch all lfetch lfetch fault lfetch Field lftype fault lfetch nofault lfetch Field lftype lfetch postinc lfetch Format in M20 M22 mem readers mem readers fp mem readers int mem readers alat ld a ldf a ldfp a ld sa ldf sa ldfp sa ld c ldf c ldfp c mem readers fp ldf ldfp mem readers int cmpxchg fetchadd xchg ld mem readers spec ld s ld sa ldf s ldf sa ldfp s ldfp sa mem writers mem writer...

Page 1290: ...v from CR Field cr3 CMCV mov from CR DCR mov from CR Field cr3 DCR mov from CR EOI mov from CR Field cr3 EOI mov from CR IFA mov from CR Field cr3 IFA mov from CR IFS mov from CR Field cr3 IFS mov from CR IHA mov from CR Field cr3 IHA mov from CR IIB mov from CR Field cr3 in IIB0 IIB1 mov from CR IIM mov from CR Field cr3 IIM mov from CR IIP mov from CR Field cr3 IIP mov from CR IIPA mov from CR F...

Page 1291: ... mov to AR CFLG mov to AR M Field ar3 CFLG mov to AR CSD mov to AR M Field ar3 CSD mov to AR EC mov to AR I Field ar3 EC mov to AR EFLAG mov to AR M Field ar3 EFLAG mov to AR FCR mov to AR M Field ar3 FCR mov to AR FDR mov to AR M Field ar3 FDR mov to AR FIR mov to AR M Field ar3 FIR mov to AR FPSR mov to AR M Field ar3 FPSR mov to AR FSR mov to AR M Field ar3 FSR mov to AR gr mov to AR M Format i...

Page 1292: ...R Field cr3 in LRR0 LRR1 mov to CR PMV mov to CR Field cr3 PMV mov to CR PTA mov to CR Field cr3 PTA mov to CR TPR mov to CR Field cr3 TPR mov to IND mov_indirect Format in M42 mov to IND CPUID mov to IND Field ireg cpuid mov to IND DBR mov to IND Field ireg dbr mov to IND IBR mov to IND Field ireg ibr mov to IND PKR mov to IND Field ireg pkr mov to IND PMC mov to IND Field ireg pmc mov to IND PMD...

Page 1293: ...rom IND mov ip mov to PSR l mov to PSR um mov from PSR mov from PSR um movl mux nop f nop i nop m nop x or pack padd pavg pavgsub pcmp pmax pmin pmpy pmpyshr popcnt probe all psad pshl pshladd pshr pshradd psub ptc e ptc g ptc ga ptc l ptr d ptr i ReservedQP rsm setf shl shladd shladdp4 shr shrp srlz i srlz d ssm st stf sub sum sxt sync tak tbit tf thash tnat tpa ttag unpack xchg xma xmpy xor zxt ...

Page 1294: ...mov to AR BSPSTORE rfi st st1 st2 st4 st8 st8 spill st16 st postinc stf Format in M10 st Format in M5 stf stfs stfd stfe stf8 stf spill sxt sxt1 sxt2 sxt4 sys mask writers partial rsm ssm unpack unpack1 unpack2 unpack4 unpredicatable instructions alloc br cloop br ctop br cexit br ia brp bsw clrrrb cover epc flushrs loadrs rfi vmsw user mask writers partial rum sum xchg xchg1 xchg2 xchg4 xchg8 zxt...

Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...

Page 1296: ......

Page 1297: ...Intel Itanium Architecture Software Developer s Manual Volume 4 IA 32 Instruction Set Reference Revision 2 3 May 2010 Document Number 323208 ...

Page 1298: ...hanges to specifications and product descriptions at any time without notice Designers must not rely on the absence or characteristics of any features or instructions marked reserved or undefined Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them Intel processors based on the Itanium architec...

Page 1299: ...n 4 15 2 2 3 Flags Affected 4 18 2 2 4 FPU Flags Affected 4 18 2 2 5 Protected Mode Exceptions 4 19 2 2 6 Real address Mode Exceptions 4 19 2 2 7 Virtual 8086 Mode Exceptions 4 19 2 2 8 Floating point Exceptions 4 20 2 3 IA 32 Base Instruction Reference 4 20 3 IA 32 Intel MMX Technology Instruction Reference 4 399 4 IA 32 SSE Instruction Reference 4 463 4 1 IA 32 SSE Instructions 4 463 4 2 About t...

Page 1300: ... 3 17 Operation of the PSRAW Instruction 4 440 3 18 Operation of the PSRLW Instruction 4 443 3 19 Operation of the PSUBW Instruction 4 446 3 20 Operation of the PSUBSW Instruction 4 449 3 21 Operation of the PSUBUSB Instruction 4 452 3 22 High order Unpacking and Interleaving of Bytes with the PUNPCKHBW Instruction 4 455 3 23 Low order Unpacking and Interleaving of Bytes with the PUNPCKLBW Instruc...

Page 1301: ...ses 4 218 2 15 LAR Descriptor Validity 4 253 2 16 LEA Address and Operand Sizes 4 258 2 17 Repeat Conditions 4 338 4 1 Real Number Notation 4 476 4 2 Denormalization Process 4 478 4 3 Results of Operations with NAN Operands 4 481 4 4 Precision and Range of SSE Datatype 4 482 4 5 Real Number and NaN Encodings 4 482 4 6 SSE Instruction Behavior with Prefixes 4 483 4 7 SIMD Integer Instructions Behav...

Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 1303: ...he Itanium application architecture including application level resources programming environment and the IA 32 application interface This volume also describes optimization techniques used to generate high performance software 1 1 1 Part 1 Application Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Ch...

Page 1304: ...software 1 2 1 Part 1 System Architecture Guide Chapter 1 About this Manual provides an overview of all volumes in the Intel Itanium Architecture Software Developer s Manual Chapter 2 Intel Itanium System Environment introduces the environment designed to support execution of Itanium architecture based operating systems running IA 32 or Itanium architecture based applications Chapter 3 System Stat...

Page 1305: ...ing systems need to preserve Itanium register contents and state This chapter also describes system architecture mechanisms that allow an operating system to reduce the number of registers that need to be spilled filled on interruptions system calls and context switches Chapter 5 Memory Management introduces various memory management strategies Chapter 6 Runtime Support for Control and Data Specul...

Page 1306: ...2 Instruction Reference provides a detailed description of all Itanium instructions organized in alphabetical order by assembly language mnemonic Chapter 3 Pseudo Code Functions provides a table of pseudo code functions which are used to define the behavior of the Itanium instructions Chapter 4 Instruction Formats describes the encoding and instruction format instructions Chapter 5 Resource and De...

Page 1307: ...itectures Software Developer s Manual Itanium System Environment The operating system environment that supports the execution of both IA 32 and Itanium architecture based code IA 32 System Environment The operating system privileged environment and resources as defined by the Intel Architecture Software Developer s Manual Resources include virtual paging control registers debugging performance mon...

Page 1308: ... the Unified EFI Forum website at http www uefi org Unified Extensible Firmware Interface Specification This document defines a new model for the interface between operating systems and platform firmware 1 7 Revision History Date of Revision Revision Number Description March 2010 2 3 Added information about illegal virtualization optimization combinations and IIPA requirements Added Resource Utili...

Page 1309: ...rocedures Allows IPI redirection feature to be optional Undefined behavior for 1 byte accesses to the non architected regions in the IPI block Modified insertion behavior for TR overlaps See Vol 2 Part I Ch 4 for details Bus parking feature is now optional for PAL_BUS_GET_FEATURES Introduced low power synchronization primitive using hint instruction FR32 127 is now preserved in PAL calling convent...

Page 1310: ...n 2 2 and 3 Part I Volume 3 Added Performance Counter Standardization Sections 7 2 3 and 11 6 Part I Volume 2 Added Freeze Bit Functionality in Context Switching and Interrupt Generation Clarification Sections 7 2 1 7 2 2 7 2 4 1 and 7 2 4 2 Part I Volume 2 Added IA_32_Exception Debug IIPA Description Change Section 9 2 Part I Volume 2 Added capability for Allowing Multiple PAL_A_SPEC and PAL_B En...

Page 1311: ...S changes extend calls to allow implementation specific feature control Section 11 8 3 Split PAL_A architecture changes Section 11 1 6 Simple barrier synchronization clarification Section 13 4 2 Limited speculation clarification added hardware generated speculative references Section 4 4 6 PAL memory accesses and restrictions clarification Section 11 9 PSP validity on INITs from PAL_MC_ERROR_INFO ...

Page 1312: ...plify the call to provide more information regarding machine check Chapter 11 PAL_ENTER_IA_32_Env call changes entry parameter represents the entry order SAL needs to initialize all the IA 32 registers properly before making this call Chapter 11 PAL_CACHE_FLUSH added a new cache_type argument Chapter 11 PAL_SHUTDOWN removed from list of PAL calls Chapter 11 Clarified memory ordering changes Chapte...

Page 1313: ...registers by IA 32 instructions see the state of the Itanium registers defined to hold the IA 32 state after entering the IA 32 instruction set State mappings IA 32 numeric instructions are controlled by and reflect their status in FCW FSW FTW FCS FIP FOP FDS and FEA On exit from the IA 32 instruction set Itanium numeric status and control resources defined to hold IA 32 state reflect the results ...

Page 1314: ... operands from the IA 32 instruction set including MMX technology and SSE instructions Nested TLB fault Alternative data TLB fault VHPT data fault Data TLB fault Data Page Not Present fault Data NaT Page Consumption Abort Data Key Miss fault Data Key Permission fault Data Access Rights fault Data Dirty bit fault Data Access bit fault 2 2 Interpreting the IA 32 Instruction Reference Pages This sect...

Page 1315: ...byte first rb rw rd A register code from 0 through 7 added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte The register codes are given in Table 2 1 i A number used in floating point instructions when one of the operands is ST i from the FPU register stack The number i which can range from 0 to 7 is added to the hexadecimal byte given at the left of the plus...

Page 1316: ...ster AL BL CL DL AH BH CH and DH or a byte from memory r m16 A word general purpose register or memory operand used for instructions whose operand size attribute is 16 bits The word general purpose registers are AX BX CX DX SP BP SI and DI The contents of memory are found at the address provided by the effective address computation r m32 A doubleword general purpose register or memory operand used...

Page 1317: ... 0 through 7 mm An MMX technology register The 64 bit MMX technology registers are MM0 through MM7 mm m32 The low order 32 bits of an MMX technology register or a 32 bit memory operand The 64 bit MMX technology registers are MM0 through MM7 The contents of memory are found at the address provided by the effective address computation mm m64 An MMX technology register or a 64 bit memory operand The ...

Page 1318: ...y by the number of bits indicated by the count operand The following identifiers are used in the algorithmic descriptions OperandSize and AddressSize The OperandSize identifier represents the operand size attribute of the instruction which is either 16 or 32 bits The AddressSize identifier represents the address size attribute which is either 16 or 32 bits For example the following pseudo code ind...

Page 1319: ...ts the result of an operation as a signed 16 bit value If the result is less than 32768 it is represented by the saturated value 32768 8000H if it is greater than 32767 it is represented by the saturated value 32767 7FFFH SaturateToUnsignedByte Represents the result of an operation as a signed 8 bit value If the result is less than zero it is represented by the saturated value zero 00H if it is gr...

Page 1320: ... by the instruction When a flag is cleared it is equal to 0 when it is set it is equal to 1 The arithmetic and logical instructions usually assign values to the status flags in a uniform manner see Appendix A EFLAGS Cross Reference in the Intel Architecture Software Developer s Manual Volume 1 Non conventional assignments are described in the Operation section The values of flags listed as undefin...

Page 1321: ... lists the exceptions that can occur when the instruction is executed in virtual 8086 mode Table 2 2 Exception Mnemonics Names and Vector Numbers Vector No Mnemonic Name Source 0 DE Divide Error DIV and IDIV instructions 1 DB Debug Any code or data reference 3 BP Breakpoint INT 3 instruction 4 OF Overflow INTO instruction 5 BR BOUND Range Exceeded BOUND instruction 6 UD Invalid Opcode Undefined Op...

Page 1322: ...description of these exceptions 2 3 IA 32 Base Instruction Reference The remainder of this chapter provides detailed descriptions of each of the Intel architecture instructions Table 2 3 Floating point Exception Mnemonics and Names Vector No Mnemonic Name Source 16 IS IA Floating point invalid operation Stack overflow or underflow Invalid arithmetic operation FPU stack overflow or underflow Invali...

Page 1323: ... the addition produces a decimal carry the AH register is incremented by 1 and the CF and AF flags are set If there was no decimal carry the CF and AF flags are cleared and the AH register is unchanged In either case bits 4 through 7 of the AL register are cleared to 0 Operation IF AL AND FH 9 OR AF 1 THEN AL AL 6 AH AH 1 AF 1 CF 1 ELSE AF 0 CF 0 FI AL AL AND FH Flags Affected The AF and CF flags ...

Page 1324: ... and then clears the AH register to 00H The value in the AX register is then equal to the binary equivalent of the original unpacked two digit number in registers AH and AL Operation tempAL AL tempAH AH AL tempAL tempAH imm8 AND FFH AH 0 The immediate value imm8 is taken from the second byte of the instruction which under normal assembly is 0AH 10 decimal However this immediate value can be change...

Page 1325: ... instruction then adjusts the contents of the AX register to contain the correct 2 digit unpacked BCD result Operation tempAL AL AH tempAL imm8 AL tempAL MOD imm8 The immediate value imm8 is taken from the second byte of the instruction which under normal assembly is 0AH 10 decimal However this immediate value can be changed to produce a different result Flags Affected The SF ZF and PF flags are s...

Page 1326: ... digit unpacked BCD result If the subtraction produced a decimal carry the AH register is decremented by 1 and the CF and AF flags are set If no decimal carry occurred the CF and AF flags are cleared and the AH register is unchanged In either case the AL register is left with its top nibble set to 0 Operation IF AL AND FH 9 OR AF 1 THEN AL AL 6 AH AH 1 AF 1 CF 1 ELSE CF 0 AF 0 FI AL AL AND FH Flag...

Page 1327: ...EST SRC CF Flags Affected The OF SF ZF AF CF and PF flags are set according to the result Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Ri...

Page 1328: ... page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address ...

Page 1329: ...F and PF flags are set according to the result Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty...

Page 1330: ...fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is out...

Page 1331: ...Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination operand points to a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effe...

Page 1332: ...a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignm...

Page 1333: ...system by an application program to match the privilege level of the application program Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application program s code segment is placed in the source operand The RPL field in the source operand represents the privilege level of the application program Execution of the ARPL instr...

Page 1334: ...d Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is...

Page 1335: ...instruction The bounds limit data structure two words or doublewords containing the lower and upper limits of the array is usually placed just before the array itself making the limits addressable via a constant offset from the beginning of the array Because the address of the array already will be present in a register this practice avoids extra bus cycles to obtain the effective address of the a...

Page 1336: ...f alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions BR If the bounds test fails GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions BR If the bounds test fails GP 0 If a me...

Page 1337: ...SRC 0 THEN ZF 1 DEST is undefined ELSE ZF 0 temp 0 WHILE Bit SRC temp 0 DO temp temp 1 DEST temp OD FI Flags Affected The ZF flag is set to 1 if all the source operand is 0 otherwise the ZF flag is cleared The CF OF SF AF and PF flags are undefined Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault D...

Page 1338: ...ng is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment li...

Page 1339: ...HEN ZF 1 DEST is undefined ELSE ZF 0 temp OperandSize 1 WHILE Bit SRC temp 0 DO temp temp 1 DEST temp OD FI Flags Affected The ZF flag is set to 1 if all the source operand is 0 otherwise the ZF flag is cleared The CF OF SF AF and PF flags are undefined Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fa...

Page 1340: ...ng is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment li...

Page 1341: ...lt is undefined Operation TEMP DEST DEST 7 0 TEMP 31 24 DEST 15 8 TEMP 23 16 DEST 23 16 TEMP 15 8 DEST 31 24 TEMP 7 0 Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Exceptions All Operating Modes None Intel Architecture Compatibility Information The BSWAP instruction is not supported on Intel architecture processors earlier th...

Page 1342: ... bit offset are stored in the immediate bit offset field and the high order bits are shifted and combined with the byte displacement in the addressing mode by the assembler The processor will ignore the high order bits if they are not zero When accessing a bit in memory the processor may access 4 bytes starting from the memory address for a 32 bit operand size using by the following relationship E...

Page 1343: ...register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a me...

Page 1344: ... 0 to 31 for an immediate offset Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand See BT Bit Test on page 4 40 for more information on this addressing mechanism Operation CF Bit BitBase BitOffset Bit BitBase BitOffset NOT Bit BitBase BitOffset Flags Affected The CF flag contains the...

Page 1345: ...lt occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outsid...

Page 1346: ... a register offset and 0 to 31 for an immediate offset Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand See BT Bit Test on page 4 40 for more information on this addressing mechanism Operation CF Bit BitBase BitOffset Bit BitBase BitOffset 0 Flags Affected The CF flag contains the v...

Page 1347: ...occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside t...

Page 1348: ... for a register offset and 0 to 31 for an immediate offset Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand See BT Bit Test on page 4 40 for more information on this addressing mechanism Operation CF Bit BitBase BitOffset Bit BitBase BitOffset 1 Flags Affected The CF flag contains t...

Page 1349: ... occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP If a memory operand effective address is outside th...

Page 1350: ...tion When executing a near call the processor pushes the value of the EIP register which contains the address of the instruction following the CALL instruction onto the procedure stack for use later as a return instruction pointer The processor then jumps to the address specified with the target operand for the called procedure The target operand specifies either an absolute address in the code se...

Page 1351: ...be pushed on the stack When the processor is operating in protected mode a far call can also be used to access a code segment at a different privilege level or to switch tasks Here the processor uses the segment selector part of the far address to access the segment descriptor for the segment being jumped to Depending on the value of the type and access rights information in the segment selector t...

Page 1352: ...T is rel32 ELSE OperandSize 16 IF stack not large enough for a 2 byte return address THEN SS 0 FI Push IP EIP EIP DEST AND 0000FFFFH DEST is rel16 FI FI ELSE near absolute call IF the instruction pointer is not within code segment limit THEN GP 0 FI IF OperandSize 32 THEN IF stack not large enough for a 4 byte return address THEN SS 0 FI Push EIP EIP DEST DEST is r m32 ELSE OperandSize 16 IF stack...

Page 1353: ...NG CODE SEGMENT GO TO CALL GATE GO TO TASK GATE GO TO TASK STATE SEGMENT FI CONFORMING CODE SEGMENT IF DPL CPL THEN GP new code segment selector FI IF not present THEN NP selector FI IF OperandSize 32 THEN IF stack not large enough for a 6 byte return address THEN SS 0 FI IF the instruction pointer is not within code segment limit THEN GP 0 FI Push CS padded with 16 high order bits Push EIP CS DES...

Page 1354: ...ate DPL CPL or RPL THEN GP call gate selector FI IF not present THEN NP call gate selector FI IF Itanium System Environment THEN IA 32_Intercept Gate CALL IF call gate code segment selector is null THEN GP 0 FI IF call gate code segment selector index is outside descriptor table limits THEN GP code segment selector FI Read code segment descriptor IF code segment segment descriptor does not indicat...

Page 1355: ...N GP 0 FI SS newSS segment descriptor information also loaded ESP newESP CS EIP CallGate CS InstructionPointer segment descriptor information also loaded Push oldSS oldESP from calling procedure temp parameter count from call gate masked to 5 bits Push parameters from calling procedure s stack temp Push oldCS oldEIP return address to calling procedure ELSE CallGateSize 16 IF stack does not have ro...

Page 1356: ... selector FI IF Itanium System Environment THEN IA 32_Intercept Gate CALL Read the TSS segment selector in the task gate descriptor IF TSS segment selector local global bit is set to local OR index not within GDT limits THEN GP TSS selector FI Access TSS descriptor in GDT IF TSS descriptor specifies that the TSS is busy low order 5 bits set to 00001 THEN GP TSS selector FI IF TSS not present THEN ...

Page 1357: ...gment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector GP selector If code segment or gate or TSS selector index is outside descriptor table limits If the segment descriptor pointed to by the segment selector in the destination operand is not for a conforming code segment nonconforming code segment call gate task gate or task state segment If th...

Page 1358: ...and ESP are beyond the end of the TSS If the new stack segment selector is null If the RPL of the new stack segment selector in the TSS is not equal to the DPL of the code segment being accessed If DPL of the stack segment descriptor for the new stack segment is not equal to the DPL of the code segment descriptor If the new stack segment is not a writable data segment If segment selector index for...

Page 1359: ...may force the operand size to 16 when CBW is used and to 32 when CWDE is used Others may treat these mnemonics as synonyms CBW CWDE and use the current setting of the operand size attribute to determine the size of values to be converted regardless of the mnemonic used The CWDE instruction is different from the CWD convert word to double instruction The CWD instruction uses the DX AX register pair...

Page 1360: ...4 58 Volume 4 Base IA 32 Instruction Reference CDQ Convert Double to Quad See entry for CWD CDQ Convert Word to Double Convert Double to Quad ...

Page 1361: ...r Carry Flag Description Clears the CF flag in the EFLAGS register Operation CF 0 Flags Affected The CF flag is cleared to 0 The OF ZF SF AF and PF flags are unaffected Exceptions All Operating Modes None Opcode Instruction Description F8 CLC Clear CF flag ...

Page 1362: ...n the EFLAGS register When the DF flag is set to 0 string operations increment the index registers ESI and or EDI Operation DF 0 Flags Affected The DF flag is cleared to 0 The CF OF ZF SF AF and PF flags are unaffected Exceptions All Operating Modes None Opcode Instruction Description FC CLD Clear DF flag ...

Page 1363: ...he following decision table indicates the action of the CLI instruction bottom of the table depending on the processor s mode of operating and the CPL and IOPL of the currently running program or procedure top of the table Notes XDon t care NAction in column 1 not taken YAction in column 1 taken Operation OLD_IF IF IF PE 0 Executing in real address mode THEN IF 0 ELSE IF VM 0 Executing in protecte...

Page 1364: ... cleared to 0 if the CPL is equal to or less than the IOPL otherwise the it is not affected The other flags in the EFLAGS register are unaffected Additional Itanium System Environment Exceptions IA 32_Intercept System Flag Intercept Trap if CFLG ii is 1 and the IF flag changes state Protected Mode Exceptions GP 0 If the CPL is greater has less privilege than the IOPL of the current program or proc...

Page 1365: ...to synchronize the saving of FPU context in multitasking applications See the description of the TS flag in the Intel Architecture Software Developer s Manual Volume 3 for more information about this flag Operation IF Itanium System Environment THEN IA 32_Intercept INST CLTS CR0 TS 0 Flags Affected The TS flag in CR0 register is cleared Additional Itanium System Environment Exceptions IA 32_Interc...

Page 1366: ...tion Complements the CF flag in the EFLAGS register Operation CF NOT CF Flags Affected The CF flag contains the complement of its original value The OF ZF SF AF and PF flags are unaffected Exceptions All Operating Modes None Opcode Instruction Description F5 CMC Complement CF flag ...

Page 1367: ... Move if less or equal ZF 1 or SF OF 0F 4E cw cd CMOVLE r32 r m32 Move if less or equal ZF 1 or SF OF 0F 46 cw cd CMOVNA r16 r m16 Move if not above CF 1 or ZF 1 0F 46 cw cd CMOVNA r32 r m32 Move if not above CF 1 or ZF 1 0F 42 cw cd CMOVNAE r16 r m16 Move if not above or equal CF 1 0F 42 cw cd CMOVNAE r32 r m32 Move if not above or equal CF 1 0F 43 cw cd CMOVNB r16 r m16 Move if not below CF 0 0F...

Page 1368: ...d below are used for unsigned integers Because a particular state of the status flags can sometimes be interpreted in two ways two mnemonics are defined for some opcodes For example the CMOVA conditional move if above instruction and the CMOVNBE conditional move if not below or equal instruction are alternate mnemonics for the opcode 0F 47H Opcode Instruction Description 0F 41 cw cd CMOVNO r16 r m...

Page 1369: ...Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a ...

Page 1370: ...ode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1371: ...CF OF SF ZF AF and PF flags are set according to the result Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fau...

Page 1372: ...king is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment ...

Page 1373: ...ting of the DF flag in the EFLAGS register If the DF flag is 0 the ESI and EDI register are incremented if the DF flag is 1 the ESI and EDI registers are decremented The registers are incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The CMPS CMPSB CMPSW and CMPSD instructions can be preceded by the REP prefix for block comparisons of EC...

Page 1374: ...ed TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment se...

Page 1375: ...rtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1376: ...s a locked read without also producing a locked write Operation accumulator AL AX or EAX depending on whether a byte word or doubleword comparison is being performed IF Itanium System Environment AND External_Atomic_Lock_Required AND DCR lc THEN IA 32_Intercept LOCK CMPXCHG IF accumulator DEST THEN ZF 1 DEST SRC ELSE ZF 0 accumulator DEST FI Flags Affected The ZF flag is set if the values in the d...

Page 1377: ...writable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Rea...

Page 1378: ...locked write Operation IF Itanium System Environment AND External_Atomic_Lock_Required AND DCR lc THEN IA 32_Intercept LOCK CMPXCHG IF EDX EAX DEST ZF 1 DEST ECX EBX ELSE ZF 0 EDX EAX DEST FI Flags Affected The ZF flag is set if the destination operand and EDX EAX are equal otherwise it is cleared The CF PF AF SF and OF flags are unaffected Additional Itanium System Environment Exceptions Itanium ...

Page 1379: ...cking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment...

Page 1380: ...gister is 80000000H the processor returns the highest value the CPUID instruction recognizes in the EAX register for returning extended function information Always use an EAX parameter value that is equal to or greater than zero and less than or equal to this highest EAX return value for extended function information The CPUID instruction can be executed at any privilege level to serialize instruc...

Page 1381: ...ssors Extended Function CPUID Information 8000000H EAX EBX ECX EDX Maximum Input Value for Extended Function CPUID Information Reserved Reserved Reserved 8000001H EAX EBX ECX EDX Extended Processor Signature and Extended Feature Bits Currently reserved Reserved Reserved Reserved 8000002H EAX EBX ECX EDX Processor Brand String Processor Brand String Continued Processor Brand String Continued Proces...

Page 1382: ... instruction in addition to loading the processor signature in the EAX register loads the EDX register with the feature flags The feature flags when a Flag 1 indicate what features the processor supports Table 2 5 lists the currently defined feature flag values A feature flag set to 1 indicates the corresponding feature is supported Software should identify Intel as the vendor to properly interpre...

Page 1383: ... 13 PGE PTE Global Bit The global bit in page directory entries PDEs and page table entries PTEs is supported indicating TLB entries that are common to different processes and need not be flushed The CR4 PGE bit controls this feature 14 MCA Machine Check Architecture The Machine Check Architecture which provides a compatible mechanism for error reporting is supported The MCG_CAP MSR contains featu...

Page 1384: ...e supported for fast save and restore of the floating point context Presence of this bit also indicates that CR4 OSFXSR is available for an operating system to indicate that it supports the FXSAVE and FXRSTOR instructions 25 SSE SSE The processor supports the SSE extensions 26 SSE2 SSE2 The processor supports the SSE2 extensions 27 SS Self Snoop The processor supports the management of conflicting...

Page 1385: ...6 23 Number of logical processors per physical processor EBX 31 24 Initial APIC ID Reserved for processors based on Itanium architecture ECX Reserved EDX Feature flags BREAK EAX 2H EAX Cache and TLB information EBX Cache and TLB information ECX Cache and TLB information EDX Cache and TLB information BREAK EAX 80000000H EAX Highest extended function input value understood by CPUID EBX Reserved ECX ...

Page 1386: ..._serialize Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Exceptions All Operating Modes None Intel Architecture Compatibility The CPUID instruction is not supported in early models of the Intel486 processor or in any Intel architecture processor earlier than the Intel486 processor The ID flag in the EFLAGS register can be use...

Page 1387: ...end from a doubleword before doubleword division The CWD and CDQ mnemonics reference the same opcode The CWD instruction is intended for use when the operand size attribute is 16 and the CDQ instruction for when the operand size attribute is 32 Some assemblers may force the operand size to 16 when CWD is used and to 32 when CDQ is used Others may treat these mnemonics as synonyms CWD CDQ and use t...

Page 1388: ...4 86 Volume 4 Base IA 32 Instruction Reference CWDE Convert Word to Doubleword See entry for CBW CWDE Convert Byte to Word Convert Word to Doubleword ...

Page 1389: ... IF AL AND 0FH 9 or AF 1 THEN AL AL 6 CF CF OR CarryFromLastAddition CF OR carry from AL AL 6 AF 1 ELSE AF 0 FI IF AL AND F0H 90H or CF 1 THEN AL AL 60H CF 1 ELSE CF 0 FI Example ADD AL BL Before AL 79H BL 35H EFLAGS OSZAPC XXXXXX After AL AEH BL 35H EFLAGS 0SZAPC 110000 DAA Before AL 79H BL 35H EFLAGS OSZAPC 110000 After AL AEH BL 35H EFLAGS 0SZAPC X00111 Flags Affected The CF and AF flags are se...

Page 1390: ... accordingly Operation IF AL AND 0FH 9 OR AF 1 THEN AL AL 6 CF CF OR BorrowFromLastSubtraction CF OR borrow from AL AL 6 AF 1 ELSE AF 0 FI IF AL 9FH or CF 1 THEN AL AL 60H CF 1 ELSE CF 0 FI Example SUB AL BL Before AL 35H BL 47H EFLAGS OSZAPC XXXXXX After AL EEH BL 47H EFLAGS 0SZAPC 010111 DAA Before AL EEH BL 47H EFLAGS OSZAPC 010111 After AL 88H BL 47H EFLAGS 0SZAPC X10111 Flags Affected The CF ...

Page 1391: ...Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is...

Page 1392: ...e Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1393: ...on rather than with the CF flag Operation IF SRC 0 THEN DE divide error FI IF OpernadSize 8 word byte operation THEN temp AX SRC IF temp FFH THEN DE divide error ELSE AL temp AH AX MOD SRC FI ELSE IF OpernadSize 16 doubleword word operation THEN temp DX AX SRC IF temp FFFFH THEN DE divide error ELSE AX temp DX DX AX MOD SRC FI Opcode Instruction Description F6 6 DIV r m8 Unsigned divide AX by r m8...

Page 1394: ... Fault Data Dirty Bit Fault Protected Mode Exceptions DE If the source operand divisor is 0 If the quotient is too large for the designated register GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a pag...

Page 1395: ...is 0 If the quotient is too large for the designated register GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1396: ...frame for an already called procedure An ENTER instruction is commonly followed by a CALL JMP or Jcc instruction to transfer program control to the procedure being called If the nesting level is 0 the processor pushes the frame pointer from the EBP register onto the stack copies the current stack pointer from the ESP register into the EBP register and loads the ESP register with the current stack ...

Page 1397: ... doubleword push ELSE OperandSize 16 Push FrameTemp word push FI GOTO CONTINUE FI CONTINUE IF StackSize 32 THEN EBP FrameTemp ESP EBP Size ELSE StackSize 16 BP FrameTemp SP BP Size FI END Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Da...

Page 1398: ...Stack Frame for Procedure Parameters Continued Protected Mode Exceptions SS 0 If the new value of the SP or ESP register is outside the stack segment limit PF fault code If a page fault occurs Real Address Mode Exceptions None Virtual 8086 Mode Exceptions None ...

Page 1399: ...onentiated using the following formula xy 2 y log 2 x Operation ST 0 2ST 0 1 FPU Flags Affected C1 Set to 0 if stack underflow occurred Indicates rounding direction if the inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Flo...

Page 1400: ... IA 32 Instruction Reference F2XM1 Compute 2x 1 Continued Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1401: ...f stack underflow occurred otherwise cleared to 0 C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions...

Page 1402: ...ng popped In some assemblers the mnemonic for this instruction is FADD rather than FADDP The FIADD instructions convert an integer source operand to extended real format before performing the addition The table on the following page shows the results obtained when adding various classes of numbers assuming that neither overflow nor underflow occurs When the sum of two operands with opposite signs ...

Page 1403: ...if the inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fau...

Page 1404: ...ory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective addres...

Page 1405: ...em Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode E...

Page 1406: ... operand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned mem...

Page 1407: ...memory The following table shows the results obtained when storing various classes of numbers in packed BCD format Notes Fmeans finite real number Dmeans packed BCD number indicates floating point invalid operation IA exception 0 or 1 depending on the rounding mode If the source value is too large for the destination format and the invalid operation exception is not masked an invalid operation exc...

Page 1408: ...Source operand is empty contains a NaN or unsupported format or contains value that exceeds 18 BCD digits in length P Value cannot be represented exactly in destination format Protected Mode Exceptions GP 0 If a segment register is being loaded with a segment selector that points to a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS...

Page 1409: ...ptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1410: ...nBit ST 0 FPU Flags Affected C1 Set to 0 if stack underflow occurred otherwise cleared to 0 C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS ...

Page 1411: ...OE ZE DE IE ES SF and B flags in the FPU status word are cleared The C0 C1 C2 and C3 flags are undefined Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mo...

Page 1412: ...mation with the CPUID instruction see CPUID CPU Identification on page 4 78 If both the CMOV and FPU feature bits are set the FCMOVcc instructions are supported Operation IF condition TRUE ST 0 ST i FI FPU Flags Affected C1 Set to 0 if stack underflow occurred C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Regis...

Page 1413: ...on Reference 4 111 FCMOVcc Floating point Conditional Move Continued Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1414: ...ssor marks the ST 0 register as empty and increments the stack pointer TOP by 1 The FCOM instructions perform the same operation as the FUCOM instructions The only difference is how they handle QNaN operands The FCOM instructions raise an invalid arithmetic operand exception IA when either or both of the operands is a NaN value or is in an unsupported format The FUCOM instructions perform the same...

Page 1415: ...ed otherwise cleared to 0 C0 C2 C3 See table on previous page Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault...

Page 1416: ...nabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES ...

Page 1417: ... generate an invalid arithmetic operand exception for QNaNs If invalid operation exception is unmasked the status flags are not set if the invalid arithmetic operand exception is generated The FCOMIP and FUCOMIP instructions also pop the register stack following the comparison operation To pop the register stack the processor marks the ST 0 register as empty and increments the stack pointer TOP by...

Page 1418: ... ZF PF CF 111 FI FI FI IF instruction is FUCOMI or FUCOMIP THEN IF ST 0 or ST i QNaN but not SNaN or unsupported format THEN ZF PF CF 111 ELSE ST 0 or ST i is SNaN or unsupported format IA IF FPUControlWord IM 1 THEN ZF PF CF 111 FI FI FI IF instruction is FCOMIP or FUCOMIP THEN PopRegisterStack FI FPU Flags Affected C1 Set to 0 if stack underflow occurred otherwise cleared to 0 C0 C2 C3 Not affec...

Page 1419: ...ne or both operands are NaN values or have unsupported formats FUCOMI or FUCOMIP instruction One or both operands are SNaN values but not QNaNs or have undefined formats Detection of a QNaN value does not raise an invalid operand exception Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is se...

Page 1420: ...ource operand is outside the acceptable range the C2 flag in the FPU status word is set and the value in register ST 0 remains unchanged The instruction does not raise an exception when the source operand is out of range It is up to the program to check the C2 flag for out of range conditions Source values outside the range 263 to 263 can be reduced to the range of the instruction by subtracting a...

Page 1421: ...d Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred IA Source operand is an SNaN value or unsupported format D Result is a denormal value U Result is too small for destination format P Value cannot be represented exactly in destination format Protected M...

Page 1422: ...lags Affected The C1 flag is set to 0 otherwise cleared to 0 The C0 C2 and C3 flags are undefined Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exce...

Page 1423: ...the floating point divide instructions always results in the register stack being popped In some assemblers the mnemonic for this instruction is FDIV rather than FDIVP The FIDIV instructions convert an integer source operand to extended real format before performing the division When the source operand is an integer 0 it is treated as a 0 If an unmasked divide by zero exception Z is generated no r...

Page 1424: ...struction is FIDIV THEN DEST DEST ConvertExtendedReal SRC ELSE source operand is real number DEST DEST SRC FI FI IF instruction FDIVP THEN PopRegisterStack FI FPU Flags Affected C1 Set to 0 if stack underflow occurred Indicates rounding direction if the inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined DEST F 0 0 F NaN 0 0 0 0 NaN F F 0 0 F NaN I F 0 0 F NaN SRC 0 ...

Page 1425: ...ation format Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory referen...

Page 1426: ... marks the ST 0 register as empty and increments the stack pointer TOP by 1 The no operand version of the floating point divide instructions always results in the register stack being popped In some assemblers the mnemonic for this instruction is FDIVR rather than FDIVRP The FIDIVR instructions convert an integer source operand to extended real format before performing the division If an unmasked ...

Page 1427: ...Operation IF DEST 0 THEN Z ELSE IF instruction is FIDIVR THEN DEST ConvertExtendedReal SRC DEST ELSE source operand is real number DEST SRC DEST FI FI IF instruction FDIVRP THEN PopRegisterStack FI FPU Flags Affected C1 Set to 0 if stack underflow occurred Indicates rounding direction if the inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined DEST F 0 0 F NaN NaN SRC...

Page 1428: ...estination format Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory re...

Page 1429: ...ted Operation TAG i 11B FPU Flags Affected C0 C1 C2 C3 undefined Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set Opcode Instruction Description ...

Page 1430: ...0 The FICOMP instructions pop the register stack following the comparison To pop the register stack the processor marks the ST 0 register empty and increments the stack pointer TOP by 1 Operation CASE relation of operands OF ST 0 SRC C3 C2 C0 000 ST 0 SRC C3 C2 C0 001 ST 0 SRC C3 C2 C0 100 Unordered C3 C2 C0 111 ESAC IF instruction FICOMP THEN PopRegisterStack FI FPU Flags Affected C1 Set to 0 if ...

Page 1431: ...ess is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions...

Page 1432: ...Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Floating point Exceptions IS Stack overflow occurred Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register c...

Page 1433: ...nd effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory re...

Page 1434: ... marked empty Operation IF TOP 7 THEN TOP 0 ELSE TOP TOP 1 FI FPU Flags Affected The C1 flag is set to 0 otherwise generates an IS fault The C0 C2 and C3 flags are undefined Floating point Exceptions IS Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM...

Page 1435: ...ting point exceptions before performing the initialization the FNINIT instruction does not Operation FPUControlWord 037FH FPUStatusWord 0 FPUTagWord FFFFH FPUDataPointer 0 FPUInstructionPointer 0 FPULastInstructionOpcode 0 FPU Flags Affected C0 C1 C2 C3 cleared to 0 Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR ...

Page 1436: ...l number Imeans integer indicates floating point invalid operation IA exception 0 or 1 depending on the rounding mode If the source value is a non integral value it is rounded to an integer value according to the rounding mode specified by the RC field of the FPU control word If the value being stored is too large for the destination format is an is a NaN or is in an unsupported format and if the ...

Page 1437: ...ey Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Floating point Exceptions IS Stack underflow occurred IA Source operand is too large for the destination format Source operand is a NaN value or unsupported format P Value cannot be represented exactly in destination format Protected Mode Exceptions GP 0 If the destination is located in a no...

Page 1438: ...erand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory...

Page 1439: ...em Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Floating point E...

Page 1440: ...s AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective...

Page 1441: ...ANT FPU Flags Affected C1 Set to 1 if stack overflow occurred otherwise cleared to 0 C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Floating point Exceptions IS Stack overflow occurred Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Opcode Instruction Descri...

Page 1442: ...FLDLN2 FLDZ Load Constant Continued Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set Intel Architecture Compatibility Information When the RC field is set to round to nearest the FPU produces the same constants that is produced by the Intel 8087 and Intel287 math coprocessors ...

Page 1443: ...ight unmask a pending exception in the FPU status word That exception is then generated upon execution of the next waiting floating point instruction Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consum...

Page 1444: ...erand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory...

Page 1445: ...ception will be generated upon execution of the next floating point instruction except for the no wait floating point instructions To avoid generating exceptions when loading a new environment clear all the exception flags in the FPU status word that is being loaded Operation FPUControlWord SRC FPUControlWord FPUStatusWord SRC FPUStatusWord FPUTagWord SRC FPUTagWord FPUDataPointer SRC FPUDataPoint...

Page 1446: ... checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside...

Page 1447: ...the floating point multiply instructions always results in the register stack being popped In some assemblers the mnemonic for this instruction is FMUL rather than FMULP The FIMUL instructions convert an integer source operand to extended real format before performing the multiplication The sign of the result is always the exclusive OR of the source signs even if one or more of the values being mu...

Page 1448: ...occurred Indicates rounding direction if the inexact result exception P fault is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Floating point Exceptions IS Stack underflow occurred IA Operand is an SNaN value or unsupported format One operand is 0 and the other is D Source operand is a denormal value U Result is too small for destination format O Result is too large for destination format P...

Page 1449: ... memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES F...

Page 1450: ...P register FPU Flags Affected C0 C1 C2 C3 undefined Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set Opcode Instruction Description D9 D0 FNOP No...

Page 1451: ...restriction on the range of source operands that FPATAN can accept Operation ST 1 arctan ST 1 ST 0 PopRegisterStack FPU Flags Affected C1 Set to 0 if stack underflow occurred Indicates rounding direction if the inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl ...

Page 1452: ...U Result is too small for destination format P Value cannot be represented exactly in destination format Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set Intel Architecture Compatibility Information The source operands for this instruction are restricted for the 80287 math coprocessor t...

Page 1453: ...ollowing table shows the results obtained when computing the remainder of various classes of numbers assuming that underflow does not occur Notes Fmeans finite real number indicates floating point invalid arithmetic operand IA exception indicates floating point zero divide Z exception When the result is 0 its sign is the same as that of the dividend When the modulus is the result is equal to the v...

Page 1454: ...h in between the instructions in the loop An important use of the FPREM instruction is to reduce the arguments of periodic functions When reduction is complete the instruction stores the three least significant bits of the quotient in the C3 C1 and C0 flags of the FPU status word This information is important in argument reduction for the tangent function using a modulus of 4 because it locates th...

Page 1455: ...urred IA Source operand is an SNaN value modulus is 0 dividend is or unsupported format D Source operand is a denormal value U Result is too small for destination format Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1456: ...ntrol has no effect The following table shows the results obtained when computing the remainder of various classes of numbers assuming that underflow does not occur Notes Fmeans finite real number indicates floating point invalid arithmetic operand IA exception indicates floating point zero divide Z exception When the result is 0 its sign is the same as that of the dividend When the modulus is the...

Page 1457: ...An important use of the FPREM1 instruction is to reduce the arguments of periodic functions When reduction is complete the instruction stores the three least significant bits of the quotient in the C3 C1 and C0 flags of the FPU status word This information is important in argument reduction for the tangent function using a modulus of 4 because it locates the original angle in the correct one of ei...

Page 1458: ...ed IA Source operand is an SNaN value modulus divisor is 0 dividend is or unsupported format D Source operand is a denormal value U Result is too small for destination format Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1459: ...and is out of range It is up to the program to check the C2 flag for out of range conditions Source values outside the range 263 to 263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2 or by using the FPREM instruction with a divisor of 2 The value 1 0 is pushed onto the register stack after the tangent has been computed to maintain compatibility w...

Page 1460: ... Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred IA Source operand is an SNaN value or unsupported format D Source operand is a denormal value U Result is too small for destination format P Value cannot be represented exactly in destination f...

Page 1461: ...tack underflow occurred Indicates rounding direction if the inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Floating point Exceptions IS Stack underflow occurred IA Source operand is an SNaN value or unsupported format D Source operand is a denormal value P Source operand is not an integral value Additional Itanium System Environment Exceptions Itanium Reg Faults...

Page 1462: ...ruction should be executed in the same operating mode as the corresponding FSAVE FNSAVE instruction If one or more unmasked exception bits are set in the new FPU status word a floating point exception will be generated To avoid raising exceptions when loading a new operating environment clear all the exception flags in the FPU status word that is being loaded Operation FPUControlWord SRC FPUContro...

Page 1463: ...ontains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment ...

Page 1464: ...VE instruction in the instruction stream have been executed After the FPU state has been saved the FPU is reset to the same default values it is set to with the FINIT FNINIT instructions see FINIT FNINIT Initialize Floating point Unit on page 4 133 The FSAVE FNSAVE instructions are typically used when the operating system needs to perform a context switch an exception handler needs to use the FPU ...

Page 1465: ...mission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside t...

Page 1466: ...limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made Intel Architecture Compatibility Information For Intel math coprocessors and FPUs prior to the Pentium processor an FWAIT instruction should be executed before attempting to read from the memory image stored with a prior FSAVE FNSAVE instruction This ...

Page 1467: ...eal number Nmeans integer In most cases only the exponent is changed and the mantissa significand remains unchanged However when the value being scaled in ST 0 is a denormal value the mantissa is also changed and the result may turn out to be a normalized number Similarly if overflow or underflow results from a scale operation the resulting mantissa will differ from the source s mantissa The FSCAL...

Page 1468: ...e operand is an SNaN value or unsupported format D Source operand is a denormal value U Result is too small for destination format O Result is too large for destination format P Value cannot be represented exactly in destination format Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Protected Mode Excepti...

Page 1469: ...operand is outside the acceptable range the C2 flag in the FPU status word is set and the value in register ST 0 remains unchanged The instruction does not raise an exception when the source operand is out of range It is up to the program to check the C2 flag for out of range conditions Source values outside the range 263 to 263 can be reduced to the range of the instruction by subtracting an appr...

Page 1470: ...ndefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred IA Source operand is an SNaN value or unsupported format D Source operand is a denormal value P Value cannot be represented exactly in destination format Protected Mode Exceptions NM EM or TS in ...

Page 1471: ...f the source operand is outside the acceptable range the C2 flag in the FPU status word is set and the value in register ST 0 remains unchanged The instruction does not raise an exception when the source operand is out of range It is up to the program to check the C2 flag for out of range conditions Source values outside the range 263 to 263 can be reduced to the range of the instruction by subtra...

Page 1472: ... Undefined Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred IA Source operand is an SNaN value or unsupported format D Source operand is a denormal value U Result is too small for destination format P Value cannot be represented exactly in destination f...

Page 1473: ...IA exception Operation ST 0 SquareRoot ST 0 FPU Flags Affected C1 Set to 0 if stack underflow occurred Indicates rounding direction if inexact result exception P is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Floating point Exceptions IS Stack underflow occurred IA Source operand is an SNaN value or unsupported format Source operand is a negative value except for 0 D Source operand is a d...

Page 1474: ...anium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1475: ...nding mode specified by the RC field of the FPU control word and the exponent is converted to the width and bias of the destination format If the value being stored is too large for the destination format a numeric overflow exception O is generated and if the exception is unmasked no value is stored in the destination operand If the value being stored is a denormal value the denormal exception D i...

Page 1476: ...ult Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS regi...

Page 1477: ... GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1478: ...n Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a m...

Page 1479: ...y operand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned me...

Page 1480: ...nt instructions preceding the FSTENV FNSTENV instruction in the instruction stream have been executed These instructions are often used by exception handlers because they provide access to the FPU instruction and data pointers The environment is typically saved in the procedure stack Masking all exceptions after saving the environment prevents floating point exceptions from interrupting the except...

Page 1481: ...is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outsi...

Page 1482: ...r instructions The status stored in the AX register is thus guaranteed to be from the completion of the prior FPU instruction Operation DEST FPUStatusWord FPU Flags Affected The C0 C1 C2 and C3 are undefined Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Itanium Mem FaultsVHPT Data Fault Nested TLB Fault...

Page 1483: ...ault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions GP 0 If a memory operand...

Page 1484: ...r this instruction is FSUB rather than FSUBP The FISUB instructions convert an integer source operand to extended real format before performing the subtraction The following table shows the results obtained when subtracting various classes of numbers from one another assuming that neither overflow nor underflow occurs Here the SRC value is subtracted from the DEST value DEST SRC result When the di...

Page 1485: ...tes rounding direction if the inexact result exception P fault is generated 0 not roundup 1 roundup C0 C2 C3 Undefined Floating point Exceptions IS Stack underflow occurred IA Operand is an SNaN value or unsupported format Operands are infinities of like sign D Source operand is a denormal value U Result is too small for destination format O Result is too large for destination format P Value canno...

Page 1486: ... memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES F...

Page 1487: ...op the register stack the processor marks the ST 0 register as empty and increments the stack pointer TOP by 1 The no operand version of the floating point reverse subtract instructions always results in the register stack being popped In some assemblers the mnemonic for this instruction is FSUBR rather than FSUBRP The FISUBR instructions convert an integer source operand to extended real format b...

Page 1488: ...ion is generated Notes Fmeans finite real number Imeans integer indicates floating point invalid arithmetic operand IA exception Operation IF instruction is FISUBR THEN DEST ConvertExtendedReal SRC DEST ELSE source operand is real number DEST SRC DEST FI IF instruction FSUBRP THEN PopRegisterStack FI FPU Flags Affected C1 Set to 0 if stack underflow occurred Indicates rounding direction if the ine...

Page 1489: ...ault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit NM EM or TS in CR0 is set PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an ...

Page 1490: ...eration CASE relation of operands OF Not comparable C3 C2 C0 111 ST 0 0 0 C3 C2 C0 000 ST 0 0 0 C3 C2 C0 001 ST 0 0 0 C3 C2 C0 100 ESAC FPU Flags Affected C1 Set to 0 if stack underflow occurred otherwise cleared to 0 C0 C2 C3 See above table Floating point Exceptions IS Stack underflow occurred IA One or both operands are NaN values or have unsupported formats D One or both operands are denormal ...

Page 1491: ...Volume 4 Base IA 32 Instruction Reference 4 189 FTST TEST Continued Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1492: ...operands is a NaN value of any kind or is in an unsupported format As with the FCOM instructions if the operation results in an invalid arithmetic operand exception being raised the condition code flags are set only if the exception is masked The FUCOMP instructions pop the register stack following the comparison operation and the FUCOMPP instructions pops the register stack twice following the co...

Page 1493: ...0 C2 C3 See table on previous page Floating point Exceptions IS Stack underflow occurred IA One or both operands are SNaN values or have unsupported formats Detection of a QNaN value in and of itself does not raise an invalid operand exception D One or both operands are denormal values Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Na...

Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...

Page 1495: ... class of value or number in ST 0 OF Unsupported C3 C2 C0 000 NaN C3 C2 C0 001 Normal C3 C2 C0 010 Infinity C3 C2 C0 011 Zero C3 C2 C0 100 Empty C3 C2 C0 101 Denormal C3 C2 C0 110 ESAC FPU Flags Affected C1 Sign of value in ST 0 C0 C2 C3 See table above Floating point Exceptions None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT ...

Page 1496: ...Base IA 32 Instruction Reference FXAM Examine Continued Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1497: ...s the square root of the third register from the top of the register stack FXCH ST 3 FSQRT FXCH ST 3 Operation IF number of operands is 1 THEN temp ST 0 ST 0 SRC SRC temp ELSE temp ST 0 ST 0 ST 1 ST 1 temp FI FPU Flags Affected C1 Set to 0 if stack underflow occurred otherwise cleared to 0 C0 C2 C3 Undefined Floating point Exceptions IS Stack underflow occurred Additional Itanium System Environmen...

Page 1498: ...4 196 Volume 4 Base IA 32 Instruction Reference FXCH Exchange Register Contents Continued Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1499: ...ling operations The FXTRACT instruction is also useful for converting numbers in extended real format to decimal representations e g for printing or displaying If the floating point zero divide exception Z is masked and the source operand is zero an exponent value of is stored in register ST 1 and 0 with the sign of the source operand is stored in register ST 0 Operation TEMP Significand ST 0 ST 0...

Page 1500: ...truction Reference FXTRACT Extract Exponent and Significand Continued Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode Exceptions NM EM or TS in CR0 is set Virtual 8086 Mode Exceptions NM EM or TS in CR0 is set ...

Page 1501: ...s masked and register ST 0 contains 0 the instruction returns with a sign that is the opposite of the sign of the source operand in register ST 1 The FYL2X instruction is designed with a built in multiplication to optimize the calculation of logarithms with an arbitrary positive base b logbx log2b 1 log2x Operation ST 1 ST 1 log2ST 0 PopRegisterStack FPU Flags Affected C1 Set to 0 if stack underfl...

Page 1502: ...rand is an SNaN or unsupported format Source operand in register ST 0 is a negative finite value not 0 Z Source operand in register ST 0 is 0 D Source operand is a denormal value U Result is too small for destination format O Result is too large for destination format P Value cannot be represented exactly in destination format Protected Mode Exceptions NM EM or TS in CR0 is set Real Address Mode E...

Page 1503: ...ST 0 that are close to 0 When the epsilon value is small more significant digits can be retained by using the FYL2XP1 instruction than by using 1 as an argument to the FYL2X instruction The 1 expression is commonly found in compound interest and annuity calculations The result can be simply converted into a value in another logarithm base by including a scale factor in the ST 1 source operand The ...

Page 1504: ...Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Floating point Exceptions IS Stack underflow occurred IA Either operand is an SNaN value or unsupported format D Source operand is a denormal value U Result is too small for destination format O Result is too large for destination format P Value cannot be represented exactly in destination format Protected Mode Exceptions NM...

Page 1505: ...rivileged instruction When the processor is running in protected or virtual 8086 mode the privilege level of a program or procedure must to 0 to execute the HLT instruction Operation IF Itanium System Environment THEN IA 32_Intercept INST HALT Enter Halt state Flags Affected None Additional Itanium System Environment Exceptions IA 32_Intercept Mandatory Instruction Intercept Protected Mode Excepti...

Page 1506: ...F OpernadSize 8 word byte operation THEN temp AX SRC signed division IF temp 7FH OR temp 80H if a positive result is greater than 7FH or a negative result is less than 80H THEN DE divide error ELSE AL temp AH AX SignedModulus SRC FI ELSE IF OpernadSize 16 doubleword word operation THEN Opcode Instruction Description F6 7 IDIV r m8 Signed divide AX where AH must contain sign extension of AL by r m ...

Page 1507: ...ons Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions DE If the source operand divisor is 0 The signed result...

Page 1508: ...limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions DE If the source operand divisor is 0 The signed result quotient is too large for the destination GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault...

Page 1509: ...cond source operand an immediate value The product is then stored in the destination operand a general purpose register When an immediate value is used as an operand it is sign extended to the length of the destination operand format The CF and OF flags are set when significant bits are carried into the upper half of the result The CF and OF flags are cleared when the result fits exactly in the lo...

Page 1510: ... the operands are signed or unsigned The CF and OF flags however cannot be used to determine if the upper half of the result is non zero Operation IF NumberOfOperands 1 THEN IF OperandSize 8 THEN AX AL SRC signed multiplication IF AH 00H OR AH FFH THEN CF 0 OF 0 ELSE CF 1 OF 1 FI ELSE IF OperandSize 16 THEN DX AX AX SRC signed multiplication IF DX 0000H OR DX FFFFH THEN CF 0 OF 0 ELSE CF 1 OF 1 FI...

Page 1511: ...ault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code ...

Page 1512: ...onment I O port references are mapped into the 64 bit virtual address pointed to by the IOBase register with four ports per 4K byte virtual page Operating systems can utilize the TLB in the Itanium architecture to grant or deny permission to any four I O ports The I O port space can be mapped into any arbitrary 64 bit physical memory location by operating system code If CFLG io is 1 and CPL IOPL t...

Page 1513: ...LB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault IA_32_Exception Debug traps for data breakpoints and single step IA_32_Exception Alignment faults GP 0 Referenced Port is to an unimplemented virtual address or PSR dt is zero Protected Mode E...

Page 1514: ...mission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the operand is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside t...

Page 1515: ...e Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1516: ... The EDI register is incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The INS INSB INSW and INSD instructions can be preceded by the REP prefix for block input of ECX bytes words or doublewords This instruction is only useful for accessing I O ports located in the processor s I O address space I O transactions are performed after all pr...

Page 1517: ...PL THEN Protected mode or virtual 8086 mode with CPL IOPL IF CFLG io AND Any I O Permission Bit for I O port being accessed 1 THEN GP 0 FI ELSE I O operation is allowed FI IF Itanium_System_Environment THEN SRC_VA IOBase Port 15 2 12 Port 11 0 SRC_PA translate SRC_VA DEST SRC_PA Reads from I O port FI memory_fence DEST SRC memory_fence IF byte transfer THEN IF DF 0 THEN E DI 1 ELSE E DI 1 FI ELSE ...

Page 1518: ...is greater than has less privilege the I O privilege level IOPL and any of the corresponding I O permission bits in TSS for the I O port being accessed is 1 and when CFLG io is 1 If the destination is located in a nonwritable segment If an illegal memory operand effective address in the ES segments is given PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned...

Page 1519: ...ing the INTO and INT3 instructions is similar to that of a far call made with the CALL instruction The primary difference is that with the INTn instruction the EFLAGS register is pushed onto the stack before the return address The return address is a far address consisting of the current values of the CS and EIP registers Returns from interrupt procedures are handled with the IRET instruction whic...

Page 1520: ...ase linear address and limit of the IDT The initial base address value of the IDTR after the processor is powered up or reset is 0 Operation The following operational description applies not only to the INTn and INTO instructions but also to external interrupts and exceptions IF Itanium System EnvironmentTHEN IF INT3 Form THEN IA_32_Exception 3 IF INTO Form THEN IA_32_Exception 4 IF INT Form THEN ...

Page 1521: ...ECTED MODE IF DEST 8 7 is not within IDT limits OR selected IDT descriptor is not an interrupt trap or task gate type THEN GP DEST 8 2 EXT EXT is bit 0 in error code FI IF software interrupt generated by INTn INT3 or INTO THEN IF gate descriptor DPL CPL THEN GP vector number 8 2 PE 1 DPL CPL software interrupt FI FI IF gate not present THEN NP vector number 8 2 EXT FI IF task gate specified in the...

Page 1522: ...T FI Read trap or interrupt handler descriptor IF descriptor does not indicate a code segment OR code segment descriptor DPL CPL THEN GP selector EXT FI IF trap or interrupt gate segment is not present THEN NP selector EXT FI IF code segment is non conforming AND DPL CPL THEN IF VM 0 THEN GOTO INTER PRIVILEGE LEVEL INTERRUPT PE 1 interrupt or trap gate nonconforming code segment DPL CPL VM 0 ELSE ...

Page 1523: ...elector s RPL DPL of code segment THEN TS SS selector EXT FI Read segment descriptor for stack segment in GDT or LDT IF stack segment DPL DPL of code segment OR stack segment does not indicate writable data segment THEN TS SS selector EXT FI IF stack segment not present THEN SS SS selector EXT FI IF 32 bit gate THEN IF new stack does not have room for 24 bytes error code pushed OR 20 bytes no erro...

Page 1524: ...ackAddress 7 TSS limit THEN TS current TSS selector FI NewSS TSSstackAddress 4 NewESP stack address ELSE TSS is 16 bit TSSstackAddress new code segment DPL 4 2 IF TSSstackAddress 4 TSS limit THEN TS current TSS selector FI NewESP TSSstackAddress NewSS TSSstackAddress 2 FI IF segment selector is null THEN TS EXT FI IF segment selector index is not within its descriptor table limits OR segment selec...

Page 1525: ...ERURPT_TO_PRIV0 FI ELSE Gate DPL 3 GP 0 FI ELSE IOPL 3 GP 0 FI ELSE VME 1 Check whether interrupt is directed for INT n instruction only executes virtual 8086 interupt protected mode interrupt or faults Ptr TSS 66 Fetch IO permission bitmpa pointer IF BIT Ptr 32 N 0 software redirection bitmap is 32 bytes below IO Permission THEN Interrupt redirected Goto VM86_INTERRUPT_TO_VM86 ELSE IF IOPL 3 THEN...

Page 1526: ...d mode FS 0 DS 0 ES 0 CS Gate CS IF OperandSize 32 THEN EIP Gate instruction pointer ELSE OperandSize is 16 EIP Gate instruction pointer AND 0000FFFFH FI Starts execution of new routine in Protected Mode END VM86_INTERRUPT_TO_VM86 IF IOPL 3 THEN push FLAGS OR 3000H Push FLAGS w IOPL bits as 11B or IOPL 3 push CS push IP CS N 4 2 N is vector num read from interrupt table IP N 4 FLAGS FLAGS AND 7CD5...

Page 1527: ...truction 3 words padded to 4 CS EIP Gate CS EIP segment descriptor information also loaded Push ErrorCode if any ELSE 16 bit gate Push FLAGS Push far pointer to return location 2 words CS IP Gate CS IP segment descriptor information also loaded Push ErrorCode if any FI CS RPL CPL IF interrupt gate THEN IF 0 FI TF 0 NT 0 VM 0 RF 0 FI END Flags Affected The EFLAGS register is pushed onto stack The I...

Page 1528: ...of the stack segment and no stack switch occurs SS selector If the SS register is being loaded and the segment pointed to is marked not present If pushing the return address flags error code or stack segment pointer exceeds the bounds of the stack segment NP selector If code segment interrupt trap or task gate or TSS is not present TS selector If the RPL of the stack segment selector in the TSS is...

Page 1529: ... not point to a segment descriptor for a code segment If the segment selector for a TSS has its local global bit set for local SS selector If the SS register is being loaded and the segment pointed to is marked not present If pushing the return address flags error code stack segment pointer or data segments exceeds the bounds of the stack segment NP selector If code segment interrupt trap or task ...

Page 1530: ...implemented differently on future Intel architecture processors Use this instruction with care Data cached internally and not written back to main memory will be lost Unless there is a specific requirement or benefit to flushing caches without writing back modified cache lines for example testing or fault recovery where cache coherency with main memory is not a concern software should use the WBIN...

Page 1531: ...e IA 32 Instruction Reference 4 229 INVD Invalidate Internal Caches Continued Intel Architecture Compatibility This instruction is not supported on Intel architecture processors earlier than the Intel486 processor ...

Page 1532: ...he INVLPG instruction normally flushes the TLB entry only for the specified page however in some cases it flushes the entire TLB Operation IF Itanium System Environment THEN IA 32_Intercept INST INVLPG Flush RelevantTLBEntries Continue Continue execution Flags Affected None Additional Itanium System Environment Exceptions IA 32_Intercept Mandatory Instruction Intercept Protected Mode Exceptions GP...

Page 1533: ...ng on the setting of these flags the processor performs the following types of interrupt returns Real Mode Return from virtual 8086 mode Return to virtual 8086 mode Intra privilege level return Inter privilege level return Return from nested task task switch All forms of IRET result in an IA 32_Intercept Inst IRET in the Itanium System Environment If the NT flag EFLAGS register is cleared the IRET...

Page 1534: ...st IRET IF PE 0 THEN GOTO REAL ADDRESS MODE ELSE GOTO PROTECTED MODE FI REAL ADDRESS MODE IF OperandSize 32 THEN IF top 12 bytes of stack not within stack limits THEN SS FI IF instruction pointer not within code segment limits THEN GP 0 FI EIP Pop CS Pop 32 bit pop high order 16 bits discarded tempEFLAGS Pop EFLAGS tempEFLAGS AND 257FD5H OR EFLAGS AND 1A0000H ELSE OperandSize 16 IF top 6 bytes of ...

Page 1535: ...RET is executed and stays in virtual 8086 mode IF CR4 VME 0 THEN IF IOPL 3 Virtual mode PE 1 VM 1 IOPL 3 THEN IF OperandSize 32 THEN IF top 12 bytes of stack not within stack limits THEN SS 0 FI IF instruction pointer not within code segment limits THEN GP 0 FI EIP Pop CS Pop 32 bit pop high order 16 bits discarded EFLAGS Pop VM IOPL VIP and VIF EFLAGS bits are not modified by pop ELSE OperandSize...

Page 1536: ...GP 0 ELSE IP Pop Word Pops CS Pop 0 TempFlags Pop FLAGS IOPL IF and TF are not modified FLAGS FLAGS AND 3302H OR TempFlags AND 4CD5H EFLAGS VIF TempFlags IF FI ELSE OperandSize 32 GP 0 FI FI END RETURN TO VIRTUAL 8086 MODE Interrupted procedure was in virtual 8086 mode PE 1 VM 1 in flags image IF top 24 bytes of stack are not within stack segment limits THEN SS 0 FI IF instruction pointer not with...

Page 1537: ...0 FI IF return code segment selector addrsses descriptor beyond descriptor table limit THEN GP selector FI Read segment descriptor pointed to by the return code segment selector IF return code segment descriptor is not a code segment THEN GP selector FI IF return code segment selector RPL CPL THEN GP selector FI IF return code segment descriptor is conforming AND return code segment DPL return cod...

Page 1538: ...F stack segment selector RPL RPL of the return code segment selector IF stack segment selector RPL RPL of the return code segment selector OR the stack segment descriptor does not indicate a a writable data segment OR stack segment DPL RPL of the return code segment selector THEN GP SS selector FI IF stack segment is not present THEN NP SS selector FI IF tempEIP is not within code segment limit TH...

Page 1539: ...n pointer is not within the return code segment limit GP selector If a segment selector index is outside its descriptor table limits If the return code segment selector RPL is greater than the CPL If the DPL of a conforming code segment is greater than the return code segment selector RPL If the DPL for a nonconforming code segment is not equal to the RPL of the code segment selector If the stack ...

Page 1540: ...e segment limit SS If the top bytes of stack are not within stack limits Virtual 8086 Mode Exceptions GP 0 If the return instruction pointer is not within the return code segment limit IF IOPL not equal to 3 PF fault code If a page fault occurs SS 0 If the top bytes of stack are not within stack limits AC 0 If an unaligned memory reference occurs and alignment checking is enabled ...

Page 1541: ...y CF 0 75 cb JNE rel8 Jump short if not equal ZF 0 7E cb JNG rel8 Jump short if not greater ZF 1 or SF OF 7C cb JNGE rel8 Jump short if not greater or equal SF OF 7D cb JNL rel8 Jump short if not less SF OF 7F cb JNLE rel8 Jump short if not less or equal ZF 0 and SF OF 71 cb JNO rel8 Jump short if not overflow OF 0 7B cb JNP rel8 Jump short if not parity PF 0 79 cb JNS rel8 Jump short if not sign ...

Page 1542: ...less and greater are used for comparisons of signed integers and the terms above and below are used for unsigned integers Opcode Instruction Description 0F 8D cw cd JGE rel16 32 Jump near if greater or equal SF OF 0F 8C cw cd JL rel16 32 Jump near if less SF OF 0F 8E cw cd JLE rel16 32 Jump near if less or equal ZF 1 or SF OF 0F 86 cw cd JNA rel16 32 Jump near if not above CF 1 or ZF 1 0F 82 cw cd...

Page 1543: ...FARLABEL BEYOND The JECXZ and JCXZ instructions differs from the other Jcc instructions because they do not check the status flags Instead they check the contents of the ECX and CX registers respectively for 0 These instructions are useful at the beginning of a conditional loop that terminates with a conditional loop instruction such as LOOPNE They prevent entering the loop when the ECX or CX regi...

Page 1544: ...the CS segment or is outside of the effective address space from 0 to FFFFH This condition can occur if 32 address size override prefix is used Virtual 8086 Mode Exceptions GP 0 If the offset being jumped to is beyond the limits of the CS segment or is outside of the effective address space from 0 to FFFFH This condition can occur if 32 address size override prefix is used ...

Page 1545: ...set from the base of the code segment or a relative offset a signed offset relative to the current value of the instruction pointer in the EIP register An absolute address is specified directly in a register or indirectly in a memory location r m16 or r m32 operand form A relative offset rel8 rel16 or rel32 is generally specified as a label in assembly code but at the machine code level it is enco...

Page 1546: ...he previous paragraph except that the processor checks the access rights of the code segment being jumped to An far jump through a call gate A task switch Results in an IA 32_Intercept Gate in Itanium System Environment The JMP instruction cannot be used to perform inter privilege level jumps When executing an far jump through a call gate the segment selector specified by the target operand identi...

Page 1547: ...32 or m16 32 IF OperandSize 32 THEN EIP tempEIP DEST is ptr16 32 or m16 32 ELSE OperandSize 16 EIP tempEIP AND 0000FFFFH clear upper 16 bits FI IF Itanium System Environment AND PSR tb THEN IA_32_Exception Debug FI IF far call AND PE 1 AND VM 0 Protected mode not virtual 8086 mode THEN IF effective address in the CS DS ES FS GS or SS segment is illegal OR segment selector in target operand null TH...

Page 1548: ...P tempEIP IF Itanium System Environment AND PSR tb THEN IA_32_Exception Debug END CALL GATE IF call gate DPL CPL OR call gate DPL call gate segment selector RPL THEN GP call gate selector FI IF call gate not present THEN NP call gate selector FI IF Itanium System Environment THEN IA 32_Intercept Gate JMP IF call gate code segment selector is null THEN GP 0 FI IF call gate code segment selector ind...

Page 1549: ...selector FI IF Itanium System Environment THENIA 32_Intercept Gate JMP SWITCH TASKS to TSS IF EIP not within code segment limit THEN GP 0 FI END Flags Affected All flags are affected if a task switch occurs no flags are affected if a task switch does not occur Additional Itanium System Environment Exceptions Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault...

Page 1550: ...he segment selector for a TSS has its local global bit set for local If a TSS segment descriptor specifies that the TSS is busy or not available SS 0 If a memory operand effective address is outside the SS segment limit NP selector If the code segment being accessed is not present If call gate task gate or TSS not present PF fault code If a page fault occurs AC 0 If alignment checking is enabled a...

Page 1551: ...4 bit virtual address space within virtual region 0 GR 1 is loaded with the next sequential instruction address following JMPE If PSR di is 1 the instruction is nullified and a Disabled Instruction Set Transition fault is generated If Itanium branch debugging is enabled an IA_32_Exception Debug trap is taken after JMPE completes execution JMPE can be performed at any privilege level and does not c...

Page 1552: ...3 32 0 zero extend from 32 bits to 64 bits GR 1 31 0 EIP AR CSD base next sequential instruction address GR 1 63 32 0 PSR id EFLAG rf 0 IF PSR tb taken branch trap IA_32_Exception Debug Flags Affected None other than EFLAG rf Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Fault Disabled ISA Disabled Instruction Set Transition Fault if PSR di is 1 IA_32...

Page 1553: ...nd 5 of the EFLAGS register are set in the AH register as shown in the Operation below Operation AH EFLAGS SF ZF 0 AF 0 PF 1 CF Flags Affected None that is the state of the flags in the EFLAGS register are not affected Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Exceptions All Operating Modes None Opcode Instruction Description 9F LAHF Load AH...

Page 1554: ...pe and DPL fields Here the two lower order bytes of the doubleword are masked by FF00H before being loaded into the destination operand This instruction performs the following checks before it loads the access rights in the destination register Checks that the segment selector is not null Checks that the segment selector points to a descriptor that is within the limits of the GDT or LDT being acce...

Page 1555: ...tem Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Table 2 15 LAR Descriptor Validity Type Name Valid 0 Reser...

Page 1556: ...ull segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Only occurs when fetching target from memory Real Address Mode Exceptions UD The LAR instruction is not recognized in real address mode Virtual 8086...

Page 1557: ... ES FS or GS registers without causing a protection exception Any subsequent reference to a segment whose corresponding segment register is loaded with a null selector causes a general protection exception GP and no memory reference to the segment occurs Operation IF ProtectedMode THEN IF SS is loaded THEN IF SegementSelector null THEN GP 0 FI ELSE IF Segment selector index is not within descripto...

Page 1558: ...tor SRC FI DEST Offset SRC Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty...

Page 1559: ...being loaded with a non null segment selector and the segment is marked not present PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is...

Page 1560: ...RC 16 bit address ELSE IF OperandSize 16 AND AddressSize 32 THEN temp EffectiveAddress SRC 32 bit address DEST temp 0 15 16 bit address ELSE IF OperandSize 32 AND AddressSize 16 THEN temp EffectiveAddress SRC 16 bit address DEST ZeroExtend temp 32 bit address ELSE IF OperandSize 32 AND AddressSize 32 THEN DEST EffectiveAddress SRC 32 bit address FI FI Opcode Instruction Description 8D r LEA r16 m ...

Page 1561: ...nal Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Protected Mode Exceptions UD If source operand is not a memory location Real Address Mode Exceptions UD If source operand is not a memory location Virtual 8086 Mode Exceptions UD If source operand is not a memory location ...

Page 1562: ...e calling procedure and remove any arguments pushed onto the stack by the procedure being returned from Operation IF StackAddressSize 32 THEN ESP EBP ELSE StackAddressSize 16 SP BP FI IF OperandSize 32 THEN EBP Pop ELSE OperandSize 16 BP Pop FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Ne...

Page 1563: ... Exit Continued Real Address Mode Exceptions GP If the EBP register points to a location outside of the effective address space from 0 to 0FFFFH Virtual 8086 Mode Exceptions GP 0 If the EBP register points to a location outside of the effective address space from 0 to 0FFFFH ...

Page 1564: ...4 262 Volume 4 Base IA 32 Instruction Reference LES Load Full Pointer See entry for LDS LES LFS LGS LSS ...

Page 1565: ...Volume 4 Base IA 32 Instruction Reference 4 263 LFS Load Full Pointer See entry for LDS LES LFS LGS LSS ...

Page 1566: ...operand is not used and the high order byte of the base address in the GDTR or IDTR is filled with zeros The LGDT and LIDT instructions are used only in operating system software they are not used in application programs They are the only instructions that directly load a linear address that is not a segment relative address and a limit in protected mode They are commonly executed in real address ...

Page 1567: ...ccess memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs Real Address Mode Exceptions UD If source operand is not a memory location GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit...

Page 1568: ...4 266 Volume 4 Base IA 32 Instruction Reference LGS Load Full Pointer See entry for LDS LES LFS LGS LSS ...

Page 1569: ...he LAR VERR VERW or LSL instructions cause a general protection exception GP The operand size attribute has no effect on this instruction The LLDT instruction is provided for use in operating system software it should not be used in application programs Also this instruction can only be executed in protected mode Operation IF Itanium System Environment THEN IA 32_Intercept INST LLDT IF SRC Offset ...

Page 1570: ...T is not a Local Descriptor Table Segment selector is beyond GDT limit SS 0 If a memory operand effective address is outside the SS segment limit NP selector If the LDT descriptor is not present PF fault code If a page fault occurs Real Address Mode Exceptions UD The LLDT instruction is not recognized in real address mode Virtual 8086 Mode Exceptions UD The LLDT instruction is recognized in virtua...

Page 1571: ...Volume 4 Base IA 32 Instruction Reference 4 269 LIDT Load Interrupt Descriptor Table Register See entry for LGDT LIDT Load Global Descriptor Table Register Load Interrupt Descriptor Table Register ...

Page 1572: ...ot be used in application programs In protected or virtual 8086 mode it can only be executed at CPL 0 This instruction is provided for compatibility with the Intel 286 processor programs and procedures intended to run on processors more recent than the Intel 286 should use the MOV control registers instruction to load the machine status word This instruction is a serializing instruction Operation ...

Page 1573: ...nd effective address is outside the CS DS ES FS or GS segment limit Virtual 8086 Mode Exceptions GP 0 If the current privilege level is not 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs ...

Page 1574: ...d by the alignment of the memory field Memory locking is observed for arbitrarily misaligned fields Operation IF Itanium System Environment AND External_Bus_Lock_Required AND DCR lc THEN IA 32_Intercept LOCK AssertLOCK DurationOfAccompaningInstruction Flags Affected None Additional Itanium System Environment Exceptions Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Dat...

Page 1575: ...ruction not listed in the Description section above Other exceptions can be generated by the instruction that the LOCK prefix is being applied to Virtual 8086 Mode Exceptions UD If the LOCK prefix is used with an instruction not listed in the Description section above Other exceptions can be generated by the instruction that the LOCK prefix is being applied to ...

Page 1576: ...ister is decremented The ESI register is incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The LODS LODSB LODSW and LODSD instructions can be preceded by the REP prefix for block loads of ECX bytes words or doublewords More often however these instructions are used within a LOOP construct because further processing of the data moved into...

Page 1577: ...ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outsid...

Page 1578: ...the instruction pointer Offsets of 128 to 127 are allowed with this instruction Some forms of the loop instruction LOOPcc also accept the ZF flag as a condition for terminating the loop before the count reaches zero With these forms of the instruction a condition code cc is associated with each instruction to indicate the condition being tested for Here the LOOPcc instruction itself does not affec...

Page 1579: ...d DEST IF OperandSize 16 THEN EIP EIP AND 0000FFFFH FI IF Itanium System Environment AND PSR tb THEN IA_32_Exception Debug ELSE Terminate loop and continue program execution at EIP FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort IA_32_Exception Taken Branch Debug Exception if PSR tb is 1 Protected Mode Exceptions GP 0 If the...

Page 1580: ...t raw limit left 12 bits and filling the low order 12 bits with 1s When the operand size is 32 bits the 32 bit byte limit is stored in the destination operand When the operand size is 16 bits a valid 32 bit limit is computed however the upper 16 bits are truncated and only the low order 16 bits are loaded into the destination operand This instruction performs the following checks before it loads t...

Page 1581: ...OR 00000FFFH FI IF OperandSize 32 THEN DEST temp ELSE OperandSize 16 DEST temp AND FFFFH FI FI Flags Affected The ZF flag is set to 1 if the segment limit is loaded successfully otherwise it is cleared to 0 Type Name Valid 0 Reserved No 1 Available 16 bit TSS Yes 2 LDT Yes 3 Busy 16 bit TSS Yes 4 16 bit call gate No 5 16 bit 32 bit task gate No 6 16 bit trap gate No 7 16 bit interrupt gate No 8 Re...

Page 1582: ... Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory ref...

Page 1583: ...Volume 4 Base IA 32 Instruction Reference 4 281 LSS Load Full Pointer See entry for LDS LES LFS LGS LSS ...

Page 1584: ...n initialization code to establish the first task to be executed The operand size attribute has no effect on this instruction Operation IF Itanium System Environment THEN IA 32_Intercept INST LTR IF SRC Offset descriptor table limit OR IF SRC type global THEN GP segment selector FI Reat segment descriptor IF segment descriptor is not for an available TSS THEN GP segment selector FI IF segment desc...

Page 1585: ... If the selector points to LDT or is beyond the GDT limit NP selector If the TSS is marked not present SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs Real Address Mode Exceptions UD The LTR instruction is not recognized in real address mode Virtual 8086 Mode Exceptions UD The LTR instruction is not recognized in virtual 8086 mode ...

Page 1586: ...The MOV instruction cannot be used to load the CS register Attempting to do so results in an invalid opcode exception UD To load the CS register use the RET instruction Opcode Instruction Description 88 r MOV r m8 r8 Move r8 to r m8 89 r MOV r m16 r16 Move r16 to r m16 89 r MOV r m32 r32 Move r32 to r m32 8A r MOV r8 r m8 Move r m8 to r8 8B r MOV r16 r m16 Move r m16 to r16 8B r MOV r32 r m32 Move...

Page 1587: ...System Environment MOV to SS results in a IA 32_Intercept SystemFlag trap after the instruction completes This operation allows a stack pointer to be loaded into the ESP register with the next instruction MOV ESP stack pointer value before an interrupt occurs The LSS instruction offers a more efficient method of loading the SS and ESP registers When moving data in 32 bit mode between a segment reg...

Page 1588: ...ister Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If attempt is made to load SS register with null segment selector If the destina...

Page 1589: ... alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 UD If attempt is made to load the CS register Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit UD If attempt is made to load the CS register Virtua...

Page 1590: ...in the control registers PE and PG in register CR0 and PGE PSE and PAE in register CR4 all TLB entries are flushed including global entries This operation is implementation specific for the Pentium Pro processor Software should not depend on this functionality in future Intel architecture processors If the PG flag is set to 1 and control register CR4 is written to set the PAE flag to 1 to enable t...

Page 1591: ... Protected Mode Exceptions GP 0 If the current privilege level is not 0 If an attempt is made to write a 1 to any reserved bit in CR4 If an attempt is made to write reserved bits in the page directory pointers table used in the extended physical addressing mode when the PAE flag in control register CR4 and the PG flag in control register CR0 are set to 1 Real Address Mode Exceptions GP If an attem...

Page 1592: ...in an undefined opcode UD exception At the opcode level the reg field within the ModR M byte specifies which of the debug registers is loaded or read The two bits in the mod field are always 11 The r m field specifies the general purpose register loaded or read Operation IF Itanium System Environment THEN IA 32_Intercept INST MOVDR IF DE 1 and SRC or DEST DR4 or DR5 THEN UD ELSE DEST SRC Flags Aff...

Page 1593: ...n debug register DR7 is set Real Address Mode Exceptions UD If the DE debug extensions bit of CR4 is set and a MOV instruction is executed involving DR4 or DR5 DB If any debug register is accessed while the GD flag in debug register DR7 is set Virtual 8086 Mode Exceptions GP 0 The debug registers cannot be loaded or read when in virtual 8086 mode ...

Page 1594: ...ters are incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register If the DF flag is 0 the ESI and EDI register are incremented if the DF flag is 1 the ESI and EDI registers are decremented The registers are incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The MOVS MOVSB MOVSW and MOVSD inst...

Page 1595: ...rand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If ...

Page 1596: ...address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operan...

Page 1597: ...ot Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the...

Page 1598: ...6 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1599: ...otherwise the flags are set Operation IF byte operation THEN AX AL SRC ELSE word or doubleword operation IF OperandSize 16 THEN DX AX AX SRC ELSE OperandSize 32 EDX EAX EAX SRC FI FI Flags Affected The OF and CF flags are cleared to 0 if the upper half of the result is 0 otherwise they are set to 1 The SF ZF AF and PF flags are undefined Additional Itanium System Environment Exceptions Itanium Reg...

Page 1600: ...nment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or G...

Page 1601: ...iss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS ...

Page 1602: ...86 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1603: ...yte instruction that takes up space in the instruction stream but does not affect the machine context except the EIP register The NOP instruction performs no operation no registers are accessed and no faults are generated Flags Affected None Exceptions All Operating Modes None Opcode Instruction Description 90 NOP No operation ...

Page 1604: ...ata Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination operand points to a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If...

Page 1605: ...86 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1606: ...Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Opcode Instruction Description 0C ib OR AL imm8 AL OR imm8 0D iw OR AX imm16 AX OR imm16 0D i...

Page 1607: ... occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside ...

Page 1608: ...mapped into the 64 bit virtual address pointed to by the IOBase register with four ports per 4K byte virtual page Operating systems can utilize TLBs in the Itanium architecture to grant or deny permission to any four I O ports The I O port space can be mapped into any arbitrary 64 bit physical memory location by operating system code If CFLG io is 1 and CPL IOPL the TSS is consulted for I O permis...

Page 1609: ...Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault IA_32_Exception Debug traps for data breakpoints and single step IA_32_Exception Alignment faults GP 0 Referenced Port is to an unimplemented virtual address or PSR dt is zero Protected Mode Exceptions...

Page 1610: ...r is incremented if the DF flag is 1 the EDI register is decremented The ESI register is incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The OUTS OUTSB OUTSW and OUTSD instructions can be preceded by the REP prefix for block input of ECX bytes words or doublewords See REP REPE REPZ REPNE REPNZ Repeat String Operation Prefix on page 4 3...

Page 1611: ...ermission bitmap the bitmap is not referenced If the referenced I O port is mapped to an unimplemented virtual address via the I O Base register or if data translations are disabled PSR dt is 0 a GPFault is generated on the referencing OUTS instruction Operation IF PE 1 AND VM 1 OR CPL IOPL THEN Protected mode or virtual 8086 mode with CPL IOPL IF CFLG io AND Any I O Permission Bit for I O port be...

Page 1612: ...vel IOPL and any of the corresponding I O permission bits in TSS for the I O port being accessed is 1 and when CFLG io is 1 If the destination is located in a nonwritable segment If a memory operand effective address is outside the limit of the ES segment If the ES register contains a null segment selector If an illegal memory operand effective address in the ES segments is given PF fault code If ...

Page 1613: ...S FS or GS register without causing a general protection fault However any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception GP In this situation no memory reference occurs and the saved value of the segment register is null The POP instruction cannot pop a value into the CS register To load the CS regis...

Page 1614: ...estination Operation IF StackAddrSize 32 THEN IF OperandSize 32 THEN DEST SS ESP copy a doubleword ESP ESP 4 ELSE OperandSize 16 DEST SS ESP copy a word ESP ESP 2 FI ELSE StackAddrSize 16 IF OperandSize 16 THEN DEST SS SP copy a word SP SP 2 ELSE OperandSize 32 DEST SS SP copy a doubleword SP SP 4 FI FI Loading a segment register while in protected mode results in special checks and actions as des...

Page 1615: ...S Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If attempt is made to load SS register with null seg...

Page 1616: ... SS selector If the SS register is being loaded and the segment pointed to is marked not present NP If the DS ES FS or GS register is being loaded and the segment pointed to is marked not present PF fault code If a page fault occurs AC 0 If an unaligned memory reference is made while the current privilege level is 3 and alignment checking is enabled Real Address Mode Exceptions GP If a memory oper...

Page 1617: ...e the same opcode The POPA instruction is intended for use when the operand size attribute is 16 and the POPAD instruction for when the operand size attribute is 32 Some assemblers may force the operand size to 16 when POPA is used and to 32 when POPAD is used Others may treat these mnemonics as synonyms POPA POPAD and use the current setting of the operand size attribute to determine the size of ...

Page 1618: ...ghts Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions SS 0 If the starting or ending stack address is not within the stack segment PF fault code If a page fault occurs Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mod...

Page 1619: ...or in real address mode which is equivalent to privilege level 0 all the non reserved flags in the EFLAGS register except the VIP and VIF flags can be modified The VIP and VIF flags are cleared When operating in protected mode but with a privilege level greater an 0 all the flags can be modified except the IOPL field and the VIP and VIF flags Here the IOPL flags are masked and the VIP and VIF flag...

Page 1620: ...d VIF can be modified IOPL is masked ELSE OperandSize 16 EFLAGS 15 0 Pop All non reserved bits except IOPL can be modified IOPL is masked FI FI ELSE In Virtual 8086 Mode IF IOPL 3 THEN IF OperandSize 32 THEN EFLAGS Pop All non reserved bits except VM RF IOPL VIP and VIF can be modified VM RF IOPL VIP and VIF are masked ELSE EFLAGS 15 0 Pop All non reserved bits except IOPL can be modified IOPL is ...

Page 1621: ...Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault IA 32_Intercept System Flag Intercept Trap if CFLG ii is 1 and the IF flag changes state or if the AC RF or TF changes state Protected Mode Exceptions SS 0 If the top of stack is not within the stack segment Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS ...

Page 1622: ...pushes the value of the ESP register as it existed before the instruction was executed Thus if a PUSH instruction uses a memory operand in which the ESP register is used as a base register for computing the operand address the effective address of the operand is computed before the ESP register is decremented In the real address mode if the ESP or SP register is 1 when the PUSH instruction is exec...

Page 1623: ...Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the c...

Page 1624: ...d and an unaligned memory reference is made Intel Architecture Compatibility For Intel architecture processors from the Intel 286 on the PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed This is also true in the real address and virtual 8086 modes For the Intel 8086 processor the PUSH SP instruction pushes the new value of the SP register t...

Page 1625: ... is 16 and the PUSHAD instruction for when the operand size attribute is 32 Some assemblers may force the operand size to 16 when PUSHA is used and to 32 when PUSHAD is used Others may treat these mnemonics as synonyms PUSHA PUSHAD and use the current setting of the operand size attribute to determine the size of values to be pushed from the stack regardless of the mnemonic used In the real addres...

Page 1626: ...ta Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions SS 0 If the starting or ending stack address is outside the stack segment limit PF fault code If a page fault occurs Real Address Mode Exceptions GP If the ESP or SP register contains 7 9 11 13 or 15 Vi...

Page 1627: ...e mnemonics as synonyms PUSHF PUSHFD and use the current setting of the operand size attribute to determine the size of values to be pushed from the stack regardless of the mnemonic used When the I O privilege level IOPL is less than 3 in virtual 8086 mode the PUSHF PUSHFD instructions causes a general protection exception GP The IOPL is altered only when executing at privilege level 0 The interru...

Page 1628: ...pFlags FI FI FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault ...

Page 1629: ...times D1 3 RCR r m32 1 Rotate 33 bits CF r m32 right once D3 3 RCR r m32 CL Rotate 33 bits CF r m32 right CL times C1 3 ib RCR r m32 imm8 Rotate 33 bits CF r m32 right imm8 times D0 0 ROL r m8 1 Rotate 8 bits r m8 left once D2 0 ROL r m8 CL Rotate 8 bits r m8 left CL times C0 0 ib ROL r m8 imm8 Rotate 8 bits r m8 left imm8 times D1 0 ROL r m16 1 Rotate 16 bits r m16 left once D3 0 ROL r m16 CL Rot...

Page 1630: ...The RCL and RCR instructions include the CF flag in the rotation The RCL instruction shifts the CF flag into the least significant bit and shifts the most significant bit into the CF flag The RCR instruction shifts the CF flag into the most significant bit and shifts the least significant bit into the CF flag For the ROL and ROR instructions the original value of the CF flag is not a part of the r...

Page 1631: ...pCOUNT 1 OD IF COUNT 1 IF COUNT 1 THEN OF MSB DEST XOR MSB 1 DEST ELSE OF is undefined FI Flags Affected The CF flag contains the value of the bit shifted into it The OF flag is affected only for single bit rotates see Description above it is undefined for multi bit rotates The SF ZF AF and PF flags are not affected Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register C...

Page 1632: ...e current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS...

Page 1633: ...eption The MSRs control functions for testability execution tracing performance monitoring and machine check errors The CPUID instruction should be used to determine whether MSRs are supported EDX 5 1 before using this instruction See model specific instructions for all the MSRs that can be written to with this instruction and their addresses Operation IF Itanium System Environment THEN IA 32_Inte...

Page 1634: ...on is not recognized in virtual 8086 mode Intel Architecture Compatibility The MSRs and the ability to read them with the RDMSR instruction were introduced into the Intel architecture with the Pentium processor Execution of this instruction by an Intel architecture processor earlier than the Pentium processor results in an invalid opcode exception UD ...

Page 1635: ...oded number of interrupts received or number of cache loads The RDPMC instruction does not serialize instruction execution That is it does not imply that all the events caused by the preceding instructions have been completed or that events caused by subsequent instructions have not begun If an exact event count is desired software must use a serializing instruction such as the CPUID instruction b...

Page 1636: ...rent privilege level is not 0 and the PCE flag in the CR4 register is clear In IA 32 System Environment If the value in the ECX register does not match an implemented performance counter Real Address Mode Exceptions GP If the PCE flag in the CR4 register is clear In the IA 32 System Environment If the value in the ECX register does not match an implemented performance counter Virtual 8086 Mode Exc...

Page 1637: ...he Itanium System Environment PSR si and CR4 TSD restricts the use of the RDTSC instruction When PSR si is clear and CR4 TSD is clear the RDTSC instruction can be executed at any privilege level when PSR si is set or CR4 TSD is set the instruction can only be executed at privilege level 0 The RDTSC instruction is not serializing instruction Thus it does not necessarily wait until all previous inst...

Page 1638: ... 1 or CR4 TSD is 1 and the CPL is greater than 0 Protected Mode Exceptions GP 0 If the TSD flag in register CR4 is set and the CPL is greater than 0 For the IA 32 System Environment only Real Address Mode Exceptions GP If the TSD flag in register CR4 is set For the IA 32 System Environment only Virtual 8086 Mode Exceptions GP 0 If the TSD flag in register CR4 is set For the IA 32 System Environmen...

Page 1639: ...rom DS ESI to ES EDI F3 A5 REP MOVS m16 m16 Move ECX words from DS ESI to ES EDI F3 A5 REP MOVS m32 m32 Move ECX doublewords from DS ESI to ES EDI F3 6E REP OUTS DX r m8 Output ECX bytes from DS ESI to port DX F3 6F REP OUTS DX r m16 Output ECX words from DS ESI to port DX F3 6F REP OUTS DX r m32 Output ECX doublewords from DS ESI to port DX F3 AC REP LODS AL Load ECX bytes from DS ESI to AL F3 AD...

Page 1640: ...rom the exception or interrupt handler The source and destination registers point to the next string elements to be operated on the EIP register points to the string instruction and the ECX register has the value it held following the last successful iteration of the instruction This mechanism allows long string operations to proceed without affecting the interrupt response time of the system When...

Page 1641: ...t the status flags in the EFLAGS register Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit ...

Page 1642: ...ting a near return the processor pops the return instruction pointer offset from the top of the procedure stack into the EIP register and begins program execution at the new instruction pointer The CS register is unchanged When executing a far return the processor pops the return instruction pointer from the top of the procedure stack into the EIP register then pops the segment selector from the t...

Page 1643: ...anium System Environment AND PSR tb THEN IA_32_Exception Debug FI Real address mode or virtual 8086 mode IF PE 0 OR PE 1 AND VM 1 AND instruction far return THEN IF OperandSize 32 THEN IF top 12 bytes of stack not within stack limits THEN SS 0 FI EIP Pop CS Pop 32 bit pop high order 16 bits discarded ELSE OperandSize 16 IF top 6 bytes of stack not within stack limits THEN SS 0 FI tempEIP Pop tempE...

Page 1644: ...N OUTER PRIVILEGE LEVEL ELSE GOTO RETURN TO SAME PRIVILEGE LEVEL FI END FI RETURN SAME PRIVILEGE LEVEL IF the return instruction pointer is not within ther return code segment limit THEN GP 0 FI IF OperandSize 32 THEN EIP Pop CS Pop 32 bit pop high order 16 bits discarded ESP ESP SRC ELSE OperandSize 16 EIP Pop EIP EIP AND 0000FFFFH CS Pop 16 bit pop ESP ESP SRC FI IF Itanium System Environment AN...

Page 1645: ... pop segment descriptor information also loaded CS RPL CPL ESP ESP SRC tempESP Pop tempSS Pop 16 bit pop segment descriptor information also loaded segment descriptor information also loaded ESP tempESP SS tempSS FI FOR each of segment register ES FS GS and DS DO IF segment register points to data or non conforming code segment AND CPL segment descriptor DPL DPL in hidden part of segment register ...

Page 1646: ...e RPL of the code segment s segment selector If the return code segment is conforming and the segment selector s DPL greater than the RPL of the code segment s segment selector If the stack segment is not a writable data segment If the stack segment selector RPL is not equal to the RPL of the return code segment selector If the stack segment descriptor DPL is not equal to the RPL of the return cod...

Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...

Page 1648: ... a 32 KByte aligned address The contents of the model specific registers are not affected by a return from SMM See Chapter 9 in the Intel Architecture Software Developer s Manual Volume 3 for more information about SMM and the behavior of the RSM instruction Operation IF Itanium System Environment THEN IA 32_Intercept INST RSM ReturnFromSSM ProcessorState Restore SSMDump Flags Affected All Additio...

Page 1649: ... 3 and 5 in the EFLAGS registers are set as shown in the Operation below Operation EFLAGS SF ZF 0 AF 0 PF 1 CF AH Flags Affected The SF ZF AF PF and CF flags are loaded with values from the AH register Bits 1 3 and 5 of the EFLAGS register are set to 1 0 and 0 respectively Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Exceptions All Operating Mo...

Page 1650: ...d divide r m16 by 2 imm8 times D1 7 SAR r m32 1 Signed divide r m32 by 2 once D3 7 SAR r m32 CL Signed divide r m32 by 2 CL times C1 7 ib SAR r m32 imm8 Signed divide r m32 by 2 imm8 times D0 4 SHL r m8 1 Multiply r m8 by 2 once D2 4 SHL r m8 CL Multiply r m8 by 2 CL times C0 4 ib SHL r m8 imm8 Multiply r m8 by 2 imm8 times D1 4 SHL r m16 1 Multiply r m16 by 2 once D3 4 SHL r m16 CL Multiply r m16...

Page 1651: ...ion clears the most significant bit the SAR instruction sets or clears the most significant bit to correspond to the sign most significant bit of the original value in the destination operand In effect the SAR instruction fills the empty bit position s shifted value with the sign of the unshifted value The SAR and SHR instructions can be used to perform signed or unsigned division respectively of ...

Page 1652: ...ow for the various instructions IF COUNT 1 THEN IF instruction is SAL or SHL THEN OF MSB DEST XOR CF ELSE IF instruction is SAR THEN OF 0 ELSE instruction is SHR OF MSB tempDEST FI FI ELSE OF undefined FI Flags Affected The CF flag contains the value of the last bit shifted out of the destination operand it is undefined for SHL and SHR instructions count is greater than or equal to the size of the...

Page 1653: ...mory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segme...

Page 1654: ... of a multibyte or multiword subtraction in which a SUB instruction is followed by a SBB instruction Operation DEST DEST SRC CF Flags Affected The OF SF ZF AF PF and CF flags are set according to the result Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Opcode Instruction Description 1C ib SBB AL imm8 Subtract with borrow imm8 from AL 1D iw SBB A...

Page 1655: ...ontains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory opera...

Page 1656: ...F flag is 1 the EDI register is decremented The EDI register is incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The SCAS SCASB SCASW and SCASD instructions can be preceded by the REP prefix for block comparisons of ECX bytes words or doublewords More often however these instructions will be used in a LOOP construct that takes some acti...

Page 1657: ...ult Protected Mode Exceptions GP 0 If a memory operand effective address is outside the limit of the ES segment If the ES register contains a null segment selector If an illegal memory operand effective address in the ES segment is given PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real A...

Page 1658: ...r m8 Set byte if greater ZF 0 and SF OF 0F 9D SETGE r m8 Set byte if greater or equal SF OF 0F 9C SETL r m8 Set byte if less SF OF 0F 9E SETLE r m8 Set byte if less or equal ZF 1 or SF OF 0F 96 SETNA r m8 Set byte if not above CF 1 or ZF 1 0F 92 SETNAE r m8 Set byte if not above or equal CF 1 0F 93 SETNB r m8 Set byte if not below CF 0 0F 97 SETNBE r m8 Set byte if not below or equal CF 0 and ZF 0...

Page 1659: ...nt Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwr...

Page 1660: ...6 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1661: ...ith 0s The SGDT and SIDT instructions are useful only in operating system software however they can be used in application programs Operation IF Itanium System Environment THEN IA 32_Intercept INST SGDT SIDT IF instruction is IDTR THEN IF OperandSize 16 THEN DEST 0 15 IDTR Limit DEST 16 39 IDTR Base 24 bits of base address loaded DEST 40 47 0 ELSE 32 bit Operand Size DEST 0 15 IDTR Limit DEST 16 4...

Page 1662: ...ress Mode Exceptions UD If the destination operand is a register GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions UD If the destination operand is a register GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a m...

Page 1663: ...Volume 4 Base IA 32 Instruction Reference 4 361 SHL SHR Shift Instructions See entry for SAL SAR SHL SHR ...

Page 1664: ...t of the destination operand For a 1 bit shift the OF flag is set if a sign change occurred otherwise it is cleared If the count operand is 0 the flags are not affected The SHLD instruction is useful for multi precision shifts of 64 bits or more Operation COUNT COUNT MOD 32 SIZE OperandSize IF COUNT 0 THEN no operation ELSE IF COUNT SIZE THEN Bad parameters DEST is undefined CF OF SF ZF AF PF are ...

Page 1665: ...bort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is ...

Page 1666: ...ination operand For a 1 bit shift the OF flag is set if a sign change occurred otherwise it is cleared If the count operand is 0 the flags are not affected The SHRD instruction is useful for multiprecision shifts of 64 bits or more Operation COUNT COUNT MOD 32 SIZE OperandSize IF COUNT 0 THEN no operation ELSE IF COUNT SIZE THEN Bad parameters DEST is undefined CF OF SF ZF AF PF are undefined ELSE...

Page 1667: ...lt Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment...

Page 1668: ...4 366 Volume 4 Base IA 32 Instruction Reference SIDT Store Interrupt Descriptor Table Register See entry for SGDT SIDT ...

Page 1669: ...ams Also this instruction can only be executed in protected mode Operation IF Itanium System Environment THEN IA 32_Intercept INST SLDT DEST LDTR SegmentSelector Flags Affected None Additional Itanium System Environment Exceptions IA 32_Intercept SLDT results in an IA 32 Intercept Protected Mode Exceptions GP 0 If the destination is located in a nonwritable segment If a memory operand effective ad...

Page 1670: ...erence SLDT Store Local Descriptor Table Register Continued Real Address Mode Exceptions UD The SLDT instruction is not recognized in real address mode Virtual 8086 Mode Exceptions UD The SLDT instruction is not recognized in virtual 8086 mode ...

Page 1671: ...286 processor programs and procedures intended to run on processors more recent than the Intel 286 should use the MOV control registers instruction to load the machine status word Operation IF Itanium System Environment THEN IA 32_Intercept INST SMSW DEST CR0 15 0 MachineStatusWord Flags Affected None Additional Itanium System Environment Exceptions IA 32_Intercept Mandatory Instruction Intercept ...

Page 1672: ... If a memory operand effective address is outside the CS DS ES FS or GS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1673: ...STC Set Carry Flag Description Sets the CF flag in the EFLAGS register Operation CF 1 Flags Affected The CF flag is set The OF ZF SF AF and PF flags are unaffected Exceptions All Operating Modes None Opcode Instruction Description F9 STC Set CF flag ...

Page 1674: ...e EFLAGS register When the DF flag is set to 1 string operations decrement the index registers ESI and or EDI Operation DF 1 Flags Affected The DF flag is set The CF OF ZF SF AF and PF flags are unaffected Operation DF 1 Exceptions All Operating Modes None Opcode Instruction Description FD STD Set DF flag ...

Page 1675: ...LAG if The IF flag and the STI and CLI instruction have no affect on the generation of exceptions and NMI interrupts The following decision table indicates the action of the STI instruction bottom of the table depending on the processor s mode of operating and the CPL and IOPL of the currently running program or procedure top of the table Notes XDon t care NAction in Column 1 not taken YAction in ...

Page 1676: ...L 3 THEN IF 1 ELSE IF CR4 VME 0 THEN GP 0 ELSE IF VIP 1 virtual interrupt is pending THEN GP 0 ELSE VIF 1 FI FI FI FI FI FI IF Itanium System Environment AND CFLG ii AND IF OLD_IF THEN IA 32_Intercept System_Flag STI Flags Affected The IF flag is set to 1 Additional Itanium System Environment Exceptions IA 32_Intercept System Flag Intercept Trap if CFLG ii is 1 and the IF flag changes state Protec...

Page 1677: ... Instruction Reference 4 375 STI Set Interrupt Flag Continued Real Address Mode Exceptions None Virtual 8086 Mode Exceptions GP 0 If the CPL is greater has less privilege than the IOPL of the current program or procedure ...

Page 1678: ...ag in the EFLAGS register If the DF flag is 0 the EDI register is incremented if the DF flag is 1 the EDI register is decremented The EDI register is incremented or decremented by 1 for byte operations by 2 for word operations or by 4 for doubleword operations The STOS STOSB STOSW and STOSD instructions can be preceded by the REP prefix for block loads of ECX bytes words or doublewords More often ...

Page 1679: ...onwritable segment If a memory operand effective address is outside the limit of the ES segment If the ES register contains a null segment selector PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS...

Page 1680: ...peration IF Itanium System Environment THEN IA 32_Intercept INST STR DEST TR SegmentSelector Flags Affected None Additional Itanium System Environment Exceptions IA 32_Intercept Mandatory Instruction Intercept Protected Mode Exceptions GP 0 If the destination is a memory operand that is located in a nonwritable segment or if the effective address is outside the CS DS ES FS or GS segment limit If t...

Page 1681: ... result Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Opcode Instruction Descripti...

Page 1682: ...ccurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside th...

Page 1683: ...TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Opcode Instruction Description A8 ib TEST AL imm8 AND imm8 with AL set SF ZF PF according to result A9 iw TEST AX imm16 AND imm16 with AX set SF ZF PF according to result A9 id...

Page 1684: ...ing is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment l...

Page 1685: ...valid opcode exception this instruction is the same as the NOP instruction Operation IF Itanium System Environment THEN IA 32_Intercept INST 0F0B UD Generates invalid opcode exception Flags Affected None Additional Itanium System Environment Exceptions IA 32_Intercept Mandatory Instruction Intercept Exceptions All Operating Modes UD Instruction is guaranteed to raise an invalid opcode exception in...

Page 1686: ...For the VERR instruction the segment must be readable the VERW instruction the segment must be a writable data segment If the segment is not a conforming code segment the segment s DPL must be greater than or equal to have less or the same privilege as both the CPL and the segment selector s RPL The validation performed is the same as if the segment were loaded into the DS ES FS or GS register and...

Page 1687: ...Data Dirty Bit Fault Protected Mode Exceptions The only exceptions generated for these instructions are those related to illegal addressing of the source operand GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit If the DS ES FS or GS register is used to access memory and it contains a null segment selector SS 0 If a memory operand effective address is outsid...

Page 1688: ...ny unmasked floating point exceptions the instruction may raise are handled before the processor can modify the instruction s results Operation CheckPendingUnmaskedFloatingPointExceptions FPU Flags Affected The C0 C1 C2 and C3 flags are undefined Floating point Exceptions None Protected Mode Exceptions NM MP and TS in CR0 is set Real Address Mode Exceptions NM MP and TS in CR0 is set Virtual 8086 ...

Page 1689: ...ction When the processor is running in protected mode the CPL of a program or procedure must be 0 to execute this instruction This instruction is also a serializing instruction In situations where cache coherency with main memory is not a concern software can use the INVD instruction Operation IF Itanium System Environment THEN IA 32_Intercept INST WBINVD WriteBack InternalCaches Flush InternalCac...

Page 1690: ...The WBINVD instruction cannot be executed at the virtual 8086 mode Intel Architecture Compatibility The WDINVD instruction implementation dependent its function may be implemented differently on future Intel architecture processors The instruction is not supported on Intel architecture processors earlier than the Intel486 processor ...

Page 1691: ... Manual Volume 3 The MSRs control functions for testability execution tracing performance monitoring and machine check errors See model specific instructions for all the MSRs that can be written to with this instruction and their addresses The WRMSR instruction is a serializing instruction The CPUID instruction should be used to determine whether MSRs are supported EDX 5 1 before using this instru...

Page 1692: ...on is not recognized in virtual 8086 mode Intel Architecture Compatibility The MSRs and the ability to read them with the WRMSR instruction were introduced into the Intel architecture with the Pentium processor Execution of this instruction by an Intel architecture processor earlier than the Pentium processor results in an invalid opcode exception UD ...

Page 1693: ...y Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault IA 32_Intercept Lock Intercept If an external atomic bus lock is required to complete this operation and DCR lc is 1 no atomic transaction occurs this instruction is faulted and an IA 32_Intercept Lock fault is generated The software lock handler is responsible for the emulation of this instr...

Page 1694: ...memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made Intel Architecture Compatibility Intel architecture processors earlier than the Intel486 processor do not recognize this instruc...

Page 1695: ...of the BSWAP instruction for 16 bit operands Operation IF Itanium System Environment AND External_Atomic_Lock_Required AND DCR lc THEN IA 32_Intercept LOCK XCHG TEMP DEST DEST SRC SRC TEMP Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault D...

Page 1696: ... or GS register contains a null segment selector SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS ...

Page 1697: ...EBX ZeroExtend AL FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit F...

Page 1698: ... segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1699: ...aults NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Opcode Instruction Description 34 ib XOR AL imm8 AL XOR imm8 35 iw XOR AX imm16 AX XOR imm16 35 id...

Page 1700: ... occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS If a memory operand effective address is outside the SS segment limit Virtual 8086 Mode Exceptions GP 0 If a memory operand effective address is outside ...

Page 1701: ... MMX Technology Instruction Reference 4 399 IA 32 Intel MMX Technology Instruction Reference 3 This section lists the IA 32 MMX technology instructions designed to increase performance of multimedia intensive applications ...

Page 1702: ...subroutines that may execute floating point instructions If a floating point instruction loads one of the registers in the FPU register stack before the FPU tag word has been reset by the EMMS instruction a floating point stack overflow can occur that will result in a floating point exception or incorrect result Operation FPUTagWord FFFFH Flags Affected None Additional Itanium System Environment E...

Page 1703: ...lue is written to the low order 32 bits of the 64 bit MMX technology register and zero extended to 64 bits see Figure 3 1 When the source operand is an MMX technology register the low order 32 bits of the MMX technology register are written to the 32 bit general purpose register or 32 bit memory location selected with the destination operand Operation IF DEST is MMX register THEN DEST ZeroExtend S...

Page 1704: ...ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand...

Page 1705: ...Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit...

Page 1706: ...t occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outs...

Page 1707: ...00H the saturated word value of 7FFFH or 8000H respectively is stored into the destination The destination operand for either the PACKSSWB or PACKSSDW instruction must be an MMX technology register the source operand may be either an MMX technology register or a quadword memory location Operation IF instruction is PACKSSWB THEN DEST 7 0 SaturateSignedWordToSignedByte DEST 15 0 DEST 15 8 SaturateSi...

Page 1708: ...ata TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit UD If EM in CR0 is se...

Page 1709: ... Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1710: ...rdToUnsignedByte DEST 47 32 DEST 31 24 SaturateSignedWordToUnsignedByte DEST 63 48 DEST 39 32 SaturateSignedWordToUnsignedByte SRC 15 0 DEST 47 40 SaturateSignedWordToUnsignedByte SRC 31 16 DEST 55 48 SaturateSignedWordToUnsignedByte SRC 47 32 DEST 63 56 SaturateSignedWordToUnsignedByte SRC 63 48 Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Re...

Page 1711: ...alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effect...

Page 1712: ...estination operand and therefore the result wraps around The PADDW instruction adds the words of the source operand to the words of the destination operand and stores the results to the destination operand When an individual result is too large to be represented in 16 bits the lower 16 bits of the result are written to the destination operand and therefore the result wraps around The PADDD instruc...

Page 1713: ...C 15 8 DEST 23 16 DEST 23 16 SRC 23 16 DEST 31 24 DEST 31 24 SRC 31 24 DEST 39 32 DEST 39 32 SRC 39 32 DEST 47 40 DEST 47 40 SRC 47 40 DEST 55 48 DEST 55 48 SRC 55 48 DEST 63 56 DEST 63 56 SRC 63 56 ELSEIF instruction is PADDW THEN DEST 15 0 DEST 15 0 SRC 15 0 DEST 31 16 DEST 31 16 SRC 31 16 DEST 47 32 DEST 47 32 SRC 47 32 DEST 63 48 DEST 63 48 SRC 63 48 ELSE instruction is PADDD DEST 31 0 DEST 31...

Page 1714: ...ment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective a...

Page 1715: ...perand to the signed words of the destination operand and stores the results to the destination operand When an individual result is beyond the range of a signed word that is greater than 7FFFH or less than 8000H the saturated word value of 7FFFH or 8000H respectively is written to the destination operand Operation IF instruction is PADDSB THEN DEST 7 0 SaturateToSignedByte DEST 7 0 SRC 7 0 DEST 1...

Page 1716: ...Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit UD If EM in CR0 is set NM If TS in ...

Page 1717: ...rtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1718: ...he destination operand When an individual result is beyond the range of an unsigned byte that is greater than FFH the saturated unsigned byte value of FFH is written to the destination operand The PADDUSW instruction adds the unsigned words of the source operand to the unsigned words of the destination operand and stores the results to the destination operand When an individual result is beyond th...

Page 1719: ...eToUnsignedWord DEST 47 32 SRC 47 32 DEST 63 48 SaturateToUnsignedWord DEST 63 48 SRC 63 48 FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumptio...

Page 1720: ...pace from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligne...

Page 1721: ...ted None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data ...

Page 1722: ...hecking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address...

Page 1723: ... must be an MMX technology register Operation DEST NOT DEST AND SRC Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault...

Page 1724: ...t checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective addr...

Page 1725: ...tion compares the words in the destination operand to the corresponding words in the source operand with the words in the destination operand being set according to the results The PCMPEQD instruction compares the doublewords in the destination operand to the corresponding doublewords in the source operand with the doublewords in the destination operand being set according to the results Opcode In...

Page 1726: ...FFFFFFFFH ELSE DEST 31 0 0 IF DEST 63 32 SRC 63 32 THEN DEST 63 32 FFFFFFFFH ELSE DEST 63 32 0 FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consump...

Page 1727: ...art of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault oc...

Page 1728: ...results The PCMPGTW instruction compares the signed words in the destination operand to the corresponding signed words in the source operand with the words in the destination operand being set according to the results The PCMPGTD instruction compares the signed doublewords in the destination operand to the corresponding signed doublewords in the source operand with the doublewords in the destinati...

Page 1729: ...nd third bytes in DEST and SRC IF DEST 63 48 SRC 63 48 THEN DEST 63 48 FFFFH ELSE DEST 63 48 0 ELSE instruction is PCMPGTD IF DEST 31 0 SRC 31 0 THEN DEST 31 0 FFFFFFFFH ELSE DEST 31 0 0 IF DEST 63 32 SRC 63 32 THEN DEST 63 32 FFFFFFFFH ELSE DEST 63 32 0 FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Registe...

Page 1730: ... AC 0 If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of t...

Page 1731: ...000000H only when all four words of both the source and destination operands are 8000H Operation DEST 31 0 DEST 15 0 SRC 15 0 DEST 31 16 SRC 31 16 DEST 63 32 DEST 47 32 SRC 47 32 DEST 63 48 SRC 63 48 Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nest...

Page 1732: ...gnment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective...

Page 1733: ...7 32 SRC 47 32 DEST 63 48 HighOrderWord DEST 63 48 SRC 63 48 Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data K...

Page 1734: ...ment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective a...

Page 1735: ...OrderWord DEST 47 32 SRC 47 32 DEST 63 48 LowOrderWord DEST 63 48 SRC 63 48 Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Mi...

Page 1736: ...ent checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective ad...

Page 1737: ...Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit...

Page 1738: ...t checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective addr...

Page 1739: ...ination operand to the left by the number of bits specified in the count operand the PSLLD instruction shifts each of the two doublewords of the destination operand and the PSLLQ instruction shifts the 64 bit quadword in the destination operand As the individual data elements are shifted left the empty low order bit positions are filled with zeros Opcode Instruction Description 0F F1 r PSLLW mm mm...

Page 1740: ...ternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit UD If EM in ...

Page 1741: ...Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1742: ...ogy register a 64 bit memory location or an 8 bit immediate The PSRAW instruction shifts each of the four words in the destination operand to the right by the number of bits specified in the count operand the PSRAD instruction shifts each of the two doublewords in the destination operand As the individual data elements are shifted right the empty high order bit positions are filled with the sign v...

Page 1743: ...ault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit UD If...

Page 1744: ...irtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1745: ...words of the destination operand to the right by the number of bits specified in the count operand the PSRLD instruction shifts each of the two doublewords of the destination operand and the PSRLQ instruction shifts the 64 bit quadword in the destination operand As the individual data elements are shifted right the empty high order bit positions are filled with zeros Opcode Instruction Description...

Page 1746: ...ternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a memory operand effective address is outside the CS DS ES FS or GS segment limit SS 0 If a memory operand effective address is outside the SS segment limit UD If EM in ...

Page 1747: ...Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned memory reference is made ...

Page 1748: ...n operand and therefore the result wraps around The PSUBW instruction subtracts the words of the source operand from the words of the destination operand and stores the results to the destination operand When an individual result is too large to be represented in 16 bits the lower 16 bits of the result are written to the destination operand and therefore the result wraps around The PSUBD instructi...

Page 1749: ... SRC 15 8 DEST 23 16 DEST 23 16 SRC 23 16 DEST 31 24 DEST 31 24 SRC 31 24 DEST 39 32 DEST 39 32 SRC 39 32 DEST 47 40 DEST 47 40 SRC 47 40 DEST 55 48 DEST 55 48 SRC 55 48 DEST 63 56 DEST 63 56 SRC 63 56 ELSEIF instruction is PSUBW THEN DEST 15 0 DEST 15 0 SRC 15 0 DEST 31 16 DEST 31 16 SRC 31 16 DEST 47 32 DEST 47 32 SRC 47 32 DEST 63 48 DEST 63 48 SRC 63 48 ELSE instruction is PSUBD DEST 31 0 DEST...

Page 1750: ...ignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effectiv...

Page 1751: ...yond the range of a signed byte that is greater than 7FH or less than 80H the saturated byte value of 7FH or 80H respectively is written to the destination operand The PSUBSW instruction subtracts the signed words of the source operand from the signed words of the destination operand and stores the results to the destination operand When an individual result is beyond the range of a signed word th...

Page 1752: ...d DEST 47 32 SRC 47 32 DEST 63 48 SaturateToSignedWord DEST 63 48 SRC 63 48 FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key...

Page 1753: ...e from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligned m...

Page 1754: ...and stores the results to the destination operand When an individual result is less than zero a negative value the saturated unsigned byte value of 00H is written to the destination operand The PSUBUSW instruction subtracts the unsigned words of the source operand from the unsigned words of the destination operand and stores the results to the destination operand When an individual result is less ...

Page 1755: ...ateToUnsignedWord DEST 47 32 SRC 47 32 DEST 63 48 SaturateToUnsignedWord DEST 63 48 SRC 63 48 FI Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumpt...

Page 1756: ... space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unalig...

Page 1757: ... the destination operand and writes them to the destination operand The PUNPCKHDQ instruction interleaves the high order doubleword of the source operand and the high order doubleword of the destination operand and writes them to the destination operand If the source operand is all zeros the result stored in the destination operand contains zero extensions of the high order data elements from the ...

Page 1758: ...ceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If...

Page 1759: ...space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unalign...

Page 1760: ...operand and writes them to the destination operand The PUNPCKLDQ instruction interleaves the low order doubleword of the source operand and the low order doubleword of the destination operand and writes them to the destination operand If the source operand is all zeros the result stored in the destination operand contains zero extensions of the high order data elements from the original value in t...

Page 1761: ...ons Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Protected Mode Exceptions GP 0 If a me...

Page 1762: ...pace from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception PF fault code If a page fault occurs AC 0 If alignment checking is enabled and an unaligne...

Page 1763: ...T XOR SRC Flags Affected None Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Abort Itanium Mem FaultsVHPT Data Fault Nested TLB Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data A...

Page 1764: ...ent checking is enabled and an unaligned memory reference is made while the current privilege level is 3 Real Address Mode Exceptions GP If any part of the operand lies outside of the effective address space from 0 to FFFFH UD If EM in CR0 is set NM If TS in CR0 is set MF If there is a pending FPU exception Virtual 8086 Mode Exceptions GP If any part of the operand lies outside of the effective ad...

Page 1765: ...l MMX technology data types These include ability to stream data into and from the processor while minimizing pollution of the caches and the ability to prefetch data before it is actually used The main focus of packed floating point instructions is the acceleration of 3D geometry The new definition also contains additional SIMD Integer instructions to accelerate 3D rendering and video encoding an...

Page 1766: ...E architecture is 100 compatible with the IEEE Standard 754 for Binary Floating point Arithmetic The SSE instructions are accessible from all IA execution modes Protected mode Real address mode and Virtual 8086 mode New Features The Intel SSE architecture provides the following new features while maintaining backward compatibility with all existing Intel architecture microprocessors IA application...

Page 1767: ... is a new control status register MXCSR which is used to mask unmask numerical exception handling to set rounding modes to set flush to zero mode and to view status flags 4 6 Extended Instruction Set The Intel SSE architecture supplies a rich set of instructions that operate on either all or the least significant pairs of packed data operands in parallel The packed instructions operate on a pair o...

Page 1768: ... floating point operands the upper three fields are passed through from the source operand Packed Scalar Multiplication and Division The MULPS Multiply packed single precision floating point instruction multiplies four pairs of packed single precision floating point operands The MULSS Multiply scalar single precision floating point instruction multiplies the least significant pair of packed single...

Page 1769: ...MAXSS Maximum scalar single precision floating point instructions returns the maximum of the least significant pair of packed single precision floating point numbers into the destination register the upper three fields are passed through from the source operand to the destination register The MINPS Minimum packed single precision floating point instruction returns the minimum of each pair of packe...

Page 1770: ...g point numbers and sets the ZF PF CF bits in the EFLAGS register the OF SF and AF bits are cleared The UCOMISS Unordered compare scalar single precision floating point ordered and set EFLAGS instruction compares the least significant pairs of packed single precision floating point numbers and sets the ZF PF CF bits in the EFLAGS register as described above the OF SF and AF bits are cleared 4 6 1 ...

Page 1771: ... but only the low order 64 bits are utilized by the instruction 4 6 1 5 Conversion Instructions These instructions support packed and scalar conversions between 128 bit SSE registers and either 64 bit integer MMX technology registers or 32 bit integer IA 32 registers The packed versions behave identically to original MMX technology instructions in the presence of x87 FP instructions including Tran...

Page 1772: ...uction converts the least significant single precision floating point number to a 32 bit signed integer in an Intel architecture 32 bit integer register when the conversion is inexact the rounded value according to the rounding mode in MXCSR is returned The CVTTSS2SI Convert truncate scalar single precision floating point to scalar 32 bit integer instruction is similar to CVTSS2SI except if the co...

Page 1773: ...t by one bit position The high order bits of each element are filled with the carry bits of the sums To prevent cumulative round off errors an averaging is performed The low order bit of each final shifted result is set to 1 if at least one of the two least significant bits of the intermediate unshifted shifted sum is 1 The PEXTRW Extract 16 bit word from MMX technology register instruction moves ...

Page 1774: ...s allow the programmer to prefetch data long before it s final use These instructions are not architectural since they do not update any architectural state and are specific to each implementation The programmer may have to tune his application for each implementation to take advantage of these instructions These instructions merely provide a hint to the hardware and they will not generate excepti...

Page 1775: ...WC the transaction will be weakly ordered and is subject to all WC memory semantics The non temporal store will not write allocate Different implementations may choose to collapse and combine these stores In general WC semantics require software to ensure coherence with respect to other processors and other system agents such as graphics cards Appropriate use of synchronization and a fencing opera...

Page 1776: ...oint values are represented identically both internally and in memory and are of the following form This is a change from x87 floating point which internally represents all numbers in 80 bit extended format This change implies that x87 FP libraries re written to use SSE instructions may not produce results that are identical to the those of the x87 FP implementation Real Numbers and Floating point...

Page 1777: ... number has three parts a sign a significand and an exponent Figure 4 9 shows the binary floating point format that SSE data uses This format conforms to the IEEE standard The sign is a binary value that indicates whether the number is positive 0 or negative 1 The significand has two parts a 1 bit binary integer also referred to as the J bit and a binary fraction The J bit is often not represented...

Page 1778: ...width To summarize a normalized real number consists of a normalized significand that represents a real number between 1 and 2 and an exponent that specifies the number s binary point 4 7 1 3 Biased Exponent The processor represents exponents in a biased form This means that a constant is added to the actual exponent so that the biased exponent is always a positive number The value of the biasing ...

Page 1779: ...the type of computation being performed The following sections describe these number and non number classes 4 7 1 5 Signed Zeros Zero can be represented as a 0 or a 0 depending on the sign bit Both encodings are equal in value The sign of a zero result depends on the operation being performed and the rounding mode being used Signed zeros have been provided to aid in implementing interval arithmeti...

Page 1780: ...on A denormalized number is computed through a technique called gradual underflow Table 4 2 gives an example of gradual underflow in the denormalization process Here the single real format is being used so the minimum exponent unbiased is 12610 The true result in this example requires an exponent of 12910 in order to have a normalized number Since 12910 is beyond the allowable exponent range the r...

Page 1781: ...e of an infinity as a source operand constitutes an invalid operation Whereas denormalized numbers represent an underflow condition the two infinity numbers represent the result of an overflow condition Here the normalized result of a computation has a biased exponent greater than the largest allowable exponent for the selected result format 4 7 1 8 NaNs Since NaNs are non numbers they are not par...

Page 1782: ... QNaN by setting the most significant fraction bit of the value to 1 The result is then stored in the destination operand and the invalid operation flag is set If the invalid operation mask is clear an invalid operation fault is signaled and no result is stored in the destination operand When a real operation or exception delivers a QNaN result the value of the result depends on the source operand...

Page 1783: ...nge of this data type Only the fraction part of the significand is encoded The integer is assumed to be 1 for all numbers except 0 and denormalized finite numbers The exponent of the single precision data type is encoded in biased format The biasing constant is 127 for the single precision format Table 4 3 Results of Operations with NAN Operands Source Operands NaN Result invalid operation excepti...

Page 1784: ...uctions There are sixty eight new instructions in SSE instruction set This chapter describes the packed and scalar floating point instructions in alphabetical order with a full description of each instruction The last two sections of this chapter describe the SIMD Integer instructions and the cacheability control instructions Table 4 4 Precision and Range of SSE Datatype Data Type Length Precision...

Page 1785: ...ory operand Operand Size 66H Reserved and may result in unpredictable behavior Segment Override 2EH 36H 3EH 26H 64H 65H Affects SSE instructions with mem operand Ignored by SSE instructions without mem operand Repeat Prefix F3H Affects SSE instructions Repeat NE Prefix F2H Reserved and may result in unpredictable behavior Lock Prefix 0F0H Generates invalid opcode exception Table 4 7 SIMD Integer I...

Page 1786: ...cified manner in which the processor handles this behavior and risks incompatibility with future processors 4 12 Notations Besides opcodes two kinds of notations are found which both describe information found in the ModR M byte 1 digit digit between 0 and 7 indicates that the instruction uses only the r m register and memory operand The reg field contains the digit that provides an extension to t...

Page 1787: ...xing byte When there is ambiguity xmm1 indicates the first source operand and xmm2 the second source operand Table 4 9 describes the naming conventions used in the SSE instruction mnemonics Table 4 9 Key to SSE Naming Convention Mnemonic Description PI Packed integer qword e g mm0 PS Packed single FP e g xmm0 SI Scalar integer e g eax SS Scalar single FP e g low 32 bits of xmm0 ...

Page 1788: ...for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCP...

Page 1789: ...CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if C...

Page 1790: ...ctive address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT...

Page 1791: ...e address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Dat...

Page 1792: ...g the comparison predicate specified by imm8 note that a subsequent computational instruction which uses this mask as an input operand will not generate a fault since a mask of all 0 s corresponds to a FP value of 0 0 and a mask of all 1 s corresponds to a FP value of qNaN Some of the comparisons can be achieved only through software emulation For these comparisons the programmer must swap the ope...

Page 1793: ... is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Predicate Descriptiona a The greater than greater than or equal not greater than and not greater than or equal relations...

Page 1794: ... in hardware require more than one instruction to emulate in software and therefore should not be implemented as pseudo ops For these the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact Bits 7 4 of the immediate field are reserved...

Page 1795: ...Note that a subsequent computational instruction which uses this mask as an input operand will not generate a fault since a mask of all 0 s corresponds to a FP value of 0 0 and a mask of all 1 s corresponds to a FP value of qNaN Some of the comparisons can be achieved only through software emulation For these comparisons the programmer must swap the operands copying registers when necessary to pro...

Page 1796: ...XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode AC for unaligned memory reference if the current privilege level is 3 PF fault code for a page fault Predicate Descriptiona a The greater than greater than or ...

Page 1797: ... in hardware require more than one instruction to emulate in software and therefore should not be implemented as pseudo ops For these the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact Bits 7 4 of the immediate field are reserved...

Page 1798: ... 0 for an illegal address in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD i...

Page 1799: ...Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments COMISS differs from UCOMISS in that it signals an invalid numeric exception when a source operand is either a qNaN or sNaN UCOMISS signals invalid only if a source operand is an sNaN The usage of Repeat F2H F3H and Operand Size 66H prefixes with COMISS is rese...

Page 1800: ...n CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception AC for unaligned memory reference XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmaske...

Page 1801: ...nent part of the corresponding x87 FP register However the use of a memory source operand with this instruction will not result in the above transition from x87 FP to MMX technology Prioritization for fault and assist behavior for CVTPI2PS is as follows Memory source 1 Invalid opcode CR0 EM 1 2 DNA CR0 TS 1 3 SS or GP for limit violation 4 PF page fault 5 SSE numeric fault i e precision Register s...

Page 1802: ... if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OS...

Page 1803: ...oritization for fault and assist behavior for CVTPS2PI is as follows Memory source 1 Invalid opcode CR0 EM 1 2 DNA CR0 TS 1 3 MF pending x87 FP fault signalled 4 After returning from MF x87 FP MMX technology transition 5 SS or GP for limit violation 6 PF page fault 7 SSE numeric fault i e invalid precision Register source 1 Invalid opcode CR0 EM 1 2 DNA CR0 TS 1 3 MF pending x87 FP fault signalled...

Page 1804: ...c exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit...

Page 1805: ... numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSF...

Page 1806: ...SXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 ...

Page 1807: ...e ones 1 s to the exponent part of the corresponding x87 FP register Prioritization for fault and assist behavior for CVTTPS2PI is as follows Memory source 1 Invalid opcode CR0 EM 1 2 DNA CR0 TS 1 3 MF pending x87 FP fault signalled 4 After returning from MF x87 FP MMX technology transition 5 SS or GP for limit violation 6 PF page fault 7 SSE numeric fault i e invalid precision Register source 1 I...

Page 1808: ...SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 ...

Page 1809: ...SE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 O...

Page 1810: ...xception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 ...

Page 1811: ...little endian byte order as arranged in memory with byte offset into row described by right column Opcode Instruction Description 0F AE 1 FXRSTOR m512byte Load FP Intel MMX technology and SSE state from m512byte 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Rsrvd CS IP FOP FTW FSW FCW 0 Reserved MXCSR Rsrvd DS DP 16 Reserved ST0 MM0 32 Reserved ST1 MM1 48 Reserved ST2 MM2 64 Reserved ST3 MM3 80 Reserved S...

Page 1812: ...rotection exception is signalled if the address is not aligned on 16 byte boundary Note that if AC is enabled and CPL is 3 signalling of AC is not guaranteed and may vary with implementation in all implementations where AC is not signalled a general protection fault will instead be signalled In addition the width of the alignment check when AC is enabled may also vary with implementation for insta...

Page 1813: ...Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Notes State saved with FXSAVE and restored with FRSTOR and vice versa will result in incorrect restoration of state in the processor The address size prefix will have the usual effect on address calculation but will ...

Page 1814: ...he state has been saved This instruction has been optimized to maximize floating point save performance The save data structure is as follows little endian byte order as arranged in memory with byte offset into row described by right column Opcode Instruction Description 0F AE 0 FXSAVE m512byte Store FP and Intel MMX technology state and SSE state to m512byte 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ...

Page 1815: ... TOS relative order which means that FR0 is always saved first followed by FR1 FR2 and so forth As an example if TOS 4 and only ST0 ST1 and ST2 are valid FSAVE saves the FTW field in the following format ST3 ST2 ST1 ST0 ST7 ST6 ST5 ST4 TOS 4 FR7 FR6 FR5 FR4 FR3 FR2 FR1 FR0 11 xx xx xx 11 11 11 11 where xx is one of 00 01 10 11 indicates an empty stack elements and the 00 01 and 10 indicate Valid Z...

Page 1816: ...on a 16 byte boundary FXSAVE generates a general protection exception FP Exceptions If AC exception detection is disabled a general protection exception is signalled if the address is not aligned on 16 byte boundary Note that if AC is enabled and CPL is 3 signalling of AC is not guaranteed and may vary with implementation in all implementations where AC is not signalled a general protection fault ...

Page 1817: ...s Bit Fault Data Dirty Bit Fault Notes State saved with FXSAVE and restored with FRSTOR and vice versa will result in incorrect restoration of state in the processor The address size prefix will have the usual effect on address calculation but will have no effect on the format of the FXSAVE image If there is a pending unmasked FP exception at the time FXSAVE is executed the sequence of FXSAVE FWAI...

Page 1818: ...ed These flags are cleared upon reset Bits 12 7 configure numerical exception masking an exception type is masked if the corresponding bit is set and it is unmasked if the bit is clear These enables are set upon reset meaning that all numerical exceptions are masked Bits 14 13 encode the rounding control which provides for the common round to nearest mode as well as directed rounding and true chop...

Page 1819: ...d bit 6 are defined as reserved and cleared attempting to write a non zero value to these bits using either the FXRSTOR or LDMXCSR instructions will result in a general protection exception The linear address corresponds to the address of the least significant byte of the referenced memory data FP Exceptions General protection fault if reserved bits are loaded with non zero values Numeric Exceptio...

Page 1820: ...TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments The usage of Repeat F2H F3H and Operand Size 66H prefixes with LDMXCSR is reserved Different processor implementations may handle this prefix differently Usage of this prefix with LDMXCSR risks incompati...

Page 1821: ...d are both zeros source2 xmm2 m128 would be returned If source2 xmm2 m128 is an sNaN this sNaN is forwarded unchanged to the destination i e a quieted version of the sNaN is not returned FP Exceptions General protection exception if not aligned on 16 byte boundary regardless of segment Numeric Exceptions Invalid including qNaN source operand Denormal Protected Mode Exceptions GP 0 for an illegal m...

Page 1822: ...Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments Note that if only one source is a NaN for these instructions the Src2 operand either NaN or real value is written to the result this differs from...

Page 1823: ... in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 ...

Page 1824: ...ne source is a NaN for these instructions the Src2 operand either NaN or real value is written to the result this differs from the behavior for other instructions as defined in Table 4 3 which is to always write the NaN to the result regardless of which source operand contains the NaN The upper three operands are still bypassed from the src1 operand as in all other scalar operations This approach ...

Page 1825: ...red are both zeros source2 xmm2 m128 would be returned If source2 xmm2 m128 is an sNaN this sNaN is forwarded unchanged to the destination i e a quieted version of the sNaN is not returned FP Exceptions General protection exception if not aligned on 16 byte boundary regardless of segment Numeric Exceptions Invalid including qNaN source operand Denormal Protected Mode Exceptions GP 0 for an illegal...

Page 1826: ...Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments Note that if only one source is a NaN for these instructions the Src2 operand either NaN or real value is written to the result this differs from...

Page 1827: ...an illegal address in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID...

Page 1828: ...ne source is a NaN for these instructions the Src2 operand either NaN or real value is written to the result this differs from the behavior for other instructions as defined in Table 4 3 which is to always write the NaN to the result regardless of which source operand contains the NaN The upper three operands are still bypassed from the src1 operand as in all other scalar operations This approach ...

Page 1829: ...d memory data When a memory address is indicated the 16 bytes of data at memory location m128 are loaded or stored When the register register form of this operation is used the content of the 128 bit source register is copied into 128 bit destination register FP Exceptions General protection exception if not aligned on 16 byte boundary regardless of segment Numeric Exceptions None Opcode Instructi...

Page 1830: ...stem Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments MOVAPS should be used when dealing with 16...

Page 1831: ...l Address Mode Exceptions UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Comments The usage of Repeat F2H F3H and Operand Size 66H prefixes with MOVHLPS is reserved Diff...

Page 1832: ...for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1...

Page 1833: ...ts VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Comments The usage of Repeat Prefixes F2H F3H with MOVHPS is reserved Different processor implementations may handle this prefix differently Usage of this prefix with ...

Page 1834: ...ess Mode Exceptions UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 Comments Example The usage of Repeat F2H F3H and Operand Size 66H prefixes with MOVLHPS is reserved Di...

Page 1835: ...for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1...

Page 1836: ...s VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Comments The usage of Repeat Prefixes F2H F3H with MOVLPS is reserved Different processor implementations may handle this prefix differently Usage of this prefix with M...

Page 1837: ...f CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Comments T...

Page 1838: ...xmm1 63 32 xmm1 63 32 xmm1 95 64 xmm1 95 64 xmm1 127 96 xmm1 127 96 else if destination m32 store instruction m32 xmm1 31 0 else move instruction xmm2 31 0 xmm1 31 0 xmm2 63 32 xmm2 63 32 xmm2 95 64 xmm2 95 64 Opcode Instruction Description F3 0F 10 r F3 0F 11 r MOVSS xmm1 xmm2 m32 MOVSS xmm2 m32 xmm1 Move 32 bits representing one scalar SP operand from XMM2 Mem to XMM1 register Move 32 bits repre...

Page 1839: ...or unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CP...

Page 1840: ...a When a memory address is indicated the 16 bytes of data at memory location m128 are loaded to the 128 bit multimedia register xmm or stored from the 128 bit multimedia register xmm When the register register form of this operation is used the content of the 128 bit source register is copied into 128 bit register xmm No assumption is made about alignment FP Exceptions None Numeric Exceptions None...

Page 1841: ...e Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Comments MOVUPS should be used with SP FP numbers when that data is known to be unaligned The usage of this instruction should be limited to the cases where the aligned restriction is hard or impossible to meet SSE implementations gua...

Page 1842: ...ic exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 Virtual 8086 Mode E...

Page 1843: ... OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 Virtual 8086 Mode Exceptions Same e...

Page 1844: ...effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault ...

Page 1845: ...he operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data...

Page 1846: ... 00000000000110000000001B 2126 The decimal approximations of the single precision numbers that delimit the three intervals specified above are as follows 1 11111111110100000000000B 2125 8 5039437 1037 1 11111111110100000000001B 2125 8 5039443 1037 1 00000000000110000000000B 2126 4 2550872 1037 1 00000000000110000000001B 2126 4 2550877 1037 The hexadecimal representations of the single precision nu...

Page 1847: ...from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode AC for unaligned memory reference if the current privilege level is 3 PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data ...

Page 1848: ... 00000000000110000000001B 2126 The decimal approximations of the single precision numbers that delimit the three intervals specified above are as follows 1 11111111110100000000000B 2125 8 5039437 1037 1 11111111110100000000001B 2125 8 5039443 1037 1 00000000000110000000000B 2126 4 2550872 1037 1 00000000000110000000001B 2126 4 2550877 1037 The hexadecimal representations of the single precision nu...

Page 1849: ... is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Add...

Page 1850: ...C is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode AC for unaligned memory reference if the current privilege level is 3 PF fault code for a page fault Additional Itanium System E...

Page 1851: ... 64 xmm2 m128 127 96 Description The SHUFPS instruction is able to shuffle any of the four SP FP numbers from xmm1 to the lower 2 destination fields the upper 2 destination fields are generated from a shuffle of any of the four SP FP numbers from xmm2 m128 By using the same register for both sources SHUFPS can return any combination of the four SP FP numbers from this register Bits 0 and 1 of the ...

Page 1852: ...0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB...

Page 1853: ...R4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CP...

Page 1854: ...CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 V...

Page 1855: ...is 3 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault AC for...

Page 1856: ...ric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR b...

Page 1857: ... CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 ...

Page 1858: ...S segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 XM for an unmasked SSE numeric exception CR4 OSXMMEXCPT 1 UD for an unmasked SSE numeric exception CR4 OSXMMEXCPT 0 UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Addr...

Page 1859: ...ent Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments UCOMISS differs from COMISS in that it signals an invalid numeric exception when a source operand is an sNaN COMISS signals invalid if a source operand is either a qNaN or an sNaN The usage of Repeat F2H F3H and Operand Size prefixes with UCOMISS is reserv...

Page 1860: ...ctive address in the CS DS ES FS or GS segments SS 0 for an illegal address in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set ...

Page 1861: ... Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments When unpacking from a memory operand an implementation may decide to fetch only the appropriate 64 bits Alignment to 16 byte boundary and normal segment checking will still be enforced The usage of Repeat Prefixes F2H F3H with UNPCKHPS is reserved Different p...

Page 1862: ...ctive address in the CS DS ES FS or GS segments SS 0 for an illegal address in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set UD if CRCR4 OSFXSR bit 9 0 UD if CPUID XMM EDX bit 25 0 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set ...

Page 1863: ... Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Comments When unpacking from a memory operand an implementation may decide to fetch only the appropriate 64 bits Alignment to 16 byte boundary and normal segment checking will still be enforced The usage of Repeat Prefixes F2H F3H with UNPCKLPS is reserved Different p...

Page 1864: ...l 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key ...

Page 1865: ...7 mm2 m64 63 56 for i 0 i 8 i temp i zero_ext x i 8 zero_ext y i 8 res i temp i 1 1 mm1 7 0 res 0 mm1 63 56 res 7 else if instruction PAVGW x 0 mm1 15 0 y 0 mm2 m64 15 0 x 1 mm1 31 16 y 1 mm2 m64 31 16 x 2 mm1 47 32 y 2 mm2 m64 47 32 x 3 mm1 63 48 y 3 mm2 m64 63 48 for i 0 i 4 i Opcode Instruction Description 0F E0 r PAVGB mm1 mm2 m64 Average with rounding packed unsigned bytes from MM2 Mem to pac...

Page 1866: ...e CS DS ES FS or GS segments SS 0 for an illegal address in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would li...

Page 1867: ...t PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode P...

Page 1868: ...igned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Virtual 8086 Mode Exceptions Same exceptions as in Real Add...

Page 1869: ... enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code f...

Page 1870: ...if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if the...

Page 1871: ... enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code f...

Page 1872: ...if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if the...

Page 1873: ...it in CR0 is set MF if there is a pending FPU exception AC for unaligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU ex...

Page 1874: ...tions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode PF fault code for a page fault ...

Page 1875: ...is a MMX technology register The source operand can either be a MMX technology register or a 64 bit memory operand Numeric Exceptions None Protected Mode Exceptions GP 0 for an illegal memory operand effective address in the CS DS ES FS or GS segments SS 0 for an illegal address in the SS segment PF fault code for a page fault UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU ...

Page 1876: ...or a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault ...

Page 1877: ...ligned memory reference To enable AC exceptions three conditions must be true CR0 AM is set EFLAGS AC is set current CPL is 3 Real Address Mode Exceptions Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH UD if CR0 EM 1 NM if TS bit in CR0 is set MF if there is a pending FPU exception Virtual 8086 Mode Exceptions Same exceptions as in Real Ad...

Page 1878: ...aults are signalled irrespective of the value of the mask Signalling of breakpoints code or data is not guaranteed different processor implementations may signal or not signal these breakpoints If the destination memory region is mapped as UC or WP enforcement of associated semantics for these memory types is not guaranteed i e is reserved and is implementation specific Dependency on the behavior ...

Page 1879: ... unnecessary bandwidth since data is to be written directly using the byte mask without allocating old data prior to the store Similar to the SSE non temporal store instructions MASKMOVQ minimizes pollution of the cache hierarchy MASKMOVQ implicitly uses weakly ordered write combining stores WC See Section 4 6 1 9 Cacheability Control Instructions for further information about non temporal stores ...

Page 1880: ...fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fault Data Key Permission Fault Data Access Rights Fault Data Access Bit Fault Data Dirty Bit Fault Com...

Page 1881: ...Mode AC for unaligned memory reference if the current privilege level is 3 PF fault code for a page fault Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR dfl is 1 NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault Data TLB Fault Alternate Data TLB Fault Data Page Not Present Fault Data NaT Page Consumption Abort Data Key Miss Fau...

Page 1882: ...dent It will however be a minimum of 32 bytes Prefetches to uncacheable memory UC or WC memory types will be ignored Additional ModRM encodings besides those specified above are defined to be reserved and the use of reserved encodings risks future incompatibility Numeric Exceptions None Protected Mode Exceptions None Real Address Mode Exceptions None Virtual 8086 Mode Exceptions None Additional It...

Page 1883: ... As a result the SFENCE instruction provides a performance efficient way of ensuring ordering between routines that produce weakly ordered results and routines that consume this data SFENCE uses the following ModRM encoding Mod 7 6 11B Reg Opcode 5 3 111B R M 2 0 000B All other ModRM encodings are defined to be reserved and use of these encodings risks incompatibility with future processors Numeri...

Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...

Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...

Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...

Page 1887: ... Fault 2 151 brl Instruction 3 30 brp Instruction 3 32 BSF Instruction 4 35 BSP RSE Backing Store Pointer Register 1 29 BSPSTORE RSE Backing Store Pointer for Memory Stores Register 1 30 BSR Instruction 4 37 bsw Instruction 3 34 BSWAP Instruction 4 39 BT Instruction 4 40 BTC Instruction 4 42 BTR Instruction 4 44 BTS Instruction 4 46 Bundle Format 1 38 Bundles 1 38 1 141 Byte Ordering 1 36 C CALL I...

Page 1888: ...ER Instruction 4 94 EOI End of External Interrupt Register 2 124 epc Instruction 2 555 3 53 Epilog Count Register EC 1 33 Explicit Prefetch 1 70 External Controller Interrupts 2 96 External Interrupt 2 96 2 538 External Interrupt Control Registers CR64 81 2 42 External Interrupt Request Registers IRR0 3 2 125 External Interrupt Vector Register IVR 2 123 External Task Priority Cycle XTP 2 130 Exter...

Page 1889: ...nstruction 3 93 FNSAVE Instruction 4 162 FNSTCW Instruction 4 176 FNSTENV Instruction 4 178 FNSTSW Instruction 4 180 for Instruction 3 94 fpabs Instruction 3 95 fpack Instruction 3 96 fpamax Instruction 3 97 fpamin Instruction 3 99 FPATAN Instruction 4 149 fpcmp Instruction 3 101 fpcvt Instruction 3 104 fpma Instruction 3 107 fpmax Instruction 3 109 fpmerge Instruction 3 111 fpmin Instruction 3 11...

Page 1890: ...are Trap 2 232 IA 32 Interruption 2 111 IA 32 Interruption Vector Definitions 2 213 IA 32 Interruption Vector Descriptions 2 213 IA 32 Memory Ordering 2 265 IA 32 Physical Memory References 2 262 IA 32 SSE Extensions 1 20 1 130 IA 32 System Registers 2 246 IA 32 System Segment Registers 2 241 IA 32 Trap Code 2 213 IA 32 Virtual Memory References 2 261 IBR Index Breakpoint Register 2 151 2 152 IDIV...

Page 1891: ...rruptions 2 95 2 537 Interrupts 2 96 2 114 External Interrupt Architecture 2 603 Interval Time Counter ITC 1 31 Interval Timer Match Register ITM 2 32 Interval Timer Offset ITO 2 34 Interval Timer Vector ITV 2 125 INTn Instruction 4 217 INTO Instruction 4 217 invala Instruction 3 146 INVD instructions 4 228 INVLPG Instruction 4 230 IP Instruction Pointer 1 27 1 140 IPI Inter processor Interrupt 2 ...

Page 1892: ...a 2 615 MINPS Instruction 4 523 MINSS Instruction 4 525 mix Instruction 3 169 MMX technology 1 20 MOV Instruction 4 284 mov Instruction 3 172 MOVAPS Instruction 4 527 MOVD Instruction 4 401 MOVHLPS Instruction 4 529 MOVHPS Instruction 4 530 movl Instruction 3 187 MOVLHPS Instruction 4 532 MOVLPS Instruction 4 533 MOVMSKPS Instruction 4 535 MOVNTPS Instruction 4 578 MOVNTQ Instruction 4 579 MOVQ In...

Page 1893: ...83 PAL_BRAND_INFO 2 366 PAL_BUS_GET_FEATURES 2 367 PAL_BUS_SET_FEATURES 2 369 PAL_CACHE_FLUSH 2 370 PAL_CACHE_INFO 2 374 PAL_CACHE_INIT 2 376 PAL_CACHE_LINE_INIT 2 377 PAL_CACHE_PROT_INFO 2 378 PAL_CACHE_READ 2 380 PAL_CACHE_SHARED_INFO 2 382 PAL_CACHE_SUMMARY 2 384 PAL_CACHE_WRITE 2 385 PAL_COPY_INFO 2 388 PAL_COPY_PAL 2 389 PAL_DEBUG_INFO 2 390 PAL_FIXED_ADDR 2 391 PAL_FREQ_BASE 2 392 PAL_FREQ_R...

Page 1894: ...MINSW Instruction 4 569 PMINUB Instruction 4 570 PMOVMSKB Instruction 4 571 pmpy Instruction 3 213 pmpyshr Instruction 3 214 PMULHUW Instruction 4 572 PMULHW Instruction 4 431 PMULLW Instruction 4 433 PMV Performance Monitoring Vector 2 126 POP Instruction 4 311 POPA Instruction 4 315 POPAD Instruction 4 315 popcnt Instruction 3 216 POPF Instruction 4 317 POPFD Instruction 4 317 POR Instruction 4 ...

Page 1895: ...on 4 337 REPNZ Instruction 4 337 REPZ Instruction 4 337 Reserved Variables 2 351 Reset Event 2 95 2 351 Resource Utilization Counter RUC 1 31 2 33 RET Instruction 4 340 rfi Instruction 2 543 3 236 RID Region Identifier 2 561 RNAT RSE NaT Collection Register 1 30 ROL Instruction 4 327 ROR Instruction 4 327 Rotating Registers 1 145 RR Region Register 2 58 2 561 RSC Register Stack Configuration Regis...

Page 1896: ...ion Cache 2 49 2 567 Template Field Encoding 1 38 Templates 1 141 TEST Instruction 4 381 tf Instruction 3 263 thash Instruction 3 265 TLB Translation Lookaside Buffer 2 47 2 565 tnat Instruction 3 266 tpa Instruction 3 268 TPR Task Priority Register 2 123 2 605 TR Translation Register 2 48 2 566 Translation Cache TC 2 49 2 567 purge 2 568 Translation Instructions 2 60 Translation Lookaside Buffer ...

Page 1897: ...ite Dependency 1 149 WRMSR Instruction 4 389 X XADD Instruction 4 391 XCHG Instruction 4 393 xchg Instruction 2 508 3 274 XLAT Instruction 4 395 XLATB Instruction 4 395 xma Instruction 3 276 xmpy Instruction 3 278 XOR Instruction 4 397 xor Instruction 3 279 XORPS Instruction 4 562 XTP External Task Priority Cycle 2 130 XTPR External Task Priority Register 2 605 Z zxt Instruction 3 280 ...

Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...

Reviews:

No comments

Related manuals for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3

Brand: ICP Pages: 77

SNAP DEPLOY 2.0 -

Brand: ACRONIS Pages: 68

Bluetooth for Microsoft Windows

Brand: Broadcom Pages: 30

Brand: Promise Technology Pages: 84

AVGPREP 8.5 - V 85.1

Brand: AVG Pages: 4

Mentor Time-Server

Brand: Sonifex Pages: 4

Brand: Gallo Pages: 9

Doctor Pro TM-2430-13

Brand: A&D Pages: 69

GPS Information

Brand: Globalsat Pages: 7

Brand: AMX Pages: 24

POLICY MANAGER PROXY -

Brand: F-SECURE Pages: 43

00128-051462-9310 - AUTOCAD 2008 COMM UPG FRM 2005 DVD

Brand: Autodesk Pages: 2268

Brand: Bay Networks Pages: 88

CONTRIBUTE 3-DEPLOYING CONTRIBUTE

Brand: MACROMEDIA Pages: 38

DREAMWEAVER 8-DREAMWEAVER API

Brand: MACROMEDIA Pages: 628

Instrument Driver NI-DMM

Brand: National Instruments Pages: 12

MN-LGD011-XX-US

Brand: LapLink Pages: 166

Brand: Campbell Pages: 22

Brands by name

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Popular brands

Load more brands