background image

Summary of Contents for i86W

Page 1: ......

Page 2: ...DED CONTROLLERS 16 BIT EMBEDDED CONTROLLERS 16 32 BIT EMBEDDED PROCESSORS MEMORY MICROCOMMUNICATIONS 2 volume set MICROCOMPUTER SYSTEMS MICROPROCESSORS PERIPHERALS PRODUCT GUIDE Overview of Intel s co...

Page 3: ...ocal Sales Tax ______ Postage add 10 of subtotal i Postage _______ Total _____ Pay by check money order or include company purchase order with this form 100 minimum We also accept VISA MasterCard or A...

Page 4: ...___ X ____ ____ I I ___X _____ ______ ___ X ____ _____ ___ X _____ ____ ____ X _____ _____ ____ X _____ _____ ___X ____ ____ ___ X ____ ______ ___ X _____ _____ ___ X _____ ____ Subtotal _____ Must Ad...

Page 5: ...i860 64 BIT MICROPROCESSOR PROGRAMMER S REFERENCE MANUAL 1990...

Page 6: ...K iRMX iSBC iSBX iSDM iSXM Library Manager MAPNET MCS Megachassis MICROMAINFRAME MULTIBUS MULTICHANNEL MULTIMODULE MultiSERVER ONCE OpenNET OTP PR0750 PROMPT Promware QUEST QueX Quick Erase Quick Puls...

Page 7: ...by the instructions of the i860 microprocessor Chapter 3 Registers presents the processor s database A detailed knowledge of the registers is important to programmers but this chapter may be skimmed...

Page 8: ...ed to A Compound statements are enclosed between the keywords of the if statement IF THEN ELSE FI or of the do statement DO OD The operator indicates autoincrement addressing Register names and instru...

Page 9: ...from the same register NOTE Depending upon the values of reserved or undefined bits makes software depen dent upon the unspecified manner in which the i860 microprocessor handles these bits Depending...

Page 10: ...tandard support includes TIPS Technical Infor mation Phone Service updates and subscription service product specific troubleshooting guides and COMMENTS Magazine Basic support consists of updates and...

Page 11: ...REGISTER 3 2 3 4 EXTENDED PROCESSOR STATUS REGISTER 3 4 3 5 DATA BREAKPOINT REGISTER 3 6 3 6 DIRECTORY BASE REGISTER 3 6 3 7 FAULT INSTRUCTION REGISTER 3 8 3 8 FLOATING POINT STATUS REGISTER 3 8 3 9...

Page 12: ...TIONS 6 5 6 3 1 Floating Point Multiply 6 7 6 3 2 Floating Point Multiply Low 6 8 6 3 3 Floating Point Reciprocals 6 9 6 4 ADDER INSTRUCTIONS 6 9 6 4 1 Floating Point Add and Subtract 6 10 Q 4 2 Float...

Page 13: ...sts 8 3 8 2 DATA ALIGNMENT 8 4 8 3 IMPLEMENTING A STACK 8 4 8 3 1 Stack Entry and Exit Code 8 5 8 3 2 Dynamic Memory Allocation on the Stack 8 6 8 4 MEMORY ORGANIZATION 8 6 CHAPTER 9 PROGRAMMING EXAMP...

Page 14: ...Address Translation 4 4 4 5 Format of a Page Table Entry 4 5 4 6 Invalid Page Table Entry 4 5 6 1 Pipelined Instruction Execution 6 3 6 2 Dual Operation Data Paths 6 16 6 3 Data Paths by Instruction...

Page 15: ...DP MERGE Update A 5 Register Encoding B 1 Examples Title Example of bla Usage Cache Flush Procedure Examples of lock and unlock Usage Saving Pipeline States Restoring Pipeline States 1 of 2 Reading Mi...

Page 16: ...TABLE OF CONTENTS Examples Example Title 9 18 9 19 9 20 9 21 Construction of Color Interpolants Z Mask Procedure Accumulator Initialization 3 D Rendering 1 of 2 xii Page 9 28 9 28 9 30 9 31...

Page 17: ...Architectural Overview 1...

Page 18: ......

Page 19: ...r programmers can explic itly use the data cache as if it were a large block of vector registers To sustain high performance the i860 microprocessor incorporates wide information paths that include 64...

Page 20: ...SqftWare can switch between scalar and pipelined modes Large register set 32 general purpose integer registers each 32 bits wide 32 floating point registers each 32 bits wide which can also be configu...

Page 21: ...ntains the integer register file and decodes and executes load store integer bit and control transfer operations Its pipelined organization with extensive bypassing and scoreboarding maximizes perform...

Page 22: ...0 microprocessor does not have high level math macro instructions High level math and other functions are imple mented in software macros and libraries For example the i860 microprocessor does not hav...

Page 23: ...ecially useful for high resolution distance interpolation In addition to the special support provided by the graphics unit many 3 D graphics applications directly benefit from the parallelism of the c...

Page 24: ...n also begins immediately Compilers designed for the vector model can treat the i860 microprocessor as a vector machine New instruction scheduling technology for compilers can compare the processing r...

Page 25: ...orm that can be utilized by a variety of compilers Simulator and debugger 1 8 1 Multiprocessing for High Performance with Compatibility Memory organization of the i860 microprocessor is compatible wit...

Page 26: ......

Page 27: ...Data Types 2...

Page 28: ......

Page 29: ...form A 32 bit integer can represent a value in the range 2 147 483 648 _231 to 2 147 438 647 231 1 Arithmetic operations on 8 and 16 bit integers can be performed by sign extending the 8 or 16 bit va...

Page 30: ...results Refer to Table 2 2 for encoding of these special values 2 4 DOUBLE PRECISION REAL 63 52 E 1 t F L FRACTION EXPONENT SIGN o 240329i A double precision real also called double real data type is...

Page 31: ...defines only the field sizes not the specific use of each field Other ways of using the fields of pixels are possible 2 6 REAL NUMBER ENCODING Table 2 2 presents the complete range of values that can...

Page 32: ...ENSITY R RED INTENSITY G GREEN INTENSITY B BLUE INTENSITY C COLOR T TEXTURE 3 THESE ASSIGNMENTS OF SPECIFIC MEANINGS TO THE FIELDS OF PIXELS ARE FOR ILLUSTRATION PURPOSES ONLY ONLY THE FIELD SIZES ARE...

Page 33: ...a 0 11 10 11 11 Normals C 0 00 01 00 00 ij Q a 0 00 00 11 11 Denormals 0 00 00 00 01 Zero 0 00 00 00 00 Zero 1 00 00 00 00 1 00 00 00 01 C Cil Denormals Q 1 00 00 11 11 II 1 00 01 00 00 C Q Normals ii...

Page 34: ......

Page 35: ...Registers 3...

Page 36: ......

Page 37: ...dirbase fir and fsr Four special purpose registers KR KI T and MERGE FLOAnNG POlNT 63 0 INTEGER 0 0 to 31 0 12 0 rO 14 r1 f6 r2 f8 r3 110 r4 112 r5 114 r6 116 r7 118 r8 f20 r9 f22 r10 124 r11 f26 r12...

Page 38: ...read independently of what is stored in them The floating point registers are also used by a set of integer operations primarily for graphics computations The floating point registers act as buffer re...

Page 39: ...status bits are changed when a trap occurs They are restored into their corresponding status bits when returning from a trap handler with a branch indirect instruction when a trap flag is set in the...

Page 40: ...when the instruction has been successfully bypassed It is possible that the core instruction may cause a trap when the floating point instruction is suppressed In this case KNF remains set permitting...

Page 41: ...nterrupt is the value of the INT input pin DCS Data Cache Size is a read only field that tells the size of the on chip data cache The number of bytes actually available is 212 DCS therefore a value of...

Page 42: ...omparing to db a 32 bit access ignores the low order two bits This ensures that any access that overlaps the address contained in the register will generate a trap The trap occurs before the register...

Page 43: ...bit in a PTE that is not itself in the data cache When CS8 Code Size 8 Bit is set instruction cache misses are processed as 8 bit bus cycles When this bit is clear instruction cache misses are proces...

Page 44: ...mode status for the cur ent process Figure 3 5 shows its format If FZ Flush Zero is clear and underflow occurs a result exception trap is generated When FZ is set and underflow occurs the result is se...

Page 45: ...Mode Rounding Action 240329i Round to nearest or even Closer to b of a or c if equally close select even number the one whose least significant bit is zero Round down toward 00 a Round up toward 00 c...

Page 46: ...aced into the first stage of the adder and multiplier pipelines When the processor executes pipe lined operations it propagates the result status bits of a particular unit multiplier or adder one stag...

Page 47: ...nd the T Temporary register are special purpose registers used by the dual operation floating point instructions described in Chapter 6 The MERGE register is used only by the graphics instructions als...

Page 48: ......

Page 49: ...Addressing 4...

Page 50: ......

Page 51: ...ry page table accesses are always done with little endian addressing Figure 4 1 shows the difference between the two storage modes Figure 4 2 defines by example how data is transferred from memory ove...

Page 52: ...or a data access trap occurs A 32 bit value is aligned to an address divisible by four when referenced in memory i e the two least significant address bits must be zero or a data access trap occurs A...

Page 53: ...address into the physical address by consulting two levels of page tables The addressing mechanism uses the DIR field as an index into a page directory uses the PAGE field as an index into the page ta...

Page 54: ...for each process or some combination of the two 4 2 4 Page Table Entries Page table entries PTEs in either level of page tables have the same format Figure 4 5 illustrates this format 4 2 4 1 PAGE FR...

Page 55: ...y is not valid for address translation and the rest of the entry is available for software use none of the other bits in the entry is tested by the hardware Figure 4 6 illustrates the format of a page...

Page 56: ...irectory entries is not referenced by the processor but is reserved To control external caches the PTB output pin reflects either CD or WT depending on the PBM bit of epsr refer to Chapter 3 4 2 4 5 A...

Page 57: ...sor clears the psr U bit to indicate supervisor level when a trap occurs including when the trap instruction causes the trap The prior value of U is copied into PU The trap mechanism is described in C...

Page 58: ...ess Let DIR PAGE and OFFSET be the fields of the virtual address let PFAl and PFA2 be the page frame address fields of the first and second level page tables respectively DTB is the page directory tab...

Page 59: ...is zero and if the TLB miss occurred while the bus was not locked assert LOCK refetch the PTE set A and store the PTE deasserting LOCK during the store 7 Locate the PTE at the physical address formed...

Page 60: ...Flushing Instruction and Address Translation Caches Storing to the dirbase register with the ITI bit set invalidates the contents of the instruction and address translation caches This bit should be...

Page 61: ...Core Instructions 5...

Page 62: ......

Page 63: ...address offset The immediate value is zero extended for logical operations and is sign extended for add and subtract operations including addu and subu and for all addressing calculations Same as src...

Page 64: ...rent instruction pointer plus four The resulting target address may lie anywhere within the address space The contents of the memory location indicated by address with a size of x The comments regardi...

Page 65: ...urce operand by the next instruction 2 A load instruction should not directly follow a store that is expected to hit in the data cache Even though immediate address offsets are limited to 16 bits load...

Page 66: ...formance a load instruction should not directly follow a store that is ex pected to hit in the data cache Even though immediate address offsets are limited to 16 bits a store using a 32 bit immediate...

Page 67: ...rc1ni Transfer Integer to F P Register The ixfr instruction transfers a 32 bit value from an integer register to a floating point register Programming Notes For best performance the destination of an...

Page 68: ...pipeline has three stages A pfld returns the data from the address calculated by the third previous pfld thereby allowing three loads to be outstanding on the external bus When the data is already in...

Page 69: ...o hit in the data cache There is no performance impact for a pfld following a store instruction 3 A string of successive pfld instructions causes internal delays due the fact that the bandwith of the...

Page 70: ...c2 Traps If the operand is misaligned a data access trap results Programming Notes For the autoincrementing form of the instruction the register coded as isrcl must not be the same register as isrc2 F...

Page 71: ...to be updated are selected by the low order bits of the PM field in the psr Each bit of PM corresponds to one pixel with bit 0 corresponding to the pixel at the lowest address This instruction is typ...

Page 72: ...he add and subtract instructions are also used to implement comparisons For this use rO is specified as the destination so that the result is effectively discarded Equal and not equal comparisons are...

Page 73: ...6 Note that the only difference between the signed and the unsigned forms is in the setting of the condition code CC and the overflow flag OF The various forms of comparison between variables and cons...

Page 74: ...d isrc2 the low order 32 bits The shift count for shrd is taken from the shift count of the last shr instruction which is saved in the SC field of the psr Shift left is identical for integers and ordi...

Page 75: ...trap as described in Chapter 7 The trap instruction can be used to implement supervisor calls and code breakpoints The ides should be zero because its contents are undefined after the operation The i...

Page 76: ...isrc2 CC set if result is zero cleared otherwise xor isrc1 isrc2 idest Logical XOR idest isrc1 XOR isrc2 CC set if result is zero cleared otherwise xorh const isrc2 idest Logical XOR high idest const...

Page 77: ...CORE INSTRUCTIONS Bit Operation Equivalent Logical Operation Set bit or Clear bit andnot Complement bit xor Test bit and CC set if bit is clear 5 15...

Page 78: ...ntrol transfer instruction before actually transferring control During the time used to execute the additional instruction the i860 microprocessor refills the instruction pipeline by fetching instruct...

Page 79: ...ontinue execution at brx sbroff FI Branch on not CC taken Branch if equal Branch if not equal bla isrc1ni isrc2 sbroff Branch on LCC and add LCC_temp clear if isrc2 comp2 isrc1nil signed LCC_temp set...

Page 80: ...rations is the value of isrc2 before the first bla instruction plus one Example 5 1 illustrates this use of bla Programmers should avoid calling subroutines from within a bla loop because a subrou tin...

Page 81: ...dual instruction mode ELSE IF DIM is set FI THEN enter dual instruction mode for next instruction pair ELSE enter single instruction mode for next instruction pair FI Continue execution at address in...

Page 82: ...ccurs saves the address of the Id c instruction After a scalar floating point operation a st c to fsr should not change the value of RR RM or FZ until the point at which result exceptions are reported...

Page 83: ...lush is suppressed use it only in supervisor mode Example 5 2 shows how to use the flush instruction The addresses used by the flush instruction refer to a reserved 4 Kbytc memory area that is not use...

Page 84: ...ear RC and RB II Change DrB ATE or ITI fields here if necessary st c Rz dirbase D_fLUSH orh fLUSH_P_H r0 Rw II Rw address minus 32 or fLUSH_P_L Rw Rw II of flush area or 127 r0 Ry II Ry loop count ld...

Page 85: ...d not span a page boundary After a lock instruction the location is not locked until the first data access that misses the data cache Software in a multiprocessing system should ensure that the first...

Page 86: ...Notes In a locked sequence a transition to or from dual instruction mode is not permitted II LOCKED TEST AND SET II Value to put in semaphore is in r23 lock II ld b semaphore r22 II Put current value...

Page 87: ...Floating Point Instructions 6...

Page 88: ......

Page 89: ...perand is to be placed srcl The first of the two source register designators src2 The second of the two source register designators dest The destination register designator Thus the operand specifier...

Page 90: ...e holds status information pertaining to those results The figure assumes that the instruction stream consists of a series of consecutive floating point instructions all of one type i e all adder inst...

Page 91: ...STRUCTIONS StagG 1 results status i s i 1 s i 2 s i 3 5 i 4 s i 5 s r r r r r Stage 2 results status Clockm Clock m 1 i s Clock m 2 i 1 s Clock m 3 i 2 s Clock m 4 i 3 8 Clock m 5 i 4 6 r r r r Stage...

Page 92: ...e normal case 2 It is propagated from the first stage of the pipeline This method is used when restoring the state of the pipeline after a preemption When a store instruction updates the fsr and the t...

Page 93: ...nstored results from the affected pipeline After a scalar operation the values of all pipeline stages of the affected unit except the last are undefined No spurious result exception traps result when...

Page 94: ...w the last stage The second previous operation old second stage is discarded The next pipelined multi plier operation stores the single precision result Double to Single Transitions When a pipelined m...

Page 95: ...Multiply Pipelined Floating Point Multiply Three Stage Pipelined Multiply k These instructions perform a standard multiply operation Programming Notes Fsrcl must not be the same as fdest for pipelined...

Page 96: ...es the low order bits of its operands It operates only on double precision operands The high order 10 bits of the result are undefined An fmlow can perform 32 bit integer multiplies Two 64 bit values...

Page 97: ...d compilers must encode fsrcl as fO A Newton Raphson approximation may produce a result that is different from the IEEE standard in the two least significant bits of the mantissa A library routine sup...

Page 98: ...ese instructions perform standard addition and subtraction operations The famov and pfamov instructions send fsrcl through the floating point adder preserv ing the value of 0 minus zero when fsrcl is...

Page 99: ...mov sd In assembly language this conversion can be specified by the fmov or pfmov pseudo operation with the sd suffix fmov sd fsrc1 fdest Equivalent to famov sd fsrc1 fdest pfmov sd fsrc1 fdest Equiva...

Page 100: ...re instructions The pipelined instructions can be used either within a sequence of pipelined instructions or within a sequence of nonpipelined scalar instructions pfgt p should be used for A B and A B...

Page 101: ...dest last stage adder result Advance A pipeline one stage A pipeline first stage 64 bit value with low order 32 bits equal to integer part of fsrc1 The instructions fix pfix ftrunc and pftrunc must sp...

Page 102: ...e first stage M op1 x M op2 pfmsm p fsrc1 fsrc2 fdest fdest last stage multiplier result PlpeJined Floating Point Multiply with Subtract Advance A and M pipeline one stage operands accessed before adv...

Page 103: ...perand l of the adder can be fsrcl the T register the last stage result of the multiplier pipeline or the last stage result of the adder pipeline 4 Operand 2 of the adder can be fsrc2 the last stage r...

Page 104: ...ID ID instructions and 1 i2apt ss ID ID ID Because single precision values are stored in these 64 bit registers in a format which does not conform to the standard for double precision numbers leaving...

Page 105: ...1 src2 No No 1111 m12tpa m12tsa src1 src2 T A result No No OPC PFMAM PFMSM M Unit M Unit A Unit A Unit T K Mnemonic Mnemonic op1 op2 op1 op2 Load Load 0000 mr2p1 mr2s1 KR src2 src1 M result No No 0001...

Page 106: ...MULTIPLIER UNIT RESULT RESULT op ADDER UNIT ADDER UNIT RESULT RESULT r2p1 r2s1 r2pt r2st fare1 fsre2 fdest fsre1 fsre2 fdest op2 op2 MULTIPLIER UNIT MULTIPLIER UNIT RESULT ADDER UNIT ADDER UNIT RESULT...

Page 107: ...i2s1 fsre2 op2 MULTIPLIER UNIT RESULT op1 op2 ADDER UNIT RESULT i2ap1 i2as1 fdest fdest fsre1 fsre2 op2 MULTIPLIER UNIT RESULT ADDER UNIT RESULT i2pt i2st fsre1 fsre2 op1 op2 MULTIPLIER UNIT RESULT o...

Page 108: ...R UNIT IRESULT fsre2 fdest fsrc2 fdest isrei fiic2 op1 op2 MULTIPLIER UNIT RESULT I op1 op2 ADDER UNIT RESULT I m12apm m12asm fsre1 fsrc2 op1 op2 MULTIPLIER UNIT RESULT E op1 op2 ADDER UNIT ADDER UNIT...

Page 109: ...UNIT RESULT RESULT opl 0P2 ADDER UNIT ADDER UNIT l lat1 p2 lat1s2 ml2tpm ml2tsm farcl farc2 fdest farcl farc2 fdest 4 opl op2 MULTIPLIER UNIT MULTIPLIER UNIT RESULT RESULT opl op2 ADDER UNIT ADDER UN...

Page 110: ...2s1 fsrc2 op2 MULTIPLIER UNIT nESULi op2 ADDER UNIT RESULT mr2mp1 mr2ms1 fdest fdest fsrc1 fsrc2 op2 MULTIPLIER UNIT RESULT iop1 op ADDER UNIT RESULT mr2pt mr2st fsrc1 fsrc2 op2 MULTIPLIER UNIT RESULT...

Page 111: ...p1 mi2s1 fsrc2 fdesl MULTIPLIER UNIT RESULT op2 ADDER UNIT RESULT ml2mp1 ml2ms1 fsrc1 fsrc2 op2 MULTIPLIER UNIT RESULT op1 op ADDER UNIT RESULT ml2pt ml2st fsrc1 fsrc2 op2 MULTIPLIER UNIT RESULT opl o...

Page 112: ...T RESULT op1 op2 ADDER UNIT ADDER UNIT RESULT RESULT I mrmt1p2 mrmt1s2 mm12mpm mm12msm fsrc1 fsrc2 fdest fsrc1 fsrc2 fdest op1 op2 MULTIPLIER UNIT MULTIPLIER UNIT RESULT 1 op1 op2 ADDER UNIT ADDER UNI...

Page 113: ...op1 op2 MULTIPLIER UNIT MULTIPLIER UNIT RESULT RESULT I op1 op2 ADDER UNIT ADDER UNIT RESULT RESULT I mimt1p2 mimt1s2 mm12tpm mm12tsm fsre1 fsre2 fdest MULTIPLIER UNIT RESULT ADDER UNIT RESULT mim1p2...

Page 114: ...and op2 ra rrn la 1m mt2 loadT t nUll I I L KI M result KI A result KR M result KR A result A unit opt I a m t Add Subtract p s A unit op2 2 m a IT L A result LM result IIrc2 subtract add plus M resu...

Page 115: ...aphics operation if fdest is not fO then fdest must not be the same as fsrcl or fsrc2 For best performance the result of a scalar operation should not be a source operand in the next instruction unles...

Page 116: ...for floating point operations These instructions do not set CC nor do they cause floating point traps due to overflow Programming Notes In assembly language fiadd and pfiadd are used to implement the...

Page 117: ...nd an OR instruction use the MERGE register The addition instructions are designed to add interpolation values to each color intensity field in an array of pixels or to each distance value in a Z buff...

Page 118: ...fsrc2 i and fsrc1 i 00 MERGE 0 fzchkl fsrc1 fsrc2 fdest 32 Blt Z Buffer Check Consider fsrc1 fsrc2 and fdest as arrays of two 32 bit fields fsrc1 0 fsrc1 1 fsrc2 0 fsrc2 1 and fdest 0 fdest 1 where ze...

Page 119: ...he instructions compare the distances of the points to be drawn against the values in the Z buffer and set bits of PM to indicate which distances are smaller than those in the Z buffer Previously calc...

Page 120: ...ion implements interpolation of color intensities The 8 and 16 bit pixel formats use 16 bit intensity interpolation Being a 64 bit instruction faddp does four 16 bit interpolations at a time The 32 bi...

Page 121: ...h faddp instruction the MERGE register is shifted right by 8 bits Two faddp instructions should be executed consecutively one to interpolate for even numbered pixels the next to interpolate for odd nu...

Page 122: ...bits Normally three faddp instructions are exe cuted consecutively one for each color represented in a pixel The shifting of MERGE causes the results of consecutive faddp instructions to be accumulate...

Page 123: ...en they are loaded into the MERGE register With each faddp the MERGE register is shifted right by 8 bits Normally three faddp instructions are exe cuted consecutively one for each color represented in...

Page 124: ...hose that form a Z buffer With faddz 16 bit Z buffers can use 32 bit distance interpolation as Figure 6 9 illustrates Since faddz adds 32 bit values each value can be treated as a fixed point real num...

Page 125: ...struction The fact that data is carried from the low order 32 bits into the high order 32 bits may introduce an insignificant distortion into the interpolation For 32 bit Z buffers 64 bit distance int...

Page 126: ...els from the MERGE register sets any additional bits that may be needed in the pixels e g texture values and loads the result into a floating point register Fsrcl when a register and fdest are floatin...

Page 127: ...O Programming Notes This scalar instruction is performed by the graphics unit When it is executed the result in the graphics unit pipeline is lost However executing this instruction does not impact pe...

Page 128: ...ion mode and encounters a floating point instruction with the D bit set one more 32 bit instruction is executed before dual mode execution begins If the i860 microprocessor is executing in dual instru...

Page 129: ...ot reported on fnop Because it is a core instruction d fnop cannot be used to initiate entry into dual instruction mode 6 8 1 Core and Floating Point Instruction Interaction 1 If one of the branch an...

Page 130: ...tion is fst or pst the store should not reference the result register of the floating point operation When the core operation is pst the floating point instruction cannot be p fzchks or p fzchkl 4 Whe...

Page 131: ...anion floating point instruction unless the destination is fO or f1 No overlap of register destinations is permitted for example the following instructions must not be paired d fmul ss f9 fl f5 fld q...

Page 132: ......

Page 133: ...Traps and Interrupts 7...

Page 134: ......

Page 135: ...ts U to zero supervisor mode Table 7 1 Types of Traps Indication Caused by Type psr epsr fsr Condition Instruction Instruction IT OF Software traps trap intovr Fault IL Missing unlock Any SE Floating...

Page 136: ...n mode when a data access fault occurs in the absence of other trap conditions the floating point half of the dual instruction will already have been executed 9 Clears the BL bit of dirbase and deasse...

Page 137: ...of the next trap 7 2 3 Returning from the Trap Handler Returning from a trap handler involves the following steps 1 Restoring the pipeline states including the fsr KR Kl T and MERGE registers where n...

Page 138: ...curred To implement the IEEE standard for unordered com pares the trap handler may need to change the value of CC In this case it cannot resume at fir 4 because the new value of CC might cause an inco...

Page 139: ...the intovr instruction The trap occurs only if OF in epsr is set when intovr is executed The trap handler should clear OF before returning Refer to the intovr instruction in Chapter 5 3 By the lack o...

Page 140: ...source operands are stored in and inspect all four source operands to see if one or both operations need to be fixed up It can then compute the appropriate result and store the result in des in the ca...

Page 141: ...been lost The point at which a result exception is reported depends upon whether pipelined operations are being used Scalar nonpipelined operations Result exceptions are reported on the next floating...

Page 142: ...spect the result compute the result appropriate for that instruction a NaN or an infinity for example and store the correct result The result is either stored in the register specified by RR if nonzer...

Page 143: ...zed by the value at the INT pin just before the end of RESET The read only fields of the epsr are set to identify the processor while the IL WP PBM and BE bits are cleared The bits U 1M BR and BW in p...

Page 144: ...he following items 1 The current contents of the floating point status register fsr including the third stage result status 2 Unstored results from the first second and third stages The number of stag...

Page 145: ...how to restore the pipeline state Trap handlers manipulate the result status bits in the floating point pipelines while preparing for pipeline resumption When storing to fsr with the U bit set the res...

Page 146: ...0 f0 f0 f0 Dummy Lres Lres Lres1m Ares3 II II II II Mres2 II Ares2 II II II II Mres1 II Ares1 II II f0 f0 Ires1 II II T and MERGE results get double precision 1 0 save third stage result status clear...

Page 147: ...L1 I Lres3m r31 I Lres3m r31 IIlxll1lllllll Fsr3 L2 Mres3 Mres3 IIlxll1l f2 f4 Fsr3 IIIx2III Temp Temp fsr I I clear FTE rl II move low 16 bits to high 16 rl II move low 16 bits to high 16 f4 f5 fill...

Page 148: ...age andh xl Fsrl r II test multiplier result precision MRP bc t Lb II skip next if double pfmul ss Mresl f2 f II insert single result pfmu13 dd Mresl f4 f II insert double result Lb andh x2 Fsrl r II...

Page 149: ...Programming Model 8...

Page 150: ......

Page 151: ...g point registers is now set at 8 Earlier software used a dividing point at 16 Table 8 1 Register Allocation Register Purpose Left Unchanged by a Subroutine rO Always zero Yes r1 Return address No r2...

Page 152: ...n integer the rest in successively higher numbered regis ters If fewer parameters are required the remaining registers can be used for temporary variables If more than 12 parameters are required the o...

Page 153: ...int value or 64 bit integer A subroutine may need to save the first parameter to make room for the return value 8 1 3 Passing Mixed Integer and Floating Point Parameters in Registers Integer and float...

Page 154: ...ENTING A STACK In general compilers and programmers have to maintain a software stack Register r2 called sp in assembly language is the suggested stack pointer Register r2 is set by the operating syst...

Page 155: ...point to a 16 byte boundary as long as the compiler keeps data correctly aligned when assigning positions relative to fp Figure 8 2 shows the stack frame format A fixed format is necessary to allow s...

Page 156: ...nter Languages such as Pascal that need to maintain activation records on the stack can put them below the frame pointer in the program specific area The frame pointer is optional All stack references...

Page 157: ...I Set return value to allocated space Example 8 4 Possible Implementation of alloca OxFFFFFFFF OPERATING SYSTEM CODE AREA EMPTY USER CODE AREA OxF0400000 FIXED SUBROUTINE ENTRIES OxFOOOOOOO OPERATING...

Page 158: ...space for shared memory areas with other tasks UNIX System V allows such shared memory areas The empty areas on the diagram if Figure 8 3 would normally be marked as not present in the page table ent...

Page 159: ...4 even in case a trap occurs on the first instruc tion of a section The memory mapped I O devices should also be placed in the upper operating system data space The paging hardware allows logical addr...

Page 160: ......

Page 161: ...Programming Examples 9...

Page 162: ......

Page 163: ...ly not loaded from memory Example 9 1 shows how II SIGN EXTEND 8 BIT INTEGER TO 32 BITS II Assume the operand is already in rlb shl 24 rlb rlb II left justify shra 24 rlb rlb II right justify all but...

Page 164: ...s algorithm is optimized for high performance and does not produce results that are rounded according to the IEEE standard Worst case error is about two least significant bits If the result is referen...

Page 165: ...rform the divide II DOUBLE PRECISION DIVIDE II The dividend X is in f2 II The divisor Y is in f4 II The result Z is left in f8 frcp dd f4 fb fmul dd f4 fb fld d flttwo f1 II The fld d is free It fsub...

Page 166: ...ocks can be overlapped with other operations II INTEGER MULTIPLY II The multiplier is in r4 II The multiplicand is in r5 II The product is left in rb II The registers f2 f4 and fb are used as temporar...

Page 167: ...ision format properly normalized by the iB60 microprocessor The value of Be BN is 252 231 Ox4330_0000_BOOO_OOOO The conversion requires 7 clocks if the result is referenced in the next instruction Thr...

Page 168: ...dd f4 fb fsub dd f10 f8 fmul dd fb f8 fmul dd f4 fb fsub dd f10 f8 fmul dd fb f8 fmul dd f4 fb fsub dd f10 f8 fmul dd fb f2 fmul dd f8 fb II Convert Quotient to fld d onepluseps fmul dd f8 f10 ixfr r...

Page 169: ...nknown II End of string indicated by NUL II r17 address of source string II r1b address of destination string copy_string ld b 0 r17 r2b II Load one character bte 0 r2b done II Test for NUL character...

Page 170: ...iscards them by specifying register fO as the destination of the first three instructions After performing the intended calculations it flushes the pipeline by executing three dummy addition instructi...

Page 171: ...s 1 0 8 0 2 0 7 0 8 0 1 0 a series of multipli cations followed by additions The dual operation instructions are designed precisely to execute this type of calculation efficiently by using the adder a...

Page 172: ...f0 II 6 3 5 4 20 18 0 14 0 8 Discard m12apm ss f10 f18 f0 II 7 2 6 3 20 20 8 18 0 14 Discard m12apm ss fll fi9 f0 II 8 1 7 2 18 20 14 20 8 18 Discard II For larger matrices include more instructions h...

Page 173: ...nes assume that the actual matrices to be multiplied have the following values A 1 0 2 0 3 0 4 0 5 0 6 0 B 6 0 5 0 4 0 3 0 2 0 1 0 Assume further that the two matrices are already loaded into register...

Page 174: ...10 0 6 0 0 Discard m12apm dd f12 f24 f0 II 5 2 4 3 12 0 10 0 6 Discard m12apm dd f14 f26 f0 II 6 1 5 2 12 6 12 0 10 Discard II For larger vectors include more instructions here II Flushing phase m12a...

Page 175: ...rocedure uses dual instruction mode to overlap loading decision making and branch ing with the basic pipelined floating point add instruction pfadd ss To make obvious the pairing of core and floating...

Page 176: ...f20 f30 f30 br S d pfadd ss f21 f31 nop d pfadd ss f22 f30 bla r21 r17 d pfadd ss f23 f31 fld d 8 r16 II If we reach this point II r17 is either 4 or 3 II Exit loop after adding f31 II f21 to the pipe...

Page 177: ...ough straight forward programming techniques Each example uses dual instruction mode to perform the loading and loop control operations in parallel with the basic floating point calcula tions The exam...

Page 178: ...8 88 f19 II matrix A row values II matrx 8 column vals II temporary results T1 f20 T2 f21 T3 f22 shl 2 adds 8 adds 8 adds 4 d fiadd dd f0 adds 1 d fnop M r0 M C f0 L SIZ DEC RC C f0 Ar bla d fnop subs...

Page 179: ...f f Tl II adds 8 M RC II Reinitialize row column counter d m12apm ss f f T2 II nop II d pfadd ss f f T3 II bla DEC RC inner_loop II Wont branch initializes LCC d pfadd ss f f Tl II fld q 16 A A5 II Lo...

Page 180: ...s from matrix B and the loop control with the eight m12apm instructions in the inner loop The strategy of Example 9 14 is suitable for larger matrices than the strategy in Example 9 13 because even in...

Page 181: ...adds 8 r0 DEC II Set decrement or for bla adds 8 M RC II Initialize rowlcolumn counter d fiadd dd f0 f0 f0 II Initiate dual instruction mode adds 4 C C II Start C index one entry low d fnop II First d...

Page 182: ...ch initializes LCC d pfadd ss f0 f0 T2 II nap II d fadd ss Tl T3 T3 II nap II d fadd ss T2 T3 T3 II adds 1 8c 8c II Decrement column counter d pfadd ss f0 f0 f0 II fst l T3 4 C II Store rowlcolumn pro...

Page 183: ...color intensities are determined by higher level graphics software The points represent the intersection of the scan line with two edges of the projected image of a polygon For a given scan line the r...

Page 184: ...iZl iZlh iZ3 iZ3h oldz newz newzh newi iR iRh aR aRh iG iGh aG aGh iB iBh aB aBh lZmask lZmaskh rZmask rZmaskh f2 II Accumulated Z values f3 II f4 II Z interpolant coefficient 1 0 f5 II f6 II Z inter...

Page 185: ...r all scan lines that intersect the polygon therefore mZ needs to be calculated only once for each polygon Example 9 21 assumes that dX and mZ have already been calculated and all that re mains is to...

Page 186: ...e way of constructing the operands before starting the distance interpolations The initial value given to fsrc1 depends on the alignment of the first pixel Table 9 1 helps to visualize the process Aft...

Page 187: ...tine the numbers shown here are the values of the coefficient N where the actual operands have the values Z1 N mZ For each execution of faddz fsrc1 is the same as fdest of the prior faddz After every...

Page 188: ...Xl N Cl mC Cl 2 mC Cl N mC C Xl dX Cl dX mC C X2 Figure 9 3 illustrates Gouraud shading of a triangle The faddp instruction performs the above calculations 64 bits at a time Because a pixel is 16 bits...

Page 189: ...IAL SRC1 SRC2 240329i The i860 microprocessor operates on 64 bit quantities that are aligned on 8 byte bound aries The code in this example takes full advantage of this design handling four 16 bit pix...

Page 190: ...shift by 16 to put the significant shl 18 mB Rc II bits into the high order half shr 16 Ra mR II Return significant 16 bits shr 16 Rb mG II to low order half Any sign bits shr 16 Rc mB II in high ord...

Page 191: ...The left and right ends of the line segment go through different logic paths so that the Z buffer masks can be applied by the form instruction All the interior points are handled by the tight inner l...

Page 192: ...Rtab II 4 5 6 7 shl 5 Lalign Lalign II Multiply by row width 1 2 3 2 3 4 3 4 5 4 5 6 adds Lalign Rtab Rtab II Index row corresponding to alignment fld d aZiCRtab aZ II Z ixfr Zl Fx II Z fld d aRiCRtab...

Page 193: ...rm f0 newi II Move 4 new pixels to 64 bit reg adds 5 dX r0 II Are there any whole sets CdX 5 Ll d fzchks oldz newz newz II Mark closer points in PM 7 4 bc short_segment II Get out now if no whole set...

Page 194: ...aB II Interpolate 4 blue intensities 8 FBP II Store pixels indicated by PM 3 iG aG II Interpolate 4 green intensities iR II aR II Interpolate II red intensities II No special boundary conditions f ne...

Page 195: ...Instruction Set Summary A...

Page 196: ......

Page 197: ...nd subu and for all addressing calculations Same as srcl except that no immediate constant or address offset value is permitted Same as srcl except that the immediate constant is a 5 bit value that is...

Page 198: ...ts s 16 bits or I 32 bits I 32 bits d 64 bits or q 128 bits I 32 bits or d 64 bits mem x address The contents of the memory location indicated by address with a size of x PM The pixel mask which is co...

Page 199: ...CC IF CC 1 THEN continue execution at brx lbroff FI bc t lbroff Branch on CC Taken IF CC 1 THEN execute one more sequential instruction continue execution at brx lbroff ELSE skip next sequential instr...

Page 200: ...e for next instructions pair Continue execution at address in isrclni The original contents of isrclni is used even if the next instruction modifies isrclni Does not trap if isrclni is misaligned bte...

Page 201: ...Subtract frdest fsrel fsre2 fix p fsrel fdest Floating Point to Integer Conversion fdest 64 bit value with low order 32 bits equal to integer part of fsrel rounded Floating Point Load fld y isrel isr...

Page 202: ...Operation Assembler pseudo operation fnop shrd rO rO rO form fsrcl fdes OR with MERGE Register fdes fsrcl OR MERGE MERGE 0 frcp p fsrc2 fdes Floating Point Reciprocal fdes 1 fsrc2 with maximum mantis...

Page 203: ...3 where zero denotes the least significant field PM PM shifted right by 4 bits FOR i 0 to 3 DO PM i 4 fsrc2 i 5 fsrcl i unsigned fdest i smaller of fsrc2 i and fsrcl i aD MERGE O intoYr Software Trap...

Page 204: ...fsrcl fsrc2 Shift MERGE right 16 and load fields 31 16 and 63 48 fromfsrcl fsrc2 pfam p fsrcl fsrc2 fdest Pipelined Floating Point Add and Multiply fdest last stage adder result Advance A and M pipel...

Page 205: ...Identical to pfgt p except that assembler sets R bit of instruction fdes last stage adder result Co clear if fsrcl 5 fsrc2 else set Advance A pipeline one stage A pipeline first stage is undefined bu...

Page 206: ...ipeline A pipeline first stage A op1 A op2 M pipeline first stage M op1 x M op2 pfsub p fsrcl fsrc2 fdest Pipelined Floating Point Subtract fdest last stage adder result Advance A pipeline one stage A...

Page 207: ...nst fdest Shift PM right by 8 pixel size in bytes bits IF autoincrement THEN isrc2 const isrc2 FI shl isrcl isrc2 idest Shift Left ides isrc2 shifted left by isrcl bits shr isrcl isrc2 idest Shift Rig...

Page 208: ...st Software Trap Generate trap with IT set in psr unlock End Interlocked Sequence Clear BL in dirbase The next load or store unlocks the bus Interrupts are enabled xor isrcl isrc2 idest Logical Exclus...

Page 209: ...Instruction Format and B Encoding...

Page 210: ......

Page 211: ...own in Table B 1 are used Among the core instructions there are two general formats REG format and CTRL format Within the REG format are several variations Table 8 1 Register Encoding Register Encodin...

Page 212: ...pst ixfr For instructions where srcl is optionally an immediate constant or address offset bit 26 of the opcode I bit indicates whether srcl is immediate If bit 26 is clear an integer register is use...

Page 213: ...it 0 selects autoincrement addressing if set Bits one and two select the operand size as follows Bit 1 Bit 2 Operand Size 0 0 64 bits 0 1 128 bits 1 0 32 bits 1 1 32 bits When srcl is immediate bits z...

Page 214: ...nch LCC Set and Add Arithmetic Shift AND ANDNOT OR XOR reserved 1 16 or 32 bits selected by bit 0 LS Load Store o Load 1 Store SO Signed Ordinal o Ordinal 1 Signed H High o and or andnot xor 1 andh or...

Page 215: ...cape Opcodes 4 3 2 o reserved 0 0 0 0 0 lock Begin Interlocked Sequence 0 0 0 0 1 calli Indirect Subroutine Call 0 0 0 1 0 reserved 0 0 0 1 1 intovr Trap on Integer Overflow 0 0 1 0 0 reserved 0 0 1 0...

Page 216: ...ODING CTRL Format Instructions 31 28 25 o BROFFSET 240329i CTRL Format Opcodes 28 27 26 br Branch Direct 0 1 0 call Call 0 1 1 bc t Branch on CC Set 1 0 T bnc t Branch on CC Clear 1 1 T T Taken o bc o...

Page 217: ...nstructions other than fxfr one of 32 floating point registers fxfr one of 32 integer registers Pipelining 1 Pipelined instruction mode o Scalar instruction mode Dual Instruction Mode 1 Dual instructi...

Page 218: ...Equal 0 1 p ftrunc Truncate 0 1 fxfr Transfer to Integer Register 1 0 p fiadd Long Integer Add 1 0 p fisub Long Integer Subtract 1 0 p fzchkl Z Check Long 1 0 p fzchks Z Check Short 1 0 p faddp Add wi...

Page 219: ...Instruction Timings c...

Page 220: ......

Page 221: ...eturned Id st pfld fld fst or ixfr and data cache load One plus number of clocks until last READY miss processing in progress returned Reference to dest of Id call calli fxfr or Id c in One clock the...

Page 222: ...e full and Id fld dress can be issued Le an address which is pfld st fst not the 2nd 4th cycle of a cache fill or the 2nd 8th cycle of a CS8 mode instruction fetch or the 2nd cycle of an 128 bit write...

Page 223: ...Instruction Characteristics 0...

Page 224: ......

Page 225: ...fault is reported on the subsequent floating point instruction plus pst fst and some times fld pfld and ixfr See Section 7 4 2 for more information on result exception reporting The instruction access...

Page 226: ...trol transfer instruction nor a trap instruction nor the target of a control transfer instruction b When using a bri to return from a trap handler programmers should take care to prevent traps from oc...

Page 227: ...OAT 5 f fsub p A SE RE ftrunc p A SE RE fxfr G 6 8 fzchkl G 8 fzchks G 8 intovr E IT ixfr E 2 Id c E Id x E OAT 6 lock E or E CC orh E CC pfadd p A P SE RE pfaddp G P 8 e pfaddz G P 8 e pfamov r A P S...

Page 228: ...lpellned Sets Faults Performance Programming Unit Delayed CC Notes Restrictions pftrunc p A P SE RE pfzchkl G P 8 pfzchks G P 8 pst d E OAT f shl E shr E shra E shrd E st c E st x E OAT subs E CC 1 su...

Page 229: ...OWA Tel 716 425 2750 TWX 510 253 7391 Intel Corp FAX 716 223 2561 1930 SI Andrews Drive N E tlntel Corp 2nd Floor Cedar Rapids 52402 2950 Expressway Dr South Tel 319 393 1294 Suite 130 Islandia 11722...

Page 230: ...onics Tel 313 522 4700 10824 Hope Street Rancho Cordova 95670 TWX 810 863 0374 8208 Melrose Dr Suite 210 TWX 810 282 8775 Cypress 90630 Tel 916 638 5282 tHamiiton Avnet Electronics Lenexa 66214 tPione...

Page 231: ...berty Ave Pittsburgh 15238 Tel 412 281 4150 Pioneer Electronics 259 Kappa Drive Pittsburgh 15238 Tel 412 782 2300 TWX 710 795 3122 tPioneer Technologies Group Inc Delaware Valley 261 Gibralter Road Ho...

Page 232: ...SA TLX 95142 Tel 32 02 216 01 60 In Multikomponent GmbH Telcom S r 1 Calle Miguel Angel 21 3 MMD TLX 64475 or 22090 Poslfach 1265 Via M Civitali 75 28010 Madrid Unit 8 Southview Park Bahnhofstrasse 44...

Page 233: ...ago el 56 2 225 8139 LX 240 846 RUD HINA HONG KONG 11 P c 7 f I Ltd hase 26 Kwai Hei Street I T Kowloon long Kong el 852 0 4223222 WX 39114 JINMI HX AX 852 0 4261602 ield Application Location INDIA Mi...

Page 234: ...464 2736 3280 Pointe Pkwy Ste 200 Norcross 30092 MISSOURI Tel 404 449 0541 OREGON Intel Corp HAWAII 4203 Earth City Exp Ste 131 Intel Corp Earth City 63045 15254 NW Greenbrier Parkway Intel Corp Tel 3...

Page 235: ......

Page 236: ......

Page 237: ......

Page 238: ......

Page 239: ......

Page 240: ......

Page 241: ......

Reviews: