background image

Revision 1.0

DMA

99

Figure 4-3

DMA Wait Example

 ############################################

 # Procedure to do DMA waits.

 #

 # Registers:

 #

 #       $11     used as tmp

 #

.name   tmp,    $11

        

DMAwait:                

        # request DMA access: (get semaphore)

                mfc0    tmp, SP_RESERVED

                bne     tmp, zero, DMAwait

                # note delay slot

        WaitSpin:

                mfc0    tmp, DMA_BUSY

                bne     tmp, zero, WaitSpin

                nop

                jr      return

# clear semaphore, delay slot

                mtc0    zero, SP_RESERVED

.unname tmp

 #

 #

 #############################################

Summary of Contents for Ultra64

Page 1: ... Version 1 1 Nintendo Ultra64 RSP Programmer s Guide Silicon Graphics Computer Systems Inc 2011 N Shoreline Blvd Mountain View CA 94043 1389 1996 Silicon Graphics Computer Systems Inc All Rights Reserved ...

Page 2: ...2 ...

Page 3: ...pment Tools 19 rspasm 19 cpp 20 m4 21 buildtask 21 rsp2elf 21 rsp rspg 21 Gameshop Debugger gvd 22 2 RSP Architecture 23 Overview 24 Slave to the CPU 24 Part of the RCP 24 R4000 Core 25 Clock Speed 26 Vector Processor 26 Major R4000 Differences 27 Pipeline Depth 27 No Interrupts Exceptions or Traps 27 Coprocessors 27 Missing Instructions 27 ...

Page 4: ...U Control Registers 33 Vector Unit Registers 34 VU Register Format 34 VU Register Addressing 34 Computational Instructions 34 Loads Stores and Moves 35 Accumulator 36 VU Control Registers 36 Vector Compare Code Register VCC 36 Vector Carry Out Register VCO 37 Vector Compare Extension Register VCE 38 SU and VU Interaction 39 Dual Issue of Instructions 39 RSP Instruction Set 40 Instruction Formats 4...

Page 5: ... 44 Coprocessor 0 45 Interrupts Exceptions and Processor Status 46 Interrupts 46 Exceptions 46 Processor Status 46 3 Vector Unit Instructions 47 VU Loads and Stores 48 Normal 50 Packed 52 Transpose 54 VU Register Moves 56 VU Computational Instructions 57 Using Scalar Elements of a Vector Register 58 VU Multiply Instructions 61 Vector Multiply Examples 64 VU Add Instructions 67 Vector Add Examples ...

Page 6: ...ble Lookup 77 Higher Precision Results 78 Vector Divide Examples 78 4 RSP Coprocessor 0 81 Register Descriptions 82 RSP Point of View 82 c0 83 c1 83 c2 c3 83 c4 85 c5 88 c6 88 c7 88 c8 88 c9 89 c10 89 c11 90 c12 92 c13 92 c14 93 c15 93 CPU Point of View 93 Other RSP Addresses 95 DMA 96 Alignment Restrictions 96 Timing 96 ...

Page 7: ...rom Other MIPS Assembly Languages 106 Why 106 Major Differences from the R4000 Instruction Set 106 Syntax 107 Tokens 107 Identifiers 107 Constants 107 Operators 108 Comments 108 Program Sections 109 Labels 109 Keywords 109 Expressions 110 Expression Operators 110 Precedence 111 Expression Restrictions 111 Registers 112 Vector Register Element Syntax 112 Program Statements 113 Assembly Directives 1...

Page 8: ...f the RSP Assembly Language 119 6 Advanced Information 125 DMEM Organization and Usage 126 Jump Tables 126 Constants 126 Labels in DMEM 127 Dynamic Data 127 Diagnostic Information 127 Performance Tips 128 Dual Execution 128 Vectorization 128 Software Pipelining 130 Loop Inversion 131 Loop Unrolling 132 Program Flow of Control 132 Profiling RSP Code 133 ...

Page 9: ...ode 141 Controlling the RSP from the CPU 142 Starting RSP Tasks 142 RSP Boot Microcode 142 Hidden OS Functions 143 __osSpDeviceBusy 143 __osSpRawStartDma 143 __osSpRawReadIo 143 __osSpRawWriteIo 144 __osSpGetStatus 144 __osSpSetStatus 144 __osSpSetPc 144 Microcode Debugging Tips 145 RSP Yielding 147 Requesting a Yield 148 Checking for Yield 148 Yielding 148 Saving a Yielded Process 149 Restarting ...

Page 10: ...10 ...

Page 11: ...Loads and Stores 55 Figure 3 6 VU Coprocessor Moves 56 Figure 3 7 VU Computational Instruction Format 57 Figure 3 8 Scalar Half and Scalar Quarter Vector Register Elements 59 Figure 3 9 VU Multiply Opcode Encoding 61 Figure 3 10 Double precision VU Multiply 64 Figure 3 11 VU Add Opcode Encoding 67 Figure 3 12 VU Select Opcode Encoding 70 Figure 3 13 VU Logical Opcode Encoding 74 Figure 3 14 VU Div...

Page 12: ...12 Figure 6 2 buildtask Operation 137 ...

Page 13: ...Logical Type Encoding 74 Table 3 8 VU Divide Type Encoding 75 Table 3 9 VU Divide Instruction Summary 76 Table 4 1 RSP Coprocessor 0 Registers 82 Table 4 2 RSP Status Register 85 Table 4 3 RSP Status Write Bits 86 Table 4 4 RDP Status Register 90 Table 4 5 RSP Status Write Bits CPU VIEW 91 Table 4 6 RSP Coprocessor 0 Registers CPU VIEW 94 Table 4 7 Other RSP Addresses CPU VIEW 95 Table 5 1 Express...

Page 14: ...14 ...

Page 15: ...ning on the RSP microcode implements the graphics geometry pipeline transformations clipping lighting etc and audio processing wavetable synthesis sampled sound etc The RSP acts as a slave processor to the host CPU and as such programming the RSP requires a conspiracy of RSP microcode R4300 interfaces and mastery of the features of the RCP This document addresses the first two of these necessary s...

Page 16: ... to speak we have adopted several specific non goals of this document Basic assembly language programming concepts are not discussed The reader is assumed to have a thorough technical background Basic concepts of vector processing architectures are not discussed however some specific issues relating to the RSP are discussed briefly A good reference for computer architecture which discusses RISC pr...

Page 17: ...other documents a thorough background knowledge of the Ultra64 is assumed in this document Information Presentation Mastery of the information presented in this document will occur slowly as the information is both voluminous and of tremendous breadth Some concepts such as the hardware architecture of the RSP and the microcode assembly language are of course thoroughly intertwined discussion of on...

Page 18: ...ivity RDP synchronization and host CPU interaction Chapter 5 RSP Assembly Language details the assembly language of the RSP including assembler directives and some programming conventions Chapter 6 Advanced Information builds on information in the previous chapters in order to address sophisticated issues including RSP performance microcode overlays host CPU interactions and additional programming...

Page 19: ... unique to the RSP The language explained in more detail in Chapter 5 RSP Assembly Language has the following major features Mnemonic opcode syntax for all SU and VU instructions Support for labels in the text section for branching and the data section for referencing DMEM Simple expression parsing The language also includes a rich set of assembler directives used to instruct the assembler during ...

Page 20: ...l file used by the rsp2elf utility in order to build an ELF object that can be used with makerom and the gvd debugger The RSP assembler has no provisions for linking separately compiled objects Since IMEM only holds 1024 instructions and assembling is so fast the lack of a sophisticated linker is not a problem Source code can be broken up into separate files and include d to enforce modularity Fac...

Page 21: ...er provided on the command line and updates a table in DMEM with offsets and code sizes This allows the microcode to find a piece of code and overlay it into IMEM during execution Additional details and examples of code overlays are described in Chapter 6 Advanced Information rsp2elf Since ELF files are required by makerom and gvd this tool is necessary to construct final microcode objects out of ...

Page 22: ...cy window interface rspg The window interface supports source level debugging which is extremely useful Gameshop Debugger gvd The Gameshop debugger gvd can be used to debug RSP microcode running on the real hardware Detailed instructions are beyond the scope of this document but if you open the Coprocessor View on gvd and set the program counter appropriately you will be looking at IMEM From here ...

Page 23: ...ond 1 As part of the RCP the RSP is an integral part of the graphics audio video processing pipelines Recommended background for this chapter includes a solid foundation in computer architecture including RISC processors and SIMD Single Instruction Multiple Data machines 1 This is not a misprint At 62 5Mhz with an 8 element vector pipeline the RSP could perform 500 000 000 multiply accumulate oper...

Page 24: ...l booting IMEM DMEM etc Part of the RCP Figure 2 1 reproduced from the Nintendo 64 Programming Manual illustrates the major functional blocks of the RCP The RSP along with the RDP and the IO subsystem comprise the RCP chip The RSP and RDP operate independently and are connected with the XBUS The IO block of the RCP also includes memory interfaces and separate DMA engines for the RSP and RDP ...

Page 25: ... an R4000 core instruction set with additional extensions The core instruction unit without the extensions is referred to as the Scalar Unit SU RSP SU VU IMEM DMEM IO RDP CPU VI AI PI SI R4300 Audio Game Contollers Video Cartridge RDRAM Rambus Memory RCP S T A T E RS TX CC BL MEM TMEM TF CP0 ...

Page 26: ...tor registers which can also be accessed as 8 vector slices a vector accumulator which also has 8 vector slices and several special purpose vector control registers The VU instruction set includes all useful computational instructions add multiply logical reciprocal etc plus additional multimedia instructions which are well suited for graphics and audio processing These instructions are thoroughly...

Page 27: ...sors The RSP implements the following MIPS Coprocessors Coprocessor 0 system control The RSP coprocessor 0 is not compatible with the R4000 coprocessor 0 The RSP coprocessor 0 is explained in Chapter 4 RSP Coprocessor 0 Coprocessor 2 VU implements the vector unit Other MIPS coprocessors including coprocessor 1 floating point processor are not implemented Missing Instructions The following R4000 in...

Page 28: ...iprocessor systems BCzF BCzT all branch on coprocessor instructions TGE TGEU TLT TLTU TEQ TNE TGEI TGEIU TLTI TLTIU TEQI TNEI all TRAP instructions Modified Instructions Some RSP instructions do not behave precisely like their R4000 counterparts Some major differences ADD ADDU ADDI ADDIU SLTI SLTIU SUB SUBU Each pair of these is synonymous with each other since the RSP does not signal overflow exc...

Page 29: ...ddressing The RSP PC is only 12 bits only the lowest 12 bits of any address or branch target are used Other address bits are ignored Explicitly Managed IMEM must be explicitly managed by the RSP program IMEM contents can only be loaded with a DMA operation or programmed IO write from the CPU ...

Page 30: ...owest 12 bits of addresses are used to address DMEM Other address bits are ignored Explicitly Managed Resource DMEM must be managed by the RSP program All RSP loads stores can only access DMEM data must first be transferred between DMEM and external DRAM using a DMA operation or programmed IO write from the CPU ...

Page 31: ...venient to use this address map with the RSP assembler rspasm and RSP simulator rsp Since only the lower 12 bits of addresses and branch targets are used the upper bits are safely ignored Chapter 4 RSP Coprocessor 0 details this address space in particular Table 4 6 RSP Coprocessor 0 Registers CPU VIEW on page 94 and Table 4 7 Other RSP Addresses CPU VIEW on page 95 General purpose SU and VU regis...

Page 32: ...nd cannot be modified Attempting to modify 0 is a null operation Since DMEM addresses are only 12 bits it can be convenient to use 0 as the base register for loads stores the entire DMEM address will fit in the 16 bit offset field Register 31 Register 31 31 is a special register The jal and jalr instructions store their return address in this register If these instructions are avoided this registe...

Page 33: ...sion 1 0 Scalar Unit Registers 33 SU Control Registers RSP control registers are part of Coprocessor 0 and are explained in Chapter 4 RSP Coprocessor 0 particularly Table 4 2 RSP Status Register on page 85 ...

Page 34: ...umbered similarly little endian VU Register Addressing VU registers can be accessed in a variety of formats depending on the instruction being executed Computational Instructions Most computational instructions operate on VU registers as vectors performing the same operation on 8 16 bit vector elements on an element by element basis with the 8 elements corresponding to the vector slices 127 0 byte...

Page 35: ... best understood with an illustrated example see Figure 3 8 Scalar Half and Scalar Quarter Vector Register Elements on page 59 RSP assembly language syntax for vector registers is explained in the section Vector Register Element Syntax in Chapter 5 Loads Stores and Moves VU loads stores and moves always reference data within VU registers by their bytes So if you want to load a short 2 bytes into e...

Page 36: ...e multiply accumulate instructions For these instructions 16 bits of the accumulator is written out after accumulation Which 16 bits to be written is usually an accumulator element Consult VU Multiply Instructions in Chapter 3 for more information One VU instruction vsar can directly reference the accumulator directly VU Control Registers Vector Compare Code Register VCC This 16 bit register conta...

Page 37: ...ase of vsubc The upper 8 bits are NOT EQUAL set by vaddc or vsubc if the operands are not equal vadd vsub and select compare instructions vlt veq vne vge use VCO as inputs and clear VCO Select compare instructions use VCO which was previously set by a vsubc instruction Figure 2 6 VCO Register Format 0 elem 7 elem 6 elem 5 elem 4 elem 3 elem 2 elem 1 elem 0 elem 7 elem 6 elem 5 elem 4 elem 3 elem 2...

Page 38: ...wise Expressed in a high level language if vs elem 0 vt elem 0 vs elem 0 vt elem 0 if vs elem vt elem 1 VCE elem 1 else VCE elem 0 else VCE elem 0 This is used for double precision clip compares by vcl in addition to VCC and VCO vcl clears VCE Figure 2 7 VCE Register Format 0 elem 7 elem 6 elem 5 elem 4 elem 3 elem 2 elem 1 elem 0 1 2 3 4 5 6 7 compare is 1 ...

Page 39: ...Instructions The instruction fetch cycle can fetch at most two instructions one SU and one VU If there are no register conflicts both instructions can be issued in parallel Instructions are paired in order they are not re ordered to facilitate dual issue They do not need to be aligned as one SU and one VU in a 64 bit word If the pipeline stalls due to register conflicts see Register Hazards on pag...

Page 40: ...struction Format VU instructions are implemented as coprocessor instructions as defined by the MIPS ISA Detailed discussion of VU instructions can be found in Chapter 3 Distinguishing SU and VU Instructions If the opcode mnemonic starts with a v it is a vector unit instruction It is important to re iterate that VU loads stores and moves are SU instructions they are executed in the scalar unit poss...

Page 41: ...sult is calculated for loads stores branches the address is calculated DF Data Fetch For loads the data is fetched store data is stored WB Write Back Results are written back to registers The vector unit also has a five stage pipeline IF Instruction Fetch Nothing happens in the VU during this stage RD Register Access and Instruction Decode Muxing for scalar mode MUL Multiply During this stage comp...

Page 42: ...16 bit Slice replicated 8 times DMem 4KByte 128bx256 Shifter load_data IMem 4KByte 64bx512 Addr PC inst store_data 8 pc next_pc reset_pc branch_pc 47 0 Various Rounding VCO 16 16 Complement Shift Pri Enc ROM Complement 16 Shift Right Reciprocal clamp values 0 s 0 s 1 s DataIn DataOut values NOP Various shifts Logical Ops Store Load Muxes Muxes EX DF RD WB IF IF RD ACC WB MUL Various bit ranges ...

Page 43: ...d on page 44 3 Any load followed by any store 2 cycles later causes a one cycle bubble Coprocessor moves mtc0 mfc0 mtc2 mfc2 ctc2 cfc2 count as both loads and stores 4 A branch target not 64 bit aligned always single issues 5 Branches a Can dual issue with preceding instruction b No branch instruction permitted in a delay slot c Delay slot always single issues d Taken branch causes a 1 cycle bubbl...

Page 44: ...itten to its destination register a subsequent instruction can use the correct value which is residing in a temporary register in the arithmetic and logical unit Figure 2 9 Pipeline Bypassing For software this means that results from SU instructions are available in the next clock cycle removing the concern of preventing pipeline stalls 1 1 An obvious question is why isn t the VU bypassed As illus...

Page 45: ...00 architecture is designated as the system control coprocessor Since the RSP is a slave processor the system control functions are greatly reduced and therefore the usage of coprocessor 0 does not conform to the MIPS R4000 architecture specification The RSP does use coprocessor 0 for system control functions these functions and their registers are explained in Chapter 4 ...

Page 46: ...e a single interrupt MI_INTR_SP triggered by the break instruction Exceptions No RSP instruction can cause an exception and there are no exception handling facilities in the RSP Processor Status The RSP has a processor status register in coprocessor 0 this register can be used to communicate with the CPU See page 85 for more information ...

Page 47: ...e are actually scalar unit instructions executed in the SU possibly in parallel with VU computational instructions which load store modify vector unit general purpose or control registers Vector Computational Instructions These instructions are executed in the vector unit in parallel with any scalar instructions All of these instructions are implemented with the MIPS coprocessor extensions to the ...

Page 48: ...containing a DMEM memory address Only the lower 12 bits of this register are used other bits are ignored VT is the VU register to or from which memory data is written The opcode is the memory item type and operation being performed Element is the byte element of the VU register being accessed Offset is a 7 bit constant shifted by the memory item size and added to the memory address in base This me...

Page 49: ...memory and VU registers with memory byte alignment and VU element alignment to the size of the item The packed operations support access to memory byte data and two and four byte per pixel image data such as YUV or RGBA Transpose accesses are discussed in a subsequent section and include a transposed or wrapped store and a transposed and wrapped load Table 3 1 VU Load Store Instruction Summary Opc...

Page 50: ...ary that is address to address 15 15 to from VU register element 0 to address 15 Rest is used to move a byte aligned quad word up to the byte address that is address 15 to address 1 to from VU register element 16 address 15 to 15 A rest with a byte address of zero writes no bytes The quad and rest pair can then move a byte aligned quad word to from an entire vector register in two instructions Thi...

Page 51: ...s VU register Memory word Element Byte Address 128b alignment Item size VU register Memory word Element Byte Address 128b alignment Item size Long item Quad item crossing memory word VU register Memory word Element Byte Address 128b alignment Item size Rest item crossing memory word ...

Page 52: ...ecutive bytes to or from a memory luv suv unsigned pack is similar to lpv spv except the memory byte MSB is aligned to bit 14 of the VU short for unsigned data lhv shv half moves every other memory byte and the selection of odd or even bytes is controlled by the memory byte address lfv sfv fourth moves every fourth memory byte and the selection of which bytes is controlled by the memory byte addre...

Page 53: ... 53 Figure 3 3 Packed Loads and Stores VU register Memory word Element Byte Address 128b alignment VU register Memory word Byte Address 128b alignment VU register Memory word Byte Address 128b alignment Pack Unsigned Pack Fourth Half ...

Page 54: ...nd ltv Transpose loads and stores move a 128 bit VU register to and from an aligned 128 bit memory word as 8 16 bit values one from each VU slice The VU register number of each slice is computed as VT 0x18 Slice Element 1 0x7 which is to say vt specifies a base register of an 8 register group Within that group the register address is a function of the slice number and the element number treated as...

Page 55: ... to memory transpose the instructions used are ltv and swv and for a register to register transpose stv and ltv Interlock is performed by enabling the source and destination register comparisons on only the upper two register number bits that is making any interlock comparison to the 8 registers within a transpose block true Figure 3 5 Transpose Loads and Stores 3 12 21 30 39 40 49 58 3 12 21 30 3...

Page 56: ...U register is sign extended when moved from the VU register For general VU register moves element is a byte element which must be one of 0 2 4 6 8 10 12 14 For control register moves the vs field specifies the VCO VCC or VCE control registers and element is ignored See VU Control Registers on page 36 for explanation of each control register Moves to VU registers have the same load delay characteri...

Page 57: ... maximum values of the element 32768 and 32767 for 16 bit signed elements before being written A vector accumulator register see Accumulator on page 36 is available to accumulate results over several instructions The accumulator is modified by all multiply and some add instructions but its contents are unchanged after other VU instructions The major types of VU computational instructions are multi...

Page 58: ...2 xa xb 2 ya yb 2 za zb 2 Assumes single precision all in range etc vsub v3 v1 v2 calc xa xb ya yb za zb vmudh v3 v3 v3 square the differences vadd v3 v3 v3 1q collect the terms vadd v3 v3 v3 2h In this example scalar half and scalar quarter element references are used in the vadd instructions to collect the intermediate terms We can also compute the distance between two groups of point pairs at o...

Page 59: ...3 8 Figure 3 8 Scalar Half and Scalar Quarter Vector Register Elements xa xb ya yb za zb 0 xa xb ya yb za zb 0 v3 x y z y y z z z z x y z y y z z z z v3 xa xb ya yb za zb 0 xa xb ya yb za zb 0 v3 x y z 0 x y z 0 v3 x y y y z 0 0 0 x y y y z 0 0 0 v3 vmudh v3 v3 v3 vadd v3 v3 v3 2h vadd v3 v3 v3 1q xa ya za 0 xa ya za 0 v1 xb yb zb 0 xb yb zb 0 v2 vsub v3 v1 v2 ...

Page 60: ...nt usage of the vector registers could have been used to direct the final result to be in a different element Replacing vadd v3 v3 v3 1q with vadd v3 v3 v3 0q would leave the final result in element 1h instead of 0h This might be important in order to align the results for the next computation ...

Page 61: ...6 bits of the accumulator written to vd Double precision 32 bit operands are supported by multiplying and accumulating the low 16 bits from one vector operand and the upper 16 bits from another vector operand in several multiply instructions Formats for various product and result options are shown in Table 3 4 Table 3 4 VU Multiply Instruction Summary Fmt S T signed Prod Shift Round Value Result C...

Page 62: ...g 32 16 if the accumulator is negative and bit 21 is zero adding 32 16 if positive and bit 21 is zero or adding zero if the accumulator bits 47 21 is zero or bit 21 is one The clamp and shift is the same as vmulq vrnd is intended to specifically support MPEG DCT rounding1 The vt operand is conditionally added to the accumulator For vrndn vt is added if the accumulator is negative otherwise zero is...

Page 63: ...ions vmulu supports signed fractions with clamping to an unsigned result such as for pixel color values For double precision vmudl performs the low partial product vmudm and vmudn the middle partial products and vmudh the high partial product Ignoring clamping the multiply instructions are equivalent to for i 0 i 8 i VD i ACC i VS i VT i 1 Round 16 and the multiply accumulate instructions are equi...

Page 64: ...precision that is a 16x32 multiply can be performed with different combinations of multiply instructions In some instances it is necessary to use an additional multiply instruction to extract the rest of the answer from the accumulator This is necessary because one of the partial product multiplies may change the sign of the result requiring you to retrieve a portion of the result from the accumul...

Page 65: ...s contained in a second register _int is a named vector register holding a signed 16 bit number _frac is a named vector register holding an unsigned 16 bit fraction dev_null is a named vector register containing all zeros IFxI mixed precision multiply IF I IF vmudn res_frac s_frac t_int vmadh res_int s_int t_int vmadn res_frac dev_null dev_null 0 IxIF mixed precision multiply I IF IF vmudm res_fra...

Page 66: ...adn res_frac dev_null dev_null 0 IxI single precision integer multiply I I I vmudh res_int s_int t_int IxF single precision multiply I F IF vmudm res_int s_int t_frac vmadn res_frac dev_null dev_null 0 Other combinations are left as an exercise to the reader ...

Page 67: ...or scalar element of vt except vsar where it selects the accumulator portion Type One of the following types of add instructions Table 3 5 VU Add Type Encoding Type Instruction 0 0 0 0 vadd 0 0 0 1 vsub 0 0 1 0 reserved 0 0 1 1 vabs 0 1 0 0 vaddc 0 1 0 1 vsubc 0 1 1 0 reserved 0 1 1 1 reserved 1 0 0 0 reserved 1 0 0 1 reserved 1 0 1 0 reserved 1 0 1 1 reserved 1 1 0 0 reserved 0 3 5 0 1 type 4 ...

Page 68: ...VT set VCO with carry out and not equal vsubc VD VS VT set VCO with borrow out and not equal vsar read the accumulator and write to vd and write the accumulator with the contents of vs vt is ignored The high middle or low 16 bits of the accumulator are selected by the element corresponding to element values of 0 1 and 2 respectively No clamping is performed vsar is useful for diagnostics and exten...

Page 69: ... register holding an unsigned 16 bit fraction dev_null is a named vector register containing all zeros This code demonstrates a double precision add vaddc res_frac s_frac t_frac vadd res_int s_int t_int This code demonstrates a double precision subtract vsubc res_frac s_frac t_frac vsub res_int s_int t_int This code demonstrates reading the accumulator using vsar following a multiply vmadh res_int...

Page 70: ...ector or scalar element of vt Type One of the following operations Table 3 6 VU Select Type Encoding Select compares perform an element by element comparison of vs and vt using VCO as input clearing VCO setting VCC with the result of comparison and storing the element for which the comparison is true to vd vlt VS VT veq VS VT Type Instruction 0 0 0 vlt 0 0 1 veq 0 1 0 vne 0 1 1 vge 1 0 0 vcl 1 0 1...

Page 71: ... VCC and write the element to vd Merge is useful for selecting several different operands from one comparison or after loading VCC with a bit field Double precision comparisons are supported in combination with the VCO register set by vsubc The compare operations use the contents of VCO as input and clear VCO Usually VCO was previously set by a vsubc instruction with a negative carry or not equal ...

Page 72: ...The vch is used for singled precision 16 bit operands For double precision vch is performed first on the upper 16 bits followed by a vcl instruction on the lower 16 bits vcl reads and writes VCO VCC and VCE Because only one of the two comparisons per element can be true vch vcl can be executed in one comparison per vector element The XOR of the sign of vs and vt is used to select the arithmetic op...

Page 73: ...rallel elements within three vectors finding the min mid and max of 8 triples After executing this code min will contain the smallest elements max will contain the largest and mid will contain the intermediate elements vge tmp1 min mid vlt min min mid vge tmp2 min max vlt min min max vge max tmp1 tmp2 vlt mid tmp1 tmp2 This code demonstrates the generation of 3D clip codes for trivial rejection te...

Page 74: ...ations on vs and vt writing the result to vd Figure 3 13 VU Logical Opcode Encoding Instruction fields are Element Vector or scalar element of vt Type One of the following operations Table 3 7 VU Logical Type Encoding Type Instruction 0 0 0 vand 0 0 1 vnand 0 1 0 vor 0 1 1 vnor 1 0 0 vxor 1 0 1 vnxor 0 2 3 5 1 0 1 type ...

Page 75: ...t specification must be provided for each operand selecting the source and destination elements for example vmov v1 5 v2 0 Instruction fields are Element Must be a single scalar element of the whole vector vt vs The scalar element of vd is encoded as vs Type One of the following operations Table 3 8 VU Divide Type Encoding Type Instruction 0 0 0 vrcp 0 0 1 vrcpl 0 1 0 vrcph 0 1 1 vmov 1 0 0 vrsq 1...

Page 76: ... 16 bits of the result is written to vd vs The upper 16 bits of the result is written by a subsequent vrcph vrsqh For double precision sources vrcph vrsqh supplies the upper 16 bits of the source and writes the upper 16 bits of a previous vrcp vrsq or vrcpl vrsql A subsequent vrcpl vrsql supplies the low 16 bits of the source and writes the low 16 bits of the result The vmov type simply copies vt ...

Page 77: ...t the result which is obtained by shifting down an appropriate number of bits and possibly complementing for negative input For rcp the radix point of the output is shifted right compared to the input For example for double precision rcp with input format S15 16 the output result will be S16 15 requiring the result to be multiplied by 2 in order to maintain the same format For rsq the radix point ...

Page 78: ...nd low double precision reciprocal for parallel Newton s iteration Square root can be performed by multiplying the result of vrsq by the source operand sqrt X X 1 sqrt X Vector Divide Examples The following code illustrates several vector divide operations In this section the following notation is used I is a signed 16 bit integer F is an unsigned 16 bit fraction IF is a 32 bit number with the sig...

Page 79: ... 0 vrcpl tres_frac 0 t_frac 0 vrcph tres_int 0 dev_null 0 In the above cases the input format was S15 16 so after the reciprocal the radix point moves to the right so we must shift by 1 multiply by 2 0 in order to correct the result vmudn sres_frac sres_frac vconst 2 constant of 2 vmadm sres_int sres_int vconst 2 vmadn sres_frac dev_null dev_null 0 Square root reciprocals are similar Note the adju...

Page 80: ...80 Vector Unit Instructions vmadm dres_int dres_int vconst 3 vmadn dres_frac vconst vconst 0 ...

Page 81: ... Coprocessor 0 or system control coprocessor The RSP Coprocessor 0 does not perform the same functions or have the same registers as the R4000 series Coprocessor 0 In the RSP Coprocessor 0 is used to control the DMA Direct Memory Access engine RSP status RDP status and RDP I O ...

Page 82: ...ddress for DMA c1 DMA_DRAM RW DRAM address for DMA c2 DMA_READ_LENGTH RW DMA READ length DRAM I DMEM c3 DMA_WRITE_LENGTH RW DMA WRITE length DRAM I DMEM c4 SP_STATUS RW RSP Status c5 DMA_FULL R DMA full c6 DMA_BUSY R DMA busy c7 SP_RESERVED RW CPU RSP Semaphore c8 CMD_START RW RDP command buffer START c9 CMD_END RW RDP command buffer END c10 CMD_CURRENT R RDP command buffer CURRENT c11 CMD_STATUS ...

Page 83: ...0 c1 This register holds the DRAM address for a DMA transfer This is a physical memory address On power up this register is 0x0 c2 c3 These registers hold the DMA transfer length c2 is used for a READ c3 is used for a WRITE 11 12 0 1 12 IMEM or DMEM address a a 0 DMEM a 1 IMEM 23 0 24 DRAM address 11 0 12 length 19 12 8 count 31 20 12 skip ...

Page 84: ...gth and line count are encoded as value 1 that is a line count of 0 means 1 line a byte length of 7 means 8 bytes etc A straightforward linear transfer will have a count of 0 and skip of 0 transferring length 1 bytes The amount of data transferred must be a multiple of 8 bytes 64 bits hence the lower three bits of length are ignored and assumed to be all 1 s DMA transfer begins when the length reg...

Page 85: ...DMA is busy 3 df R DMA is full 4 if R IO is full 5 ss RW RSP is in single step mode 6 ib RW Interrupt on break 7 s0 RW signal 0 is set 8 s1 RW signal 1 is set 9 s2 RW signal 2 is set 10 s3 RW signal 3 is set 11 s4 RW signal 4 is set 12 s5 RW signal 5 is set 13 s6 RW signal 6 is set 14 s7 RW signal 7 is set 6 1 ib 7 1 s0 4 1 if 5 1 ss 2 1 db 3 1 df 0 1 h 1 1 b 14 1 s7 12 1 s5 13 1 s6 10 1 s3 11 1 s...

Page 86: ...ns 0x0001 When writing the RSP status register the following bits are used Table 4 3 RSP Status Write Bits bit Description 0 0x00000001 clear HALT 1 0x00000002 set HALT 2 0x00000004 clear BROKE 3 0x00000008 clear RSP interrupt 4 0x00000010 set RSP interrupt 5 0x00000020 clear SINGLE STEP 6 0x00000040 set SINGLE STEP 7 0x00000080 clear INTERRUPT ON BREAK 8 0x00000100 set INTERRUPT ON BREAK 9 0x0000...

Page 87: ...clear SIGNAL 2 14 0x00004000 set SIGNAL 2 15 0x00008000 clear SIGNAL 3 16 0x00010000 set SIGNAL 3 17 0x00020000 clear SIGNAL 4 18 0x00040000 set SIGNAL 4 19 0x00080000 clear SIGNAL 5 20 0x00100000 set SIGNAL 5 21 0x00200000 clear SIGNAL 6 22 0x00400000 set SIGNAL 6 23 0x00800000 clear SIGNAL 7 24 0x01000000 set SIGNAL 7 bit Description ...

Page 88: ...e for synchronization with the CPU typically used to share the DMA activity If this register is 0 the semaphore may be acquired This register is set on read so the test and set is atomic Writing 0 to this register releases the semaphore GetSema mfc0 1 c7 bne 1 0 GetSema nop do critical work ReleaseSema mtc0 0 7 On power up this register is 0x0 c8 This register holds the RDP command buffer START ad...

Page 89: ...r END address Depending on the state of the RDP STATUS register this address is interpreted by the RDP as either a 24 bit physical DRAM address or a 12 bit DMEM address see c11 On power up this register is undefined c10 This register holds the RDP command buffer CURRENT address This register is READ ONLY Depending on the state of the RDP STATUS 23 0 24 RDP Command Start 23 0 24 RDP Command End ...

Page 90: ...his register holds the RDP status Table 4 4 RDP Status Register bit field Access Mode Description 0 x RW Use XBUS DMEM DMA or DRAM DMA 1 f RW RDP is frozen 2 fl RW RDP is flushed 3 g RW GCLK is alive 4 tb R TMEM is busy 5 pb R RDP PIPELINE is busy 6 cb R RDP COMMAND unit is busy 23 0 24 RDP Command Current 6 1 cb 7 1 cr 4 1 tb 5 1 pb 2 1 fl 3 1 g 0 1 x 1 1 f 10 1 sv 8 1 db 9 1 ev ...

Page 91: ...writing the RDP status register the following bits are used Table 4 5 RSP Status Write Bits CPU VIEW 7 cr R RDP COMMAND buffer is ready 8 db R RDP DMA is busy 9 ev R RDP COMMAND END register is valid 10 sv R RDP COMMAND START register is valid bit Description 0 0x0001 clear XBUS DMEM DMA 1 0x0002 set XBUS DMEM DMA 2 0x0004 clear FREEZE 3 0x0008 set FREEZE 4 0x0010 clear FLUSH 5 0x0020 set FLUSH 6 ...

Page 92: ...ed c13 This register holds a RDP command buffer busy counter incremented on each cycle of the RDP clock while the RDP command buffer is busy This register is READ ONLY On power up this register is undefined 7 0x0080 clear PIPE COUNTER 8 0x0100 clear COMMAND COUNTER 9 0x0200 clear CLOCK COUNTER bit Description 23 0 24 RDP Clock Counter 23 0 24 RDP Command Busy Counter ...

Page 93: ...his register is undefined c15 This register holds a RDP TMEM load counter incremented on each cycle of the RDP clock while the TMEM is loading This register is READ ONLY On power up this register is undefined CPU Point of View The RSP Coprocessor 0 registers and certain other RSP registers are memory mapped into the host CPU address space 23 0 24 RDP Pipe Busy Counter 23 0 24 RDP TMEM Load Counter...

Page 94: ...4040008 RW DMA READ length DRAM I DMEM c3 0x0404000c RW DMA WRITE length DRAM I DMEM c4 0x04040010 RW RSP Status c5 0x04040014 R DMA full c6 0x04040018 R DMA busy c7 0x0404001c RW CPU RSP Semaphore c8 0x04100000 RW RDP command buffer START c9 0x04100004 RW RDP command buffer END c10 0x04100008 R RDP command buffer CURRENT c11 0x0410000c RW RDP Status c12 0x04100010 R RDP clock counter c13 0x041000...

Page 95: ...her RSP Addresses These are also memory mapped for the CPU Table 4 7 Other RSP Addresses CPU VIEW Address Access Mode Description 0x04000000 RW RSP DMEM 4096 bytes 0x04001000 RW RSP IMEM 4096 bytes 0x04080000 RW RSP Program Counter PC 12 bits ...

Page 96: ...ate is 8 bytes 64 bits per cycle There is a DMA setup overhead of 6 12 clocks so longer transfers are more efficient IMEM and DMEM are single ported memories so accesses during DMA transfers will impact performance DMA Full The DMA registers are double buffered having one pending request and one current active request The DMA FULL condition means that there is an active request and a pending reque...

Page 97: ...of memory such as a portion of an image See Figure 4 1 DMA Transfer Length Encoding on page 84 for more information CPU Semaphore The CPU RSP semaphore should be used to share DMA resources Since the CPU could possibly DMA data to from the RSP while the RSP is running this semaphore is necessary to share the DMA engine Note The current graphics and audio microcode assume the CPU will not be DMA in...

Page 98: ...ss get semaphore mfc0 tmp SP_RESERVED bne tmp zero DMAproc note delay slot DMAFull wait for not FULL mfc0 tmp DMA_FULL bne tmp zero DMAFull nop set DMA registers mtc0 mem_addr DMA_CACHE handle writes bgtz iswrite DMAWrite mtc0 dram_addr DMA_DRAM j DMADone mtc0 dma_len DMA_READ_LENGTH DMAWrite mtc0 dma_len DMA_WRITE_LENGTH DMADone jr return clear semaphore delay slot mtc0 zero SP_RESERVED unname me...

Page 99: ...MA waits Registers 11 used as tmp name tmp 11 DMAwait request DMA access get semaphore mfc0 tmp SP_RESERVED bne tmp zero DMAwait note delay slot WaitSpin mfc0 tmp DMA_BUSY bne tmp zero WaitSpin nop jr return clear semaphore delay slot mtc0 zero SP_RESERVED unname tmp ...

Page 100: ...nd CMD_END registers are double buffered so they can be updated asynchronously by the RSP or CPU while the RDP is transferring data Writing to these registers will set the START_VALID and or END_VALID bits in the RDP status register signaling the RDP to use the new pointers once the current transfer is complete When a new CMD_START pointer is used CMD_CURRENT is reset to CMD_START Algorithm to pro...

Page 101: ...uffer registers The first code fragment illustrates the initial conditions for the RDP command buffer registers Figure 4 4 RDP Initialization Using the XBUS The OutputOpen function contains the most complicated part of the algorithm handling the wrapping condition of the circular FIFO The wrapping condition waits for CMD_CURRENT to advance before re programming new CMD_START and CMD_END registers ...

Page 102: ...tsz sub dramp dramp dmemp bgez dramp CurrentFit nop WrapBuffer packet won t fit wait for current to wrap mfc0 dramp CMD_STATUS andi dramp dramp 0x0400 bne dramp zero WrapBuffer AdvanceCurrent wait for current to advance mfc0 dramp CMD_CURRENT addi outp zero RSP_OUTPUT_OFFSET beq dramp outp AdvanceCurrent nop mtc0 outp CMD_START reset START CurrentFit done if current_address outp mfc0 dramp CMD_CUR...

Page 103: ... commands to DMEM advancing outp Once the complete RDP command is written to DMEM OutputClose is called Figure 4 6 OutputClose Function Using the XBUS OutputClose ent OutputClose OutputClose XBUS RDP output jr return mtc0 outp CMD_END end OutputClose unname outsz unname dramp unname dmemp ...

Page 104: ...104 RSP Coprocessor 0 ...

Page 105: ...007 2418 001 The reader is encouraged to be familiar with this document as we will occasionally use it as a frame of reference to describe the RSP assembly language The machine language format of the RSP instructions is based on the R4000 instruction set the reader is referred to the MIPS R4000 Microprocessor User s Manual 1 for additional information In the following chapter the assembler refers ...

Page 106: ...is well suited The RSP is also a proprietary processor its implementation and programming interface is not publicly available The RSP programming interface is designed to be incompatible with other MIPS products Major Differences from the R4000 Instruction Set The scalar unit SU instruction set uses only a subset of the R4000 instruction set See Missing Instructions on page 27 The pseudo opcodes o...

Page 107: ...tted as are single statements which span multiple lines Identifiers An identifier consists of a case sensitive sequence of alphanumeric characters plus the underscore _ character Identifiers can be up to 31 characters long and the first character must be alphabetic The value of an identifier can be set explicitly with the symbol directive Constants The assembler supports the following types of con...

Page 108: ...to the opcodes listed in Appendix A RSP Instruction Set Details Directive mnemonics a sequence of lowercase alphabetic characters that correspond to the list in Assembly Directives on page 114 Expression operators Other character sequences that make up the instruction syntax such as square brackets parentheses the colon the comma and the period Comments The assembler accepts three forms of comment...

Page 109: ...s A label is an identifier with a colon appended There can be no whitespace between the identifier and the colon Labels can be used as program labels targets of branching instructions or in the data segment to define DMEM addresses and later used as constants or in expressions Multiple consecutive labels in the data section are permitted they evaluate to the same value Multiple consecutive labels ...

Page 110: ...essions evaluate to an integer data type The assembler does arithmetic with two s complement integers using 32 bits of precision Expressions follow precedence rules and consist of Expression Operators Identifiers Constants Expression Operators The list of expression operators include Table 5 1 Expression Operators Operator Meaning Addition Subtraction Multiplication Division Remainder or Modulo Sh...

Page 111: ...annot be delayed until the value of a forward referencing symbol is determined Identifiers cannot be used in expressions used as a branch target or as a vector register element Identifiers cannot be used in expressions used in conjunction with the data initialization directives word half byte Note Identifiers by themselves can be used as values for the word and half directives including forward re...

Page 112: ...ollowed by a v followed by an integer in the range of 0 31 No whitespace between the dollar sign the v and the integer is permitted The syntax for referring to the coprocessor 0 control registers is a dollar sign followed by a c followed by an integer in the range of 0 31 No whitespace between the dollar sign the c and the integer is permitted Registers can be named using the name directive associ...

Page 113: ...f the 4 16 bit vector elements of the register halves an integer or integer expression in the range 0 1 followed by the letter q enclosed by square brackets representing the ordinal index of one of the 2 16 bit vector elements of the register quarters For vector loads stores and moves the vector register element syntax is as follows an integer or integer expression in the range 0 15 enclosed by sq...

Page 114: ...ssion is an integer expression an expression composed solely of integers and no identifiers Optional parameters are enclosed in square brackets Conditional parameters are denoted with a vertical bar align align iexpression The current location within the text or data section is aligned to the next multiple byte boundary corresponding to the evaluated iexpression possibly adding padding For the tex...

Page 115: ...s the base address to continue packing the data section Only the least significant 12 bits of the base address is used since DMEM is only 4K bytes Multiple base addresses are permitted any holes between initialized data will remain un initialized all 0 s The assembler keeps track of the maximum address initialized and all data up to that point including any holes will be output dmax dmax iexpressi...

Page 116: ...er or the iexpression The identifier may be a forward referencing symbol which is not defined yet This is useful for building program jump tables which must be filled in during the second pass of the assembler In order to accommodate this useful feature we accept the restriction that any expression used to initialize this data be an iexpression not an expression Since there are only 4K bytes of IM...

Page 117: ...ted space space expression If we are in the data section expression number of bytes are allocated and filled with zeros The new current location in the data section will be equal to the previous location plus expression bytes If we are in the text section expression 2 number of instructions are padded and filled with nop s and the new program counter for assembly will be equal to the old program c...

Page 118: ...tifier is removed from the symbol table Usually this is used to free up a named register when you are finished using it but it could be used to free up another program identifier word word identifier iexpression Four bytes one word of the data section are allocated and initialized to the value of the identifier or the iexpression The identifier may be a forward referencing symbol which is not defi...

Page 119: ...ram instruction program instruction instruction directive label directive label label directive scalarInstruction label scalarInstruction vectorInstruction label vectorInstruction directive align iexpression bound iexpression byte iexpression data data iexpression dmax iexpression end end identifier ent identifier ent identifier integer half identifier half iexpression name identifier scalarRegist...

Page 120: ...calarRegister scalarRegister regRegOp scalarRegister controlRegister regRegRegOp scalarRegister scalarRegister scalarRegister regImmOp scalarRegister expression regRegImmOp scalarRegister expression regRegImmOp scalarRegister scalarRegister expression regOffsetOp scalarRegister expression regOffsetOp expression regRegOffsetOp scalarRegister scalarRegister expression regOffsetBaseOp scalarRegister ...

Page 121: ... vectorRegister vectorRegister vectorRegister veRegvRegvRegOp vectorRegister vectorRegister vectorRegister element vdRegvRegOp vectorRegister element vectorRegister element regOp jr regRegRegOp add addu and nor or slt sltu sub subu xor regImmOp lui regRegImmOp addi addiu andi ori slti sltiu xori regOffsetOp bgez bgezal bgtz blez bltz bltzal regRegOffsetOp beq bne regOffsetBaseOp lb lbu lw lh lhu s...

Page 122: ... vmudh vmadh vmudm vmadm vmudn vmadn vmudl vmadl vadd vsub vabs vaddc vsubc vsar vand vnand vor vnor vxor vnxor vlt veq vne vge vcl vch vcr vmrg vdRegvRegOp vmov vrcp vrsq vrcph vrsqh vrcpl vrsql expression expression integer identifier expression expression expression expression expression expression expression expression expression expression expression expression expression expression expressio...

Page 123: ...iexpression iexpression iexpression iexpression iexpression iexpression iexpression iexpression iexpression iexpression iexpression iexpression iexpression scalarRegister identifier integer sp s8 at ra vectorRegister identifier v integer vco vcc vce controlRegister identifier c integer element iexpression iexpression h iexpression q identifier alpha alphanumeric alphanumeric alpha digit _ qstring ...

Page 124: ... c d e f g h i j k l m n o p r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z integer digit 0x hexdigit 0X hexdigit 0 octdigit digit 0 1 2 3 4 5 6 7 8 9 hexdigit digit a b c d e f A B C D E F octdigit 0 1 2 3 4 5 6 7 ...

Page 125: ...ome advanced topics such as DMEM usage RSP performance code overlays and the CPU RSP relationship Examples and information presented in this chapter are often one of many possible approaches the reader is encouraged to treat this chapter as inspiration not rigorous instruction ...

Page 126: ...should be loaded into DMEM as part of the task loading effort If you make this data section as small as it can be and keep it near the top of DMEM 0x04000000 this task loading can be as fast as possible Be sure to compare the size of the data that must be initialized with the size of the data loaded into DMEM via the task structure Most programs use the value SP_UCODE_DATA_SIZE which is defined in...

Page 127: ...e initialized however it may be useful to allocate space in your global DMEM map at compile time Truncating the dat file before building the ELF object to a size that includes the static data but not the dynamic data which does not need to be initialized will result in a smaller ELF object and therefore less ROM and DRAM usage Diagnostic Information The assembler provides several useful directives...

Page 128: ...sistent coding style helps improve the chance of finding a bug that would otherwise be hidden in an unreadable section of code This optimization technique is best left for last As code is reorganized during development and testing the dual issue pattern will change Hint Keeping the both halves of the RSP busy is going to be one of your keys to maximum performance Vectorization The computational po...

Page 129: ... is because computing structure offsets is a simple addition rather than another memory access This is not a not a major point for the RSP as we lack a vectorizing C compiler There is another important lesson worth mentioning from the body of previous vectorization work Most of the recent efforts in compiler design and high level software engineering for SIMD systems are designed to be scalable as...

Page 130: ... your keys to maximum performance Software Pipelining SIMD processing achieves maximum performance when there is a high degree of data parallelism This simply means that their are lots of independent data items that can all be operated on at once An important idea in vector processing is that data recurrence is not allowed Consider this code fragment for i 0 i n i a i a i 1 2 0 In this example we ...

Page 131: ...ing the pipeline full is going to be one of your keys to maximum performance Loop Inversion A common trick used in vector programming is loop inversion This means swapping inner and outer loops in order to create the simplest loop with the largest number of iterations so we can maximize vectorization Consider the following code fragment which could be used for vertex translation for i 0 i num_pts ...

Page 132: ...nd unknown CPU resources on the RSP we must vectorize the loop by hand breaking up the iterations into 8 elements at a time the size of our vector unit Careful evaluation of each loop should include trying to maximize the vector elements keeping them filled as well as avoiding unnecessary loop start up and loop overhead Loop Unrolling Unrolling a loop or section of code while consuming precious IM...

Page 133: ...e RSP clock CLK of the simulator is always available as a register Note Although it is accurate within a few percent the RSP simulator is not cycle accurate with the actual hardware The differences are mainly in VU loads and moves It is also useful to use the RDP Command Counter to profile code on the actual hardware This value can be sampled saved to DMEM or DMA d to DRAM for later analysis A sam...

Page 134: ... experiencing execution stalls due to data dependencies and or not keeping both execution units busy Inserting dummy display list instructions temporarily customizing the microcode to mark coarse timing boundaries is another useful trick In the RSP microcode Checkpoint the clock before the critical section mfc0 1 c12 sw 1 0 0 Perform the critical section Checkpoint the clock after the critical sec...

Page 135: ...er speeds Like all DMA transfers the source and destination must be 64 bit aligned some care must be taken planning microcode overlays to meet this restriction The assembler provides several directives to guarantee code alignment Since IMEM is single ported memory only one control unit can access it at a time if microcode is loaded while a program is currently executing IMEM accesses are shared be...

Page 136: ...y in the second pass of the assembler External Symbol Tables The S option to the assembler allows you to specify another microcode object to use as an external symbol table This allows you to branch to locations outside the current microcode object A Sample RSP Linker While not a true linker the program buildtask can be used to combine multiple RSP objects both text and data sections into a larger...

Page 137: ...set field is 32 bits the size and destination are both 16 bits The destination field is not generated by buildtask it is generated by the assembler usually as an IMEM label At run time the DRAM address of the microcode object part of the OSTask structure must be added to the offset field to generate the correct DRAM address for each overlay Data objects for subsequent overlays may be redundant and...

Page 138: ...ects use the rsp2elf program to construct the debug executables and library objects gspLine3D gspLine3D u newt u BUILDTASK f 1 o gspLine3D u newt u gspLine3D o gspLine3D RSP2ELF p r build the individual objects newt u gspLine3D u newt s COMMON_GFX_CODE RSPASM LCINCS LCDEFS DNEWT_ALONE S gspLine3D u o newt s gspLine3D u COMMON_GFX_CODE LINE_CODE RSPASM LCINCS LCDEFS o gmain s In this example there ...

Page 139: ...om Overlay DMEM Initialization This code fragment shows the initialization of DMEM for this example OVERLAY TABLE Program module overlay table Offsets and sizes are filled in by the buildtask utility destinations are the responsibility of the ucode OVERLAY_OFFSET offset from beginning of microcode in RDRAM and in o file filled in by buildtask OVERLAY_SIZE length of overlay in bytes filled in by bu...

Page 140: ...sk see Figure 6 2 buildtask Operation on page 137 Overlay Initialization Code Before we load the overlay we must update the overlay table with the correct DRAM address for the start of the code This is usually done immediately at the beginning of the program since we require the OSTask structure which has been copied into DMEM and may need to be overwritten by the program code overlays update tabl...

Page 141: ...tually overlaying the new microcode is the same as any other DMA transfer See DMA on page 96 we use the information from the overlay table to set the source destination and length of the transfer overp points to the proper entry in the overlay table loadOverlay lw dram_addr OVERLAY_OFFSET overp lh dma_len OVERLAY_SIZE overp lh imem_addr OVERLAY_DEST overp jal DMAproc addi iswrite zero 0 delay slot...

Page 142: ...low part of DMEM 0x1000 sizeof OSTask DMA the RSP boot microcode into IMEM at 0x0 Set the RSP PC to 0x0 Clear the HALT bit of the RSP status register Once the HALT bit is cleared the RSP begins execution using the current PC and contents of IMEM RSP Boot Microcode The boot microcode copies the task microcode into IMEM at 0x80 and the task data into DMEM at 0x0 Since the task data might overwrite t...

Page 143: ...forming IO operations __osSpRawStartDma s32 __osSpRawStartDma s32 direction u32 devAddr void dramAddr u32 size Based on the input direction OS_READ or OS_WRITE set up a DMA transfer between RDRAM and RSP memory address space devAddr and dramAddr specifies the DMA buffer address of RSP memory and RDRAM respectively size contains the number of bytes to transfer Note that these addresses must be 64 b...

Page 144: ...ce is busy return a 1 and abort the operation __osSpGetStatus u32 __osSpGetStatus void Return the RSP status register __osSpSetStatus void __osSpSetStatus u32 data Update the RSP status register __osSpSetPc s32 __osSpSetPc u32 data Set the RSP program counter PC If the RSP is not halted return a 1 and abort the operation Address spaces used as parameters to these functions are defined in the file ...

Page 145: ...ything is mostly working and you progress to integrating the new microcode with an application running on the CPU using the RSP simulator becomes a little trickier In order to use the RSP simulator you must create a DRAM image containing all the necessary pieces for the RSP task and an OSTask structure Briefly the technique is Run the RSP simulator Copy the DRAM image into memory at 0x0 Copy the O...

Page 146: ... to the Indy in order to simulate the RSP task gbi2mem This tool takes the file dumped by guDumpGbiDL and creates the mem and tsk files containing the DRAM image and OSTask structure respectively gbi2mem works by reading the ASCII file and creating a binary DRAM image with all objects located at the proper address Since rmonPrintf writes to the terminal the proper invocation is to pipe the output ...

Page 147: ...display the old frame Audio processing on the other hand is usually a function of sample rate number of voices or other data which is more constant and easier to predict Audio processing is more susceptible to discontinuities caused by processor starvation however If the next frame of audio is not computed the audio circuitry will not have any data to play and the sound will stop or click or pop T...

Page 148: ... only a few cycles and this guarantees that we will test every several hundred clock cycles at the most we re done with this command do the next one if available GfxDone stick our head up see if we need to yield the SP If so checkpoint everything then exit mfc0 yield SP_STATUS need to yield andi yield yield SP_STATUS_YIELD bne yield zero RSPYield lh overeturn TASKYIELD zero return where Yielding T...

Page 149: ...s Restarting a previously yielded task is conceptually simple the previously saved DMEM data from the yield buffer is used as the ucode_data field in the task header and the OS_TASK_YIELDED bit in the task header is set The microcode will detect the OS_TASK_YIELDED bit in the task header flags and perform the proper initialization before resuming execution This initialization should include restor...

Page 150: ...150 Advanced Information ...

Page 151: ...format such as rs rt immediate etc are shown in lowercase names For the sake of clarity we sometimes use an alias for a variable subfield in the formats of specific instructions For example we use rs base in the format for load and store instructions Such an alias is always lower case since it refers to a variable subfield In the instruction descriptions that follow the Operation section describes...

Page 152: ...on 2 s complement or floating point subtraction 2 s complement or floating point multiplication div 2 s complement integer division mod 2 s complement modulo Floating point division 2 s complement less than comparison and Bit wise logical AND or Bit wise logical OR xor Bit wise logical XOR nor Bit wise logical NOR GPR x General Register x The content of GPR 0 is always zero Attempts to alter the c...

Page 153: ... to be executed in sequential order as modified by conditional and loop constructs Operations which are marked T i are executed at instruction cycle i relative to the start of execution of the instruction Thus an instruction which starts at time j executes operations marked T i at time i j The interpretation of the order of execution between two instructions or two operations which execute at the ...

Page 154: ...bits 15 through 0 of the immediate value to form a 32 bit sign extended value immediate 016 immediate15 16 immediate15 0 Example 3 VR vt e 15 0 dmem Addr 7 0 08 Eight zero bits are concatenated with the byte of DMEM at Addr and assigned to the 16 bit element at byte e of VU register vt Example 4 The 16 bit element at byte 2 of VU register vs is AND d with the 16 bit element at byte 2 of VU registe...

Page 155: ...Revision 1 0 155 ...

Page 156: ... form the result The result is placed into general register rd Since the RSP does not signal an overflow exception for ADD this command behaves identically to ADDU Operation Exceptions None ADD Add 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 ADD 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ADD T GPR rd GPR rs GPR rt ...

Page 157: ...ister rs to form the result The result is placed into general register rt Since the RSP does not signal an overflow exception for ADDI this command behaves identically to ADDIU Operation Exceptions None ADDI Add Immediate 31 25 26 20 21 15 16 0 ADDI rs rt immediate 6 5 5 16 0 0 1 0 0 0 ADDI T GPR rt GPR rs immediate15 16 immediate15 0 ...

Page 158: ... form the result The result is placed into general register rt Since the RSP does not signal an overflow exception for ADDI this command behaves identically to ADDI Operation Exceptions None ADDIU Add Immediate Unsigned 31 25 26 20 21 15 16 0 ADDIU rs rt immediate 6 5 5 16 0 0 1 0 0 1 ADDIU T GPR rt GPR rs immediate15 16 immediate15 0 ...

Page 159: ...o form the result The result is placed into general register rd Since the RSP does not signal an overflow exception for ADD this command behaves identically to ADD Operation Exceptions None ADDU Add Unsigned 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 ADDU 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 ADDU T GPR rd GPR rs GPR rt ...

Page 160: ...ith the contents of general register rt in a bit wise logical AND operation The result is placed into general register rd Operation Exceptions None AND And 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 AND 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 AND T GPR rd GPR rs and GPR rt ...

Page 161: ...ded and combined with the contents of general register rs in a bit wise logical AND operation The result is placed into general register rt Operation Exceptions None ANDI And Immediate 31 25 26 20 21 15 16 0 ANDI rs rt immediate 6 5 5 16 0 0 1 1 0 0 ANDI T GPR rt 016 immediate and GPR rs 15 0 ...

Page 162: ...f general register rt are compared If the two registers are equal then the program branches to the target address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BEQ Branch On Equal BEQ 31 25 26 20 21 15 16 0 BEQ rs rt offset 6 5 5 16 0 0 0 1 0 0 T target offset15 14 offset 02 condition GPR rs G...

Page 163: ...ave the sign bit cleared then the program branches to the target address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BGEZ Or Equal To Zero Branch On Greater Than 31 25 26 20 21 15 16 0 REGIMM rs BGEZ offset 6 5 5 16 0 0 0 0 0 1 0 0 0 0 1 BGEZ T target offset15 14 offset 02 condition GPR rs 3...

Page 164: ...cleared then the program branches to the target address with a delay of one instruction General register rs may not be general register 31 because such an instruction is not restartable Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BGEZAL Or Equal To Zero And Link Branch On Greater Than 31 25 26 20 21 15 16 0 REGIMM rs BGEZA...

Page 165: ...al register rs have the sign bit cleared and are not equal to zero then the program branches to the target address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BGTZ Branch On Greater Than Zero 31 25 26 20 21 15 16 0 BGTZ rs 0 offset 6 5 5 16 0 0 0 1 1 1 0 0 0 0 0 BGTZ T target offset15 14 off...

Page 166: ...ister rs have the sign bit set or are equal to zero then the program branches to the target address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BLEZ Branch on Less Than 31 25 26 20 21 15 16 0 BLEZ rs 0 offset 6 5 5 16 Or Equal To Zero 0 0 0 1 1 0 0 0 0 0 0 BLEZ T target offset15 14 offset 02...

Page 167: ...ster rs have the sign bit set then the program branches to the target address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BLTZ Branch On Less Than Zero 31 25 26 20 21 15 16 0 REGIMM rs BLTZ offset 6 5 5 16 0 0 0 0 0 1 0 0 0 0 0 BLTZ T target offset15 14 offset 02 condition GPR rs 31 1 T 1 if...

Page 168: ... sign bit set then the program branches to the target address with a delay of one instruction General register rs may not be general register 31 because such an instruction is not restartable Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BLTZAL Zero And Link Branch On Less Than 31 25 26 20 21 15 16 0 REGIMM rs BGEZAL offset ...

Page 169: ...s of general register rt are compared If the two registers are not equal then the program branches to the target address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None BNE Branch On Not Equal 31 25 26 20 21 15 16 0 BNE rs rt offset 6 5 5 16 0 0 0 1 0 1 BNE T target offset15 14 offset 02 conditi...

Page 170: ... the SP_STATUS_BROKE bit in the RSP status register When the SP_STATUS_INTR_BREAK is set in the RSP status register the RSP interrupt is signaled MI_INTR_SP Operation Exceptions None BREAK Breakpoint 31 25 26 SPECIAL 6 0 BREAK code 6 5 6 20 0 0 0 0 0 0 0 0 1 1 0 1 BREAK T break ...

Page 171: ...ocessor 2 VU control register rd are loaded into general register rt Operation Exceptions None Coprocessor 2 VU CFC2 11 Move Control From 31 25 26 20 21 15 16 COP2 CF rt 6 5 5 rd 0 5 11 10 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 CFC2 T data CCR rd T 1 GPR rt data ...

Page 172: ... rt are loaded into control register rd of the VU coprocessor unit 2 Operation Exceptions None CTC2 11 Move Control to 31 25 26 20 21 15 16 COP2 CT rt 6 5 5 rd 0 5 11 10 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 CTC2 Coprocessor 2 VU T data GPR rt T 1 CCR rd data ...

Page 173: ...its of the address of the delay slot The program unconditionally jumps to this calculated address with a delay of one instruction Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None J Jump 31 25 26 J 6 0 target 26 0 0 0 0 1 0 J T temp target T 1 PC11 0 temp11 2 02 ...

Page 174: ...y jumps to this calculated address with a delay of one instruction The address of the instruction after the delay slot is placed in the link register r31 Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None JAL Jump And Link 31 25 26 JAL 6 0 target 26 0 0 0 0 1 1 JAL GPR 31 PC 8 T temp target T 1 PC11 0 temp11 2 02 ...

Page 175: ... have the same effect when re executed However an attempt to execute this instruction is not trapped and the result of executing such an instruction is undefined Since instructions must be word aligned a Jump and Link Register instruction must specify a target register rs whose two low order bits are zero Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used...

Page 176: ...aligned a Jump Register instruction must specify a target register rs whose two low order bits are zero Since the RSP program counter is only 12 bits only 12 bits of the calculated address are used Operation Exceptions None JR Jump Register 21 20 31 25 26 SPECIAL 6 0 JR rs 0 6 5 5 15 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 JR T temp GPR rs T 1 PC11 0 temp11 0 ...

Page 177: ...yte at the DMEM location specified by the effective address are sign extended and loaded into general register rt Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None LB Load Byte 31 25 26 20 21 15 16 0 LB base rt offset 6 5 5 16 1 0 0 0 0 0 LB T Addr offset15 16 offset15 0 GPR base GPR rt 31 0 dmem Addr 7 24 dmem Addr11 0 7 0 ...

Page 178: ...the DMEM location specified by the effective address are zero extended and loaded into general register rt Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None LBU Load Byte Unsigned 31 25 26 20 21 15 16 0 LBU base rt offset 6 5 5 16 1 0 0 1 0 0 LBU T Addr offset15 16 offset15 0 GPR base GPR rt 31 0 024 dmem Addr11 0 7 0 ...

Page 179: ...tion following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None LBV into Vector Register Load Byte 31 26 20 21 15 16 ...

Page 180: ...f 8 bytes This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computati...

Page 181: ...omputed by adding the offset to the contents of the base register a SU GPR This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element is the byte element of the...

Page 182: ...182 Operation Exceptions None T Addr offset15 16 offset15 0 GPR base for i in 0 3 Addr Addr i 4 VR vt element i 2 15 0 01 dmem Addr11 0 7 0 07 endfor ...

Page 183: ...rd at the DMEM location specified by the effective address are sign extended and loaded into general register rt Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None LH Load Halfword 31 25 26 20 21 15 16 0 LH base rt offset 6 5 5 16 1 0 0 0 0 1 LH T Addr offset15 16 offset15 0 GPR base GPR rt 31 0 dmem Addr 7 16 dmem Addr11 0 15 0 ...

Page 184: ...the DMEM location specified by the effective address are zero extended and loaded into general register rt Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None LHU Load Halfword Unsigned 31 25 26 20 21 15 16 0 LHU base rt offset 6 5 5 16 1 0 0 1 0 1 LHU T Addr offset15 16 offset15 0 GPR base GPR rt 31 0 016 dmem Addr11 0 15 0 ...

Page 185: ... SU GPR This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element should be 0 This instruction could be used for unpacking pixel chroma UV values as required b...

Page 186: ...186 Operation Exceptions None T Addr offset15 16 offset15 0 GPR base for i in 0 7 Addr Addr i 2 VR vt i 2 15 0 01 dmem Addr11 0 7 0 07 endfor ...

Page 187: ...iple of 4 bytes This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU com...

Page 188: ...ults are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element should be 0 Operation Exceptions None LPV into Vector Register Load Packed Bytes 31 26 20 21 15 16 0 LWC2 base vt 6 5 5 1 1 0 0 1 0 LPV 4 5 element ...

Page 189: ...tores on page 51 The effective address is computed by adding the offset to the contents of the base register a SU GPR This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed TOperation Excepti...

Page 190: ...omputed by adding the offset to the contents of the base register a SU GPR This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Operation Exceptions None LRV into Vector Register Load Quad ...

Page 191: ...iple of 2 bytes This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU com...

Page 192: ...ents of the base register a SU GPR This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element is the byte element of the vector register not the ordinal element...

Page 193: ...ediate is shifted left 16 bits and concatenated to 16 bits of zeros The result is placed into general register rt Operation Exceptions None LUI Load Upper Immediate 31 25 26 20 21 15 16 0 LUI rt immediate 6 5 5 16 0 0 1 1 1 1 LUI 0 0 0 0 0 0 T GPR rt immediate15 0 016 ...

Page 194: ...ster a SU GPR This instruction has three load delay slots results are available in the fourth instruction following this load If an attempt is made to use the target register vt in a delay slot hardware register interlocking will stall the processor until the load is completed Note The element specifier element should be 0 This instruction could be used to unpack 8 bit pixel data such as RGBA or l...

Page 195: ...Revision 1 0 195 Operation Exceptions None T Addr offset15 16 offset15 0 GPR base for i in 0 7 Addr Addr i VR vt i 2 15 0 01 dmem Addr11 0 7 0 07 endfor ...

Page 196: ... of the word at the DMEM location specified by the effective address are loaded into general register rt Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None LW Load Word 31 25 26 20 21 15 16 0 LW base rt offset 6 5 5 16 1 0 0 0 1 1 LW T Addr offset15 16 offset15 0 GPR base GPR rt 31 0 dmem Addr11 0 31 0 ...

Page 197: ...cessor register rd of the CP0 are loaded into general register rt Operation Exceptions None MFC0 Move From rd 11 10 5 31 25 26 20 21 15 16 0 COP0 MF rt 0 6 5 5 11 System Control Coprocessor 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 MFC0 T data CPR 0 rd T 1 GPR rt data ...

Page 198: ...gister vd are sign extended and loaded into general register rt Operation Exceptions None MFC2 7 Move From 31 25 26 20 21 15 16 COP2 MF rt 6 5 5 rd 0 5 11 10 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 MFC2 Coprocessor 2 VU 7 6 e 4 T data15 0 VR vd e 15 0 T 1 GPR rt 31 0 data15 16 data15 0 ...

Page 199: ...neral register rt are loaded into coprocessor register rd of CP0 Operation Exceptions None MTC0 Move To rd 11 10 5 31 25 26 20 21 15 16 0 COP0 MT rt 0 6 5 5 11 System Control Coprocessor 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 00 MTC0 T data GPR rt T 1 CPR 0 rd data ...

Page 200: ...neral register rt are loaded at byte element e of VU register vd Operation Exceptions None MTC2 7 Move To 31 25 26 20 21 15 16 COP2 MT rt 6 5 5 rd 0 5 11 10 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 MTC2 Coprocessor 2 VU 7 6 e 4 T data15 0 GPR rt 15 0 T 1 VR vd e 15 0 data15 0 ...

Page 201: ...s no internal RSP state It is useful for program instruction padding or insertion into branch delay slots when no useful work can be done Operation Exceptions None NOP Null Operation 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 NOP 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NOP T nothing happens ...

Page 202: ...ith the contents of general register rt in a bit wise logical NOR operation The result is placed into general register rd Operation Exceptions None NOR Nor 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 NOR 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 NOR T GPR rd GPR rs nor GPR rt ...

Page 203: ...ombined with the contents of general register rt in a bit wise logical OR operation The result is placed into general register rd Operation Exceptions None OR Or 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 OR 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 OR T GPR rd GPR rs or GPR rt ...

Page 204: ...ombined with the contents of general register rs in a bit wise logical OR operation The result is placed into general register rt Operation Exceptions None 31 25 26 20 21 15 16 0 ORI rs rt immediate 6 5 5 16 0 0 1 1 0 1 ORI Or Immediate ORI T GPR rt GPR rs 31 16 immediate or GPR rs 15 0 ...

Page 205: ...MEM address The least significant byte of register rt is stored at the DMEM address Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None SB Store Byte 31 25 26 20 21 15 16 0 SB base rt offset 6 5 5 16 1 0 1 0 0 0 SB T Addr offset15 16 offset15 0 GPR base data GPR7 0 StoreDMEM BYTE data Addr11 0 ...

Page 206: ... a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None SBV from Vector Register Store Byte 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SBV 4 5 element 6 10 7 11 7 SBV 0 0 0 0 0 25 offset T Addr offset15 16 offset15 0 GPR base data VR vt element 7 0 StoreDMEM BYTE data...

Page 207: ...e register a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None SDV from Vector Register Store Double 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SDV 4 5 element 6 10 7 11 7 SDV 0 0 0 1 1 25 offset T Addr offset15 16 offset15 0 GPR base data VR vt element 63 0 StoreD...

Page 208: ...3 Packed Loads and Stores on page 53 The effective address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None SFV from Vector Register Store Packed Fourth 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 ...

Page 209: ...M address The least significant halfword of register rt is stored at the DMEM address Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None SH Store Halfword 31 25 26 20 21 15 16 0 SH base rt offset 6 5 5 16 1 0 1 0 0 1 SH T Addr offset15 16 offset15 0 GPR base data GPR15 0 StoreDMEM HALFWORD data Addr11 0 ...

Page 210: ...s is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element should be 0 This instruction could be used to pack pixel chroma UV values as required for MPEG compression Operation Exceptions None SHV from Vector Register Store Packed Half 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SHV 4 5 element 6 10 7 11 7 SHV 0 1 0 0 0 25 offset T Add...

Page 211: ...are shifted left by sa bits inserting zeros into the low order bits The result is placed in register rd Operation Exceptions None SLL Shift Left Logical 31 25 26 20 21 15 16 SPECIAL rt 6 5 5 rd sa SLL 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 SLL 0 0 0 0 0 0 T GPR rd GPR rt 31 sa 0 0sa ...

Page 212: ... order five bits contained in general register rs inserting zeros into the low order bits The result is placed in register rd Operation Exceptions None SLLV Shift Left Logical Variable 31 25 26 20 21 15 16 SPECIAL rt 6 5 5 rd 0 SLLV 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 SLLV rs T s GP rs 4 0 GPR rd GPR rt 31 s 0 0s ...

Page 213: ...ntents of general register rs are less than the contents of general register rt the result is set to one otherwise the result is set to zero The result is placed into general register rd Operation Exceptions None SLT Set On Less Than 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 SLT 5 5 6 11 10 6 5 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 SLT T if GPR rs GPR rt then GPR rd 031 1 else GPR rd 032 endif ...

Page 214: ...mediate the result is set to one otherwise the result is set to zero The result is placed into general register rt Since the RSP does not signal an overflow exception for SLTI this command behaves identically to SLTIU Operation Exceptions None SLTI Set On Less Than Immediate 31 25 26 20 21 15 16 0 SLTI rs rt immediate 6 5 5 16 0 0 1 0 1 0 SLTI T if GPR rs immediate15 16 immediate15 0 then GPR rd 0...

Page 215: ...mmediate the result is set to one otherwise the result is set to zero The result is placed into general register rt Since the RSP does not signal an overflow exception for SLTI this command behaves identically to SLTI Operation Exceptions None SLTIU Immediate Unsigned Set On Less Than 31 25 26 20 21 15 16 0 SLTIU rs rt immediate 6 5 5 16 0 0 1 0 1 1 SLTIU T if 0 GPR rs immediate15 16 immediate15 0...

Page 216: ...eral register rs are less than the contents of general register rt the result is set to one otherwise the result is set to zero The result is placed into general register rd Operation Exceptions None SLTU Set On Less Than Unsigned 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 SLTU 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 SLTU T if 0 GPR rs 0 GPR rt then GPR rd 031 1 else GPR rd 032 endi...

Page 217: ...ase register a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None SLV from Vector Register Store Long 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SLV 4 5 element 6 10 7 11 7 SLV 0 0 0 1 0 25 offset T Addr offset15 16 offset15 0 GPR base data VR vt element 31 0 StoreD...

Page 218: ...tive address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element should be 0 Operation Exceptions None SPV from Vector Register Store Packed Bytes 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SPV 4 5 element 6 10 7 11 7 SPV 0 0 1 1 0 25 offset T Addr offset15 16 offset15 0 GPR base for i in 0 7 Addr Addr i data VR vt i 2 15 8 Stor...

Page 219: ...rd can be stored with the appropriate SRV instruction See Figure 3 2 Long Quad and Rest Loads and Stores on page 51 The effective address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element should be 0 Operation Exceptions None SQV from Vector Register Store Quad 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SQV 4 5 element 6 10 7 ...

Page 220: ...d right by sa bits sign extending the high order bits The result is placed in register rd Operation Exceptions None SRA Shift Right Arithmetic 31 25 26 20 21 15 16 SPECIAL 0 rt 6 5 5 rd sa SRA 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 SRA T GPR rd GPR rt 31 sa GPR rt 31 sa ...

Page 221: ...by the low order five bits of general register rs sign extending the high order bits The result is placed in register rd Operation Exceptions None SRAV Shift Right 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 SRAV 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 SRAV Arithmetic Variable T s GPR rs 4 0 GPR rd GPR rt 31 s GPR rt 31 s ...

Page 222: ...fted right by sa bits inserting zeros into the high order bits The result is placed in register rd Operation Exceptions None SRL Shift Right Logical 31 25 26 20 21 15 16 SPECIAL rt 6 5 5 rd sa SRL 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 1 0 SRL 0 0 0 0 0 0 T GPR rd 0 sa GPR rt 31 sa ...

Page 223: ...ed by the low order five bits of general register rs inserting zeros into the high order bits The result is placed in register rd Operation Exceptions None SRLV Shift Right Logical Variable 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 SRLV 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 SRLV T s GPR rs 4 0 GPR rd 0s GPR rt 31 s ...

Page 224: ...tes no bytes The effective address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier e is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None SRV from Vector Register Store Quad Rest 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SRV 4 5 element 6 10 7 11 7 SR...

Page 225: ...e register a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation Exceptions None SSV from Vector Register Store Short 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SSV 4 5 element 6 10 7 11 7 SSV 0 0 0 0 1 25 offset T Addr offset15 16 offset15 0 GPR base data VR vt element 15 0 StoreDM...

Page 226: ...specifies the beginning of an 8 register group The effective address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation See Transpose on page 54 Exceptions None STV from Vector Register Store Transpose 31 26 20 21 15 1...

Page 227: ...er rs to form a result The result is placed into general register rd Since the RSP does not signal an overflow exception for SUB this command behaves identically to SUBU Operation Exceptions None SUB SUB Subtract 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 SUB 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 T GPR rd GPR rs GPR rt ...

Page 228: ...m a result The result is placed into general register rd Since the RSP does not signal an overflow exception for SUB this command behaves identically to SUBU Operation Exceptions None SUBU Subtract Unsigned 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 SUBU 5 5 6 11 10 6 5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 SUBU T GPR rd GPR rs GPR rt ...

Page 229: ...ive address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element should be 0 This instruction could be used to pack 8 bit pixel data such as RGBA or luma Y values Operation Exceptions None SUV from Vector Register Store Unsigned Packed 31 26 20 21 15 16 0 SWC2 base vt 6 5 5 1 1 1 0 1 0 SUV 4 5 element 6 10 7 11 7 SUV 0 0 1 1 1 25 offset ...

Page 230: ...ents of general register rt are stored at the DMEM location specified by the DMEM address Since DMEM is only 4K bytes only the lower 12 bits of the effective address are used Operation Exceptions None SW Store Word 31 25 26 20 21 15 16 0 SW base rt offset 6 5 5 16 1 0 1 0 1 1 SW T Addr offset15 16 offset15 0 GPR base data GPR31 0 StoreDMEM WORD data Addr11 0 ...

Page 231: ... a circular shift of the 8 shorts by element 1 which is equivalent to dest_short Slice source_short Slice Element 1 0x7 The effective address is computed by adding the offset to the contents of the base register a SU GPR Note The element specifier element is the byte element of the vector register not the ordinal element count as in VU computational instructions Operation See Transpose on page 54 ...

Page 232: ...vector register vs and placed into vector register vd If vs is equal to 0 vs is placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VABS of Short Elements Vector Absolute Value 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VABS 1 1 5 5 vd vs 5 10 6 11 6 VABS 0 1 0 0 1 1 24 ...

Page 233: ... i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if VR vs i 2 15 0 0 then result15 0 VR vt j 2 15 0 elseif VR vs i 2 15 0 015 0 then result15 0 015 0 elseif VR vs i 2 15 0 0 then result15 0 VR vt j 2 15 0 endif VR vd i 2 15 0 result15 0 ACC i 15 0 result15 0 endfor ...

Page 234: ...egister VCO is used as carry in and VCO is cleared The results are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VADD of Short Elements Vector Add 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VADD 1 1 5 5 vd vs 5 10 6 11 6 VADD 0 1 0 0 0 0 24 ...

Page 235: ...010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 VR vt j 2 15 0 VCOi VR vd i 2 15 0 Clamp_Signed result15 0 ACC i 15 0 result15 0 endfor VCO15 0 016 ...

Page 236: ...ontrol register VCO is used as carry out The results are not clamped The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VADDC With Carry Vector Add Short Elements 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VADDC 1 1 5 5 vd vs 5 10 6 11 6 VADDC 0 1 0 1 0 0 24 ...

Page 237: ...10 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result16 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 VCOi 8 0 VCOi result16 endfor ...

Page 238: ...he elements of vector register vs The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VAND of Short Elements Vector AND 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VAND 1 1 5 5 vd vs 5 10 6 11 6 VAND 1 0 1 0 0 0 24 ...

Page 239: ...f e3 0 1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 and VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor ...

Page 240: ...n vt or the vector vt such as comparing w to xyz or clamping a vector to a range VCH performs VT VS VT generating 16 bits in VCC and updating VCO and VCE with equal and sign values The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VCH Test High Vector Select Clip 31 25 26 20 ...

Page 241: ...0 1000 1000 then scalar whole of vector j e3 0 0111 endif sign VR vs i 2 15 0 xor VR vt j 2 15 0 0 if sign then ge VR vt j 2 15 0 0 le VR vs i 2 15 0 VR vt j 2 15 0 0 vce VR vs i 2 15 0 VR vt j 2 15 0 1 eq VR vs i 2 15 0 VR vt j 2 15 0 0 di15 0 le VR vt j 2 15 0 VR vs i 2 15 0 ACC i 15 0 di15 0 else le VR vt j 2 15 0 0 ge VR vs i 2 15 0 VR vt j 2 15 0 0 vce 0 eq VR vs i 2 15 0 VR vt j 2 15 0 0 di1...

Page 242: ...242 Exceptions None VR vd i 2 15 0 di15 0 neq eq and 1 VCC15 0 VCC15 0 or ge i 8 or le i VCO15 0 VCO15 0 or neq i 8 or sign i VCE7 0 VCE7 0 or vce i 8 endfor ...

Page 243: ...ement in vt or the vector vt such as comparing w to xyz or clamping a vector to a range VCL performs VT VS VT generating 16 bits in VCC and updating VCO and VCE with equal and sign values The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VCL Test Low Vector Select Clip 31 25 ...

Page 244: ...eif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif le VCC15 0 i and 1 ge VCC15 0 i 8 and 1 vce VCE7 0 i and 1 eq VCO15 0 i 8 and 1 sign VCO15 0 i and 1 if sign then di15 0 VR vs i 2 15 0 VR vt j 2 15 0 carrry di15 0 116 if eq then le not vce and di15 0 and 116 0 and not carry or vce and di15 0 and 116 0 or not carry endif di15 0 le VR vt j 2 15 0 VR vs i 2 15 0 ACC i 15 0 di15 0 VCEi...

Page 245: ...None di15 0 VR vs i 2 15 0 VR vt j 2 15 0 if eq then ge di15 0 0 endif di15 0 ge VR vt j 2 15 0 VR vs i 2 15 0 ACC i 15 0 di15 0 endif VR vd i 2 15 0 di15 0 VCC15 0 VCC15 0 and 1 07 1 i or ge i 8 or le i endfor VCO15 0 0 VCE7 0 0 ...

Page 246: ... a range VCR performs VT VS VT generating 16 bits in VCC and updating VCO and VCE with equal and sign values It interprets vt as a 1 s complement number useful for clamping to a power of 2 VCR is a single precision instruction and ignores VCO for input The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt ...

Page 247: ...e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif sign VR vs i 2 15 0 xor VR vt j 2 15 0 0 if sign then ge VR vt j 2 15 0 0 le VR vs i 2 15 0 VR vt j 2 15 0 1 0 di15 0 le VR vt j 2 15 0 VR vs i 2 15 0 ACC i 15 0 di15 0 else le VR vt j 2 15 0 0 ge VR vs i 2 15 0 VR vt j 2 15 0 0 di15 0 ge VR vt j 2 15 0 VR vs i 2 15 0 ACC i 15 0 di15 0 endif VR vd i 2 15 0 di15 0 VCC15 0 VCC15 0 or ge i ...

Page 248: ...248 Exceptions None ...

Page 249: ...sed as input VCO and VCE are cleared on output and VCC is set with the results of the comparison the element which is equal The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VEQ Equal Vector Select 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VEQ 1 1 5 5 vd vs 5 10 6 11...

Page 250: ... 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if VR vs i 2 15 0 VR vt j 2 15 0 and VCEi then VCCi 1 else VCCi 0 endif if VCCi then result15 0 VR vs i 2 15 0 else result15 0 VR vt j 2 15 0 endif ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 VCOi 0 VCOi 8 0 VCEi 0 endfor ...

Page 251: ...Revision 1 0 251 Exceptions None ...

Page 252: ...CE are cleared on output and VCC is set with the results of the comparison the element which is greater than or equal The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VGE Greater Than or Equal Vector Select 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VGE 1 1 5 5 vd vs...

Page 253: ...lar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if VR vs i 2 15 0 VR vt j 2 15 0 then VCCi 1 elseif VR vs i 2 15 0 VR vt j 2 15 0 and VCOi VCEi then VCCi 1 else VCCi 0 endif if VCCi then result15 0 VR vs i 2 15 0 else result15 0 VR vt j 2 15 0 endif ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 VCOi 0 VCOi 8 0 VCEi 0 endfor ...

Page 254: ...254 Exceptions None ...

Page 255: ...as input VCO and VCE are cleared on output and VCC is set with the results of the comparison the element which is less than The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VLT Less Than Vector Select 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VLT 1 1 5 5 vd vs 5 10 ...

Page 256: ...r half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if VR vs i 2 15 0 VR vt j 2 15 0 then VCCi 1 elseif VR vs i 2 15 0 VR vt j 2 15 0 and VCOi and VCEi then VCCi 1 else VCCi 0 endif if VCCi then result15 0 VR vs i 2 15 0 else result15 0 VR vt j 2 15 0 endif ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor VCO 0 VCE 0 ...

Page 257: ...Revision 1 0 257 Exceptions None ...

Page 258: ...6 of the accumulator Bits 31 16 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMACF of Signed Fractions Vector Multiply Accumulate 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VMACF 1 1 5 5 vd vs 5 10 6 11 6 VMACF 0 0 1 0 0...

Page 259: ...10 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC47 16 ACC47 16 product30 0 0 VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 260: ...f ACC47 21 are zero or ACC21 is 1 Bits 32 17 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below 1 Oddification is performed as described in the MPEG1 specification ISO IEC 11172 2 VMACQ Oddification Vector Accumulator 31 25 26 20 21 1...

Page 261: ... e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if ACC47 0 0 and not ACC21 then ACC47 0 ACC47 0 026 1 021 else if ACC47 0 0 and not ACC21 then ACC47 0 ACC47 0 126 1 021 else ACC47 0 ACC47 0 048 endif VR vd i 2 15 0 Clamp_Signed ACC32 17 endfor ...

Page 262: ...of the accumulator Bits 31 16 of the accumulator are clamped to 16 bit unsigned values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMACU of Unsigned Fractions Vector Multiply Accumulate 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VMACU 1 1 5 5 vd vs 5 10 6 11 6 VMACU 0 0 1 0...

Page 263: ...0 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC47 16 ACC47 16 product30 0 0 VR vd i 2 15 0 Clamp_Unsigned ACC31 16 endfor ...

Page 264: ...ed for the high partial product multiplying an integer vs times an integer vt Bits 31 16 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMADH of High Partial Products Vector Multiply Accumulate 31 25 26 20 21 15 16 0 COP2 e vt 6 4...

Page 265: ...10 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 ACC31 0 product31 16 016 VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 266: ...igned for the low partial product multiplying a fraction vs times a fraction vt Bits 15 0 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMADL of Low Partial Products Vector Multiply Accumulate 31 25 26 20 21 15 16 0 COP2 e vt 6 4...

Page 267: ...1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 ACC31 0 product31 16 VR vd i 2 15 0 Clamp_Signed ACC15 0 endfor ...

Page 268: ...the mid partial product multiplying an integer vs times a fraction vt Bits 31 16 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMADM of Mid Partial Products Vector Multiply Accumulate 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 ...

Page 269: ...1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 ACC31 0 product31 0 VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 270: ...the mid partial product multiplying a fraction vs times an integer vt Bits 15 0 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMADN of Mid Partial Products Vector Multiply Accumulate 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0...

Page 271: ... 1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 ACC31 0 product31 0 VR vd i 2 15 0 Clamp_Signed ACC15 0 endfor ...

Page 272: ...ister vt is moved to the scalar 16 bit element de of vector register vd Operation Exceptions None VMOV Scalar Move Vector Element 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VMOV 1 1 5 5 vd de 5 10 6 11 6 VMOV 1 1 0 0 1 1 24 T VR vd de 15 0 VR vt e 15 0 ACC15 0 VR vt e 15 0 ...

Page 273: ... VCC for that element The values of VCC VCO and VCE remain unchanged The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMRG Merge Vector Select 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VMRG 1 1 5 5 vd vs 5 10 6 11 6 VMRG 1 0 0 1 1 1 24 ...

Page 274: ...n scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if VCCi then result15 0 VR vs i 2 15 0 else result15 0 VR vt j 2 15 0 endif VR vd i 2 15 0 result15 0 ACC15 0 result15 0 endfor ...

Page 275: ... designed for the high partial product multiplying an integer vs times an integer vt Bits 31 16 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMUDH of High Parital Products Vector Multiply 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0...

Page 276: ...0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 product31 16 016 VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 277: ... is designed for the low partial product multiplying a fraction vs times a fraction vt Bits 15 0 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMUDL of Low Parital Products Vector Multiply 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0...

Page 278: ... then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 product31 16 product31 16 VR vd i 2 15 0 Clamp_Signed ACC15 0 endfor ...

Page 279: ...ed for the mid partial product multiplying an integer vs times a fraction vt Bits 31 16 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMUDM of Mid Parital Products Vector Multiply 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 ...

Page 280: ...10 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 product31 0 VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 281: ...ed for the mid partial product multiplying a fraction vs times an integer vt Bits 15 0 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMUDN of Mid Parital Products Vector Multiply 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0...

Page 282: ...10 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC31 0 product31 0 VR vd i 2 15 0 Clamp_Signed ACC15 0 endfor ...

Page 283: ...oaded into the accumulator Bits 31 16 of the accumulator are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMULF of Signed Fractions Vector Multiply 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VMULF 1 1 5 5 vd vs 5 10 6 11 6 VMULF 0 0 0 0 0 0 24...

Page 284: ...scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC47 16 product30 0 0 ACC47 0 ACC47 0 1 015 VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 285: ... rounded if the product is negative otherwise zero is added Bits 32 17 of the accumulator are clamped to 16 bit signed values and AND d with 0xFFF0 producing a result from 2048 to 2047 aligned to the short MSB writing the results into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMULQ MPEG Quantiza...

Page 286: ...j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 if product31 0 0 then ACC47 16 product15 0 010 1 05 else ACC47 16 product15 0 endif VR vd i 2 15 0 Clamp_Signed ACC32 17 and 112 04 endfor ...

Page 287: ...ded into the accumulator Bits 31 16 of the accumulator are clamped to 16 bit unsigned values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VMULU of Unsigned Fractions Vector Multiply 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VMULU 1 1 5 5 vd vs 5 10 6 11 6 VMULU 0 0 0 0 0 1 ...

Page 288: ...calar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif product31 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC47 16 product30 0 0 ACC47 0 ACC47 0 1 015 VR vd i 2 15 0 Clamp_Unsigned ACC31 16 endfor ...

Page 289: ...with the elements of vector register vs The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VNAND of Short Elements Vector NAND 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VNAND 1 1 5 5 vd vs 5 10 6 11 6 VNAND 1 0 1 0 0 1 24 ...

Page 290: ...1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 nand VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor ...

Page 291: ...as input VCO and VCE are cleared on output and VCC is set with the results of the comparison the element which is not equal The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VNE Not Equal Vector Select 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VNE 1 1 5 5 vd vs 5 10 ...

Page 292: ... 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if VR vs i 2 15 0 VR vt j 2 15 0 then VCCi 1 elseif VR vs i 2 15 0 VR vt j 2 15 0 then VCCi 1 elseif VR vs i 2 15 0 VR vt j 2 15 0 and VCEi then VCCi 1 else VCCi 0 endif if VCCi then result15 0 VR vs i 2 15 0 else result15 0 VR vt j 2 15 0 endif VR vd i 2 15 0 result15 0 ACC i 15 0 result15 0 VCOi 0 VCEi 0 endfor ...

Page 293: ...Revision 1 0 293 Exceptions None ...

Page 294: ...al RSP state It is useful for program instruction padding or insertion into branch delay slots when no useful work can be done The Operation Exceptions None VNOP Null Instruction Vector 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VNOP 1 1 5 5 vd vs 5 10 6 11 6 VNOP 1 1 0 1 1 1 24 T nothing happens ...

Page 295: ... with the elements of vector register vs The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VNOR of Short Elements Vector NOR 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VNOR 1 1 5 5 vd vs 5 10 6 11 6 VNOR 1 0 1 0 1 1 24 ...

Page 296: ... 1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 nor VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor ...

Page 297: ...with the elements of vector register vs The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VNXOR of Short Elements Vector NXOR 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VNXOR 1 1 5 5 vd vs 5 10 6 11 6 VNXOR 1 0 1 1 0 1 24 ...

Page 298: ...1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 nxor VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor ...

Page 299: ... with the elements of vector register vs The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VOR of Short Elements Vector OR 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VOR 1 1 5 5 vd vs 5 10 6 11 6 VNOR 1 0 1 0 1 0 24 ...

Page 300: ... 1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 or VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor ...

Page 301: ...alar 16 bit element de of vector register vd Operation VRCP Reciprocal Single Precision Vector Element Scalar 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRCP 1 1 5 5 vd de 5 10 6 11 6 VRCP 1 1 0 0 0 0 24 T if VR vt e 15 0 0 then DivIn31 0 016 VR vt e 15 0 else DivIn31 0 016 VR vt e 15 0 endif lshift 0 i 0 while i 32 and found if DivIni 1 lshift 0 found 1 endif i i 1 endwhile ...

Page 302: ... addr15 0 result31 0 0 1 romData15 0 014 rshift lshift and 15 result31 0 0rshift result31 32 rshift if VR vt e 15 0 0 then result31 0 result31 0 endif if VR vt e 15 0 0 then result31 0 0 131 DivOut31 0 result31 0 internal register used by vrcph for i in 0 7 ACC i 15 0 VR vt e 15 0 endfor VR vd de 2 15 0 DivOut15 0 ...

Page 303: ...ter vt is loaded as the upper 16 bits for a pending double precision reciprocal operation Operation Exceptions None VRCPH Reciprocal Double Prec High Vector Element Scalar 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRCPH 1 1 5 5 vd de 5 10 6 11 6 VRCPH 1 1 0 0 1 0 24 T DivIn31 0 VR vt e 15 0 016 for i in 0 7 ACC i 15 0 VR vt e 15 0 endfor VR vd de 2 15 0 DivOut31 16 internal register set b...

Page 304: ...is calculated and the lower 16 bits are stored in the scalar 16 bit element de of vector register vd Operation VRCPL Reciprocal Double Prec Low Vector Element Scalar 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRCPL 1 1 5 5 vd de 5 10 6 11 6 VRCPL 1 1 0 0 0 1 24 T DivIn31 0 DivIn31 16 VR vt e 15 0 lshift 0 i 0 while i 32 and found if DivIni 1 lshift 0 found 1 endif i i 1 endwhile if DivIn31...

Page 305: ...ult31 0 0 1 romData15 0 014 rshift lshift and 15 result31 0 0rshift result31 32 rshift if VR vt e 15 0 0 then result31 0 result31 0 endif if VR vt e 15 0 0 then result31 0 0 131 DivOut31 0 result31 0 internal register used by vrcph for i in 0 7 ACC i 15 0 VR vt e 15 0 endfor VR vd de 2 15 0 DivOut15 0 ...

Page 306: ...ut the vs instruction field bits and conditionally added to the accumulator If the accumulator is negative vt is added otherwise zero is added If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VRNDN DCT Rounding Negative Vector Accumulator 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRNDN 1 1 5 5 vd vs 5 10 6 11 6 ...

Page 307: ...seif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if vs and 1 then product31 0 VR vt i 2 15 0 016 else product31 0 VR vt i 2 15 16 VR vt i 2 15 0 endif if ACC47 0 0 then ACC47 0 ACC47 0 product31 16 product31 0 else ACC47 0 ACC47 0 048 endif VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 308: ...ut the vs instruction field bits and conditionally added to the accumulator If the accumulator is positive vt is added otherwise zero is added If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VRNDP DCT Rounding Positive Vector Accumulator 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRNDP 1 1 5 5 vd vs 5 10 6 11 6 ...

Page 309: ...seif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif if vs and 1 then product31 0 VR vt i 2 15 0 016 else product31 0 VR vt i 2 15 16 VR vt i 2 15 0 endif if ACC47 0 0 then ACC47 0 ACC47 0 product31 16 product31 0 else ACC47 0 ACC47 0 048 endif VR vd i 2 15 0 Clamp_Signed ACC31 16 endfor ...

Page 310: ...in the scalar 16 bit element de of vector register vd Operation VRSQ SQRT Reciprocal Vector Element Scalar 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRSQ 1 1 5 5 vd de 5 10 6 11 6 VRSQ 1 1 0 1 0 0 24 T if VR vt e 15 0 0 then DivIn31 0 016 VR vt e 15 0 else DivIn31 0 016 VR vt e 15 0 endif lshift 0 i 0 while i 32 and found if DivIni 1 lshift 0 found 1 endif i i 1 endwhile ...

Page 311: ...or lshift mod 2 romData15 0 rsqRom addr15 0 result31 0 0 1 romData15 0 014 rshift lshift and 15 2 result31 0 0rshift result31 32 rshift if VR vt e 15 0 0 then result31 0 result31 0 endif if VR vt e 15 0 0 then result31 0 0 131 DivOut31 0 result31 0 internal register used by vrsqh for i in 0 7 ACC i 15 0 VR vt e 15 0 endfor VR vd de 2 15 0 DivOut15 0 ...

Page 312: ...t is loaded as the upper 16 bits for a pending double precision reciprocal of a square root operation Operation Exceptions None VRSQH Reciprocal Double Prec High Vector Element Scalar SQRT 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRSQH 1 1 5 5 vd de 5 10 6 11 6 VRSQH 1 1 0 1 1 0 24 T DivIn31 0 VR vt e 15 0 016 for i in 0 7 ACC i 15 0 VR vt e 15 0 endfor VR vd de 2 15 0 DivOut31 16 intern...

Page 313: ...root reciprocal is calculated and the lower 16 bits are stored in the scalar 16 bit element de of vector register vd Operation VRSQL Reciprocal Double Prec Low Vector Element Scalar SQRT 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VRSQL 1 1 5 5 vd de 5 10 6 11 6 VRSQL 1 1 0 1 0 1 24 T DivIn31 0 DivIn31 16 VR vt e 15 0 lshift 0 i 0 while i 32 and found if DivIni 1 lshift 0 found 1 endif i i ...

Page 314: ...0 rsqRom addr15 0 result31 0 0 1 romData15 0 014 rshift lshift and 15 2 result31 0 0rshift result31 32 rshift if VR vt e 15 0 0 then result31 0 result31 0 endif if VR vt e 15 0 0 then result31 0 0 131 DivOut31 0 result31 0 internal register used by vrsqh for i in 0 7 ACC i 15 0 VR vt e 15 0 endfor VR vd de 2 15 0 DivOut15 0 ...

Page 315: ...me portion of the accumulator Operation VSAR Read and Write Vector Accumulator 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VSAR 1 1 5 5 vd vs 5 10 6 11 6 VSAR 0 1 1 1 0 1 24 T for i in 0 7 if e 0 then VR vd i 2 15 0 ACC i 47 32 ACC i 47 32 VR vs i 2 15 0 else if e 1 then VR vd i 2 15 0 ACC i 31 16 ACC i 31 16 VR vs i 2 15 0 else if e 2 then VR vd i 2 15 0 ACC i 15 0 ACC i 15 0 VR vs i 2 15 ...

Page 316: ...316 Exceptions None ...

Page 317: ...rol register VCO is used as borrow in and VCO is cleared The results are clamped to 16 bit signed values and placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VSUB of Short Elements Vector Subtraction 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VSUB 1 1 5 5 vd vs 5 10 6 11 6 VSUB 0 1 ...

Page 318: ...en scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 VR vt j 2 15 0 VCOi ACC i 15 0 result15 0 VR vd i 2 15 0 Clamp_Signed result15 0 endfor VCO15 0 016 ...

Page 319: ...or control register VCO is used as borrow out The results are not clamped The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VSUBC Elements With Carry Vector Subtraction of Short 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VSUBC 1 1 5 5 vd vs 5 10 6 11 6 VSUBC 0 1 0 1 0...

Page 320: ... 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result16 0 VR vs i 2 15 0 VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 if result16 0 0 then VCOi 1 VCOi 8 1 else if result16 0 0 then VCOi 0 VCOi 8 1 else VCOi 0 VCOi 8 0 endif endfor ...

Page 321: ... with the elements of vector register vs The results are placed into vector register vd If an element specification e is present for vector register vt the selected scalar element s of vt is used as described below VXOR of Short Elements Vector XOR 31 25 26 20 21 15 16 0 COP2 e vt 6 4 5 0 1 0 0 1 0 VXOR 1 1 5 5 vd vs 5 10 6 11 6 VXOR 1 0 1 1 0 0 24 ...

Page 322: ... 1110 0010 then scalar quarter of vector j e3 0 0001 i 1110 elseif e3 0 1100 0100 then scalar half of vector j e3 0 0011 i 1100 elseif e3 0 1000 1000 then scalar whole of vector j e3 0 0111 endif result15 0 VR vs i 2 15 0 xor VR vt j 2 15 0 ACC i 15 0 result15 0 VR vd i 2 15 0 result15 0 endfor ...

Page 323: ... the contents of general register rt in a bit wise logical exclusive OR operation The result is placed into general register rd Operation Exceptions None XOR Exclusive Or 31 25 26 20 21 15 16 SPECIAL rs rt 6 5 5 rd 0 XOR 5 5 6 11 10 6 5 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 XOR T GPR rd GPR rs xor GPR rt ...

Page 324: ...ned with the contents of general register rs in a bit wise logical exclusive OR operation The result is placed into general register rt Operation Exceptions None XORI Exclusive OR Immediate 31 25 26 20 21 15 16 0 XORI rs rt immediate 6 5 5 16 0 0 1 1 1 0 XORI T GPR rt GPR rs xor 016 immediate ...

Page 325: ...22 108 110 122 108 108 110 122 123 108 108 align 119 127 136 bound 119 127 136 byte 111 119 dat 20 126 127 data 109 119 136 dbg 20 dmax 119 end 119 ent 119 half 111 119 136 lst 20 name 112 119 print 108 119 120 127 space 120 sym 20 symbol 107 111 120 127 text 109 120 unname 120 word 111 120 136 108 110 122 108 108 109 108 108 110 122 108 110 122 108 108 110 122 _ 107 __osSpDeviceBusy 143 __osSpGet...

Page 326: ...9 BCzF 28 BCzT 28 beq 121 162 BEQL 28 bgez 121 163 bgezal 121 164 168 BGEZALL 28 BGEZL 28 bgtz 121 165 BGTZALL 28 BGTZL 28 big endian 32 34 bitwise and 110 bitwise exclusive or 110 bitwise or 110 blez 121 166 BLEZL 28 bltz 121 167 bltzal 121 BLTZALL 28 BLTZL 28 bne 121 169 BNEL 28 BNF 119 BNF Specification of the RSP Assembly Language 119 borrow in 37 branch 132 branch target 43 break 28 46 122 17...

Page 327: ...97 DMA FULL 96 DMA LENGTH 97 DMA READ length 82 DMA setup 96 DMA transfer 84 135 141 143 DMA WRITE length 82 DMA_BUSY 82 88 91 96 DMA_CACHE 82 DMA_DRAM 82 DMA_FULL 82 88 DMA_READ_LENGTH 82 DMA_WRITE_LENGTH 82 DMEM 24 30 48 95 109 126 DMULT 28 DMULTU 28 Doherty Mary Jo 43 double precision add 37 double precision compare 37 71 double precision multiply 63 double precision reciprocal 79 DPC_SET_XBUS_...

Page 328: ...ommittee 62 JISC 62 jr 121 176 J type instruction 40 jump tables 126 K keywords 109 L label 109 119 127 labels 109 lb 121 177 lbu 121 178 lbv 49 122 179 LD 27 LDC1 27 LDC2 27 LDL 27 LDR 27 ldv 49 122 180 lfv 49 52 122 181 lh 121 183 lhu 121 184 lhv 49 52 122 185 linker 21 136 linking RSP objects 20 listing 20 LL 27 LLD 27 llv 49 122 187 load delay 56 load delay slot 48 loop inversion 131 loop unro...

Page 329: ...ntheses 111 Patterson D 16 PC 29 95 PIPE_BUSY 91 pipeline delay 130 134 pipeline depth 27 pipeline stall 39 43 44 plus unary 110 precedence assembler expressions 111 profiling 133 program 119 program sections RSP 109 programmed IO 144 pseudo opcode 106 Q q 113 quad 50 quarters 58 R R4000 25 R4000 instruction set 27 40 105 Rambus 135 RCP 24 rcp h 31 144 148 RD pipeline stage 41 RDP clock counter 82...

Page 330: ... 7 85 SIMD 16 23 128 129 130 single issue 43 single step 85 slave processor 27 45 sll 121 211 sllv 121 212 slt 121 213 slti 28 121 214 215 sltiu 28 121 214 215 sltu 121 216 slv 49 122 217 software pipelining 130 SP_RESERVED 82 SP_SET_YIELD 148 SP_STATUS 82 SP_STATUS_BROKE 170 SP_STATUS_INTR_BREAK 170 SP_UCODE_DATA_SIZE 126 SP_YIELDED 148 sptask h 142 spv 49 52 122 218 square root 76 sqv 49 122 219...

Page 331: ...ctor select 37 73 vector slice 34 vector unit 26 34 vectorization 128 veq 37 70 122 249 vge 37 70 122 252 vlt 37 70 122 255 vmacf 61 122 258 vmacq 61 62 122 260 vmacu 61 122 262 vmadh 62 122 264 vmadl 61 122 266 vmadm 61 122 268 vmadn 61 122 270 vmov 75 76 122 272 vmrg 37 70 122 273 vmudh 62 63 122 275 vmudl 61 63 122 277 vmudm 61 63 122 279 vmudn 61 63 122 281 vmulf 61 62 63 122 283 vmulq 61 62 1...

Reviews: