Texas Instruments TMS320C67 DSP Series Reference Manual Download Page 276 | Manualshive

Page: 276 / 391

background image

Pipeline Operation Overview

5-9

TMS320C62x Pipeline

The pipeline operation is based on CPU cycles. A CPU cycle is the period dur-
ing which a particular execute packet is in a particular pipeline phase. CPU
cycle boundaries always occur at clock cycle boundaries.

As code flows through the pipeline phases, it is processed by different parts
of the ’C62x. Figure 5–7 shows a full pipeline with a fetch packet in every
phase of fetch. One execute packet of eight instructions is being dispatched
at the same time that a 7-instruction execute packet is in decode. The arrows
between DP and DC correspond to the functional units identified in the code
in Example 5–1.

Example 5–1. Execute Packet in Figure 5–7

SADD .L1 A2,A7,A2 ; E1 Phase
|| SADD .L2 B2,B7,B2
|| SMPYH .M2X B3,A3,B2
|| SMPY .M1X B3,A3,A2
|| B .S1 LOOP1
|| MVK .S2 117,B1

LDW .D2 *B4++,B3 ; DC Phase
|| LDW .D1 *A4++,A3
|| MV .L2X A1,B0
|| SMPYH .M1 A2,A2,A0
|| SMPYH .M2 B2,B2,B10
|| SHR .S1 A2,16,A5
|| SHR .S2 B2,16,B5

LOOP1:

STH .D1 A5,*A8++[2] ; DP, PW, and PG
Phases
|| STH .D2 B5,*B8++[2]
|| SADD .L1 A2,A7.A2
|| SADD .L2 B2,B7,B2
|| SMPYH .M2X B3,A3,B2
|| SMPY .M1X B3,A3,A2
|| [B1] B .S1 LOOP1
|| [B1] SUB .S2 B1,1,B1

LDW .D2 *B4++,B3 : PR and PS Phases
|| LDW .D1 *A4++,A3
|| SADD .L1 A0,A1,A1
|| SADD .L2 B10,B0,B0
|| SMPYH .M1 A2,A2,A0
|| SMPYH .M2 B2,B2,B10
|| SHR .S1 A2,16,A5
|| SHR .S2 B2,16,B5

«
...
274
275
276
277
278
...
»

Summary of Contents for TMS320C67 DSP Series

Page 1: ...TMS320C6000 CPU and Instruction Set Reference Guide Literature Number SPRU189D March 1999 Printed on Recycled Paper...

Page 2: ...ONDUCTOR PRODUCTS MAY INVOLVE POTENTIAL RISKS OF DEATH PERSONAL INJURY OR SEVERE PROPERTY OR ENVIRONMENTAL DAMAGE CRITICAL APPLICATIONS TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED AUTHORIZED OR WARRANT...

Page 3: ...al as a reference for the architecture of the TMS320C6000 CPU First time readers should read Chapter 1 for general information about TI DSPs the features of the C6000 and the applications for which th...

Page 4: ...S320C67x Floating Point Instruction Set Chapter 5 TMS320C62x Pipeline Chapter 6 TMS320C67x Pipeline General purpose register files Chapter 2 CPU Data Paths and Control Instruction set Chapter 3 TMS320...

Page 5: ...ypes as defined in Chapter 3 TMS320C62x C67x Fixed Point Instruction Set Although the instruction mnemonic MPY in this example is in capital let ters the C6x assembler is not case sensitive it can ass...

Page 6: ...ed point DSP and provides pinouts electrical specifications and timings for the de vice TMS320C6701 Digital Signal Processor Data Sheet literature number SPRS067 describes the features of the TMS320C6...

Page 7: ...de literature number SPRU052 alphabetically lists over 100 third parties that provide various products that serve the family of TMS320 digital signal processors A myriad of products and applications a...

Page 8: ...22 25 40 Europe Customer Training Helpline Fax 49 81 61 80 40 10 Asia Pacific Literature Response Center 852 2 956 7288 Fax 852 2 956 2200 Hong Kong DSP Hotline 852 2 956 7268 Fax 852 2 956 1002 Korea...

Page 9: ...Architecture 1 7 1 4 1 Central Processing Unit CPU 1 8 1 4 2 Internal Memory 1 8 1 4 3 Peripherals 1 9 2 CPU Data Paths and Control 2 1 Summarizes the TMS320C62x C67x architecture and describes the p...

Page 10: ...s on Register Reads 3 19 3 7 6 Constraints on Register Writes 3 19 3 8 Addressing Modes 3 21 3 8 1 Linear Addressing Mode 3 21 3 8 2 Circular Addressing Mode 3 21 3 8 3 Syntax for Load Store Address G...

Page 11: ...struction Types 6 13 6 3 Functional Unit Hazards 6 20 6 3 1 S Unit Hazards 6 21 6 3 2 M Unit Hazards 6 25 6 3 3 L Unit Hazards 6 30 6 3 4 D Unit Instruction Hazards 6 34 6 3 5 Single Cycle Instruction...

Page 12: ...pts Interrupt Flag Set and Clear Registers IFR ISR ICR 7 14 7 3 3 Returning From Interrupt Servicing 7 16 7 4 Interrupt Detection and Processing 7 18 7 4 1 Setting the Nonreset Interrupt Flag 7 18 7 4...

Page 13: ...9 5 1 Fixed Point Pipeline Stages 5 2 5 2 Fetch Phases of the Pipeline 5 3 5 3 Decode Phases of the Pipeline 5 4 5 4 Execute Phases of the Pipeline and Functional Block Diagram of the TMS320C62x 5 5...

Page 14: ...on Block Diagram 6 43 6 16 Branch Instruction Phases 6 44 6 17 Branch Execution Block Diagram 6 45 6 18 2 Cycle DP Instruction Phases 6 46 6 19 4 Cycle Instruction Phases 6 47 6 20 INTDP Instruction P...

Page 15: ...Return Pointer NRP 7 16 7 11 Interrupt Return Pointer IRP 7 17 7 12 TMS320C62x Nonreset Interrupt Detection and Processing Pipeline Operation 7 19 7 13 TMS320C67x Nonreset Interrupt Detection and Pro...

Page 16: ...Unit Latency Summary 3 12 3 6 Registers That Can Be Tested by Conditional Operations 3 16 3 7 Indirect Address Generation for Load Store 3 23 3 8 Relationships Between Operands Operand Size Signed Un...

Page 17: ...Multiply M Unit Instruction Hazards 6 25 6 8 4 Cycle M Unit Instruction Hazards 6 26 6 9 MPYI M Unit Instruction Hazards 6 27 6 10 MPYID M Unit Instruction Hazards 6 28 6 11 MPYDP M Unit Instruction H...

Page 18: ...viii 7 1 Interrupt Priorities 7 3 7 2 Interrupt Service Table Pointer ISTP Field Descriptions 7 8 7 3 Interrupt Control Registers 7 10 7 4 Control Status Register CSR Interrupt Control Field Descripti...

Page 19: ...ble Interrupts Globally 7 12 7 3 Code Sequence to Enable Maskable Interrupts Globally 7 12 7 4 Code Sequence to Enable an Individual Interrupt INT9 7 14 7 5 Code Sequence to Disable an Individual Inte...

Page 20: ...sts of multiple execution units running in parallel performing multiple instructions during a single clock cycle Parallelism is the key to extremely high performance taking these DSPs well beyond the...

Page 21: ...ct of the Year Today the TMS320 family consists of many generations C1x C2x C2xx C5x and C54x fixed point DSPs C3x and C4x floating point DSPs and C8x multipro cessor DSPs Now there is a new generatio...

Page 22: ...control Power line monitoring Robotics Security access Instrumentation Medical Military Digital filtering Function generation Pattern matching Phase locked loops Seismic processing Spectrum analysis T...

Page 23: ...function applications such as Pooled modems Wireless local loop base stations Beam forming base stations Remote access servers RAS Digital subscriber loop DSL systems Cable modems Multichannel telepho...

Page 24: ...Access Port and Boundary Scan Architecture Features of the C62x C67x include Advanced VLIW CPU with eight functional units including two multipliers and six arithmetic units J Executes up to eight ins...

Page 25: ...Peak 688M FLOPS at 167 MHz for multiply and accumulate operations Hardware support for single precision 32 bit and double precision 64 bit IEEE floating point operations 32 32 bit integer multiply wi...

Page 26: ...U while peripherals such as serial ports and host ports are on only certain devices Check the data sheet for your device to determine the specific peripheral configurations you have Figure 1 1 TMS320C...

Page 27: ...16 32 bit general purpose registers The data paths are described in more detail in Chapter 2 CPU Data Paths and Control A control register file provides the means to configure and control various proc...

Page 28: ...ces have a subset of these peripherals but may not have all of them Serial ports Timers External memory interface EMIF that supports synchronous and asynchronous SRAM and synchronous DRAM DMA controll...

Page 29: ...consist of Two general purpose register files A and B Eight functional units L1 L2 S1 S2 M1 M2 D1 and D2 Two load from memory paths LD1 and LD2 Two store to memory paths ST1 and ST2 Two register file...

Page 30: ...src1 src1 src1 src1 src1 src1 src1 8 8 8 8 8 8 long dst long dst dst dst dst dst dst dst dst src2 src2 src2 src2 src2 src2 src2 long src Control register file DA1 DA2 ST1 LD1 LD2 ST2 32 32 Data path A...

Page 31: ...src1 src1 src1 src1 src1 src1 8 8 long dst long dst dst dst dst dst dst dst dst src2 src2 src2 src2 src2 src2 src2 long src Control register file DA1 DA2 ST1 LD1 32 LSB LD2 32 LSB LD2 32 MSB 32 32 Dat...

Page 32: ...bit data is contained across two registers the 32 LSBs of the data are placed in an even register and the remaining eight MSBs are placed in the eight LSBs of the next upper register which is always...

Page 33: ...4 MSBs of the odd register Operations producing a long result zero fill the 24 MSBs of the odd register The even register is encoded in the opcode Figure 2 3 Storage Scheme for 40 Bit Data in a Regist...

Page 34: ...SP DP conversion operations M unit M1 M2 16 16 bit multiply operations 32 32 bit fixed point multiply operations Floating point multiply operations D unit D1 D2 32 bit add subtract linear and circula...

Page 35: ...le The L1 and L2 units src1 and src2 inputs are also multiplex selectable between the cross path and the same side register file Only two cross paths 1X and 2X exist in the C62x C67x CPUs This limits...

Page 36: ...ht registers also contains sizes for circular addressing 2 9 CSR Control status register Contains the global interrupt enable bit cache control bits and other miscellaneous control and status bits 2 1...

Page 37: ...ds and block size fields are shown in Figure 2 4 and the mode select field encoding is shown in Table 2 4 Figure 2 4 Addressing Mode Register AMR 31 26 16 25 21 20 BK0 R W 0 Reserved R 0 R W 0 BK1 Blo...

Page 38: ...2 5 Block Size Calculations N Block Size N Block Size 00000 2 10000 131 072 00001 4 10001 262 144 00010 8 10010 524 288 00011 16 10011 1 048 576 00100 32 10100 2 097 152 00101 64 10101 4 194 304 00110...

Page 39: ...1 24 8 CPU ID CPU ID defines which CPU CPU ID 00b indicates C62x CPU ID 10b indicates C67x 23 16 8 Revision ID Revision ID defines silicon revision of the CPU 15 10 6 PWRD Control power down modes the...

Page 40: ...e PCE1 shown in Figure 2 6 contains the 32 bit address of the execute packet in the E1 pipeline phase Figure 2 6 E1 Phase Program Counter PCE1 31 PCE1 R W x 16 15 PCE1 R W x 0 Legend R Readable by the...

Page 41: ...ttempted with a NaN source Table 2 7 shows the addi tional registers used by the C67x The OVER UNDER INEX INVAL DENn NANn INFO UNORD and DIV0 bits within these registers will not be modified by a cond...

Page 42: ...cific to each of the L units L1 and L2 Figure 2 7 shows the layout of FADCR The functions of the fields in the FADCR are shown in Table 2 8 Figure 2 7 Floating Point Adder Configuration Register FADCR...

Page 43: ...to integer conversion or when infinity is subtracted from infinity 19 1 DEN2 L2 src2 is a denormalized number 18 1 DEN1 L2 src1 is a denormalized number 17 1 NAN2 L2 src2 is NaN 16 1 NAN1 L2 src1 is N...

Page 44: ...c to each of the S units S1 and S2 Figure 2 8 shows the layout of FAUCR The functions of the fields in the FAUCR are shown in Table 2 9 Figure 2 8 Floating Point Auxiliary Configuration Register FAUCR...

Page 45: ...point to integer conversion or when infinity is subtracted from infinity 19 1 DEN2 S2 src2 is a denormalized number 18 1 DEN1 S2 src1 is a denormalized number 17 1 NAN2 S2 src2 is NaN 16 1 NAN1 S2 sr...

Page 46: ...ds specific to each of the M units M1 and M2 Figure 2 9 shows the layout of FMCR The functions of the fields in the FMCR are shown in Table 2 10 Figure 2 9 Floating Point Multiplier Configuration Regi...

Page 47: ...nt to integer conversion or when infinity is subtracted from infinity 19 1 DEN2 M2 src2 is a denormalized number 18 1 DEN1 M2 src1 is a denormalized number 17 1 NAN2 M2 src2 is NaN 16 1 NAN1 M2 src1 i...

Page 48: ...C67x digital sig nal processors Also described are parallel operations conditional operations resource constraints and addressing modes Instructions unique to the C67x floating point addition subtract...

Page 49: ...tring b cond Check for either creg equal to 0 or creg not equal to 0 creg 3 bit field specifying a conditional register cstn n bit constant field for example cst5 int 32 bit integer value lmb0 x Leftm...

Page 50: ...bit x ext l r Extract and sign extend a field in x specified by l shift left value and r shift right value x extu l r Extract an unsigned field in x specified by l shift left value and r shift right...

Page 51: ...ADDK SHL ADDAB STH 15 bit offset ADDU MPYUS ADD2 SHR ADDAH STW 15 bit offset AND MPYSU AND SHRU ADDAW SUB CMPEQ MPYH B disp SSHL LDB SUBAB CMPGT MPYHU B IRP SUB LDBU SUBAH CMPGTU MPYHUS B NRP SUBU LD...

Page 52: ...l Unit to Instruction Mapping C62x C67x Functional Units Instruction L Unit M Unit S Unit D Unit ABS n ADD n n n ADDU n ADDAB n ADDAH n ADDAW n ADDK n ADD2 n AND n n B n B IRP n B NRP n B reg n CLR n...

Page 53: ...ction D Unit S Unit M Unit L Unit LDW mem n LDB mem 15 bit offset n LDBU mem 15 bit offset n LDH mem 15 bit offset n LDHU mem 15 bit offset n LDW mem 15 bit offset n LMBD n MPY n MPYU n MPYUS n MPYSU...

Page 54: ...ing Continued C62x C67x Functional Units Instruction D Unit S Unit M Unit L Unit MVKH n MVKLH n NEG n n NOP NORM n NOT n n OR n n SADD n SAT n SET n SHL n SHR n SHRU n SMPY n SMPYH n SMPYHL n SMPYLH n...

Page 55: ...d Functional Units 3 8 Table 3 3 Functional Unit to Instruction Mapping Continued C62x C67x Functional Units Instruction D Unit S Unit M Unit L Unit SUBU n n SUBAB n SUBAH n SUBAW n SUBC n SUB2 n XOR...

Page 56: ...baseR base address register creg 3 bit field specifying a conditional register cst constant csta constant a cstb constant b dst destination h MVK or MVKH bit ld st load store opfield mode addressing m...

Page 57: ...op 0 0 0 s p Operations on the D unit 3 5 5 5 6 7 6 1 0 src2 src1 cst 31 29 28 27 23 22 creg z dst src 4 3 2 1 0 1 1 s p Load store with 15 bit offset on the D unit 3 5 15 6 ld st ucst15 7 8 y 3 Load...

Page 58: ...5 5 2 Field operations immediate forms on the S unit src2 31 29 28 27 23 22 creg z dst 7 6 5 4 3 2 1 0 1 0 1 0 s p 3 5 16 h cst MVK and MVKH on the S unit Bcond disp on the S unit 31 29 28 27 creg z...

Page 59: ...are equivalent to an execution or result latency All of the instruc tions that are common to the C62x and C67x have a functional unit latency of 1 This means that a new instruction can be started on...

Page 60: ...bits are scanned from left to right lower to higher address If the p bit of instruction i is 1 then instruction i 1 is to be executed in parallel with in the the same cycle as instruction i If the p...

Page 61: ...nce Cycle Execute Packet Instructions 1 A 2 B 3 C 4 D 5 E 6 F 7 G 8 H The eight instructions are executed sequentially Example 3 2 Fully Parallel p Bit Pattern in a Fetch Packet This p bit pattern 1 1...

Page 62: ...s signify that an instruction is to execute in parallel with the pre vious instruction The code for the fetch packet in Example 3 3 would be rep resented as this instruction A instruction B instructio...

Page 63: ...Can Be Tested by Conditional Operations Specified C diti l creg z Conditional Register Bit 31 30 29 28 Unconditional 0 0 0 0 Reserved 0 0 0 1 B0 0 0 1 z B1 0 1 0 z B2 0 1 1 z A1 1 0 0 z A2 1 0 1 z Res...

Page 64: ...per data path per execute packet can read a source operand from its opposite register file via the cross paths 1X and 2X For example S1 can read both of an instruction s operands from the A register f...

Page 65: ...or storing from the same register file cannot be issued in the same execute packet The following execute packet is invalid LDW D1 A4 A5 Loading to and storing from the STW D2 A6 B4 same register file...

Page 66: ...ional registers are not included in this count The following code sequences are invalid MPY M1 A1 A1 A4 five reads of register A1 ADD L1 A1 A1 A5 SUB D1 A1 A2 A3 MPY M1 A1 A1 A4 five reads of register...

Page 67: ...n L2 and L3 might not be detected by the assembler The instructions in L4 do not constitute a write conflict because they are mutually exclusive In con trast because the instructions in L5 may or may...

Page 68: ...action instructions linear mode simply shifts the src1 cst operand to the left by 2 1 or 0 for word halfword or byte data sizes respectively and then performs the add or subtract specified 3 8 2 Circu...

Page 69: ...s borrows propagate as usual If you specify src1 greater than the circular buffer size 2 N 1 the effective offsetR cst is modulo the circular buffer size see Example 3 5 The circular buffer size in th...

Page 70: ...ore In this case you can use the B14 or B15 register as the base register and use a 15 bit constant ucst15 as the offset Table 3 7 Indirect Address Generation for Load Store Addressing Type No Modific...

Page 71: ...llowing information Assembler syntax Functional units Operands Opcode Description Execution Instruction type Delay slots Functional Unit Latency Examples The ADD instruction is used as an example to f...

Page 72: ...s situation is documented for the ADD instruction This instruction has three opcode map fields src1 src2 and dst In the seventh row the operands have the types cst5 long and long for src1 src2 and dst...

Page 73: ...2 dst sint xsint slong L1 L2 0100011 ADD src1 src2 dst uint xuint ulong L1 L2 0101011 ADDU src1 src2 dst xsint slong slong L1 L2 0100001 ADD src1 src2 dst xuint ulong ulong L1 L2 0101001 ADDU src1 src...

Page 74: ...defined in Table 3 1 on page 3 2 Pipeline This section contains a table that shows the sources read from the destina tions written to and the functional unit used during each execution cycle of the i...

Page 75: ...olute value of src2 is placed in dst Execution if cond abs src2 dst else nop The absolute value of src2 when src2 is an sint is determined as follows 1 If src2 w 0 then src2 dst 2 If src2 t 0 and src2...

Page 76: ...BS L1 A1 A5 Before instruction 1 cycle after instruction A1 8000 4E3Dh 2147463619 A1 8000 4E3Dh 2147463619 A5 XXXX XXXXh A5 7FFF B1C3h 2147463619 Example 2 ABS L1 A1 A5 Before instruction 1 cycle afte...

Page 77: ...1 L2 0000011 src1 src2 dst sint xsint slong L1 L2 0100011 src1 src2 dst uint xuint ulong L1 L2 0101011 src1 src2 dst xsint slong slong L1 L2 0100001 src1 src2 dst xuint ulong ulong L1 L2 0101001 src1...

Page 78: ...2 Description for L1 L2 and S1 S2 Opcodes src2 is added to src1 The result is placed in dst Execution for L1 L2 and S1 S2 Opcodes if cond src1 src2 dst else nop Opcode D unit 31 29 28 27 23 22 18 17 c...

Page 79: ...A4 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah A3 A2 0000 00FFh FFFF FF12h 1099511627538 A3 A2 0000 00FFh FFFF FF12h A5 A4 0000 0000h 0000 0000h 0 A5 A4 0000 0000h...

Page 80: ...er Addition Without Saturation ADD U 3 33 TMS320C62x C67x Fixed Point Instruction Set Example 6 ADD D1 26 A1 A6 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah A6 XXXX X...

Page 81: ...0 0 s p 3 5 5 5 6 7 6 1 0 src2 src1 cst Description src1 is added to src2 using the addressing mode specified for src2 The addi tion defaults to linear mode However if src2 is one of A4 A7 or B4 B7 th...

Page 82: ...0001h BK0 2 size 8 A4 in circular addressing mode using BK0 Example 2 ADDAH D1 A4 A2 A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0106h AMR 0002 0...

Page 83: ...s p 31 creg 29 28 27 23 22 7 1 3 1 1 Description A 16 bit signed constant is added to the dst register specified The result is placed in dst Execution if cond cst dst dst else nop Pipeline Stage E1 R...

Page 84: ...The upper and lower halves of the src1 operand are added to the upper and lower halves of the src2 operand Any carry from the lower half add does not affect the upper half add Execution if cond lsb16...

Page 85: ...111 src1 src2 dst scst5 xuint uint S1 S2 011110 Opcode L unit form 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 src1 cst S unit form 31 29 28 27 23 22 18 17 cr...

Page 86: ...use L or S Instruction Type Single cycle Example 1 AND L1X A1 B1 A2 Before instruction 1 cycle after instruction A1 F7A1 302Ah A1 F7A1 302Ah A2 XXXX XXXXh A2 02A0 2020h B1 02B6 E724h B1 02B6 E724h Exa...

Page 87: ...2 If two branches are in the same execute packet and both are taken behavior is undefined Two conditional branches can be in the same execute packet if one branch uses a displacement and the other use...

Page 88: ...L1 A1 A2 A3 0000 0008 ADD L2 B1 B2 B3 0000 000C LOOP MPY M1X A3 B3 A4 0000 0010 SUB D1 A5 A6 A6 0000 0014 MPY M1 A3 A6 A5 0000 0018 MPY M1 A6 A7 A8 0000 001C SHR S1 A4 15 A4 0000 0020 ADD D1 A4 A6 A4...

Page 89: ...n be in the same execute packet if one branch uses a displacement and the other uses a register IRP or NRP As long as onlly one branch has a true condition the code executes in a well defined way Exec...

Page 90: ...DD L2 B1 B2 B3 1000 000C MPY M1X A3 B3 A4 1000 0010 SUB D1 A5 A6 A6 1000 0014 MPY M1 A3 A6 A5 1000 0018 MPY M1 A6 A7 A8 1000 001C SHR S1 A4 15 A4 1000 0020 ADD D1 A4 A6 A4 Table 3 10 Program Counter V...

Page 91: ...nches can be in the same execute packet if one branch uses a displacement and the other uses a register IRP or NRP As long as only one branch has a ture condition the code executes in a well defined w...

Page 92: ...P 0000 1000 0000 0020 B S2 IRP 0000 0024 ADD S1 A0 A2 A1 0000 0028 MPY M1 A1 A0 A1 0000 002C NOP 0000 0030 SHR S1 A1 15 A1 0000 0034 ADD L1 A1 A2 A1 0000 0038 ADD L2 B1 B2 B3 Table 3 11 Program Counte...

Page 93: ...can be in the same execute packet if one branch uses a displacement and the other uses a register IRP or NRP As long as only one branch has a true condition the code executes in a well defined way Ex...

Page 94: ...00 1000 0000 0020 B S2 NRP 0000 0024 ADD S1 A0 A2 A1 0000 0028 MPY M1 A1 A0 A1 0000 002C NOP 0000 0030 SHR S1 A1 15 A1 0000 0034 ADD L1 A1 A2 A1 0000 0038 ADD L2 B1 B2 B3 Table 3 12 Program Counter Va...

Page 95: ...it Opfield src2 csta cstb dst uint ucst5 ucst5 uint S1 S2 11 src2 src1 dst xuint uint uint S1 S2 111111 Opcode Constant form 5 z cstb 6 5 0 dst 0 0 1 0 s p 31 creg 29 28 27 7 1 3 18 17 23 22 src2 5 cs...

Page 96: ...are valid for the register version of the instruction If any of the 22 MSBs are non zero the result is invalid src2 dst 0 x x x x x x x x x x x x x x x x x x x x x x x 1 1 1 1 1 0 0 0 0 x x x x x x x...

Page 97: ...CLR Clear a Bit Field 3 50 Example 2 CLR S2 B1 B3 B2 Before instruction 1 cycle after instruction B1 03B6 E7D5h B1 03B6 E7D5h B2 XXXX XXXXh B2 03B0 0001h B3 0000 0052h B3 0000 0052h...

Page 98: ...2 dst xsint slong uint L1 L2 1010001 src1 src2 dst scst5 slong uint L1 L2 1010000 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 src1 cst Description This...

Page 99: ...0h false B1 0000 4B7h 1207 B1 0000 4B7h Example 2 CMPEQ L1 Ch A1 A2 Before instruction 1 cycle after instruction A1 0000 000Ch 12 A1 0000 000Ch A2 XXXX XXXXh A2 0000 0001h true Example 3 CMPEQ L2X A1...

Page 100: ...L2 1000111 CMPGT src1 src2 dst scst5 xsint uint L1 L2 1000110 CMPGT src1 src2 dst xsint slong uint L1 L2 1000101 CMPGT src1 src2 dst scst5 slong uint L1 L2 1000100 CMPGT src1 src2 dst uint xuint uint...

Page 101: ...else 0 dst else nop Pipeline Stage E1 Read src1 src2 Written dst Unit in use L Instruction Type Single cycle Delay Slots 0 Example 1 CMPGT L1X A1 B1 A2 Before instruction 1 cycle after instruction A1...

Page 102: ...Before instruction 1 cycle after instruction A1 0000 0128h 296 A1 0000 0128h A2 FFFF FFDEh 4294967262 A2 FFFF FFDEh A3 XXXX XXXXh A3 0000 0000h false Example 6 CMPGTU L1 0Ah A1 A2 Before instruction 1...

Page 103: ...c2 dst scst5 xsint uint L1 L2 1010110 CMPLT src1 src2 dst xsint slong uint L1 L2 1010101 CMPLT src1 src2 dst scst5 slong uint L1 L2 1010100 CMPLT src1 src2 dst uint xuint uint L1 L2 1011111 CMPLTU src...

Page 104: ...2 Written dst Unit in use L Instruction Type Single cycle Delay Slots 0 Example 1 CMPLT L1 A1 A2 A3 Before instruction 1 cycle after instruction A1 0000 07E2h 2018 A1 0000 07E2h A2 0000 0F6Bh 3947 A2...

Page 105: ...1h true Unsigned 32 bit integer Example 5 CMPLTU L1 14 A1 A2 Before instruction 1 cycle after instruction A1 0000 000Fh 15 A1 0000 000Fh A2 XXXX XXXXh A2 0000 0001h true Example 6 CMPLTU L1 A1 A5 A4 A...

Page 106: ...1 1 1 1 Description The field in src2 specified by csta and cstb is extracted and sign extended to 32 bits The extract is performed by a shift left followed by a signed shift right csta and cstb are t...

Page 107: ...2 3 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24...

Page 108: ...t Example 1 EXT S1 A1 10 19 A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 XXXX XXXXh A2 FFFF F21Fh Example 2 EXT S1 A1 A2 A3 Before instruction 1 cycle after instructi...

Page 109: ...11 Description The field in src2 specified by csta and cstb is extracted and zero extended to 32 bits The extract is performed by a shift left followed by an unsigned shift right csta and cstb are th...

Page 110: ...to produce 1 2 3 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29 28...

Page 111: ...10 19 A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 XXXX XXXXh A2 0000 121Fh Example 2 EXTU S1 A1 A2 A3 Before instruction 1 cycle after instruction A1 03B6 E7D5h A1...

Page 112: ...0 0 0 0 0 s p 31 Reserved 18 17 16 14 15 1 14 13 12 11 10 9 8 7 6 0 0 0 0 0 0 0 0 1 1 1 1 1 4 3 2 Description This instruction performs an infinite multicycle NOP that terminates upon servicing an in...

Page 113: ...unsigned constant ucst5 If an offset is not given the assembler assigns an offset of zero offsetR and baseR must be in the same register file and on the same side as the D unit used The y bit in the...

Page 114: ...and s 1 indicates dst will be loaded in the B register file The r bit should be set to zero Table 3 13 Data Types Supported by Loads Mnemonic ld st Field Load Data Type SIze Left Shift of Offset LDB 0...

Page 115: ...ou must type either brackets or parentheses around the specified offset if you use the optional offset parameter Word and halfword addresses must be aligned on word two LSBs are 0 and halfword LSB is...

Page 116: ...D1 A4 A1 A8 Before LDH 1 cycle after LDH 5 cycles after LDH A1 0000 0002h A1 0000 0002h A1 0000 0002h A4 0000 0020h A4 0000 0024h A4 0000 0024h A8 1103 51FFh A8 1103 51FFh A8 FFFF A21Fh AMR 0000 0000h...

Page 117: ...gister Offset 3 70 Example 5 LDW D1 A4 1 A6 Before LDW 1 cycle after LDW 5 cycles after LDW A4 0000 0100h A4 0000 0104h A4 0000 0104h A6 1234 5678h A6 1234 5678h A6 0217 6991h AMR 0000 0000h 0000 0000...

Page 118: ...ucst15 is added to baseR Subtraction is not supported The result of the calculation is the address sent to memory The ad dressing arithmetic is always performed in linear mode For LDH U and LDB U the...

Page 119: ...ld st Field Load Data Type SIze Left Shift of Offset LDB 0 1 0 Load byte 8 0 bits LDBU 0 0 1 Load byte unsigned 8 0 bits LDH 1 0 0 Load halfword 16 1 bit LDHU 0 0 0 Load halfword unsigned 16 1 bit LD...

Page 120: ...oint Instruction Set Example LDB D2 B14 36 B1 Before LDB 1 cycle after LDB B1 XXXX XXXXh B1 XXXX XXXXh B14 0000 0100h B14 0000 0100h mem 124 127h 4E7A FF12h mem 124 127h 4E7A FF12h mem 124h 12h mem 12...

Page 121: ...following diagram illustrates the operation of LMBD for several cases 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x 0 1 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x...

Page 122: ...n Set Pipeline Stage E1 Read src1 src2 Written dst Unit in use L Instruction Type Single cycle Delay Slots 0 Example LMBD L1 A1 A2 A3 Before instruction 1 cycle after instruction A1 0000 0001h A1 0000...

Page 123: ...16 sint M1 M2 11101 MPYUS src1 src2 dst slsb16 xulsb16 sint M1 M2 11011 MPYSU src1 src2 dst scst5 xslsb16 sint M1 M2 11000 MPY src1 src2 dst scst5 xulsb16 sint M1 M2 11110 MPYSU Opcode 31 29 28 27 23...

Page 124: ...A1 0000 0123h A2 01E0 FA81h 1407 A2 01E0 FA81h A3 XXXX XXXXh A3 FFF9 C0A3 409437 Example 2 MPYU M1 A1 A2 A3 Before instruction 2 cycles after instruction A1 0000 0123h 291 A1 0000 0123h A2 0F12 FA81h...

Page 125: ...fore instruction 2 cycles after instruction A1 3497 FFF3h 13 A1 3497 FFF3h A2 XXXX XXXXh A2 FFFF FF57h 163 Example 5 MPYSU M1 13 A1 A2 Before instruction 2 cycles after instruction A1 3497 FFF3h 65523...

Page 126: ...1 src2 dst umsb16 xumsb16 uint M1 M2 00111 MPYHU src1 src2 dst umsb16 xsmsb16 sint M1 M2 00101 MPYHUS src1 src2 dst smsb16 xumsb16 sint M1 M2 00011 MPYHSU Opcode 31 29 28 27 23 22 18 17 creg z dst src...

Page 127: ...1234h 89 A2 FFA7 1234h A3 XXXX XXXXh A3 FFFF F3D5h 3115 Example 2 MPYHU M1 A1 A2 A3 Before instruction 2 cycles after instruction A1 0023 0000h 35 A1 0023 0000h A2 FFA7 1234h 65447 A2 FFA7 1234h A3 X...

Page 128: ...HL src1 src2 dst umsb16 xulsb16 uint M1 M2 01111 MPYHLU src1 src2 dst umsb16 xslsb16 sint M1 M2 01101 MPYHULS src1 src2 dst smsb16 xulsb16 sint M1 M2 01011 MPYHSLU Opcode 31 29 28 27 23 22 18 17 creg...

Page 129: ...src1 src2 Written dst Unit in use M Instruction Type Multiply 16 16 Delay Slots 1 Example MPYHL M1 A1 A2 A3 Before instruction 2 cycles after instruction A1 008A 003Eh 138 A1 008A 003Eh A2 21FF 00A7h...

Page 130: ...LH src1 src2 dst ulsb16 xumsb16 uint M1 M2 10111 MPYLHU src1 src2 dst ulsb16 xsmsb16 sint M1 M2 10101 MPYLUHS src1 src2 dst slsb16 xumsb16 sint M1 M2 10011 MPYLSHU Opcode 31 29 28 27 23 22 18 17 creg...

Page 131: ...d src1 src2 Written dst Unit in use M Instruction Type Multiply 16 16 Delay Slots 1 Example MPYLH M1 A1 A2 A3 Before instruction 2 cycles after instruction A1 0900 000Eh 14 A1 0900 000Eh A2 0029 00A7h...

Page 132: ...src dst xsint sint L1 L2 0000010 src dst sint sint D1 D2 010010 src dst slong slong L1 L2 0100001 src dst xsint sint S1 S2 000110 Opcode See ADD instruction Description This is a pseudo operation that...

Page 133: ...scription The src2 register is moved from the control register file to the register file Valid values for src2 are any register listed in the control register file Operands when moving from the regist...

Page 134: ...nterrupt enable register 00100 R W ISTP Interrupt service table pointer 00101 R W IRP Interrupt return pointer 00110 R W NRP Nonmaskable interrupt return pointer 00111 R W PCE1 Program counter E1 phas...

Page 135: ...has one delay slot because the results cannot be read by the MVC instruction in the IFR until two cycles after the write to the ISR or ICR Delay Slots 0 Example MVC S2 B1 AMR Before instruction 1 cyc...

Page 136: ...creg 29 28 27 23 22 7 1 3 1 1 Description The 16 bit constant is sign extended and placed in dst Execution if cond scst16 dst else nop Pipeline Stage E1 Read Written dst Unit in use S Instruction Type...

Page 137: ...93 A1 Before instruction 1 cycle after instruction A1 XXXX XXXXh A1 0000 0125h 293 Example 2 MVK S2 125h B1 Before instruction 1 cycle after instruction B1 XXXX XXXXh B1 0000 0125h 293 Example 3 MVK S...

Page 138: ...on The 16 bit constant cst is loaded into the upper 16 bits of dst The 16 LSBs of dst are unchanged The assembler encodes the 16 MSBs of a 32 bit constant into the cst field of the opcode for the MVKH...

Page 139: ...uctions MVK 0x5678 MVKLH 0x1234 You could also use MVK 0x12345678 MVKH 0x12345678 If you are loading the address of a label use MVK label MVKH label Example 1 MVKH S1 0A329123h A1 Before instruction 1...

Page 140: ...Opfield src dst xsint sint S1 S2 010110 src dst xsint sint L1 L2 0000110 src dst slong slong L1 L2 0100100 Opcode See SUB instruction Description This is a pseudo operation used to negate src and pla...

Page 141: ...A multicycle NOP will not finish if a branch is completed first For example if a branch is initiated on cycle n and a NOP 5 instruction is initiated on cycle n 3 the branch is complete on cycle n 6 an...

Page 142: ...TMS320C62x C67x Fixed Point Instruction Set Example 2 MVK S1 1 A1 MVKLH S1 0 A1 NOP 5 ADD L1 A1 A2 A1 Before NOP 5 1 cycle after ADD instruction 6 cycles after NOP 5 A1 0000 0001h A1 0000 0004h A2 000...

Page 143: ...x x x x x x x x x x x x x x x x x x x x x x x x In this case NORM returns 3 In this case NORM returns 30 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29...

Page 144: ...E1 Read src2 Written dst Unit in use L Delay Slots 0 Example 1 NORM L1 A1 A2 Before instruction 1 cycle after instruction A1 02A3 469Fh A1 02A3 469Fh A2 XXXX XXXXh A2 0000 0005h 5 Example 2 NORM L1 A1...

Page 145: ...uint uint L1 L2 1101110 src dst xuint uint S1 S2 001010 Opcode See XOR instruction Description This is a pseudo operation used to bitwise NOT the src operand and place the result in dst The assembler...

Page 146: ...src1 src2 dst uint xuint uint S1 S2 011011 src1 src2 dst scst5 xuint uint S1 S2 011010 Opcode L unit form 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 src1 cs...

Page 147: ...truction Type Single cycle Delay Slots 0 Example 1 OR L1X A1 B1 A2 Before instruction 1 cycle after instruction A1 08A3 A49Fh A1 08A3 A49Fh A2 XXXX XXXXh A2 08FF B7DFh B1 00FF 375Ah B1 00FF 375Ah Exam...

Page 148: ...eg z dst 13 12 11 5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 src1 cst Description src1 is added to src2 and saturated if an overflow occurs according to the fol lowing rules 1 If the dst is an int and...

Page 149: ...A1 5A2E 51A3h A2 012A 3FA2h 19546018 A2 012A 3FA2h A2 012A 3FA2h A3 XXXX XXXXh A3 5B58 9145h 1532531013 A3 5B58 9145h CSR 0001 0100h CSR 0001 0100h CSR 0001 0100h Not saturated Example 2 SADD L1 A1 A...

Page 150: ...instruction A5 A4 0000 0000h 7C83 39B1h 1922644401 A5 A4 0000 0000h 7C83 39B1h A7 A6 XXXX XXXXh XXXX XXXXh A7 A6 0000 0000h 8DAD 7953h 2376956243 B2 112A 3FA2h 287981474 B2 112A 3FA2h CSR 0001 0100h C...

Page 151: ...ription A 40 bit src2 value is converted to a 32 bit value If the value in src2 is greater than what can be represented in 32 bits src2 is saturated The result is placed in dst If a saturate occurs th...

Page 152: ...aturated Example 2 SAT L2 B1 B0 B5 Before instruction 1 cycle after instruction 2 cycles after instruction B1 B0 0000 0000h A190 7321h B1 B0 0000 0000h A190 7321h B1 B0 0000 0000h A190 7321h B5 XXXX X...

Page 153: ...nd type Unit src2 csta cstb dst uint ucst5 ucst5 uint S1 S2 src2 src1 dst xuint uint uint S1 S2 Opcode Constant form 5 5 z dst cstb 6 5 0 src2 1 0 0 0 1 0 s p 31 creg 29 28 27 23 22 7 1 3 18 13 1 1 17...

Page 154: ...rc2 is 31 In the example below csta is 15 and cstb is 23 Only the ten LSBs are valid for the register version of the instruction If any of the 22 MSBs are non zero the result is invalid src2 dst 0 x x...

Page 155: ...efore instruction 1 cycle after instruction A0 4B13 4A1Eh A0 4B13 4A1Eh A1 XXXX XXXXh A1 4B3F FF9Eh Example 2 SET S2 B0 B1 B2 Before instruction 1 cycle after instruction B0 9ED3 1A31h B0 9ED3 1A31h B...

Page 156: ...S2 110000 src2 src1 dst xuint ucst5 ulong S1 S2 010010 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 5 4 3 2 1 0 op 0 0 0 s p 3 5 5 5 6 6 1 11 x src1 cst src2 Description The src2 operand is shifte...

Page 157: ...tion A0 29E3 D31Ch A0 29E3 D31Ch A1 XXXX XXXXh A1 9E3D 31C0h Example 2 SHL S2 B0 B1 B2 Before instruction 1 cycle after instruction B0 4197 51A5h B0 4197 51A5h B1 0000 0009h B1 0000 0009h B2 XXXX XXXX...

Page 158: ...0 s p 3 5 5 5 6 6 1 11 x src1 cst src2 Description The src2 operand is shifted to the right by the src1 operand The sign extended result is placed in dst When a register is used the six LSBs specify t...

Page 159: ...Example 2 SHR S2 B0 B1 B2 Before instruction 1 cycle after instruction B0 1492 5A41h B0 1492 5A41h B1 0000 0012h B1 0000 0012h B2 XXXX XXXXh B2 0000 0524h Example 3 SHR S2 B1 B0 B2 B3 B2 Before instr...

Page 160: ...0 s p 3 5 5 5 6 6 1 11 x src1 cst src2 Description The src2 operand is shifted to the right by the src1 operand The zero extended result is placed in dst When a register is used the six LSBs specify t...

Page 161: ...SHRU Logical Shift Right 3 114 Delay Slots 0 Example SHRU S1 A0 8 A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 XXXX XXXXh A1 00F1 2363h...

Page 162: ...PY src1 src2 dst smsb16 xslsb16 sint M1 M2 01010 SMPYHL src1 src2 dst slsb16 xsmsb16 sint M1 M2 10010 SMPYLH src1 src2 dst smsb16 xsmsb16 sint M1 M2 00010 SMPYH Opcode 31 29 28 27 23 22 18 17 creg z d...

Page 163: ...1 SMPY M1 A1 A2 A3 Before instruction 2 cycle after instruction A1 0000 0123h 291 A1 0000 0123h A2 01E0 FA81h 1407 A2 01E0 FA81h A3 XXXX XXXXh A3 FFF3 8146h 818874 CSR 0001 0100h CSR 0001 0100h Not sa...

Page 164: ...Point Instruction Set Example 3 SMPYLH M1 A1 A2 A3 Before instruction 2 cycles after instruction A1 0000 8000h 32768 A1 0000 8000h A2 8000 0000h 32768 A2 8000 0000h A3 XXXX XXXXh A3 7FFF FFFFh 214748...

Page 165: ...a register is used to specify the shift the five least significant bits specify the shift amount Valid values are 0 through 31 and the result of the shift is invalid if the shift amount is greater tha...

Page 166: ...031Ch A0 02E3 031Ch A1 XXXX XXXXh A1 0B8C 0C70h A1 0B8C 0C70h CSR 0001 0100h CSR 0001 0100h CSR 0001 0100h Not saturated Example 2 SSHL S1 A0 A1 A2 Before instruction 1 cycle after instruction 2 cycl...

Page 167: ...5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 src1 cst Description src2 is subtracted from src1 and is saturated to the result size according to the following rules 1 If the result is an int and src1 src2...

Page 168: ...512984995 B1 5A2E 51A3h B1 5A2E 51A3h B2 802A 3FA2h 2144714846 B2 802A 3FA2h B2 802A 3FA2h B3 XXXX XXXXh B3 7FFF FFFFh 2147483647 B3 7FFF FFFFh CSR 0001 0100h CSR 0001 0100h CSR 0001 0300h Saturated E...

Page 169: ...selects the D2 unit and baseR and offsetR from the B register file offsetR ucst5 is scaled by a left shift of 0 1 or 2 for STB STH and STW respectively After scaling offsetR ucst5 is added to or subt...

Page 170: ...R ucst5 Preincrement 1 0 0 0 R ucst5 Predecrement 1 0 1 1 R ucst5 Postincrement 1 0 1 0 R ucst5 Postdecrement Increments and decrements default to 1 and offsets default to zero when no bracketed regi...

Page 171: ...Chapter 6 TMS320C67x Pipeline Example 1 STB D1 A1 A10 Before instruction 1 cycle after instruction 3 cycles after instruction A1 9A32 7634h A1 9A32 7634h A1 9A32 7634h A10 0000 0100h A10 0000 0100h A1...

Page 172: ...0000 0100h A10 0000 0104h A10 0000 0104h mem 100h 1111 1134h mem 100h 1111 1134h mem 100h 1111 1134h mem 104h 0000 1111h mem 104h 0000 1111h mem 104h 9A32 7634h Example 4 STH D1 A1 A10 A11 Before inst...

Page 173: ...dded to baseR The result of the calcula tion is the address that is sent to memory The addressing arithmetic is always performed in linear mode For STB and STH the 8 and 16 LSBs of the src register ar...

Page 174: ...6 1 bit STW 1 1 1 Store word 32 2 bits Execution if cond src mem else nop Pipeline Stage E1 E2 E3 Read B14 B15 src Written Unit in use D2 Instruction Type Store Delay Slots 0 Note This instruction exe...

Page 175: ...nt L1 L2 0000111 SUB src1 src2 dst xsint sint sint L1 L2 0010111 SUB src1 src2 dst sint xsint slong L1 L2 0100111 SUB src1 src2 dst xsint sint slong L1 L2 0110111 SUB src1 src2 dst uint xuint ulong L1...

Page 176: ...5 5 7 src2 src1 cst S unit form 31 29 28 27 23 22 18 17 creg z dst 13 12 5 4 3 2 1 0 op 0 0 0 s p 3 5 5 5 6 6 1 11 x src1 cst src2 Description for L1 L2 and S1 S2 Opcodes src2 is subtracted from src1...

Page 177: ...ant ucst5 allows a greater offset for addressing with the D unit Pipeline Stage E1 Read src1 src2 Written dst Unit in use L S or D Instruction Type Single cycle Delay Slots 0 Example 1 SUB L1 A1 A2 A3...

Page 178: ...creg z dst 13 12 5 4 3 2 1 0 op 0 0 0 s p 3 5 5 5 6 7 6 1 0 src2 src1 cst Description src1 is subtracted from src2 The subtraction defaults to linear mode Howev er if src2 is one of A4 A7 or B4 B7 th...

Page 179: ...0004h A0 0000 0004h A5 0000 4000h A5 0000 400Ch AMR 0003 0004h AMR 0003 0004h BK0 3 size 16 A5 in circular addressing mode using BK0 Example 2 SUBAW D1 A5 2 A3 Before instruction 1 cycle after instru...

Page 180: ...1 5 4 3 2 1 0 x 1 0 0 1 0 1 1 1 1 0 s p 3 5 5 5 7 src2 src1 Description Subtract src2 from src1 If result is greather than or equal to 0 left shift result by 1 add 1 to it and place it in dst If resul...

Page 181: ...BC L1 A0 A1 A0 Before instruction 1 cycle after instruction A0 0000 125Ah 4698 A0 0000 024B4h 9396 A1 0000 1F12h 7954 A1 0000 1F12h Example 2 SUBC L1 A0 A1 A0 Before instruction 1 cycle after instruct...

Page 182: ...wer halves of src2 are subtracted from the upper and lower halves of src1 Any borrow from the lower half subtraction does not affect the upper half subtraction Execution if cond lsb16 src1 lsb16 src2...

Page 183: ...uint uint S1 S2 001011 src1 src2 dst scst5 xuint uint S1 S2 001010 Opcode L unit form 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 src1 cst S unit form 31 29 2...

Page 184: ...Unit in use L or S Instruction Type Single cycle Delay Slots 0 Example 1 XOR L1 A1 A2 A3 Before instruction 1 cycle after instruction A1 0721 325Ah A1 0721 325Ah A2 0019 0F12h A2 0019 0F12h A3 XXXX X...

Page 185: ...dst sint S1 S2 010111 dst slong L1 L2 0110111 Description This is a pseudo operation used to fill the dst register with 0s by subtracting the dst from itself and placing the result in the dst The ass...

Page 186: ...cluding addition subtraction and multiplication This chapter de scribes these C67x specific instructions Instructions that are common to both the C62x and C67x are described in Chapter 3 Topic Page 4...

Page 187: ...sion floating point register value dp x Convert x to dp dst_h msb32 of dst dst_l lsb32 of dst int 32 bit integer value int x Convert x to integer lsbn or LSBn n least significant bits for example lsb3...

Page 188: ...mple ucstn5 uint Unsigned 32 bit integer value dp Double precision floating point register value xsint Signed 32 bit integer value that can optionally use cross path sp Single precision floating point...

Page 189: ...S Unit D Unit ADDDP MPYDP ABSDP ADDAD ADDSP MPYI ABSSP LDDW DPINT MPYID CMPEQDP DPSP MPYSP CMPEQSP INTDP CMPGTDP INTDPU CMPGTSP INTSP CMPLTDP INTSPU CMPLTSP SPINT RCPDP SPTRUNC RCPSP SUBDP RSQRDP SUB...

Page 190: ...t S Unit M Unit L Unit CMPLTDP n DP compare CMPLTSP n Single cycle DPINT n 4 cycle DPSP n 4 cycle DPTRUNC n 4 cycle INTDP n INTDP INTDPU n INTDP INTSP n 4 cycle INTSPU n 4 cycle LDDW n Load MPYDP n MP...

Page 191: ...produce a double precision result write the low 32 bit word one cycle before writing the high 32 bit word If an instruction that writes a DP result is followed by an instruction that uses the result...

Page 192: ...issa field x Can have value of 0 or 1 don t care NaN Not a Number SNaN or QNaN SNaN Signal NaN QNaN Quiet NaN NaN_out QNaN with all bits in the f field 1 Inf Infinity LFPN Largest floating point numbe...

Page 193: ...int fields represent floating point numbers within two ranges normalized e is between 0 and 255 and denormalized e is 0 The following formulas define how to translate the s e and f fields into a singl...

Page 194: ...x0000 0001 1 40129846e 45 Figure 4 2 shows the fields of a double precision floating point number repre sented within a pair of 32 bit registers Figure 4 2 Double Precision Floating Point Fields 31 e...

Page 195: ...le 4 8 shows hex and decimal values for some double precision floating point numbers Table 4 8 Hex and Decimal Representation for Selected Double Precision Values Symbol Hex Value Decimal Value NaN_ou...

Page 196: ...es the functional unit read ports For example the ADDDP instruction has a functional unit latency of 2 Operands are read on cycle i and cycle i 1 Therefore a new instruction cannot begin until cycle i...

Page 197: ...tional unit on cycles i i 1 i 2 and i 3 If a cross path is used to read a source in an instruction with a multicycle func tional unit latency you must ensure that no other instructions executing on th...

Page 198: ...on cycle i 3 or i 4 due to a write hazard on cycle i 3 or i 4 respectively An INTDP instruction cannot be scheduled on that func tional unit on cycle i 1 due to a write hazard on cycle i 1 A 4 cycle i...

Page 199: ...nit on cycle i 2 or i 3 due to a write hazard on cycle i 5 or i 6 respectively All of the above cases deal with double precision floating point instructions or the MPYI or MPYID instructions except fo...

Page 200: ...idual Instruction Descriptions This section gives detailed information on the floating point instruction set for the C67x Each instruction presents the following information Assembler syntax Functiona...

Page 201: ...2 port for the 32 MSBs and the src1 port for the 32 LSBs Execution if cond abs src2 dst else nop The absolute value of src2 is determined as follows 1 If src2 w 0 then src2 dst 2 If src2 t 0 then src2...

Page 202: ...ay slots can be reduced by one because these instructions read the lower word of the DP source one cycle before the upper word of the DP source Instruction Type 2 cycle DP Delay Slots 1 Functional Uni...

Page 203: ...abs src2 dst else nop The absolute value of src2 is determined as follows 1 If src2 w 0 then src2 dst 2 If src2 t 0 then src2 dst Notes 1 If scr2 is SNaN NaN_out is placed in dst and the INVAL and NA...

Page 204: ...Absolute Value ABSSP 4 19 TMS320C67x Floating Point Instruction Set Functional Unit Latency 1 Example ABSSP S1X B1 A5 Before instruction 1 cycle after instruction B1 c020 0000h 2 5 B1 c020 0000h 2 5...

Page 205: ...leword addressing mode specified for src2 The addition defaults to linear mode However if src2 is one of A4 A7 or B4 B7 the mode can be changed to circular mode by writing the appropri ate value to th...

Page 206: ...DDAD 4 21 TMS320C67x Floating Point Instruction Set Functional Unit Latency 1 Example ADDAD D1 A1 A2 A3 Before instruction 1 cycle after instruction A1 0000 1234h 4660 A1 0000 1234h 4660 A2 0000 0002h...

Page 207: ...or L2 Opcode map field used For operand type Unit src1 src2 dst dp xdp dp L1 L2 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x 0 0 1 1 0 0 0 1 1 0 s p 3 5 5 5 7 src2 src1 Description...

Page 208: ...int number Overflow Output Rounding Mode Result Sign Nearest Even Zero Infinity Infinity infinity LFPN infinity LFPN infinity LFPN LFPN infinity 6 If underflow occurs the INEX and UNDER bits are set a...

Page 209: ...be reduced by one because these instructions read the lower word of the DP source one cycle before the upper word of the DP source Instruction Type ADDDP SUBDP Delay Slots 6 Functional Unit Latency 2...

Page 210: ...rc1 src2 dst unit L1 or L2 Opcode map field used For operand type Unit src1 src2 dst sp xsp sp L1 L2 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x 0 0 1 0 0 0 0 1 1 0 s p 3 5 5 5 7...

Page 211: ...put Rounding Mode Result Sign Nearest Even Zero Infinity Infinity infinity LFPN infinity LFPN infinity LFPN LFPN infinity 6 If underflow occurs the INEX and UNDER bits are set and the results are roun...

Page 212: ...ge E1 E2 E3 E4 Read src1 src2 Written dst Unit in use L Instruction Type 4 cycle Delay Slots 3 Functional Unit Latency 1 Example ADDSP L1 A1 A2 A3 Before instruction 4 cycles after instruction A1 C020...

Page 213: ...src1 src2 Description This instruction compares src1 to src2 If src1 equals src2 1 is written to dst Otherwise 0 is written to dst Execution if cond if src1 src2 1 dst else 0 dst else nop Special cas...

Page 214: ...xcept the NaNn and DENn bits when appropriate Pipeline Stage E1 E2 Read src1_l src2_l src1_h src2_h Written dst Unit in use S S Instruction Type DP compare Delay Slots 1 Functional Unit Latency 2 Exam...

Page 215: ...src2 Description This instruction compares src1 to src2 If src1 equals src2 1 is written to dst Otherwise 0 is written to dst Execution if cond if src1 src2 1 dst else 0 dst else nop Special cases of...

Page 216: ...hose shown in the preceding table are set except for the NaNn and DENn bits when appropriate Pipeline Stage E1 Read src1 src2 Written dst Unit in use S Instruction Type Single cycle Delay Slots 0 Func...

Page 217: ...c2 Description This instruction compares src1 to src2 If src1 is greater than src2 1 is written to dst Otherwise 0 is written to dst Execution if cond if src1 src2 1 dst else 0 dst else nop Special ca...

Page 218: ...s when appropriate Pipeline Stage E1 E2 Read src1_l src2_l src1_h src2_h Written dst Unit in use S S Instruction Type DP compare Delay Slots 1 Functional Unit Latency 2 Example CMPGTDP S1 A1 A0 A3 A2...

Page 219: ...c2 Description This instruction compares src1 to src2 If src1 is greater than src2 1 is written to dst Otherwise 0 is written to dst Execution if cond if src1 src2 1 dst else 0 dst else nop Special ca...

Page 220: ...e are set ex cept for the NaNn and DENn bits when appropriate Pipeline Stage E1 Read src1 src2 Written dst Unit in use S Instruction Type Single cycle Delay Slots 0 Functional Unit Latency 1 Example C...

Page 221: ...2 Description This instruction compares src1 to src2 If src1 is less than src2 1 is written to dst Otherwise 0 is written to dst Execution if cond if src1 t src2 1 dst else 0 dst else nop Special case...

Page 222: ...s when appropriate Pipeline Stage E1 E2 Read src1_l src2_l src1_h src2_h Written dst Unit in use S S Instruction Type DP compare Delay Slots 1 Functional Unit Latency 2 Example CMPLTDP S1X A1 A0 B3 B2...

Page 223: ...2 Description This instruction compares src1 to src2 If src1 is less than src2 1 is written to dst Otherwise 0 is written to dst Execution if cond if src1 t src2 1 dst else 0 dst else nop Special case...

Page 224: ...e are set ex cept for the NaNn and DENn bits when appropriate Pipeline Stage E1 Read src1 src2 Written dst Unit in use S Instruction Type Single cycle Delay Slots 0 Functional Unit Latency 1 Example C...

Page 225: ...MSBs and the src1 port for the 32 LSBs Execution if cond int src2 dst else nop Notes 1 If src2 is NaN the maximum signed integer 7FFF FFFFh or 8000 0000h is placed in dst and the INVAL bit is set 2 If...

Page 226: ...PINT 4 41 TMS320C67x Floating Point Instruction Set Delay Slots 3 Functional Unit Latency 1 Example DPINT L1 A1 A0 A4 Before instruction 4 cycles after instruction A1 A0 4021 3333h 3333 3333h 8 6 A1 A...

Page 227: ...st dp sp L1 L2 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x 0 0 0 1 0 0 1 1 1 0 s p 3 5 5 5 7 src2 0 0 0 0 0 Description The double precision 64 bit value in src2 is converted to a...

Page 228: ...the INEX and DEN2 bits are set 5 If src2 is signed infinity the result is signed infinity and the INFO bit is set 6 If overflow occurs the INEX and OVER bits are set and the results are set as follows...

Page 229: ...Stage E1 E2 E3 E4 Read src2_l src2_h Written dst Unit in use L Instruction Type 4 cycle Delay Slots 3 Functional Unit Latency 1 Example DPSP L1 A1 A0 A4 Before instruction 4 cycles after instruction...

Page 230: ...ero truncate is always used The 64 bit operand is read in one cycle by using the src2 port for the 32 MSBs and the src1 port for the 32 LSBs Execution if cond int src2 dst else nop Notes 1 If src2 is...

Page 231: ...Value to Integer With Truncation 4 46 Delay Slots 3 Functional Unit Latency 1 Example DPTRUNC L1 A1 A0 A4 Before instruction 4 cycles after instruction A1 A0 4021 3333h 3333 3333h 8 6 A1 A0 4021 3333...

Page 232: ...0 Description The integer value in src2 is converted to a double precision value and placed in dst Execution if cond dp src2 dst else nop You cannot set configuration bits with this instruction Pipeli...

Page 233: ...ter instruction B4 1965 1127h 426053927 B4 1965 1127h 426053927 A1 A0 XXXX XXXXh XXXX XXXXh A1 A0 41B9 6511h 2700 0000h 4 2605393 E08 Example 2 INTDPU L1 A4 A1 A0 Before instruction 5 cycles after ins...

Page 234: ...L2 1001001 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3 2 1 0 x op 1 1 0 s p 3 5 5 5 7 src2 0 0 0 0 0 Description The integer value in src2 is converted to single precision value and plac...

Page 235: ...ore instruction 4 cycles after instruction A1 1965 1127h 426053927 A1 1965 1127h 426053927 A2 XXXX XXXXh A2 4DCB 2889h 4 2605393 E08 Example 2 INTSPU L1X B1 A2 Before instruction 4 cycles after instru...

Page 236: ...other load and store instructions The dst field must always be an even value because LDDW loads register pairs Therefore bit 23 is always zero Further more the value of the ld st field is110 The brac...

Page 237: ...pending on the mode selected When LDDW is used to load two 32 bit single precision floating point values or two 32 bit integer val ues the order is dependent on the endian mode used In little endian m...

Page 238: ...0 XXXX XXXXh XXXX XXXXh A1 A0 4021 3333h 3333 3333h 8 6 B10 0000 0010h 16 B10 0000 0010h 16 mem 0x18 3333 3333h 4021 3333h 8 6 mem 0x18 3333 3333h 4021 3333h 8 6 Little endian mode Example 2 LDDW D1 A...

Page 239: ...ut signs 2 Signed infinity multiplied by signed infinity or a normalized number other than signed 0 returns signed infinity Signed infinity multiplied by signed 0 returns a signed NaN_out and sets the...

Page 240: ...BDP instruction the number of delay slots can be reduced by one because these instructions read the lower word of the DP source one cycle before the upper word of the DP source Instruction Type MPYDP...

Page 241: ...Description The src1 operand is multiplied by the src2 operand The lower 32 bits of the result are placed in dst Execution if cond lsb32 src1 src2 dst else nop Pipeline Stage E1 E2 E3 E4 E5 E6 E7 E8...

Page 242: ...7 creg z dst src2 13 12 11 5 4 3 2 1 0 x op 0 0 0 s p 3 5 5 5 5 7 6 0 0 src1 cst Description The src1 operand is multiplied by the src2 operand The 64 bit result is placed in the dst register pair Exe...

Page 243: ...Bits 4 58 Example MPYID M1 A1 A2 A5 A4 Before instruction 10 cycles after instruction A1 0034 5678h 3430008 A1 0034 5678h 3430008 A2 0011 2765h 1124197 A2 0011 2765h 1124197 A5 A4 XXXX XXXXh XXXX XXXX...

Page 244: ...xclusive or of the input signs 2 Signed infinity multiplied by signed infinity or a normalized number other than signed 0 returns signed infinity Signed infinity multiplied by signed 0 returns a signe...

Page 245: ...the number of delay slots can be reduced by one because these instructions read the lower word of the DP source one cycle before the upper word of the DP source Instruction Type 4 cycle Delay Slots 3...

Page 246: ...ort for the 32 MSBs The RCPDP instruction provides the correct exponent and the mantissa is accurate to the eighth binary position therefore mantissa error is less than 2 8 This estimate can be used a...

Page 247: ...ws signed 0 is placed in dst and the INEX and UNDER bits are set Underflow occurs when 21022 t src2 t infinity Pipeline Stage E1 E2 Read src2_l src2_h Written dst_l dst_h Unit in use S If dst is used...

Page 248: ...instruction provides the correct exponent and the mantissa is accurate to the eighth binary position therefore mantissa error is less than 2 8 This estimate can be used as a seed value for an algorith...

Page 249: ...If src2 is signed 0 signed infinity is placed in dst and the DIV0 and INFO bits are set 5 If src2 is signed infinity signed 0 is placed in dst 6 If the result underflows signed 0 is placed in dst and...

Page 250: ...the 32 MSBs The RSQRDP instruction provides the correct exponent and the mantissa is accurate to the eighth binary position therefore mantissa error is less than 2 8 This estimate can be used as a se...

Page 251: ...igned 0 signed infinity is placed in dst and the DIV0 and INFO bits are set The Newton Rhapson approximation cannot be used to cal culate the square root of 0 because infinity multiplied by 0 is inval...

Page 252: ...imation RSQRDP 4 67 TMS320C67x Floating Point Instruction Set Example RCPDP S1 A1 A0 A3 A2 Before instruction 2 cycles after instruction A1 A0 4010 0000h 0000 0000h 4 0 A1 A0 4010 0000h 0000 0000h 4 0...

Page 253: ...e correct exponent and the mantissa is accurate to the eighth binary position therefore mantissa error is less than 2 8 This estimate can be used as a seed value for an algorithm to compute the recipr...

Page 254: ...signed infinity is placed in dst and the DIV0 INEX and DEN2 bits are set 5 If src2 is signed 0 signed infinity is placed in dst and the DIV0 and INFO bits are set The Newton Rhapson approximation can...

Page 255: ...Precision Floating Point Square Root Reciprocal Approximation 4 70 Example 2 RSQRSP S2X A1 B2 Before instruction 1 cycle after instruction A1 4109 999Ah 8 6 A1 4109 999Ah 8 6 B2 XXXX XXXXh B2 3EAE 800...

Page 256: ...p src2 dst else nop Notes 1 If src2 is SNaN NaN_out is placed in dst and the INVAL and NAN2 bits are set 2 If src2 is QNaN NaN_out is placed in dst and the NAN2 bit is set 3 If src2 is a signed denorm...

Page 257: ...sion Floating Point Value 4 72 Instruction Type 2 cycle DP Delay Slots 1 Functional Unit Latency 1 Example SPDP S1X B2 A1 A0 Before instruction 2 cycles after instruction B2 4109 999Ah 8 6 B2 4109 999...

Page 258: ...if cond int src2 dst else nop Notes 1 If src2 is NaN the maximum signed integer 7FFF FFFFh or 8000 0000h is placed in dst and the INVAL bit is set 2 If src2 is signed infinity or if overflow occurs t...

Page 259: ...INT Convert Single Precision Floating Point Value to Integer 4 74 Example SPINT L1 A1 A2 Before instruction 4 cycles after instruction A1 4109 9999Ah 8 6 A1 4109 999Ah 8 6 A2 XXXX XXXXh A2 0000 0009h...

Page 260: ...nding modes in the FADCR are ignored and round toward zero truncate is always used Execution if cond int src2 dst else nop Notes 1 If src2 is NaN the maximum signed integer 7FFF FFFFh or 8000 0000h is...

Page 261: ...ecision Floating Point Value to Integer With Truncation 4 76 Functional Unit Latency 1 Example SPTRUNC L1X B1 A2 Before instruction 4 cycles after instruction B1 4109 9999Ah 8 6 B1 4109 999Ah 8 6 A2 X...

Page 262: ...BDP unit src1 src2 dst unit L1 or L2 Opcode map field used For operand type Unit Opfield src1 src2 dst dp xdp dp L1 L2 0011001 src1 src2 dst xdp dp dp L1 L2 0011101 Opcode 31 29 28 27 23 22 18 17 creg...

Page 263: ...verflow Output Rounding Mode Result Sign Nearest Even Zero Infinity Infinity infinity LFPN infinity LFPN infinity LFPN LFPN infinity 6 If underflow occurs the INEX and UNDER bits are set and the resul...

Page 264: ...er of delay slots can be reduced by one because these instructions read the lower word of the DP source one cycle before the upper word of the DP source Instruction Type ADDDP SUBDP Delay Slots 6 Func...

Page 265: ...t unit L1 or L2 Opcode map field used For operand type Unit Opfield src1 src2 dst sp xsp sp L1 L2 0010001 src1 src2 dst xsp sp sp L1 L2 0010101 Opcode 31 29 28 27 23 22 18 17 creg z dst 13 12 11 5 4 3...

Page 266: ...oating point number Overflow Output Rounding Mode Result Sign Nearest Even Zero Infinity Infinity infinity LFPN infinity LFPN infinity LFPN LFPN infinity 6 If underflow occurs the INEX and UNDER bits...

Page 267: ...d src1 src2 Written dst Unit in use L Instruction Type 4 cycle Delay Slots 3 Functional Unit Latency 1 Example SUBSP L1X A2 B1 A3 Before instruction 4 cycles after instruction A2 4109 999Ah A2 4109 99...

Page 268: ...y during the same pipeline phase eliminating read after write memory conflicts All instructions require the same number of pipeline phases for fetch and decode but require a varying number of execute...

Page 269: ...PR Program fetch packet receive The C62x uses a fetch packet FP of eight instructions All eight of the instruc tions proceed through fetch processing together through the PG PS PW and PR phases Figur...

Page 270: ...Fetch Phases of the Pipeline PR PW PS PG PW Memory PS PR PG Registers units Functional a b CPU PR PW PS PG 256 MVK LDW LDW SHL ADD MVK LDW LDW NOP MVK MV B SADD SMPYH SADD SHR SMPY SHR SMPYH LDW LDW...

Page 271: ...e pipeline The last six instruc tions of the fetch packet FP are parallel and form an execute packet EP This EP is in the dispatch phase DP of the decode stage The arrows indicate each instruction s a...

Page 272: ...peline Execution of Instruction Types Figure 5 4 a shows the execute phases of the pipeline in sequential order from left to right Figure 5 4 b shows the portion of the functional block diagram in whi...

Page 273: ...e pipeline For example examine cycle 7 in Figure 5 6 When the instructions from FP n reach E1 the instructions in the execute packet from FPn 1 are being decoded FP n 2 is in dispatch while FPs n 3 n...

Page 274: ...ddress genera tion is performed and address modifications are written to a register file For branch instructions branch fetch packet in PG phase is affected For single cycle instructions results are w...

Page 275: ...DD SADD STH LDW STH LDW B SUB SMPY SMPYH SADD SADD STH STH B SUB SMPY SMPYH SADD SADD STH STH Register file A Register file B Data 2 Data 1 32 32 32 32 byte addressable Internal data memory Data addre...

Page 276: ...instruction execute packet is in decode The arrows between DP and DC correspond to the functional units identified in the code in Example 5 1 Example 5 1 Execute Packet in Figure 5 7 SADD L1 A2 A7 A2...

Page 277: ...nstructions in E1 are shaded in Figure 5 7 The multi plexers used for the input operands to the functional units are also shaded in the figure The bold crosspaths are used by the MPY instructions Most...

Page 278: ...ack to CPU E5 Write data into register Delay slots 0 1 0 4 5 See section 5 2 3 and 5 2 4 for more information on execution and delay slots for stores and loads See section 5 2 5 for more information o...

Page 279: ...execution diagram The operands are read the operation is performed and the results are written to a register all during E1 Single cycle instructions have no delay slots Figure 5 9 Single Cycle Execut...

Page 280: ...ctions Store instructions require phases E1 through E3 to complete their operations Figure 5 12 shows the pipeline phases the store instructions use Figure 5 12 Store Instruction Phases PG PS PW PR DP...

Page 281: ...ycle When a load is executed before a store the old value is loaded and the new value is stored i LDW i 1 STW When a store is executed before a load the new value is stored and the new value is loaded...

Page 282: ...e data address pointer is modified in its register In the E2 phase the data address is sent to data memory In the E3 phase a memory read at that address is performed Figure 5 15 Load Execution Block D...

Page 283: ...load following a store accesses the value placed in memory by that store in the cycle after the store is completed This is why the store is considered to have zero delay slots 5 2 5 Branch Instructio...

Page 284: ...Because the branch target has to wait until it reaches the E1 phase to begin execution the branch takes five delay slots before the branch target code executes Figure 5 17 Branch Execution Block Diagr...

Page 285: ...struction has a fixed number of execute cycles that determines when this instruction s operations are complete Section 5 3 2 covers the effect of including a multicycle NOP in an individual EP Finally...

Page 286: ...es 1 4 During these cycles a program fetch phase is started for each of the fetch packets that follow In cycle 5 the program dispatch DP phase the CPU scans the p bits and detects that there are three...

Page 287: ...packet in parallel with other code The results of the LD ADD and MPY will all be available during the proper cycle for each instruction Hence NOP has no effect on the execute packet Figure 5 19 b show...

Page 288: ...e NOPs EP7 Normal Cycle 11 10 9 8 7 6 5 4 3 2 1 Target E1 DC DP PR PW PS PG Branch E1 EP6 EP5 EP4 EP3 EP2 EP1 NOP5 ADD MPY LD EP without branch EP without branch B EP without branch EP without branch...

Page 289: ...loads and instruction fetches dispatches The comparison is valid because data loads and program fetches operate on internal memories of the same speed on the C62x and per form the same types of operat...

Page 290: ...ata memory access The memory stall causes all of the pipeline phases to lengthen beyond a single clock cycle caus ing execution to take additional clock cycles to finish The results of the program exe...

Page 291: ...ycle result in a memory stall that halts all pipeline operation for one cycle while the second value is read from memory Two memory operations per cycle are allowed without any stall as long as they d...

Page 292: ...re with an access to bank 0 in another memory space and no pipeline stall occurs Figure 5 24 4 Bank Interleaved Memory With Two Memory Spaces 6 7 14 15 8N 6 8N 7 Bank 3 Bank 2 8N 5 8N 4 13 12 5 4 2 3...

Page 293: ...me pipeline phase eliminating read after write memory conflicts All instructions require the same number of pipeline phases for fetch and decode but require a varying number of execute phases This cha...

Page 294: ...gram fetch packet receive The C67x uses a fetch packet FP of eight instructions All eight of the instruc tions proceed through fetch processing together through the PG PS PW and PR phases Figure 6 2 a...

Page 295: ...Fetch Phases of the Pipeline PR PW PS PG PW Memory PS PR PG Registers units Functional a b CPU PR PW PS PG 256 MVK LDW LDW SHL ADD MVK LDW LDW NOP MVK MV B SADD SMPYH SADD SHR SMPY SHR SMPYH LDW LDW...

Page 296: ...e pipeline The last six instruc tions of the fetch packet FP are parallel and form an execute packet EP This EP is in the dispatch phase DP of the decode stage The arrows indicate each instruction s a...

Page 297: ...ed in section 6 2 Pipeline Execution of Instruction Types Figure 6 4 a shows the execute phases of the pipeline in sequential order from left to right Figure 6 4 b shows the por tion of the functional...

Page 298: ...ructions from FP n reach E1 the instructions in the execute packet from FPn 1 are being decoded FP n 2 is in dispatch while FPs n 3 n 4 n 5 and n 6 are each in one of four phases of program fetch See...

Page 299: ...Decode DC Instructions are decoded in functional units Execute Execute 1 E1 For all instruction types the conditions for the instructions are evaluated and operands are read For load and store instruc...

Page 300: ...in struction that saturates results sets the SAT bit in the CSR if saturation occurs For MPYDP instruction the upper 32 bits of src1 and the lower 32 bits of src2 are read For MPYI and MPYID instruct...

Page 301: ...ritten to a register file ADDDP SUBDP Execute 8 E8 Nothing is read or written Execute 9 E9 For the MPYI instruction the result is written to a register file For MPYDP and MPYID instructions the lower...

Page 302: ...SP ADDSP MV LDDW B MPYSP SUBSP LDDW Register file A Register file B Data 2 Data 1 32 32 32 32 byte addressable Internal data memory Data address 2 Data address 1 9 8 7 6 5 4 3 2 1 0 16 16 16 16 Data m...

Page 303: ...he code in Example 6 1 In the DC phase portion of Figure 6 7 one box is empty because a NOP was the eighth instruction in the fetch packet in DC and no functional unit is needed for a NOP Finally the...

Page 304: ...A15 A8 A1 ABSSP S2 B12 B15 LOOP B2 LDDW D1 A0 2 A5 A4 DP and PS Phases B2 ZERO D2 B0 SUBSP L1 A12 A2 A12 ADDSP L2 B9 B12 B12 MPYSP M1X A5 B7 A10 MPYSP M2 B4 B7 B10 B0 B S1 LOOP B1 CMPLTSP S2 B15 B8 B1...

Page 305: ...ases E1 Compute result and write to register Read operands and start computations Compute address Compute address Target code in PG E2 Compute result and write to register Send address and data to mem...

Page 306: ...utation Read upper sources finish com putation and write results to register E3 Continue computation Continue computation E4 Complete computa tion and write results to register Continue computa tion a...

Page 307: ...ontinue com putation E5 Continue computation Continue computation Continue computation Continue computation E6 Compute the lower results and write to register Continue computation Continue computation...

Page 308: ...ew instruction dispatched to that functional unit during this locking period causes undefined results If an in struction with a multicycle functional unit latency has a condition that is evalu ated as...

Page 309: ...on the same functional unit attempt to read or write respectively to the register file on the came cycle An instruction scheduled on cycle i has the following constraints 2 cycle DP A single cycle in...

Page 310: ...same functional unit on cycle i 4 i 5 or i 6 A MPYI instruction cannot be scheduled on the same functional unit on cycle i 4 i 5 or i 6 A MPYID instruction cannot be scheduled on the same functional...

Page 311: ...point instructions The S and L units share their long write port with the load port for the 32 most significant bits of an LDDW load Therefore the LDDW instruction and the S or L unit writing a long r...

Page 312: ...en dif ferently for each instruction If you analyze these differences you can make further optimization improvements by considering what happens during the execution phases of instructions that use th...

Page 313: ...RW Instruction Type Subsequent Same Unit Instruction Executable Single cycle n DP compare n 2 cycle DP n Branch n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycl...

Page 314: ...it Both Using Cross Path Executable Single cycle Xr n Load Xr n Store Xr n INTDP Xr n ADDDP SUBDP Xr n 16 16 multiply Xr n 4 cycle Xr n MPYI Xr n MPYID Xr n MPYDP Xr n Legend E1 phase of the single cy...

Page 315: ...ble Single cycle Xw n DP compare n n 2 cycle DP Xw n Branch n n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycle n n Load n n Store n n INTDP n n ADDDP SUBDP n n...

Page 316: ...n Branch n n n n n n n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycle n n n n n n n Load n n n n n n n Store n n n n n n n INTDP n n n n n n n ADDDP SUBDP n n...

Page 317: ...uction Type Subsequent Same Unit Instruction Executable 16 16 multiply n n 4 cycle n n MPYI n n MPYID n n MPYDP n n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cy...

Page 318: ...n n n n MPYI n n n n MPYID n n n n MPYDP n n n n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycle n n n n Load n n n n Store n n n n DP compare n n n n 2 cycle D...

Page 319: ...erent Unit Both Using Cross Path Executable Single cycle Xr Xr Xr n n n n n n Load Xr Xr Xr n n n n n n Store Xr Xr Xr n n n n n n DP compare Xr Xr Xr n n n n n n 2 cycle DP Xr Xr Xr n n n n n n Branc...

Page 320: ...it Both Using Cross Path Executable Single cycle Xr Xr Xr n n n n n n n Load Xr Xr Xr n n n n n n n Store Xr Xr Xr n n n n n n n DP compare Xr Xr Xr n n n n n n n 2 cycle DP Xr Xr Xr n n n n n n n Bra...

Page 321: ...ifferent Unit Both Using Cross Path Executable Single cycle Xr Xr Xr n n n n n n n Load Xr Xr Xr n n n n n n n Store Xr Xr Xr n n n n n n n DP compare Xr Xr Xr n n n n n n n 2 cycle DP Xr Xr Xr n n n...

Page 322: ...Type Subsequent Same Unit Instruction Executable Single cycle n 4 cycle n INTDP n ADDDP SUBDP n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycle n DP compare n 2...

Page 323: ...n n n INTDP n n n n ADDDP SUBDP n n n n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycle n n n n DP compare n n n n 2 cycle DP n n n n 4 cycle n n n n Load n n n...

Page 324: ...n ADDDP SUBDP n n n n n Instruction Type Same Side Different Unit Both Using Cross Path Executable Single cycle n n n n n DP compare n n n n n 2 cycle DP n n n n n 4 cycle n n n n n Load n n n n n Sto...

Page 325: ...Both Using Cross Path Executable Single cycle Xr n n n n n n DP compare Xr n n n n n n 2 cycle DP Xr n n n n n n 4 cycle Xr n n n n n n Load Xr n n n n n n Store Xr n n n n n n Branch Xr n n n n n n 1...

Page 326: ...gle cycle n n n n n Load n n n n n Store n n n n n Instruction Type Same Side Different Unit Both Using Cross Path Executable 16 16 multiply n n n n n MPYI n n n n n MPYID n n n n n MPYDP n n n n n Si...

Page 327: ...ction Executable Single cycle n n n Load n n n Store n n n Instruction Type Same Side Different Unit Both Using Cross Path Executable 16 16 multiply n n n MPYI n n n MPYID n n n MPYDP n n n Single cyc...

Page 328: ...Subsequent Same Unit Instruction Executable Single cycle n Load n Store n Instruction Type Same Side Different Unit Both Using Cross Path Executable 16 16 multiply n MPYI n MPYID n MPYDP n Single cycl...

Page 329: ...Hazards Instruction Execution Cycle 1 2 3 4 5 6 LDDW RW W Instruction Type Subsequent Same Unit Instruction Executable Instruction with long result n n n Xw n Legend E1 phase of the single cyle instr...

Page 330: ...s the single cycle execution diagram The operands are read the operation is per formed and the results are written to a register all during E1 Single cycle instructions have no delay slots Table 6 20...

Page 331: ...ing in the pipeline for a multiply In the E1 phase the operands are read and the multiply begins In the E2 phase the multiply finishes and the result is written to the destination register Multiply in...

Page 332: ...s of the data to be stored is computed In the E2 phase the data and destination addresses are sent to data memory In the E3 phase a memory write is performed The address modification is per formed in...

Page 333: ...cycle When a load is executed before a store the old value is loaded and the new value is stored i LDW i 1 STW When a store is executed before a load the new value is stored and the new value is load...

Page 334: ...ipeline Stage E1 E2 E3 E4 E5 Read baseR offsetR Written baseR dst Unit in use D Figure 6 14 Load Instruction Phases PG PS PW PR DP DC E1 E2 E3 E4 E5 4 delay slots Address modification Figure 6 15 show...

Page 335: ...inter results are written to the register in E1 there are no delay slots associated with the address modification In the following code pointer results are written to the A4 register in the first exec...

Page 336: ...of the target code see Table 6 24 Figure 6 16 shows the pipeline phases used by the branch instruction and branch target code The delay slots are shaded Table 6 24 Branch Execution Pipeline Stage E1 P...

Page 337: ...the branch target has to wait until it reaches the E1 phase to begin execution the branch takes five delay slots before the branch target code executes Figure 6 17 Branch Execution Block Diagram DP PR...

Page 338: ...E1 using the src1 and src2 ports respectively The lower 32 bits of the DP source are written on E1 and the upper 32 bits of the DP source are written on E2 The 2 cycle DP instructions are executed on...

Page 339: ...Figure 6 19 shows the pipeline phases the 4 cycle instruc tions use Table 6 26 4 Cycle Execution Pipeline Stage E1 E2 E3 E4 Read src1 src2 Written dst Unit in use L or M Figure 6 19 4 Cycle Instructi...

Page 340: ...e read on E1 the upper 32 bits of the sources are read on E2 and the results are written on E2 The following instructions are DP compare instructions CMPEQDP CMPLTDP CMPGTDP The DP compare instruction...

Page 341: ...t are written on E7 The ADDDP SUBDP instructions are executed on the L unit The functional unit latency for ADDDP SUBDP instructions is 2 The status is written to the FADCR on E6 Figure 6 22 shows the...

Page 342: ...Unit in use M M M M Figure 6 23 MPYI Instruction Phases PG PS PW PR DP DC E1 E2 E3 E4 E5 E6 E7 E8 E9 8 delay slots 6 3 16 MPYID Instructions The MPYID instruction uses the E1 through E10 phases of th...

Page 343: ...the upper 32 bits of src2 are read on E2 and E4 The lower 32 bits of the result are written on E9 and the upper 32 bits of the result are written on E10 The MPYDP instruction is executed on the M unit...

Page 344: ...struction has a fixed number of execute cycles that determines when this instruction s operations are complete Section 6 4 2 covers the effect of including a multicycle NOP in an individual EP Finally...

Page 345: ...etch packet n goes through the program fetch phases during cycles 1 4 During these cycles a program fetch phase is started for each of the fetch packets that follow In cycle 5 the program dispatch DP...

Page 346: ...packet in parallel with other code The results of the LD ADD and MPY will all be available during the proper cycle for each instruction Hence NOP has no effect on the execute packet Figure 6 27 b sho...

Page 347: ...e NOPs EP7 Normal Cycle 11 10 9 8 7 6 5 4 3 2 1 Target E1 DC DP PR PW PS PG Branch E1 EP6 EP5 EP4 EP3 EP2 EP1 NOP5 ADD MPY LD EP without branch EP without branch B EP without branch EP without branch...

Page 348: ...loads and instruction fetches dispatches The comparison is valid because data loads and program fetches operate on internal memories of the same speed on the C67x and per form the same types of opera...

Page 349: ...data memory access The memory stall causes all of the pipeline phases to lengthen beyond a single clock cycle causing execution to take additional clock cycles to finish The results of the program exe...

Page 350: ...ch of these banks is single ported memory only one access to each bank is allowed per cycle Two accesses to a single bank in a given cycle result in a memory stall that halts all pipeline operation fo...

Page 351: ...m device to device See the TMS320C62x C67x Peripherals Reference Guide to determine the memory spaces in your particular device Figure 6 32 8 Bank Interleaved Memory With Two Memory Spaces Bank 7 Bank...

Page 352: ...automatically the presence of interrupts and divert program execution flow to your interrupt service code Finally the chapter describes the programming implications of interrupts Topic Page 7 1 Overv...

Page 353: ...ets the pending status of the interrupt within the interrupt flag register IFR If the interrupt is properly enabled the CPU begins processing the interrupt and redirecting program flow to the interrup...

Page 354: ...errupt service fetch packet must be located at address 0 RESET is not affected by branches 7 1 1 2 Nonmaskable Interrupt NMI NMI is the second highest priority interrupt and is generally used to alert...

Page 355: ...register CSR is set to1 The NMIE bit in the interrupt enable register IER is set to1 The corresponding interrupt enable IE bit in the IER is set to1 The corresponding interrupt occurs which sets the...

Page 356: ...outine may fit in an individual fetch packet The addresses and contents of the IST are shown in Figure 7 1 Because each fetch packet contains eight 32 bit instruction words or 32 bytes each address in...

Page 357: ...nstr3 Interrupt service table IST Instr2 Instr4 Instr5 Instr6 B IRP NOP 5 ISFP for INT6 000h 020h 040h 060h 080h 0A0h 0C0h 0E0h 100h 120h 140h 160h 180h 1A0h 1C0h 1E0h 0C0h 0C4h 0C8h 0CCh 0D0h 0D4h 0D...

Page 358: ...nterrupt ISFP Instr1 Instr2 B 1234h Instr4 Instr5 Instr6 Instr7 Instr8 ISFP for INT4 080h 084h 088h 08Ch 090h 094h 098h 09Ch Program memory Instr9 Instr11 1224h 1228h 122Ch 1230h 1234h 1238h 123Ch B I...

Page 359: ...ns Bits Field Name Description 0 4 Set to 0 fetch packets must be aligned on 8 word 32 byte boundaries 5 9 HPEINT Highest priority enabled interrupt This field gives the number related bit position in...

Page 360: ...880h 8A0h 8C0h 8E0h 900h 920h 940h 96h0 980h 9A0h 9C0h 9E0h Program memory 800h RESET ISFP 1 Copy the IST located between 0h and 200h to the memory loca tion between 800h and A00h 2 Write 800h to the...

Page 361: ...ted in the table Table 7 3 Interrupt Control Registers Abbreviation Name Description Page Number CSR Control status register Allows you to globally set or disable interrupts 7 11 IER Interrupt enable...

Page 362: ...rrupts globally enabled GIE 0 maskable interrupts globally disabled 1 PGIE Previous GIE saves the value of GIE when an interrupt is taken This value is used on return from an interrupt The global inte...

Page 363: ...ck to GIE result ing in GIE being cleared as directed by your code Example 7 2 shows how to disable maskable interrupts globally and Example 7 3 shows how to enable maskable interrupts globally Exampl...

Page 364: ...ot writeable and is always read as 1 so the reset inter rupt is always enabled You cannot disable the reset interrupt Bits IE4 IE15 can be written as 1 or 0 enabling or disabling the associated interr...

Page 365: ...check the status of interrupts use the MVC instruction to read the IFR Figure 7 7 shows the IFR Figure 7 7 Interrupt Flag Register IFR 31 16 Reserved R 0 15 0 IF15 IF14 IF13 IF12 IF11 IF10 IF9 IF8 IF7...

Page 366: ...rrupts Figure 7 8 Interrupt Set Register ISR 31 16 Reserved 15 0 IS15 IS14 IS13 IS12 IS11 IS10 IS9 IS8 IS7 IS6 IS5 IS4 W Rsv Rsv Rsv Rsv Legend W Writeable by the MVC instruction Rsv Reserved Figure 7...

Page 367: ...I return pointer register NRP contains the return pointer that directs the CPU to the proper location to continue program execution after NMI processing A branch using the address in the NRP B NRP in...

Page 368: ...ing is complete Example 7 9 shows how to return from a maskable interrupt Example 7 9 Code to Return from a Maskable Interrupt B IRP return moves PGIE to GIE NOP 5 delay slots The IRP contains the 32...

Page 369: ...pt signal enters the CPU it is has been detected cycle 4 Two clock cycles after detection the interrupt s corresponding flag bit in the IFR is set cycle 6 In Figure 7 12 and Figure 7 13 IFm is set dur...

Page 370: ...E5 E4 E5 E3 E4 E5 DC E1 E2 E3 E4 DP DC E1 E2 E3 PR DP DC E1 E2 PW PR DP DC E1 PS PW PR DP DC E5 E4 E3 E2 E1 n 5 n 4 n 3 n 2 n 1 n Execute packet INUM IACK IFm External INTm Clock cycle 0 0 0 0 0 0 0...

Page 371: ...External INTm at pin 0 0 IACK INUM 0 E2 E1 DC E1 DC DP DP PR PW PS PR PW PS PG n n 1 n 2 n 3 n 4 n 5 n 6 DC DP PR PW PS PG Execute packet PR 12 11 PW PS 10 9 8 PG DP PW PR PS PG PR PS PW PS PG PG PW P...

Page 372: ...ction to be annulled in future pipeline stages The address of the first annulled execute packet n 5 is loaded in to the NRP in the case of NMI or IRP for all other interrupts A branch to the address h...

Page 373: ...le Figure 7 14 RESET Interrupt Detection and Processing Pipeline Operation Reset ISFP n 7 n 6 Pipeline flush E1 DC DP PR PW PS PG PG PS PW PR DP DC E1 n 5 n 4 n 3 n 2 n 1 n Execute packet INUM IACK IF...

Page 374: ...3 3 1 on page 7 16 During CPU cycles 15 21 of Figure 7 14 the following reset processing actions occur Processing of subsequent nonreset interrupts is disabled because GIE and NMIE are cleared A bran...

Page 375: ...occur only every second cycle However the frequency of interrupt processing depends on the time required for inter rupt service and whether you reenable interrupts during processing thereby allowing n...

Page 376: ...ar to the instructions after the interrupt to have fewer delay slots than they actually have For example suppose that register A1 contains 0 and register A0 points to a memory location containing a va...

Page 377: ...bit which would reenable interrupts inside the interrupt service routine 7 6 3 Manual Interrupt Processing You can poll the IFR and IER to detect interrupts manually and then branch to the value held...

Page 378: ...a branch using a displace ment the MVKH instructions could be eliminated thus shortening the code sequence The trap is processed with the code located at the address pointed to by the label TRAP_HAND...

Page 379: ...er up C clock cycles Cycles based on the input from the external clock code A set of instructions written to perform a task a computer program or part of a program CPU cycle The period during which a...

Page 380: ...ce fetch packet ISFP See also fetch packet FP A fetch packet used to service interrupts If eight instructions are insufficient the user must branch out of this block for additional interrupt service I...

Page 381: ...upt A higher priority interrupt that must be serviced before completion of the current interrupt service routine nonmaskable interrupt An interrupt that can be neither masked nor manu ally disabled O...

Page 382: ...mber with the sign bit W wait state A period of time that the CPU must wait for external program data or I O memory to respond when reading from or writing to that ex ternal memory The CPU waits one e...

Page 383: ...for load store 3 23 address paths 2 7 addressing mode circular mode 3 21 definition A 1 linear mode 3 21 addressing mode register AMR 2 8 2 9 field encoding table 2 9 figure 2 9 ADDSP instruction 4 25...

Page 384: ...upts 7 13 of interrupts 7 11 control register file extension C67x 2 13 interrupt 7 10 list of 2 8 register addresses for accessing 3 87 control status register CSR 7 10 description 2 8 2 11 figure 2 1...

Page 385: ...derations C67x 6 52 pipeline operation 5 18 execute phases of the pipeline 5 22 6 56 figure 5 5 6 5 execution notations fixed point instructions 3 2 floating point instructions 4 2 execution table ADD...

Page 386: ...ts 4 12 instruction operation fixed point notations for 3 2 floating point notations for 4 2 instruction to functional unit mapping 3 4 4 4 instruction types 2 cycle DP instructions 6 46 4 cycle instr...

Page 387: ...10 performance considerations 7 24 priorities 7 3 processing 7 18 to 7 23 programming considerations 7 25 to 7 28 setting 7 14 signals used 7 2 traps 7 27 types of 7 2 INTSP instruction 4 49 to 4 50 I...

Page 388: ...ing functional unit to instruction 3 5 4 4 instruction to functional unit 3 4 4 4 maskable interrupt description 7 4 return from 7 17 memory considerations 5 22 internal 1 8 paths 2 7 pipeline phases...

Page 389: ...code example 3 15 parallel fetch packets 3 14 parallel operations 3 13 partially serial fetch packets 3 15 PCC field CSR 2 11 PCE1 See program counter PCE1 performance considerations pipeline 5 18 6 5...

Page 390: ...ter IRP ISR See interrupt set register ISR ISTP See interrupt service table pointer ISTP NRP See nonmaskable interrupt return pointer NRP PCE1 See program counter PCE1 read constraints 3 19 write cons...

Page 391: ...uction 15 bit offset 3 126 to 3 127 register offset or 5 bit unsigned constant offset 3 122 to 3 125 using circular addressing 3 21 SUB instruction 3 128 to 3 130 SUB2 instruction 3 135 SUBAB instruct...

Reviews:

No comments

Related manuals for TMS320C67 DSP Series

Brand: ABB Pages: 50

Brand: Vacon Pages: 64

Brand: jbc Pages: 8

Brand: L. G. B. Pages: 2

Brand: L-Acoustics Pages: 36

Brand: MAKOT Pages: 6

Brand: Rain Bird Pages: 4

Brand: Rain Bird Pages: 2

Brand: Rain Bird Pages: 140

WeatherSmart RSC600i

Brand: Raindrip Pages: 36

Brand: Ultratec Pages: 33

FX-COMM4NDER MFX3203

Brand: Magicfx Pages: 16

Brand: AC Tech Pages: 2

Brand: V-TAC Pages: 2

EZ-ZONE PM L AAAAB Series

Brand: Watlow Pages: 30

RailBoss 4 Basic

Brand: G-Scale Graphics Pages: 22

Brand: Interactive Technologies Pages: 431

Brand: Zareba Pages: 40

Brands by name

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Popular brands

Load more brands