background image

www.ti.com

5.3.6 Interrupt Latency

5.4

TMS320C54xx Rules and Guidelines

5.4.1 Data Models

5.4.2 Program Models

TMS320C54xx Rules and Guidelines

CSR Field

Use

Type

EN

Current CPU endian mode.

Read-only (global)

PWRD

Power-Down modes

Not accessible (global)

PCC

Program Cache Control

Not accessible (global)

DCC

Data Cache Control.

Not accessible (global)

Note that the GIE and PGIE are read-only registers. Algorithms that need to create non-interruptible
sections must use the DSP/BIOS operations HWI_disable() and HWI_restore(). They must never directly
manipulate the GIE or PGIE bits.

Although there are no additional rules for C6x algorithms that deal with interrupt latency, it is important to
note that all instructions in the delay slots of branches are non-interruptible; i.e., once fetched, interrupts
are blocked until the branch completes. Since these delay slots may contain other branch instructions,
care must be taken to avoid long chains of non-interruptible instructions. In particular, tightly coded loops
often result in unacceptably long non-interruptible sequences.

Note that the C compiler has options to limit the duration of loops. Even if this option is used, you must be
careful to limit the length of loops whose length is not a simple constant.

This section describes the rules and guidelines that are specific to the TMS320C5400 family of DSPs.

The C54x has just one data model, so there are no special data memory requirements for this processor.

Some variants of the TMS320C54xx support an extended program address space. Since code can be
compiled for either standard or extended (near or far) addresses, it is possible to have incompatible
mixtures of code.

We need to ensure that calls made from an algorithm to external support functions will be compatible, and
that calls made from the application to an algorithm will be compatible. We also need to ensure that calls
to independently relocatable object modules within an algorithm will be compatible.

Rule 28

On processors that support large program model compilation, all function accesses to independently
relocatable object modules must be far references. For example, intersection function references within
algorithm and external function references to other eXpressDSP-compliant modules must be far on the
C54x; i.e., the calling function must push both the XPC and the current PC.

Rule 29

On processors that support large program model compilation, all independently relocatable object
module functions must be declared as far functions; for example, on the C54x, callers must push both
the XPC and the current PC and the algorithm functions must perform a far return.

This requires that the top-level interface to the algorithm functions be declared as "far." Note that function
calls within the algorithm may be near calls. Still, calls within the algorithm to independently relocatable
object modules must be far calls, since any relocatable object module may be loaded in a 'far' page of
memory.

What about existing applications that do not support far calls to algorithms? Note that it is possible for an
existing application to do a near call into a far algorithm; create a small "near stub" that the application
calls using a near call, the stub then does the appropriate far call and a near return to the application.

SPRU352G – June 2005 – Revised February 2007

DSP-Specific Guidelines

49

Submit Documentation Feedback

Summary of Contents for TMS320 DSP

Page 1: ...TMS320 DSP Algorithm Standard Rules and Guidelines User s Guide Literature Number SPRU352G June 2005 Revised February 2007 ...

Page 2: ...2 SPRU352G June 2005 Revised February 2007 Submit Documentation Feedback ...

Page 3: ...ersus Persistent 20 2 3 3 Algorithm versus Application 22 2 4 Program Memory 23 2 5 ROM ability 23 2 6 Use of Peripherals 24 3 Algorithm Component Model 25 3 1 Interfaces and Modules 26 3 1 1 External Identifiers 27 3 1 2 Naming Conventions 28 3 1 3 Module Initialization and Finalization 28 3 1 4 Module Instance Objects 28 3 1 5 Design Time Object Creation 29 3 1 6 Run Time Object Creation and Del...

Page 4: ...ines 49 5 4 1 Data Models 49 5 4 2 Program Models 49 5 4 3 Register Conventions 51 5 4 4 Status Registers 51 5 4 5 Interrupt Latency 52 5 5 TMS320C55x Rules and Guidelines 52 5 5 1 Stack Architecture 52 5 5 2 Data Models 52 5 5 3 Program Models 53 5 5 4 Relocatability 53 5 5 5 Register Conventions 54 5 5 6 Status Bits 55 5 6 TMS320C24xx Guidelines 57 5 6 1 General 57 5 6 2 Data Models 57 5 6 3 Pro...

Page 5: ... Rules and Guidelines 70 6 14 1 Supporting Packed Burst Mode DMA Transfers 70 6 14 2 Minimizing Logical Channel Reconfiguration Overhead 71 6 14 3 Addressing Automatic Endianism Conversion Issues 71 6 15 Inter Algorithm Synchronization 71 6 15 1 Non Preemptive System 71 6 15 3 Preemptive System 72 A Rules and Guidelines 75 A 1 General Rules 76 A 2 Performance Characterization Rules 77 A 3 DMA Rule...

Page 6: ...ace and Implementation 26 3 2 Module Object Creation 29 3 3 Example Module Object 29 3 4 Example Implementation of IALG Interface 33 4 1 Execution Timeline for Two Periodic Tasks 42 5 1 Register Types 46 6 1 Transfer Properties for a 1 D Frame 64 6 2 Frame Index and 2 D Transfer of N 1 Frames 64 6 List of Figures SPRU352G June 2005 Revised February 2007 Submit Documentation Feedback ...

Page 7: ...rithm can coexist with other algorithms in a single system and how to package an algorithm for deployment into a wide variety of systems System integrators learn how to incorporate multiple algorithms from separate sources into a complete system Document Overview Throughout this document the rules and guidelines of the TMS320 DSP Algorithm Standard referred to as XDAIS are highlighted Rules must b...

Page 8: ...lementing the XDAIS interface and to create a test application Using DMA with Framework Components for C64x Application Report SPRAAG1 Describes the standard DMA software abstractions and interfaces for TMS320 DSP Algorithm Standard XDAIS compliant algorithms designed for the C64x EDMA3 controller using DMA Framework Components utilities Although these documents are largely self contained there ar...

Page 9: ...verview of the TMS320 DSP Algorithm Standard Topic Page 1 1 Scope of the Standard 10 1 2 Requirements of the Standard 11 1 3 Goals of the Standard 12 1 4 Intentional Omissions 12 1 5 System Architecture 13 SPRU352G June 2005 Revised February 2007 Overview 9 Submit Documentation Feedback ...

Page 10: ...iles per hour Such algorithms are often the result of many years of doctoral research However because of the lack of consistent standards it is not possible to use an algorithm in more than one system without significant reengineering Since few companies can afford a team of DSP PhDs and the reuse of DSP algorithms is so labor intensive the time to market for a new DSP based product is measured in...

Page 11: ...ork Algorithms can be deployed in purely static as well as dynamic run time environments Algorithms can be distributed in binary form Integration of algorithms does not require recompilation of the client application although reconfiguration and relinking may be required A huge number of DSP algorithms are needed in today s marketplace including modems vocoders speech recognizers echo cancellation...

Page 12: ...require the use of an architecture independent language such as C in the implementation of algorithms Wherever possible this standard tries to anticipate the needs of the system integrator and provide rules for the development of algorithms that allow host tools to be created that will assist the integration of these algorithms For example rules related to algorithm naming conventions enable tools...

Page 13: ...orithms with the real time data sources and links using the core run time support to create a complete DSP sub system Frameworks for the DSP often interact with the real time peripherals including other processors in the system and often define the I O interfaces for the algorithm components Unfortunately for performance reasons many DSP systems do not enforce a clear line between algorithm code a...

Page 14: ...pending on the system designed the system integrator may prefer an algorithm with lower quality and smaller footprint to one with higher quality detection and larger footprint e g an electronic toy doll verses a corporate voice mail system Thus multiple implementations of exactly the same algorithm sometimes make sense there is no single best implementation of many algorithms Unfortunately the sys...

Page 15: ... apply to all algorithms on all DSP architectures regardless of application area Topic Page 2 1 Use of C Language 16 2 2 Threads and Reentrancy 16 2 3 Data Memory 19 2 4 Program Memory 23 2 5 ROM ability 23 2 6 Use of Peripherals 24 SPRU352G June 2005 Revised February 2007 General Programming Guidelines 15 Submit Documentation Feedback ...

Page 16: ...re reentrancy requirements In this section we try to precisely define the types of threads supported by this standard and the reentrancy requirements of algorithms A thread is an encapsulation of the flow of control in a program Most people are accustomed to writing single threaded programs i e programs that only execute one path through their code at a time Multi threaded programs may have severa...

Page 17: ...ithin a specified maximum amount of time or explicitly relinquish control of the CPU to the framework or operating system at some minimum periodic rate By itself this is not a problem since most DSP threads are periodic with real time deadlines However this minimum rate is a function of the other threads in the system and consequently non preemptive threads are not completely independent of one an...

Page 18: ...ve Since algorithms are not permitted to directly manipulate the interrupt state of the processor the allowed DSP BIOS HWI module functions or equivalent implementations must be called to create these critical sections Rule 2 All algorithms must be reentrant within a preemptive environment including time sliced preemption In the remainder of this section we consider several implementations of a si...

Page 19: ... while performance is comparable to our original implementation it is slightly larger and slower because of the state object redirection Directly referencing global data is often more efficient than referencing data via an address register On the other hand the decrease in efficiency can usually be factored out of the time critical loop and into the loop setup Code Thus the incremental performance...

Page 20: ...ocations Note that algorithms can directly access data contained in a static data structure located by the linker This rule only requires that all such references be done symbolically i e via a relocatable label rather than a fixed numerical address In systems where the set of algorithms is not known in advance or when there is insufficient on chip memory for the worst case working set of algorith...

Page 21: ...that all subsequent accesses by the algorithm s processing to write once buffers are strictly read only Additionally the algorithm can link its own statically allocated write once buffers and provide the buffer addresses to the client The client is free to use provided buffers or allocate its own Frameworks can optimize memory allocation by arranging multiple instances of the same algorithm create...

Page 22: ...algorithm Do the algorithms that share the block preempt one another The first question is determined by the implementation of the algorithm the algorithm must be written with assumptions about the contents of certain memory buffers We ve argued that there is significant benefit to distinguish between scratch memory and persistent memory but it is up to the algorithm implementation to trade the be...

Page 23: ...d be defined in a separate object module these modules must not contain any other code In some cases it is awkward to place each function in a separate file Doing so may require making some identifiers globally visible or require significant changes to an existing Code base The TI C compiler supports a pragma directive that allows you to place specified functions in distinct COFF output sections T...

Page 24: ...hms must never directly access any peripheral device This includes but is not limited to on chip DMAs timers I O devices and cache control registers Note however algorithms can utilize the DMA resource by implementing the IDMA2 interface on C64x and C5000 devices and the IDMA3 interface on C64x devices using the EDMA3 controller See Chapter 6 for details In order for an algorithm to be framework i...

Page 25: ...evelop additional rules and guidelines that apply to all algorithms on all DSP architectures regardless of application area Topic Page 3 1 Interfaces and Modules 26 3 2 Algorithms 33 3 3 Packaging 34 SPRU352G June 2005 Revised February 2007 Algorithm Component Model 25 Submit Documentation Feedback ...

Page 26: ... simply a collection of related type definitions functions constants and variables In the C language an interface is typically specified by a header file It is important to note that not all modules implement algorithms but all algorithm implementations must be modules For example the DSP BIOS is a collection of modules and none of these are eXpressDSP compliant algorithms All eXpressDSP compliant...

Page 27: ...s from the set A Z0 9 vendor is the name of the vendor containing characters from the set A Z0 9 For example TI s implementation of the FIR module must only contain external identifiers of the form FIR_TI_ a zA Z0 9 On the other hand external identifiers that are common to all implementations do not have the vendor component of the name For example if the FIR module interface defined a constant st...

Page 28: ...et_buffer This avoids ambiguity when parsing module and vendor prefixes Before a module can be used by an application it must first be initialized i e the module s init method must be run Similarly when an application terminates any module that was initialized must be finalized i e its exit method must be executed Initialization methods are often used to initialize global data used by the module t...

Page 29: ... Even if the entire system cannot be static often certain sub systems can be fixed at design time It is important therefore that all modules efficiently support static system designs Guideline 3 All modules that support object creation should support design time object creation In practice this simply means that all functions that are only required for run time object creation be placed either in ...

Page 30: ...A filter module may include a global configuration parameter that specifies that the system will only use all zero filters with aligned data By making this a design time global configuration parameter systems that are willing to accept constraints in their use of the API are rewarded by smaller faster operation of the module that implements the API Modules that have one or more global configuratio...

Page 31: ...applications which require more than one FIR filter Modern component programming models support the ability of a single component to implement more than one interface This allows a single component to be used concurrently by a variety of different applications For example in addition to a component s concrete interface defined by its header a component might also support a debug interface that all...

Page 32: ...want to define a new interface that requires additional methods beyond those defined by IALG we define a new interface that derives from or inherits from the IALG interface Interface inheritance is implemented by simply defining the new interface s Fxns structure so that its first field is the Fxns structure from which the interface is inherited Thus any pointer to the new interface s Fxns structu...

Page 33: ...tions such as FFT or dot product which do not maintain state between consecutive operations and do not require internal workspaces to perform their computation are not good eXpressDSP compliant candidates These algorithms encapsulate larger computations that require internal working memory and typically operate on conceptually infinite data streams Figure 3 4 Example Implementation of IALG Interfa...

Page 34: ...erface Thus every algorithm has considerable flexibility to define the methods that are appropriate for the algorithm By deriving from IALG we can ensure that all implementations of any algorithm implement the IALG interface Rule 14 All abstract algorithm interfaces must derive from the IALG interface In this section we cover the details necessary for a developer to bundle a module into a form tha...

Page 35: ...pliant module includes one or more interface headers In order to ensure that no name conflicts occur we must adopt a naming convention for all header files C language headers should be named as follows module vers _ vendor h Assembly language headers should be named as follows module vers _ vendor h arch A single vendor may produce more than one implementation of an algorithm For example a debug v...

Page 36: ...uld be fir_ti_debug l62 and fir_ti l62 To avoid having to make changes to source Code only one header file must suffice for all variants supplied by a vendor Since different algorithm implementations can be interchanged without recompilation of client programs it should not be necessary to have different debug versus release definitions in a module s header However a vendor may elect to include ve...

Page 37: ...at should be provided by algorithm components to enable system integrators to assemble combinations of algorithms into reliable products Topic Page 4 1 Data Memory 38 4 2 Program Memory 40 4 3 Interrupt Latency 41 4 4 Execution Time 41 SPRU352G June 2005 Revised February 2007 Algorithm Performance Characterization 37 Submit Documentation Feedback ...

Page 38: ... is allocated and freed at run time and is managed using a LIFO Last In First Out allocation policy Finally static data is any data that is allocated at design time i e program build time and whose location is fixed during run time In the remainder of this section we define performance metrics that describe an algorithm s data memory requirements Heap memory is run time re allocable bulk memory th...

Page 39: ...e not required to be a constant it may be function of the algorithm s instance creation parameters One way to achieve reentrancy in a function is to declare all scratch data objects on the local stack If the stack is in on chip memory this provides easy access to fast scratch memory The problem with this approach to reentrancy is that if carried too far it may require a very large stack While this...

Page 40: ... algorithm using the eXpressDSP compliant IALG interface The implementation of interfaces is described in Section 3 2 and a detailed description of the IALG interface is provided in the TMS320 DSP Algorithm Standard API Reference Guideline 7 Algorithms should never have any scratch static memory Algorithm code can often be partitioned into two distinct types frequently accessed code and infrequent...

Page 41: ...pe of memory allocated to an algorithm instance Since this relationship can be extremely complex interrupt latency should be measured for a single fixed configuration Thus this number must be the latency imposed by an algorithm instance using the same memory configuration used to specify worst case MIPS and memory requirements In this section we examine the execution time information that should b...

Page 42: ...d on a mathematical model of the software and as with any model it may not correspond 100 with reality Moreover the model is dependent on each component accurately characterizing its performance if a component underestimates its CPU requirements by even 1 clock cycle it is possible for the system to fail Finally designing with worst case CPU requirements often prevents one from creating viable com...

Page 43: ...Suppose for example that an audio encoder consumes 10 milliseconds frames of data at a time but only outputs encoded data on every 20 milliseconds In this case the encoder s worst case execution time on even frames will differ perhaps significantly from the worst case execution time for odd numbered frames the output of data only occurs on odd frames In these situations it is important to characte...

Page 44: ...www ti com Algorithm Performance Characterization 44 SPRU352G June 2005 Revised February 2007 Submit Documentation Feedback ...

Page 45: ... DSP families Topic Page 5 1 CPU Register Types 46 5 2 Use of Floating Point 47 5 3 TMS320C6xxx Rules and Guidelines 47 5 4 TMS320C54xx Rules and Guidelines 49 5 5 TMS320C55x Rules and Guidelines 52 5 6 TMS320C24xx Guidelines 57 5 7 TMS320C28x Rules and Guidelines 58 SPRU352G June 2005 Revised February 2007 DSP Specific Guidelines 45 Submit Documentation Feedback ...

Page 46: ...algorithm to the value it had at entry Initialized register these registers may be used by an algorithm contain a specified initial value upon entry to an algorithm function as stated next to the register and must be restored upon exit from the algorithm Read only register these registers may be read but must not be modified by an algorithm In addition to the categories defined above all registers...

Page 47: ...e of which data format to use is often decided based on the presence of other processors in the system the data format of the other processors which may not be configurable determines the setting of the C6x data format Thus it is not possible to simply choose a single data format for all eXpressDSP compliant algorithms Rule 25 All C6x algorithms must be supplied in little endian format Guideline 1...

Page 48: ... enable register Read only global IFR Interrupt flag register Read only global IRP 1 Interrupt return pointer Scratch global ISR Interrupt set register Not accessible global ISTP Interrupt service table pointer Read only global NRP Non maskable Interrupt return pointer Read only global PCE1 Program counter Read only local FADCR C67xx floating point control register Preserve local FAUCR C67xx float...

Page 49: ...ded near or far addresses it is possible to have incompatible mixtures of code We need to ensure that calls made from an algorithm to external support functions will be compatible and that calls made from the application to an algorithm will be compatible We also need to ensure that calls to independently relocatable object modules within an algorithm will be compatible Rule 28 On processors that ...

Page 50: ... the usable program space on each page is reduced To ensure algorithm usability the code size for each loadable object must be limited Rule 30 On processors that support an extended program address space paged memory the code size of any independently relocatable object module should never exceed the code space available on a page when overlays are enabled Note here that the algorithm can be large...

Page 51: ...hift operand Scratch local TRN Viterbi transition register Scratch local XPC Extended Program Counter Scratch local The C54xx contains three status registers ST0 ST1 and PMST Each status register is further divided into several distinct fields Although each field is often thought of as a separate register it is not possible to access these fields individually In order to set one field it is necess...

Page 52: ... rules and guidelines that are specific to the TMS320C5500 family of DSPs The C55X CPU supports different stack configurations and the stack configuration register 4 bits selects the stack architecture The selection of the stack architecture can be done only on a hardware or software reset To facilitate integration each algorithm must publish the stack configuration that it uses Rule 31 All C55x a...

Page 53: ...t the data accessed with the B bus coefficient addressing must come from on chip memory The data that is accessed by B bus can be static data or heap data All C55x algorithms that access data static or heap with the B bus must adhere to the following rule Rule 34 All C55x algorithms that access data by B bus must document the instance number of the IALG_MemRec structure that is accessed by the B b...

Page 54: ...C compiler register variables Preserve local AC0 AC1 AC2 AC3 16 bit 32 bit and 40 bit data or 24 bit code pointers Scratch local T0 T1 Function arguments 16 bit data values Scratch local T2 T3 C compiler expression registers Preserve local SSP System Stack Pointer Preserve local SP Stack Pointer Preserve local ST0 ST1 ST2 ST3 Status registers Preserve local IFR0 IMR0 IFR1 IMR1 Interrupt flag and m...

Page 55: ...e bit for D unit Init local C16 0 Dual 16 bit math bit Init local FRCT 0 Fractional mode bit Init local LEAD 0 Lead bit Init local T2 bits 0 to 4 Accumulator shift mode Scratch local The following table describes the attributes for the ST2 register ST2 Field Name Use Type ARMS 0 AR Modifier Switch Init local XCNA Conditional Execute Control Address Read only local XCND Conditional Execute Control ...

Page 56: ...als Read only global CBERR CPU bus error Read only global MPNMC Microprocessor Microcomputer mode Read only global SATA 0 Saturation control bit for A unit Init local AVIS Address visibility bit Read only global CLKOFF CLKOUT disable bit Read only global SMUL 0 Saturation on multiply bit Init local SST Saturation on store Init local DSP Specific Guidelines 56 SPRU352G June 2005 Revised February 20...

Page 57: ...5 6 2 Data Models 5 6 3 Program Models 5 6 4 Register Conventions TMS320C24xx Guidelines This section describes the rules and guidelines that are specific to the TMS320C24xx family of digital signal processors DSPs Note that 24xx here refers to the following DSPs C240 C241 C242 C243 and C240x As per all other eXpressDSP compliant algorithms C24xx eXpressDSP compliant algorithms also referred to as...

Page 58: ...d only global TC Test control flag Scratch local SXM Sign extension mode Scratch local C Carry Scratch local XF XF pin status Read only global PM Product shift mode Init local The C24xx CPU has only one non interruptible loop instruction namely RPT Once started the RPT instruction blocks interrupts until the entire number of repeats are completed Thus the length of these loops can have a significa...

Page 59: ...AR3 Pointers and expressions Preserve local XAR4 Pointers expressions argument passing and returns 16 and 22 bit Scratch local pointer values from functions XAR5 Pointers expressions and arguments Scratch local XAR6 Pointers and expressions Scratch local XAR7 Pointers expressions indirect calls and branches Scratch local SP Stack pointer Preserve local T Multiply and shift expressions Scratch loca...

Page 60: ...status bit Scratch local SPA Stack pointer alignment bit Init local VMAP Vector map bit Read Only global PAGE0 PAGE0 addressing mode configuration Read Only global DBGM Debug enable mask bit Read Only global INTM Interrupt mode Preserve global The TMS320C28x CPU has only one non interruptible loop instruction namely RPT Once started the RPT instruction blocks interrupts until the entire number of ...

Page 61: ...Framework 62 6 3 Requirements for the Use of the DMA Resource 63 6 4 Logical Channel 63 6 5 Data Transfer Properties 64 6 6 Data Transfer Synchronization 64 6 7 Abstract Interface 65 6 8 Resource Characterization 66 6 9 Runtime APIs 67 6 10 Strong Ordering of DMA Transfer Requests 67 6 11 Submitting DMA Transfer Requests 68 6 12 Device Independent DMA Optimization Guideline 68 6 13 C6xxx Specific ...

Page 62: ...ed recommendations The algorithm standard looks upon algorithms as pure data transducers They are among other things not allowed to perform any operations that can affect scheduling or memory management All these operations must be controlled by the framework to ensure easy integration of algorithms possibly from different vendors In general the framework must be in command of managing the system ...

Page 63: ...guration must therefore be determined on the fly An algorithm might schedule a fixed number of DMA data transfers into its program flow and the configuration of these transfers might be the same It is only necessary to provide the source and destination information to execute these data transfers since the configuration is fixed This type of data transfer is not data dependent its configuration ca...

Page 64: ... shared across both the source and the destination element size the number of bytes per element 1 2 4 for IDMA2 and 1 bytes 65535 for IDMA3 number of elements the number of elements per frame 1 elements 65535 number of frames the number of frames in the block 1 frames 65535 The following parameters may be shared between source and destination and if supported by hardware can also be set independen...

Page 65: ...or example an algorithm can not start a data transfer in algActivate by calling ACPY2_start or ACPY2_startAligned and then check for completion of the data transfer in the algorithm s process function by calling ACPY2_complete or wait for the completion by calling ACPY2_wait The algorithm must ensure the data transfer is complete in aalgActivate by using either the ACPY2_complete or the ACPY2_wait...

Page 66: ...the queue of DMA jobs number of concurrent transfers on each logical channel DMA Rule 4 All algorithms must state the maximum number of concurrent DMA transfers for each logical channel This can be accomplished by filling out a table such as that shown below Logical channel number Number of concurrent transfers depth of queue 0 3 1 1 In the example above that algorithm requires two DMA logical cha...

Page 67: ... Using DMA with Framework Components for C64x SPRAAG1 Use of the ACPY3 library is not mandatory when using the IDMA3 interfaces algorithms are free to use their own DMA functions to program the physical DMA resources acquired through the IDMA3 protocol An important enhancement that was introduced through the ACPY2 APIs over the deprecated ACPY APIs is the strict FIFO ordering property of DMA trans...

Page 68: ...es at any alignment and when allowed by the architecture adjusts the transfer parameters including element size number of elements transfer type to transparently perform the desired transfer using the given alignment It is intended to simplify algorithm development in the initial states ACPY2_start thus strives to maintain simplicity while maintaining reasonable levels of performance The ACPY2_sta...

Page 69: ...t size constraints for internal buffers they request through the IALG interface To deal with these coherency problems the following new guidelines and rules have been added DMA Guideline 3 To ensure correctness All C6000 algorithms that implement IDMA2 need to be supplied with the internal memory they request from the client application using algAlloc This guideline applies to the client applicati...

Page 70: ...smaller buffers and then use these smaller buffers in DMA transfers In this case the transfer must also occur on buffers aligned on a cache line boundary Note that this does not mean the transfer size needs to be a multiple of the cache line length in size Instead the buffer containing memory locations involved in the transfer must be considered a single buffer the algorithm must not directly acce...

Page 71: ...ct operation of general C55x algorithms on hardware with automatic endianism conversion following rules regarding alignment size and access all rules for data buffers that may reside in external memory must be followed DMA Rule 10 C55x algorithms must request all data buffers in external memory with 32 bit alignment and sizes in multiples of 4 bytes DMA Rule 11 C55x algorithms must use the same da...

Page 72: ...sfer has completed The framework checks to see that the data has been transferred Algorithm B can process the transferred data Notice that algorithm A must wait for the transfer to complete because the parallel CPU processing takes less time than the data transfer whereas algorithm B s data transfer has completed at the time of synchronization In summary we can see from Section 6 15 2 that sharing...

Page 73: ...policy However in this scenario it is more important to grant the DMA channel to the higher priority algorithm Scenario 2 The system policy is to let the current DMA transfer issued by the lower priority algorithm finish before starting a DMA transfer issued by the higher priority algorithm See Section 6 15 5 Events 1 Algorithm A requests a data transfer by calling ACPY2_start The framework execut...

Page 74: ...void the scenarios described in Section 6 15 4 and Section 6 15 5 This of course requires at least one physical channel for each priority level which might not always be the case In summary sharing a DMA device among algorithms at different priorities can be accomplished in several different ways In the end it is the system integrator s choice based on its available resources 74 Use of the DMA Res...

Page 75: ... together all rules and guidelines into one compact reference Topic Page A 1 General Rules 76 A 2 Performance Characterization Rules 77 A 3 DMA Rules 77 A 4 General Guidelines 78 A 5 DMA Guidelines 79 SPRU352G June 2005 Revised February 2007 Rules and Guidelines 75 Submit Documentation Feedback ...

Page 76: ...ons within a single source file See Section 3 1 Rule 8 All external definitions must be either API identifiers or API and vendor prefixed See Section 3 1 1 Rule 9 All undefined references must refer either to the operations specified in Appendix B a subset of C runtime support library functions and a subset of the DSP BIOS HWI API functions or TI s DSPLIB or IMGLIB functions or other eXpressDSP co...

Page 77: ... Rule 32 All C55x algorithms must access all static and global data as far data also the algorithms should be instantiable in a large memory model See Section 5 5 2 Rule 33 C55x algorithms must never assume placement in on chip program memory i e they must properly operate with program memory operated in instruction cache mode See Section 5 5 3 Rule 34 All C55x algorithms that access data by B bus...

Page 78: ...nation of any DMA transfer See Section 6 13 1 DMA Rule 10 C55x algorithms must request all data buffers in external memory with 32 bit alignment and sizes in multiples of 4 bytes See Section 6 14 3 DMA Rule 11 C55x algorithms must use the same data types access modes and DMA transfer settings when reading from or writing to data stored in external memory or in application passed data buffers See S...

Page 79: ...DMA channel for each distinct type of DMA transfer it issues and avoid calling ACPY2 configure and preferring the new fast configuration APIs where possible See Section 6 12 DMA Guideline 3 To ensure correctness All C6000 algorithms that implement IDMA2 need to be supplied with the internal memory they request from the client applciation using algAlloc See Section 6 13 1 DMA Guideline 4 To facilit...

Page 80: ...www ti com Rules and Guidelines 80 SPRU352G June 2005 Revised February 2007 Submit Documentation Feedback ...

Page 81: ...erates all acceptable core run time APIs that may be referenced by an eXpressDSP compliant algorithm Topic Page B 1 TI C Language Run Time Support Library 82 B 2 DSP BIOS Run time Support Library 82 SPRU352G June 2005 Revised February 2007 Core Run Time APIs 81 Submit Documentation Feedback ...

Page 82: ... _addd _subd _mpyd _divd 2 3 log10 cosh etc allowed Conversion functions atoi ftoi itof etc 2 disallowed Heap management functions malloc free realloc alloc 4 disallowed I O functions printf open read write etc 5 disallowed misc non reentrant functions printf sprintf ctime etc 4 6 1 Exceptions strtok is not reentrant and strdup allocates memory with malloc 2 Some of these are issued by the compile...

Page 83: ...uary 1973 40 61 Massey Tim and Iyer Ramesh DSP Solutions for Telephony and Data Facimile Modems SPRA073 1997 Texas Instruments TMS320C54x Optimizing C Compiler User s Guide SPRU103C 1998 Texas Instruments TMS320C6x Optimizing C Compiler User s Guide SPRU187C 1998 Texas Instruments TMS320C62xx CPU and Instruction Set SPRU189B 1997 Texas Instruments TMS320C55x Optimizing C C Compiler User s Guide SP...

Page 84: ...www ti com Bibliography 84 SPRU352G June 2005 Revised February 2007 Submit Documentation Feedback ...

Page 85: ...benefits for real time compute intensive applications Client The term client is often used to denote any piece of software that uses a function module or interface for example if the function a calls the function b a is a client of b Similarly if an application App uses module MOD App is a client of MOD COFF Common Output File Format The file format of the files produced by the TI compiler assembl...

Page 86: ...s Multithreading Multithreading is the management of multiple concurrent uses of the same program Most operating systems and modern computer languages also support multithreading Preemptive A property of a scheduler that allows one task to asynchronously interrupt the execution of the currently executing task and switch to another task the interrupted task is not required to call any scheduler fun...

Page 87: ...thout loss i e prior contents need not be saved and restored after each use Scratch Register A register that can be overwritten without loss i e prior contents need not be saved and restored after each use Thread The program state managed by the operating system that defines a logically independent sequence of program instructions This state may be as little as the Program Counter PC value but oft...

Page 88: ...om TI to use such products or services or a warranty or endorsement thereof Use of such information may require a license from a third party under the patents or other intellectual property of the third party or a license from TI under the patents or other intellectual property of TI Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alterati...

Reviews: