AMD ATI CTM Technical Reference Manual Download Page 8

Page: 8 / 56

ATI CTM Guide v. 1.01

4 CTM Units

2.1.1

The ATI Data Parallel Processor Array

The ATI Data Parallel Processor (DPP) Array comprises one or more processors, each a programmable unit that can
execute a series of instructions.

Each processor in the array is directed by the Processor Execution Unit (see Section 2.1.2). If a processor is idle, the
PE may request that it execute a program. It does so by passing to the processor an identifier pair (i, j), where i and j
are non-negative (range-limited) integers, as well as its conditional value. Upon receiving the identifier pair and
conditional value, the processor informs the PE that it is busy, resets its program counter to zero, and begins program
execution. The processor remains busy until its internal program counter reaches the end of the program, as specified
in the Application Binary Interface. After the processor executes the instruction at the end-of-program address, the
processor halts and informs that PE that it is again idle.

Instructions for a program, as well as constants, inputs, and outputs to which the program may refer, are stored in
memory, and read or written through the MC (see Section 2.1.4). Conceptually, each processor maintains a separate
interface to the MC. This interface consists of two non-negative (range-limited) integer indices (x, y) and a field
identifying the type of memory access the processor is requesting (program instruction, floating-point constant,
integer constant, boolean constant, or input read; or output write).

The (x, y) pair is different for each of the types of memory that a processor may request. For instructions, (x, y) is
equal to (pc, 0), where pc is the current program counter. The index pair for each of the constants is (c, 0), where c is
the index specified by the program instruction requesting the constant. The index pair and identifier for inputs are
specified by the program instruction requesting an input value. The index pair for an output is always the pair assigned
to the processor by the PE (i, j), while the identifier for the output is specified in the requesting program instruction.

If conditional output is enabled, output write requests by a processor are conditionally generated, based on a value
returned by the Conditional Output Unit (see Section 2.1.3). The processor sends a conditional value (v) and its (i, j)
index pair to the CO, and the CO then performs a conditional test based on the value and index pair. If the test passes,
the processors dispatch l output write requests to the MC; otherwise no output write requests are generated. The
conditional value, v, depends on program that is currently being executed. The value may be specified directly in an
instruction in the program, or it may equal the conditional value sent to the processor by the PE. If conditional output
is disabled, the processor behaves as if the conditional output test always passes. Conditional output is enabled by
setting the condition location to the processors with the

set_cond_loc

command (see page 22).

All processors refer to the same instructions and constants, but may index different input, output, and conditional data.
Thus, if multiple processors are working simultaneously, CTM exports a SIMD programming model. Individual
processors, however, may or may not execute in SIMD lock-step in a particular CTM implementation; the behavior
of individual processors relative to other processors is unspecified.

2.1.2

Processor Execution Unit

The Processor Execution Unit interprets commands from a command buffer, forwarding them to other units in CTM
if necessary. Under normal operation, the PE consumes commands as fast as it can process them or pass them along.
If, however, the PE receives a

wait_for_idle

command (see page 11), it stops reading commands until all processors

in the processor array are idle. Once the processors are idle, the PE again starts to read commands, beginning with the
one following the

wait_for_idle

command.

In addition to parsing the command buffer, the PE is responsible for distributing work to the processors in the
processor array. The PE's distribution of work is based on the rectangular domain D \subset Z^2, with D = { (i,j) | i0
<= i <= i1, j0 <= j <= j1 }. The parameters i0, j0, i1, and j1 are specified to the PE through the

set_domain

command

(see page 10).

When the PE receives a

start_program

command (see page 11), it begins allocating work for the processors. If

conditional program execution is disabled, the PE schedules the processors to run the current program once for each
index pair (i, j) in the current domain. The specific partitioning of work among the processors and the order in which
the index pairs are scheduled is unspecified. As described in Section 2.1.1, the PE sends a corresponding index pair
and its conditional value to an idle processor in order to execute the program for that index pair. The result of the
entire computation is as if the program were executed in SIMD across all index pairs.[x]