ATI CTM Guide v. 1.01
© 2006 Advanced Micro Devices, Inc.
34 Flow Control
In the above example, note that instruction 2 waits for the semaphore to ensure the semaphore is available before
acquiring it.
Using TEX_SEM_ACQUIRE on an uncached write instruction is very similar to using TEX_SEM_ACQUIRE on a
texture lookup. If you are performing uncached reads and writes in the same program, you should use one of the
TEX_SEM_ACQUIRE options on the last uncached write before an uncached read, and you should use
TEX_SEM_WAIT on the uncached read. The uncached write mechanism will not release the semaphore until that
particular uncached write, and all prior uncached writes, are complete.
Remember that the last instruction of the program must set TEX_SEM_WAIT, to ensure that the texture unit is ready
to process the next fragment. It is invalid to terminate the program while holding the texture semaphore, either from
a texture lookup or from an uncached write.
Warning: see errata section at the end for a known issue with texture semaphores.
3.5
Flow Control
Each flow control instruction is essentially a conditional jump. Various optional stack operations allow all the
different kinds of traditional flow control statements. In particular, flow control instructions allow branch statements
(if/else/endif blocks), loop statements (with an optional loop register, aL), and subroutine calls. Optimizers may be
able to combine these basic types of instructions, and utilize more esoteric flow control modes.
HW supports two flow control modes, "partial" and "full". Partial flow control mode enables twice as many contexts
as full mode, but partial flow control mode has a limited nesting depth of branch statements, and does not support
loops or subroutine calls. Partial flow control mode should be used unless the program requires branch statements
nested more than 6 deep, or the program requires loops or subroutines.
In CTM, partial or full flow control mode is selected based on information supplied as part of the program itself. The
application has no control over the selection (except via the program it loads).
See the Fields section below for descriptions of fields that affect the jump condition and the various flow control
stacks. Following that are the values of those fields for the most common types of flow control operations.
3.5.1
Dynamic Flow Control
As the X1K DPP is a SIMD engine, applying the same instruction to a group of processors, dynamic flow control
must be implemented with processor masks. If a processor wants to take a jump because it failed an IF condition, but
its neighbors in the processor group don't want to jump, the processor must be masked off for a time until that branch
of the IF statement is completed. Only if all processors fail the IF condition would the program counter actually be
changed. Conversely, if some processors don't want to jump to a subroutine, they must be masked off for the entire
subroutine. Only if none of the processors want to jump would the call be skipped. A break statement within a loop
masks off passing processors until the loop is complete, and the program counter is only changed if all processors
want to jump.
These processor masks are organized into stacks so flow control blocks may be nested. The operations on these stacks
are encoded in the flow control instructions as flags, instead of having one set of opcodes which hard-wire the stack
behavior. This orthogonality allows for more creative control of the program behavior, and provides opportunity for
optimizations in programs that use a lot of flow control.
4:
r2 = r2 + 1
0
5:
r3 = r3 + 1
0
6:
r4 = r4 + 1
1
INSTRUCTION
TEX_SEM_WAIT
TEX_SEM_ACQUIRE
Содержание ATI CTM
Страница 1: ...ATI CTM Guide Technical Reference Manual Version 1 01...
Страница 6: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 2 Related Documents...
Страница 48: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 44 Errata...
Страница 54: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 50 Executable Files...