Stacks and Branch Counters 35
© 2006 Advanced Micro Devices, Inc.
ATI CTM Guide v. 1.01
Jump conditions can be based off of a boolean constant, the result of the previous ALU operation, and/or a predicate
bit. Booleans are constant across all processors, so dynamic flow control is only achieved with predicates and
conditionals (ALU result). Any ALU instruction can specify whether to write the ALU result and what channel
supplies the data for the result. The ALU result is only valid until another ALU instruction writes to the result, or a
flow control instruction is encountered. The predicate bits can be set anywhere and are preserved across flow control
instructions, but there are only 4 of them.
Flow control predication cannot be per-channel. One of the replicate swizzles must be used for predication of flow
control instructions (all other types of instructions can be predicated per channel). Flow control instructions use the
RGB_PRED_SEL and RGB_PRED_INV fields to compute the predicate.
3.5.2
Stacks and Branch Counters
The HW maintains two separate stacks for flow control:
•
Address Stack
- Purely an address stack. No other state is maintained. Popping the address stack overrides the
instruction address field of the flow control instruction. The address stack will only be modified if the flow
control instruction decides to jump.
•
Loop Stack
- Stores an internal iteration count, loop variable (aL), and a processor mask per frame. The only
way to access the iteration count is with the LOOP/ENDLOOP and REP/ENDREP operations. The only way to
alter the aL variable is with the LOOP/ENDLOOP ops. The only way to read the aL variable is with relative
addressing. The only way to alter the processor mask is with the BREAK or CONTINUE instruction.
Each stack's size is dependent on whether the program is in partial or full flow control mode. Stack overflows and
underflows produce undefined behaviour in the hardware. The stack sizes are:
The loop stack is maintained in such a way that an inner REP block will continue to see the loop variable from an
outer LOOP block. Nested LOOP blocks will shadow the loop variable. The loop variable is not valid if you are not
in at least one LOOP block.
In addition to the two stacks, hardware maintains an Active Bit and a Branch Counter for each processor that indicate
whether the processor is active and, if it was disabled by a conditional statement (if, else), how long before it can be
reactivated. If the active bit is unset, the processor is inactive and the branch counter indicates the number of
conditional blocks we must exit before the processor can be activated again. The maximum value of this counter is
dependent on whether the program is in partial or full flow control mode. The limits (which determine maximum safe
nesting depth) are:
The branch counter can be incremented and decremented directly by any flow control instruction based on whether
the processor agrees with the jump decision. Manipulating the branch counter may affect the active bit. Incrementing
the counter on an active processor will disable the processor by clearing the active bit, and set the branch counter to
zero. Decrementing the counter of an inactive processor to a negative value will set the active bit, reactivating the
processor. The branch counter is ignored in hardware while the active bit is set.
Processors disabled by looping statements (BREAKLOOP, BREAKREP, and CONTINUE) are also tracked with
"loop inactive" counters, however unlike the branch counter, the loop counters cannot be manipulated directly.
PARTIAL
FULL
Loop stack
n/a
4
Address stack
n/a
4
PARTIAL
FULL
Branch counter
0..3
0..31
Maximum depth
4
32
Summary of Contents for ATI CTM
Page 1: ...ATI CTM Guide Technical Reference Manual Version 1 01...
Page 6: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 2 Related Documents...
Page 48: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 44 Errata...
Page 54: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 50 Executable Files...