ATI CTM Guide v. 1.01
© 2006 Advanced Micro Devices, Inc.
6 CTM Units
1. Address Translation for Linear Memory Format
2. Address Translation for Tiled Memory Formats
The 2x2 superfine tiling option augments the linear and tiled format addresses described above. When applied to
single-channel inputs, it operates as if four independent data elements are requested with index pairs given by (x+1,
y), (x, y+1), (x+1, y+1), and (x, y). These four values are packed into the four channels (c0, c1, c2, c3), respectively,
of the register specified by the program making the memory request. Without the 2x2 superfine tiling option, a
program would need to make four independent input memory requests, across four independent instructions, to
achieve the same result. The 2x2 superfine tiling is ignored for all memory clients besides inputs, and its behavior is
undefined if the input memory format has more than one channel.
The index pair (x, y) and a unique identifier are sent to the MC by the client requesting the memory read or write. The
pitch, offset, tiling, and format parameters associated with the client identifier are maintained in the MC (commands
to set these parameters are summarized below), and they are accessed when a client requests a memory transaction.
The parameters passed to the memory control unit are different for each of the clients making a memory request. They
are detailed, for each of the possible identifiers, in the following subsections.
Input Parameters
The MC supports clients (processors in the data parallel processor array) requesting a memory read for up to 16
distinct program inputs. The (x, y) index pair for a given request is specified in an instruction being executed on one
of the processors. The index pair is sent to the MC along with the program input identifier, also specified in the
requesting instruction. The pitch, offset, tiling, and format for each input identifier are shared among all processors
in the processor array. These values are provided to the MC with the
set_inp_fmt
command (see page 15).
The MC may service any number of requests from the processors during program execution. If the data can be found
in the MC input read cache, then the MC will satisfy the request from the cache. Otherwise, it will pull data into the
cache (either from GPU or system memory, as appropriate), in the process of servicing the request. The input read
cache is shared among all 16 program inputs, and must be invalidated to guarantee correct reading of data that has
changed in memory. The input read cache is invalidated with the
inv_inp_cache
Bytes
Bits[31:5]
Bits [4:0]
1
y[11:0]*pitch[13:5]+x[11:5]+offset[31:5]
x[4:0]
2
y[11:0]*pitch[13:4]+x[11:4]+offset[31:5]
x[3:0],0
4
y[11:0]*pitch[13:3]+x[11:3]+offset[31:5]
x[2:0],00
8
y[11:0]*pitch[13:2]+x[11:2]+offset[31:5]
x[1:0],000
16
y[11:0]*pitch[13:1]+x[11:1]+offset[31:5]
x[0],0000
Bytes
Bits [31:11]
Bits [10:9]
Bits [8:7]
Bits [6:5]
Bits [4:0]
1
y[11:5]*pitch[13:6]+x[11:6]+
offset[31:11]
y[4]^x[6],x[5]^y[5]
y[3]^x[5],x[4]^y[4]
y[2],x[3]
y[1:0],x[2:0]
2
y[11:5]*pitch[13:5]+x[11:5]+
offset[31:11]
y[4]^x[5],x[4]^y[5]
y[3]^x[4],x[3]^y[4]
y[2],x[2]
y[1:0],x[1:0],0
4
y[11:4]*pitch[13:5]+x[11:5]+
offset[31:11]
y[3]^x[5],x[4]^y[4]
y[2]^x[4],x[3]^y[3]
y[1],x[2]
y[0],x[1:0],00
8
y[11:4]*pitch[13:4]+x[11:4]+
offset[31:11]
y[3]^x[4],x[3]^y[4]
y[2]^x[3],x[2]^y[3]
y[1],x[1]
y[0],x[0],000
16
y[11:3]*pitch[13:4]+x[11:4]+
offset[31:11]
y[2]^x[4],x[3]^y[3]
y[1]^x[3],x[2]^y[2]
y[0],x[1]
x[0], 0000
Содержание ATI CTM
Страница 1: ...ATI CTM Guide Technical Reference Manual Version 1 01...
Страница 6: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 2 Related Documents...
Страница 48: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 44 Errata...
Страница 54: ...ATI CTM Guide v 1 01 2006 Advanced Micro Devices Inc 50 Executable Files...