Parallel Computing Architecture
Figure 5 depicts a high-level view of the GeForce GTX 280 GPU parallel
computing architecture. A hardware-based thread scheduler at the top manages
scheduling threads across the TPCs. You’ll also notice the compute mode includes
texture caches and memory interface units. The texture caches are used to combine
memory accesses for more efficient and higher bandwidth memory read/write
operations. The elements indicated as “atomic” refer to the ability to perform
atomic read-modify-write operations to memory. Atomic access provides granular
access to memory locations and facilitates parallel reductions and parallel data
structure management.
Figure 5: GeForce GTX 280 GPU Parallel Computing Architecture
A TPC in compute mode is represented in Figure 6 below. You can see local shared
memory is included in each of the three SMs. Each processing core in an SM can
share data with other processing cores in the SM via the shared memory, without
having to read or write to or from an external memory subsystem. This contributes
greatly to increased computational speed and efficiency for a variety of algorithms.
12
May, 2008 | TB-04044-001_v01