Chip
TPCs
SM per
TPC
Threads per
SM
Total
Threads Per
Chip
GeForce 8 &
9 Series
8 2 768
12,288
GeForce
GTX 200
GPUs
10 3 1,024
30,720
Table 3: Maximum Number of Threads
Doing the math results in 32 x 32, or 1,024 maximum concurrent threads that can
be managed by each SM. With 30 SMs in total, the GeForce GTX 280 supports up
to 30,720 concurrent threads in hardware (versus 768 threads/SM × 2 SMs/TPC ×
8 TPCs = 12,288 maximum concurrent threads in GeForce 8800 GTX).
Larger Register File
The local register file size has doubled per SM in GeForce GTX 200 GPUs
compared to GeForce 8 & 9 Series GPUs. The older GPUs could run into
situations with long shaders where registers would be exhausted, generating the
need to swap to memory. A much larger register file permits larger and more
complex shaders to be run on the GeForce GTX 200 GPUs faster and more
efficiently. In terms of die size increase, the additional register file takes only a small
fraction of SM die area.
Games are employing more and more complex shaders that require more register
space. Figure 7 below highlights performance improvements 2× register file size in
3D Mark Vantage.
2x vs 1x Register File Size
3D Mark Vantage
Extreme Preset
3600
3800
4000
4200
4400
4600
4800
Overall Score
GPU Total
Normal LRF (2x)
Decreased LRF (1x)
Figure 7: Local Register File 2× versus 1×
14
May, 2008 | TB-04044-001_v01