
GeForce GTX 980 Whitepaper
GM204 HARDWARE ARCHITECTURE
IN-DEPTH
7
In GeForce GTX 980, each GPC ships with a dedicated raster engine and four SMMs. Each SMM has 128
CUDA cores, a PolyMorph Engine, and eight texture units. With 16 SMMs, the GeForce GTX 980 ships
with a total of 2048 CUDA cores and 128 texture units.
The GeForce GTX 980 features four 64-bit memory controllers (256-bit total). Tied to each memory
controller are 16 ROP units and 512KB of L2 cache. The full chip ships with a total of 64 ROPs and
2048KB of L2 cache (this compared to 32 ROPs and 512K L2 on GK104).
The following table provides a high-level comparison of Maxwell vs. our previous-generation GK104
GPU:
GPU
GeForce GTX 680 (Kepler)
GeForce GTX 980 (Maxwell)
SMs
8
16
CUDA Cores
1536
2048
Base Clock
1006 MHz
1126 MHz
GPU Boost Clock
1058 MHz
1216 MHz
GFLOPs
3090
4612
1
Texture Units
128
128
Texel fill-rate
128.8 Gigatexels/sec
144.1 Gigatexels/sec
Memory Clock
6000 MHz
7000 MHz
Memory Bandwidth
192 GB/sec
224 GB/sec
ROPs
32
64
L2 Cache Size
512KB
2048KB
TDP
195 Watts
165 Watts
Transistors
3.54 billion
5.2 billion
Die Size
294
mm²
398
mm²
Manufacturing Process
28-nm
28-nm
The GeForce GTX 980 has double the SMs compared to the GK104 GPU used in the GeForce GTX 680
released two years ago. Because of the changes implemented in GTX 980’s new Maxwell SM, we were
able to integrate 2x more SMs without doubling the die size. With each SM also containing its own
dedicated PolyMorph Engine, GeForce GTX 980 also has twice the number of geometry units as its direct
predecessor. We’ll be discussing more details on the new SM design in the next section of the
whitepaper.
Based on efficiency and workload analysis, and math vs. texture processing requirements of modern
games, NVIDIA engineers determined that eight texture units per SMM is the best architectural balance
for Maxwell; therefore, the total number of texture units is the same as Kepler, 128. However, thanks to
GeForce GTX 980’s higher clocks, texture fill rate improves by 12% from one generation to the next. To
improve performance in high AA/high resolution gaming scenarios, we doubled the number of ROPs
1
The GFLOPS and texel fill rates in this chart are based on GPU Base Clock