Improved Dual Issue
Special function units (SFUs) in the SMs compute transcendental math, attribute
interpolation (interpreting pixel attributes from a primitive’s vertex attributes), and
perform floating-point MUL instructions. The individual streaming processing cores
of GeForce GTX 200 GPUs can now perform near full-speed dual-issue of
multiply-add operations (MADs) and MULs (3 flops/SP) by using the SP’s MAD
unit to perform a MUL and ADD per clock, and using the SFU to perform another
MUL in the same clock. Optimized and directed tests can measure around 93-94%
efficiency.
The entire GeForce GTX 200 GPU SPA delivers nearly one teraflop of peak,
single-precision, IEEE 754, floating-point performance.
Double Precision Support
A very important new addition to the GeForce GTX 200 GPU architecture is
double-precision, 64-bit floating point computation support. This benefits various
high-end scientific, engineering, and financial computing applications or any
computational task requiring very high accuracy of results. Each SM incorporates a
double-precision 64-bit floating math unit, for a total of 30 double-precision 64-bit
processing cores.
The double-precision unit performs a fused MAD, which is a high-precision
implementation of a MAD instruction that is also fully IEEE 754R floating-point
specification compliant. The overall double-precision performance of all 10 TPCs of
a GeForce GTX 280 GPU is roughly equivalent to an eight-core Xeon CPU,
yielding up to 78 gigaflops.
Improved Texturing Performance
The eight TPCs of the GeForce 8800 GTX allowed for 64 pixels per clock of
texture filtering, 32 pixels per clock of texture addressing, 32 pixels per clock of 2×
anisotropic bilinear filtering (8-bit integer), or 32-bilinear-filtered pixels per clock (8-
bit integer or 16-bit floating point). Subsequent GeForce 8 and 9 Series GPUs
balanced texture addressing and filtering.
For example, the GeForce 9800 GTX can address and filter 64 pixels
per clock, supporting 64-bilinear-filtered pixels per clock (8-bit integer)
or 32-bilinear-filtered pixels per clock (16-bit floating point).
GeForce GTX 200 GPUs also provide balanced texture addressing and filtering and
each of the 10 TPCs includes a dual-quad texture unit capable of addressing and
filtering eight bilinear pixels/clock, or four 2:1 anisotropic filtered pixels/clock, or
four FP16 bilinear-filtered pixels/clock. Total bilinear texture addressing and
filtering capability for an entire high-end GeForce GTX 200 GPU is 80 pixels per
clock.
GeForce GTX 200 GPUs employ a more efficient scheduler, allowing the chips to
attain close to theoretical peak performance in texture filtering. In real world
measurements, it is 22% more efficient than the GeForce 9 Series.
May 2008 | TB-04044-001_v01
15