been modified to improve efficiency of data transfer between the driver and the
front end.
The memory crossbar between the data assembler and the frame buffer units has
been optimized, allowing the GeForce GTX 200 GPUs to run at full speed when
performing indexed primitive fetches (unlike the prior generation which suffered
some contention between the front end and data assembler).
The post-transform cache size has been increased, resulting in fewer pipeline stalls
and faster communication from the geometry and vertex stages to the viewport
clip/cull stage. (Setup rates are similar to prior generation, supporting up to one
primitive per clock).
Z-Culling performance has also been improved, especially at high resolutions. Early-
Z rejection rates have been increased because the number of ZROPs was increased.
The maximum ZROP cull rate is 256 samples/clock or 32 pixels/clock.
GeForce GTX 200 GPUs also include significant micro-architectural improvements
in register allocation, instruction scheduling, and instruction issue. The GPUs can
now feed the execution units more swiftly. These improvements are responsible for
the ability to dual-issue instructions to SPs and SFUs as previously discussed.
Scheduling of work between texture units and the SM controller has also been
improved.
May 2008 | TB-04044-001_v01
19