A Detailed Look Inside the Intel
®
NetBurst
™
Micro-Architecture of the Intel Pentium
®
4 Processor
Page 6
Introduction
The Intel
®
Pentium
®
4 processor, utilizing the Intel
®
NetBurst
TM
micro-architecture, is a complete processor re-
design that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution”, introduced on prior Intel
®
micro-architectural
generations. Many of these new innovations and advances were made possible with the improvements in processor
technology, process technology and circuit design and could not previously be implemented in high-volume,
manufacturable solutions. The features and resulting benefits of the new micro-architecture are defined in the
following sections.
This paper begins with a brief introduction of three generations of single-instruction, multiple-data (SIMD)
technology. The rest of this paper describes the principle of operation of the innovations of Intel Pentium 4
processor with respect to the Intel NetBurst micro-architecture and the implementation characteristics of the
Pentium 4 processor.
SIMD Technology and Streaming SIMD Extensions 2
One way to increase processor performance is to execute several computations in parallel, so that multiple
computations are done with a single instruction. The way to achieve this type of parallel execution is to use the
single-instruction, multiple-data (SIMD) computation technique.
Figure 1 shows a typical SIMD computation. Here two
sets of four packed data elements (X1, X2, X3, and X4,
and Y1, Y2, Y3, and Y4) are operated on in parallel, with
the same operation being performed on each
corresponding pair of data elements (X1 and Y1, X2 and
Y2, X3 and Y3, and X4 and Y4). The results of the four
parallel computations are a set of four packed data
elements.
SIMD computations like those shown in Figure 1 were
introduced into the Intel IA-32 architecture with the Intel
MMX™ technology. The Intel MMX technology allows
SIMD computations to be performed on packed byte,
word, and doubleword integers that are contained in a set
of eight 64-bit registers called the MMX registers (see Figure 2).
The Pentium
III
processor extended this initial SIMD computation model with the introduction of the Streaming
SIMD Extensions (SSE). The Streaming SIMD
Extensions allow SIMD computations to be
performed on operands that contain four packed
single-precision floating-point data elements. The
operands can be either in memory or in a set of eight
128-bit registers called the XMM registers (see
Figure 2). The SSE also extended SIMD
computational capability with additional 64-bit MMX
instructions.
The Pentium 4 processor further extends the SIMD
computation model with the introduction of the
Streaming SIMD Extensions 2 (SSE2). The SSE2
extensions also work with operands in either memory
or in the XMM registers. The SSE2 extends SIMD
X4 op Y4
X1 op Y1
X2 op Y2
X3 op Y3
X4
X1
X2
X3
Y4
Y1
Y2
Y3
op
op
op
op
Figure 1 Typical SIMD Operations
Figure 2 Registers available to SIMD Instructions
XMM0
XMM1
XMM2
XMM3
XMM4
XMM5
XMM6
XMM7
128 Bit XMM Registers
64 Bit MMX
TM
Registers
MM0
MM1
MM2
MM3
MM4
MM5
MM6
MM7