EMBEDDED Intel486™ PROCESSOR HARDWARE REFERENCE MANUAL
9-2
for 16-bit programs, which generate more writes than 32-bit programs. The cycle time of the
write can limit system performance as the total bus usage approaches the maximum allowed.
A common method of improving memory system performance is to add a cache. The Intel486
processor has an on-chip cache (known as L1 cache), which handles most of the read requests.
The performance gain is highly dependent on the application—some applications benefit less
than 5% with an external cache. Most applications benefit 10-15% in performance, while a few
benefit as much as 40%. An external cache is not required for many Intel486 processor applica-
tions.
A high-performance Intel486 processor design needs to consider all of these issues in the memory
design. The following sections provide more detail on the activity of the Intel486 processor dur-
ing typical program execution. The memory activity of the CPU needs to be understood to best
design the memory subsystem.
9.2
INSTRUCTION EXECUTION PERFORMANCE
The Intel486 processor was designed to execute instructions in fewer clocks than earlier
Intel386™ family microprocessors. The reduced clock counts increase performance relative to
earlier products. This section reviews how the Intel486 processor accomplishes this and com-
pares it to earlier Intel microprocessors.
The instruction execution rate and internal design is important to understand when designing
memory systems. It accounts for the heavy write traffic on the Intel486 processor as compared to
earlier microprocessors. It also explains how memory bandwidth and latency affect performance.
9.2.1
Intel486™ Processor Execution Times
The Intel486 processor uses several techniques to execute many frequent instructions in a single
clock. The processor has an on-chip code/data cache and a five stage pipelined execution unit.
The Intel486 processor decodes many simple instructions directly into hardware actions and uses
write buffers to match the execution rate to memory bus speed.
One high-level way to examine the impact of these techniques is to compare the execution time
of a typical application. To do so, Intel has measured a set of applications for the frequency of
instruction usage. For each instruction we multiply the frequency times the clocks required to ex-
ecute. The sum of these products then yields the typical number of clocks required to execute an
instruction.
Table 9-1
shows such a comparison. The Intel486 processor requires 1.95 clocks for a typical in-
struction while the Intel386 microprocessor requires 4.919 clocks. This is a 2.5x improvement
for integer programs. The floating-point instructions have an even larger improvement, as dis-
cussed later. The numbers in
Table 9-1
do not include effects of cache misses for the Intel486
processor.
One implication of these numbers is that the Intel486 processor cannot sustain that rate of execu-
tion with the cache disabled. The bus bandwidth required for the Intel486 processor with cache
disabled would be 2.5 times that of the Intel386 CPU. The Intel486 processor bus has 60% more
data bandwidth for reads than the Intel386 CPU, but the same bandwidth for writes. The on-chip
cache of the Intel486 processor handles most (90-95%) of the read requests. The external bus
Summary of Contents for Embedded Intel486
Page 16: ......
Page 18: ......
Page 26: ......
Page 28: ......
Page 42: ......
Page 44: ......
Page 62: ......
Page 64: ......
Page 138: ......
Page 140: ......
Page 148: ......
Page 150: ......
Page 170: ......
Page 172: ......
Page 226: ......
Page 228: ......
Page 264: ......
Page 282: ......
Page 284: ......