EMBEDDED Intel486™ PROCESSOR HARDWARE REFERENCE MANUAL
5-6
Write posting can improve average write latency to under 3 clocks for many applications. This
improvement is important in Intel486 processor-based systems because approximately 70% of all
bus cycles are writes. Without using a latency improvement technique such as write posting, av-
erage write latency is above 15 clocks. From this data we can conclude that approximately a 9%
performance improvement can be obtained using write posting.
This improvement may increase due to other effects. Write cycles, particularly DRAM page
misses, can be overlapped with read hit cycles in the L2 cache. This fact greatly reduces the delay
caused by read cycles which immediately follow write cycles.
Analysis of this memory subsystem design has shown that use of these features has resulted in a
low latency response to the CPU. The following characteristics have been recorded over several
important applications. The average clock cycles required to complete the first read is 3.5 clocks.
Subsequent cycles of a burst are always processed in one clock. Write cycles average 2.5 clocks.
These average counts result from the DRAM access rates in
Table 5-2
. Read accesses from the
cache always occur in zero wait states.
5.2.5
Second-Level Cache
Several different types of L2 cache architectures are possible candidates for use with the Intel486
processor. For single CPU systems the different architectures offer similar performance benefits
in most cases. The reason they are so similar is the mechanism which improves performance. The
primary benefit of the L2 cache is bus cycle latency reduction.
In most systems that incorporate a single Intel486 processor, bus traffic from other bus masters
is minimal. With most memory systems, the CPU uses at most 50% to 70% of the bus. Therefore
reduction of bus cycle latency is the only performance benefit external logic can offer.
An L2 cache is an economical method of reducing read cycle latency and can be implemented as
a system option. To provide this capability, a cache device can be configured as a look-aside
cache that monitors the CPU address and control signals. When a cycle occurs in which the cache
can supply data, it intervenes. The cache device could then supply an entire 16-byte line with no
wait states.
The performance improvement offered by an L2 cache is substantial in some environments. This
performance improvement is particularly obvious when executing multi-tasking, multi-user op-
erating systems such as UNIX*, OS/2*, Windows 95*, Windows NT*, and Windows CE*. Some
applications, however, may not require the performance improvement offered by the cache. In
these cases, implementing the L2 cache as a system option is attractive.
Table 5-2. Clock Latencies for DRAM Functions
DRAM Function
First Access Burst
Subsequent Burst
Write Cycles
Page hit
3
1
2
Page miss
7
1
5*
*Latency only incurred for back-to-back cycles.
Summary of Contents for Embedded Intel486
Page 16: ......
Page 18: ......
Page 26: ......
Page 28: ......
Page 42: ......
Page 44: ......
Page 62: ......
Page 64: ......
Page 138: ......
Page 140: ......
Page 148: ......
Page 150: ......
Page 170: ......
Page 172: ......
Page 226: ......
Page 228: ......
Page 264: ......
Page 282: ......
Page 284: ......