9-5
PERFORMANCE CONSIDERATIONS
an external bus cycle. In addition, the internal cache is updated when the address written to is con-
tained in the cache. This policy ensures consistency between the on-chip cache and the external
memory. The IntelDX4 processor can be configured to update main memory using the write-back
policy. During writes, the cache is updated when the address being written to is contained in the
cache. The write is not propagated through the system to memory, but is stored and written to
memory during a future update.
9.3.2
Performance Effects of the On-Chip Cache
If all program operations use on-chip resources, the fastest possible execution is achieved, as the
on-chip registers and cache satisfy all requests. However, on cache read misses or any memory
write operation, the external bus has to be accessed, reducing system performance.
A hit rate of approximately 95% is realized from the on-chip cache, depending on the application.
The high level of cache hits has three main effects.
1.
Performance is improved. The Intel486 processor can access data from its on-chip cache
every clock. This high bandwidth allows the execution unit of the Intel486 processor to
execute many common instructions in one clock.
2.
The system bus utilization decreases. Because a high percentage of reads are satisfied by
the cache, the Intel486 processor bus is idle a large percentage of the time. Additional bus
masters can reside in the system without bus saturation and the resulting performance
degradation.
3.
The ratio of writes to reads is increased on the external bus. The number of reads is
decreased but the amount of writes remains constant. Therefore, main memory systems
should have low latency on write operations.
Internally, two separate 128-bit wide prefetch buffers interface to the L1 cache unit. These can
be filled with data fetched from the on-board cache in one clock cycle, or by external memory in
as few as four clock cycles. Because the wide prefetch buffers satisfy multiple prefetches, the
usual degradation caused by a combined code cache and data cache scheme is avoided.
To optimize performance during cache line fills, a technique called bypassing is used. The first
cycle of a cache line fill satisfies the original request. Data read in during the first cycle is sent
directly to the requesting unit. Because of this, it is not necessary to wait for the entire cache line
to fill before the requested data can be used.
Figure 9-1
shows the on-chip hit rates for prefetch and read operations when running the pro-
grams shown in
Table 9-2
.
Summary of Contents for Embedded Intel486
Page 16: ......
Page 18: ......
Page 26: ......
Page 28: ......
Page 42: ......
Page 44: ......
Page 62: ......
Page 64: ......
Page 138: ......
Page 140: ......
Page 148: ......
Page 150: ......
Page 170: ......
Page 172: ......
Page 226: ......
Page 228: ......
Page 264: ......
Page 282: ......
Page 284: ......