Xilinx Virtex-II Pro PPC405 Скачать руководство пользователя страница 536

Страница: 536 / 562

844

www.xilinx.com

March 2002 Release

1-800-255-7778

Virtex-II Pro™ Platform FPGA Documentation

Appendix D:

Programming Considerations

Any two memory addresses are considered congruent if address bits 19:26 (the cache
index) are the same but address bits 0:18 (the cache tag) are different. Address bits 27:31
define the 32-byte cacheline, which is the smallest object that can be brought into the cache.
Only two congruent cachelines can be in the cache simultaneously. Accessing a third
congruent line causes one of the two lines already in the cache to be removed.

Software can minimize the number of congruent addresses by organizing used addresses
such that they are uniformly distributed across address bits 19:26.

Alignment

Misaligned memory accesses are usually handled by the processor and do not cause an
alignment exception. However, the fastest possible memory-access performance is
obtained when operands are properly aligned. If an unaligned load or store operand
crosses a word boundary, the processor accesses that operand using two memory
references.

Branch targets should be aligned on a cache-line boundary if that target is unlikely to be
accessed due to a default prediction or a prediction override. This helps minimize the
number of unused instructions present in the instruction cache.

Instruction Performance

The following performance descriptions consider only the “first order” effects of cache
misses. The performance penalty associated with a cache miss involves a number of
second-order effects. This includes PLB contention between the instruction and data
caches and the time associated with performing cache-line fills and flushes. Unless stated
otherwise, the number of cycles described applies to systems having zero-wait-state
memory access.

General Rules

The following rules apply to instruction execution in the PPC405:

•

Instructions execute in order.

•

Assuming cache hits, all instructions execute in one cycle except the following:
-

Divide instructions execute in 35 clock cycles.

Branches execute in one to three clock cycles as described in

Branches

below.

Multiply-accumulate and multiply instructions execute in one to five cycles as
described in

Multiplies

below.

Aligned load/store instructions that hit in the data cache execute in one clock
cycle. See

Alignment

above for information on the access penalty associated with

unaligned load/stores.

•

A data cache-control instruction requires two cycles to execute. However, subsequent
data-cache accesses stall until the cache-control instruction finishes accessing the data
cache. Those accesses do not remain stalled when transfers associated with previous
data cache-control instructions continue on the PLB.

Branches

The performance of a branch instruction depends on how quickly it is resolved. A branch
is resolved when all conditions it depends on are known and the branch target is known.
Generally, the greater the separation (in instructions) between a branch and the last
instruction it depends on, the earlier the branch is resolved. If the branch is resolved early,
it can be executed in fewer cycles.

Xilinx Virtex-II Pro PPC405, Руководство пользователя

Результаты поиска

Содержание Virtex-II Pro PPC405

Отзывы:

Бренды по названию

Популярные бренды