![Xilinx Virtex-II Pro PPC405 Скачать руководство пользователя страница 536](http://html1.mh-extra.com/html/xilinx/virtex-ii-pro-ppc405/virtex-ii-pro-ppc405_user-manual_3410279536.webp)
844
March 2002 Release
1-800-255-7778
Virtex-II Pro™ Platform FPGA Documentation
Appendix D:
Programming Considerations
R
Any two memory addresses are considered congruent if address bits 19:26 (the cache
index) are the same but address bits 0:18 (the cache tag) are different. Address bits 27:31
define the 32-byte cacheline, which is the smallest object that can be brought into the cache.
Only two congruent cachelines can be in the cache simultaneously. Accessing a third
congruent line causes one of the two lines already in the cache to be removed.
Software can minimize the number of congruent addresses by organizing used addresses
such that they are uniformly distributed across address bits 19:26.
Alignment
Misaligned memory accesses are usually handled by the processor and do not cause an
alignment exception. However, the fastest possible memory-access performance is
obtained when operands are properly aligned. If an unaligned load or store operand
crosses a word boundary, the processor accesses that operand using two memory
references.
Branch targets should be aligned on a cache-line boundary if that target is unlikely to be
accessed due to a default prediction or a prediction override. This helps minimize the
number of unused instructions present in the instruction cache.
Instruction Performance
The following performance descriptions consider only the “first order” effects of cache
misses. The performance penalty associated with a cache miss involves a number of
second-order effects. This includes PLB contention between the instruction and data
caches and the time associated with performing cache-line fills and flushes. Unless stated
otherwise, the number of cycles described applies to systems having zero-wait-state
memory access.
General Rules
The following rules apply to instruction execution in the PPC405:
•
Instructions execute in order.
•
Assuming cache hits, all instructions execute in one cycle except the following:
-
Divide instructions execute in 35 clock cycles.
-
Branches execute in one to three clock cycles as described in
below.
-
Multiply-accumulate and multiply instructions execute in one to five cycles as
described in
below.
-
Aligned load/store instructions that hit in the data cache execute in one clock
cycle. See
above for information on the access penalty associated with
unaligned load/stores.
•
A data cache-control instruction requires two cycles to execute. However, subsequent
data-cache accesses stall until the cache-control instruction finishes accessing the data
cache. Those accesses do not remain stalled when transfers associated with previous
data cache-control instructions continue on the PLB.
Branches
The performance of a branch instruction depends on how quickly it is resolved. A branch
is resolved when all conditions it depends on are known and the branch target is known.
Generally, the greater the separation (in instructions) between a branch and the last
instruction it depends on, the earlier the branch is resolved. If the branch is resolved early,
it can be executed in fewer cycles.
The execution time of branches on the PPC405 can be determined as follows:
•
A
known not taken
branch does not have condition dependencies (they are resolved) or
Содержание Virtex-II Pro PPC405
Страница 1: ...R Volume 2 a PPC405 User Manual Virtex II Pro Platform FPGA Developer s Kit March 2002 Release...
Страница 14: ...322 www xilinx com March 2002 Release 1 800 255 7778 Virtex II Pro Platform FPGA Documentation Preface R...
Страница 252: ...560 www xilinx com March 2002 Release 1 800 255 7778 Virtex II Pro Platform FPGA Documentation R...
Страница 260: ...568 www xilinx com March 2002 Release 1 800 255 7778 Virtex II Pro Platform FPGA Documentation R...
Страница 562: ...870 www xilinx com March 2002 Release 1 800 255 7778 Virtex II Pro Platform FPGA Documentation R...