Level 1 Memory System
ARM DDI 0500D
Copyright © 2013-2014 ARM. All rights reserved.
6-12
ID021414
Non-Confidential
6.6
Data prefetching
This section describes:
•
Preload instructions
.
•
Data prefetching and monitoring
.
•
Non-temporal loads
.
•
Data Cache Zero
.
6.6.1
Preload instructions
The Cortex-A53 processor supports the
PLD
and
PRFM
prefetch hint instructions.
PLD
and
PRFM
instructions lookup in the cache, and start a linefill if they miss and are to a cacheable address.
However, the
PLD
or
PRFM
instruction retires as soon as its linefill is started rather than waiting
for data to be returned. This enables other instructions to execute while the linefill continues in
the background. If the memory type is Shareable, then any linefill started by a
PLDW
instruction
also causes the data to be invalidated in other cores, so that the line is ready for writing.
PST
, or
PLDW
in AArch32, is similar to a
PLD
, except that if it misses, it requests an exclusive
linefill instead of a shared one. The
PRFM
s also enable targeting of a prefetch to the L2 cache.
When this is the case, a request is sent to L2 to start a linefill, and then the instruction can retire,
without any data being returned to L1.
PLI
is implemented as a
NOP
.
6.6.2
Data prefetching and monitoring
The data cache implements an automatic prefetcher that monitors cache misses in the core.
When a pattern is detected, the automatic prefetcher starts linefills in the background. The
prefetcher recognizes a sequence of data cache misses at a fixed stride pattern that lies in four
cache lines, plus or minus. Any intervening stores or loads that hit in the data cache do not
interfere with the recognition of the cache miss pattern.
The CPUACTLR, see the
CPU Auxiliary Control Register, EL1
on page 4-124
, enables you to:
•
Deactivate the prefetcher.
•
Alter the sequence length required to trigger the prefetcher.
•
Alter the number of outstanding requests that the prefetcher can make.
Use the
PLD
or
PRFM
instruction for data prefetching where short sequences or irregular pattern
fetches are required.
6.6.3
Non-temporal loads
Cache requests made by a non-temporal load instruction (
LDNP
) are allocated to the L2 cache
only. The allocation policy makes it likely that the line is replaced sooner than other lines.
6.6.4
Data Cache Zero
The ARMv8-A architecture introduces a Data Cache Zero by Virtual Address (
DC ZVA
)
instruction. This enables a block of 64 bytes in memory, aligned to 64 bytes in size, to be set to
zero. If the
DC ZVA
instruction misses in the cache, it clears main memory, without causing an L1
or L2 cache allocation.