Level One Memory System
ARM DDI 0301H
Copyright © 2004-2009 ARM Limited. All rights reserved.
7-13
ID012310
Non-Confidential, Unrestricted Access
7.5.3
Instruction accesses to TCM
If the Instruction TCM and the Instruction Cache both contain the requested instruction address,
the processor returns data from the TCM. The instruction prefetch port of the processor cannot
access the Data TCM. If an instruction prefetch misses the Instruction TCM and Instruction
Cache but hits the Data TCM, then the result is an access to the level two memory.
An IMB must be inserted between a write to an Instruction TCM and the instructions being
written that it relies on. In addition, any branch prediction mechanism must be invalidated or
disabled if a branch in the Instruction TCM is overwritten.
7.5.4
Data accesses to the Instruction TCM
If the Data TCM and the Data Cache both contain the requested data address for a read, the
processor returns data from the Data TCM. For a write, the write occurs to the Data TCM. The
majority of data accesses are expected to go to the Data Cache or to the Data TCM, but it is
necessary for the Instruction TCM to be read or written on occasion.
The Instruction TCM base addresses are read by the processor data port as a possible source for
data for all memory accesses. This increases the data comparisons associated with the data,
compared with the number required for the instruction memory lookup, for the level one
memory hit generation. This functionality is required for reading literal values and for debug
purposes, such as setting software breakpoints.
Access to the Instruction TCM involves a delay of 5-12 cycles in reading or writing the data.
This delay enables the Instruction TCM access to be scheduled to take place only when the
presence of a hit to the Instruction TCM is known. This saves power and avoids unnecessary
delays being inserted into the instruction-fetch side. This delay is applied to all accesses in a
multiple operation in the case of an LDM, an LDCL, an STM, or an STCL.
Literal pool accesses
It can take 5-12 cycles for the data port to read data from the Instruction TCM.
Because the path lengths are short, there might sometimes be an increase in
latency to achieve greater clock speeds. Therefore, avoid literal pool accesses
inside critical loops. This does not affect code in cache, because the literal pool is
loaded into the D cache.
Switching penalty between cache & TCM
Normally, an access to the cache or TCM takes a single cycle. However, it can
take three cycles in certain cases.
To perform a cache or TCM read in a single cycle, the processor speculatively
reads the RAM contents. It does not know if it was the correct RAM until after
the read is complete. To save power, the processor performs a speculative read
either to the TCM or to the cache. If the read is wrong, the processor must repeat
the access to the correct location.
There is a penalty of three clock cycles when the core switches between accessing
cache and TCM, for example if it thinks the access is in TCM, but it is in fact in
cache. So. three cycles for the first non-sequential access to TCM, when the
previous access on that side, I-side or D-side, was to cache and similarly, three
cycles penalty for the first non-sequential access to cache, when the previous
access on that side was to TCM. This is not an issue on the I-side, where code does
not typically branch between TCM and cacheable areas, but can be an issue for
data.
For example, in the following code:
Loop LDR r0, [r2],#4 ; reads an item from D-TCM