The instruction cache can be partially reconfigured into ITIM, which occupies a fixed address
range in the memory map. ITIM provides high-performance, predictable instruction delivery.
Fetching an instruction from ITIM is as fast as an instruction-cache hit, with no possibility of a
cache miss. ITIM can hold data as well as instructions, though loads and stores from a core to
its ITIM are not as performant as loads and stores to its Data Tightly Integrated Memory (DTIM).
The instruction cache can be configured as ITIM for all ways except for 1 in units of cache lines
(64 bytes). A single instruction cache way must remain an instruction cache. ITIM is allocated
simply by storing to it. A store to the
n
th
byte of the ITIM memory map reallocates the first
n+1
bytes of instruction cache as ITIM, rounded up to the next cache line.
ITIM is deallocated by storing zero to the first byte after the ITIM region, that is, 8 KiB after the
base address of ITIM as indicated in the Memory Map in Chapter 4. The deallocated ITIM space
is automatically returned to the instruction cache.
For determinism, software must clear the contents of ITIM after allocating it. It is unpredictable
whether ITIM contents are preserved between deallocation and allocation.
The E31 instruction fetch unit contains branch prediction hardware to improve performance of
the processor core. The branch predictor comprises a 28-entry branch target buffer (BTB) which
predicts the target of taken branches, a 512-entry branch history table (BHT), which predicts the
direction of conditional branches, and a 6-entry return-address stack (RAS) which predicts the
target of procedure returns. The branch predictor has a one-cycle latency, so that correctly pre-
dicted control-flow instructions result in no penalty. Mispredicted control-flow instructions incur a
three-cycle penalty.
The E31 implements the standard Compressed (C) extension to the RISC‑V architecture, which
allows for 16-bit RISC‑V instructions.
The E31 execution unit is a single-issue, in-order pipeline. The pipeline comprises five stages:
instruction fetch, instruction decode and register fetch, execute, data memory access, and regis-
ter writeback.
The pipeline has a peak execution rate of one instruction per clock cycle, and is fully bypassed
so that most instructions have a one-cycle result latency. There are several exceptions:
• LW has a two-cycle result latency, assuming a cache hit.
• LH, LHU, LB, and LBU have a three-cycle result latency, assuming a cache hit.
• CSR reads have a three-cycle result latency.
Copyright © 2017–2018, SiFive Inc. All rights reserved.
9