background image

5

VLC Architecture

High-speed / low latency Intra-Cell cache-to-cache data transfer

The Express5800/1000 series server 
implements the VLC architecture, which 
allows for low latency cache-to-cache 
data transfer between multiple CPUs 
within a cell.

In a split BUS architecture, for a cache-
to-cache data transfer to take place, the 
data must be passed through a chipset. 
However, in the VLC architecture, 
data within the cache memory can 
be accessed directly by one another, 
bypassing the chipset. This allows 
for lower latency between the cache 
memory, which results in faster data 
transfers.

Dedicated Cache Coherency Interface (CCI)

High-speed / low latency Inter-Cell cache-to-cache data transfer

Another technology implemented in the Express5800/1000 series 
server to improve cache-to-cache data transfer is the Cache 
Coherency Interface (CCI). CCI, the inter-Cell counterpart of the 
VLC architecture, allows for a lower latency cache-to-cache data 
transfer between Cells.

Information containing the location and state of cached data is 
required for the CPU to access the specific data stored in cache 
memory. By accessing the cache memory according to this 
information, the CPU is able to retrieve the desired data.

Two main mechanisms exist for cache-to-cache data transfer 
between Cells, directory based and TAG based cache coherency. 
The cache information, described above, is stored in external 
memory (DIR memory) for the directory based, and within the 
chipset for the TAG based mechanisms.

In a directory based system, the requestor CPU will first access the 
external memory to confirm the location of the cached data, and 
then will access the appropriate cache memory. On the other hand, 
in a TAG based system, the requestor CPU broadcasts a request to 
all other cache simultaneously via TAG. 

Crossbar-less configuration 

Improved data transfer latency through direct attached Cell configuration

Within the Express5800/1000 series server lineup, the 1080Rf 
has been able to lower the data transfer latency by removing the 
crossbar and directly connecting Cell to Cell, and Cell to PCI box.

Even with the crossbar-less configuration, virtualization of the Cell 
card and I/O box has been retained as not to diminish computing 
and I/O resources.

CPU

L3

Cache

Memory

Cache

Memory

Cache

Memory

Cache

Memory

Cache

Memory

Cache

Memory

CPU

L3

L3 of other CPU

CPU

 L3

L3 of other CPU

L3 of other 

CPU on

different FSB

L3 of other CPU

on same FSB

L3 of other CPU on

different FSB

CPU

 L3

Increased enterprise 

applications 

performance through 

reduced cache memory 

access latency

Very Large Cache (VLC) Architecture

Intel

®

 Itanium

®

 2 processor

(Madison : L3 9MB)

Latency

Dual-Core Intel

®

 Itanium

®

 processor

(Montvale : L3 24MB)

Latency

CPU

CPU

CPU

Cache

Memory

Cache

Memory

CPU

Cache

Memory

Cache

Memory

Intel

®

 Itanium

®

 2 processor

(Madison : L3 9MB)

Latency

High-speed 

cache-to-cache 

transfers

Direct CPU-to-CPU transfers

FSB

Data Size

Data Size

Memory

Dual-Core Intel

®

 Itanium

®

 processor

(Montvale : L3 24MB)

Latency

Split BUS Architecture

Data Size

CPU

CPU

CPU

Cache

Memory

Cache

Memory

CPU

Cache

Memory

Cache

Memory

chipset

Data transfer controller 

Latency 
degradation
(approx 3x)

This area increases

due to the increase in

cache size and

higher latency

Overhead from transferring

data through the chipset.

FSB

FSB

chipset

Higher cache memory 

access latency.

Non-uniform 

cache-to-cache data 

transfer.

Inconsistent 

performance.

Data Size

Higher 
latency 
(approx 3x)

This image does not depict actual numbers

Memory

chipset

Cache

Memory

Cache

Memory

Cache

Memory

Cache

Memory

Cache

Memory

Cache

Memory

L3 of

other CPU on

same FSB

The benefit of the TAG based mechanism, thus implemented in 
the Express5800/1000 series server, is that by accessing the 
TAG, unnecessary inquiries to the cache memory are filtered for a 
smoother transfer of data. Furthermore, the Express5800/1000 
series server includes a dedicated high-speed cache coherency 
interface (CCI) which is used to connect the Cells directly to 
one another without using a crossbar.  This interface is used for 
broadcasting and other cache coherency transactions to allow for 
even faster cache-to-cache data transfer.

CPU requesting the information

CPU storing the newest information

Memory that is storing location regarding 

the memory

TAG memory (Manages cache line 

information for all of the CPUs loaded on a 

CELL card)

DIR Memory (Manages cache line 

information for all of the memory loaded on 

a CELL card)

Tag Based Cache Coherency

Directory Based Cache Coherency

Request is broadcasted to all CPU 
simultaneously

The Express5800/1000 Series server 

implements a dedicated connection (

CCI

for snooping

Access Directory to confirm the location of 
the data first, then access the appropriate 
cache memory

Memory

CPU

CPU

DIR

TAG

Memory

CPU

CPU CPU

CPU CPU CPU CPU

Memory

CPU CPU CPU CPU

Memory

Memory

CPU

CPU CPU

DIR

CPU CPU CPU CPU

Memory

CPU CPU CPU CPU

DIR

CPU

DIR

Memory

Directory Based Cache Coherency

A

Chipset

chip

set

chip

set

chip

set

chip

set

chip

set

chip

set

CPU

CPU

chip

set

chip

set

chip

set

chip

set

chip

set

chip

set

chip

set

chip

set

CPU

chip

set

chip

set

CPU

TAG

CPU

CPU

Memory

DIR

Performance 
increase with 

the A

3

 chipset

TAG

TAG

TAG

CPU

Summary of Contents for INTEL 5800/1000

Page 1: ...EC Express5800 1000 Technology Guide Vol 1 Powered by the Dual Core Intel Itanium Processor Reliability and Performance through the fusion of the NEC A3 chipset and the Dual Core Intel Itanium process...

Page 2: ...enterprises With the new Dual Core Intel Itanium processor 9000 series and the NEC designed third generation chipset A3 from chipset board to system level design NEC has never compromised to realize...

Page 3: ...etransmission of error data Two independent power sources Avoid system shutdown due to failures of the power distribution units Serviceability Autonomic reporting of logs with pinpoint prognosis of fa...

Page 4: ...llelization is achieved however it is not maximized nor efficient Parallel processing with EPIC architecture In the EPIC architecture parallelization is run at compile time allowing for maximum parall...

Page 5: ...se applications performance through reduced cache memory access latency Very Large Cache VLC Architecture Intel Itanium 2 processor Madison L3 9MB Latency Dual Core Intel Itanium processor Montvale L3...

Page 6: ...ts Partial chipset degradation Dynamic recovery Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Hot Pluggable 4 Duplexed 1 16 processor domain segmentation 2 Core I O R...

Page 7: ...e may result in a multi partition shutdown To resolve this issue the Express5800 1000 series servers have been designed to allow for the partial degradation of chipsets Within each of the LSI chips wh...

Page 8: ...de that is linked directly to the failed crossbar will be temporarily shutdown The failed crossbar card can be replaced without halting other business operations Cell Cell Cell Cell Cell Cell Cell Cel...

Page 9: ...distribution mechanisms so that system downtime can be minimized The 1320Xf system allows for the division of the system into two 16 processor segments where one segment utilizes one system clock and...

Page 10: ...ected by the chipset in the event of an error The BID is able to diagnose the location of the error and will pinpoint the required FRU Field Replaceable Unit so that the time required to replace the c...

Page 11: ...iguration Small footprint and a highly scalable I O Along with the industry s prevalent Microsoft Windows operating system the Express5800 1000 series servers also support the Linux operating system B...

Page 12: ...tel logo Itanium and Itanium inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries Microsoft and Windows are registered trade...

Reviews: