ESB that provides PCI bus and memory connectivity. The second is a more recent architecture in which
multiple CPU processors are interconnected via QPI, and each processor itself supports integrated MCH and
PCI connectivity directly.
There is a perceived advantage in keeping the allocation of port objects, such as queues, as close as possible
to the NUMA node or collection of CPUs where it would most likely be accessed. So, as hinted at in this
example, having the port's queues using CPUs and memory from one socket when the PCI device is actually
hanging off of another socket could result in undesirable QPI processor-to-processor bus bandwidth being
consumed. This highlights the need for you to understand the specific platform architecture you are working
with when you are utilizing these performance options.
Shared Single Root PCI/Memory Architecture
Distributed Multi-Root PCI/Memory Architecture
Example 4:
Consider what happens if the number of available NUMA node CPUs is not sufficient for queue allocation. If
your platform has a processor that does not support an even power of 2 CPUs (for example, it supports 6
cores), then if SW runs out of CPUs on one socket during queue allocation it will by default reduce the number
of queues to a power of 2 until allocation is achieved.
For example, if there is a 6 core processor being used, and there is only a single NUMA node, the SW will only
allocate 4 FCoE queues. If there are multiple NUMA nodes, you have the option to change NUMA node count
to be >= 2 in order to have all 8 queues created.
Determining Active Queue Location
When you are using these performance options you will want to determine the affinity of FCoE queues to
CPUs in order to verify their actual effect on queue allocation. You can do this by using a fairly heavy small
packet workload and an I/O application such as IoMeter: by monitoring the per-CPU utilization using the built-