Performance optimization
The following sections describe how to optimize the performance of the appliance. For now, we only provide recommendations for the dataplane because the dataplane is the most performance-critical part of the appliance as it is responsible for forwarding packets. The dataplane is built on top of VPP and mainly uses DPDK for packet I/O. Therefore, general performance optimizations for VPP and DPDK can often also be applied to the appliance as well.
System
CPU
If not explicitly configured, the appliance automatically configures itself according to the recommendations given in this section. Hence, it is usually not required to manually configure this section.
This applies to releases v0.36 and later.
The appliance is designed to run on a multi-core CPU with different cores assigned to different tasks. We recommend that core 0 be assigned to Linux and control plane services. Core 1 should be used as the VPP's main core, which handles management functions (CLI, API, stats collection). The rest of the cores should be used as the VPP's worker cores, which perform the packet processing.
To configure multiple workers, the network interface must support multiple queues, as each worker gets a unique pair of RX/TX queues. I.e., it does not make sense to configure more workers than the number of queues supported by the network interface.
These configuration options can be set under the config.system.vpp.cpu section of the
configuration file. Assuming that the appliance is running on a 4-core CPU, the following
configuration is recommended:
{
"system": {
"vpp": {
"cpu": {
"main_core": 1,
"workers": 2
}
}
}
}
If the system only has two cores, we recommend that core 0 is used for Linux and core 1 for VPP. The work performed usually performed by the worker cores is then performed by VPP's main core. To explicitly achieve this configuration the number of workers can be set to 0 as follows:
{
"system": {
"vpp": {
"cpu": {
"main_core": 1,
"workers": 0
}
}
}
}
It is also possible to run the appliance on a single-core CPU and have VPP share the core with Linux.
In case the packet processing is not performed on a separate core, the performance of the appliance will be significantly reduced. In particular, jitter and latency will be negatively affected.
Hugepages
The dataplane makes use of hugepages (see Linux kernel
documentation) for the packet buffers.
Hence, it is important that there are enough hugepages available. The size of the hugepages and
number of hugepages allocated by the appliance can be configured in the config.system.kernel
section of the appliance configuration:
{
"system": {
"kernel": {
"hugepage_size": "2M",
"hugepages": 256
}
}
}
By default, 256 2MB hugepages are allocated. This is sufficient for most deployments. If there is a large number of fast interfaces (e.g., 25 Gbps), it might be necessary to increase the number of hugepages.
If there are not enough hugepages available, the dataplane will not start or log an error message.
Currently, the appliance only supports 2MB hugepages.
Buffers
In case there is a larger number of interfaces and worker threads, it might be necessary to increase
the number of buffers. By default, the appliance uses a fixed portion of the hugepages to allocate
buffers. Hence, it is recommended to increase the number of buffers by increasing the number of
hugepages. Alternatively, the number of buffers can be configured in the config.system.vpp.buffers
section.
{
"system": {
"vpp": {
"buffers": {
"data_size": 9000,
"num_buffers": 32400
}
}
}
}
The memory allocated for buffers must fit into the allocated hugepages.
RX/TX queues and descriptors
By default, the appliance configures each interface with one 1 RX queue per worker core. The default RSS hash function over the 5 tuple of the packet is used to distribute the incoming traffic among the workers.
The number of RX/TX queues cannot be independently configured through the appliance API. This will be added in a future release.
The number of descriptors per RX/TX queue is by default set to 1024, which is sufficient for most deployments. If there is a large number of fast interfaces (e.g., 25 Gbps), it might be necessary to increase the number of descriptors. We recommend 2048 queue descriptors for a 25 Gbps interface and 4096 descriptors for a 100 Gbps interface.
The number of queue descriptors can be configured individually for each interface in the
config.interfaces.<type> section of the appliance configuration:
{
"interfaces": {
"ethernets": [
{
"name": "eth0",
"rx_queue_size": 2048,
"tx_queue_size": 2048
}
]
}
}
The number of RX/TX queue descriptors can only be configured for VPP interfaces of
the type ethernets and virtual_functions.
SCION
SCION RSS
SCION RSS (Receive Side Scaling) is a performance optimization feature that enables multi-core systems to process SCION traffic in parallel across multiple CPU cores. This significantly improves throughput on multi-core systems.
What is RSS?
RSS is a network driver technology that distributes incoming network traffic across multiple CPU cores based on a hash of the packet header. This allows the system to process packets in parallel, utilizing all available CPU cores instead of bottlenecking on a single core.
How does SCION RSS work?
SCION packets are encapsulated in UDP/IP packets on the underlay network. By default, all packets between two SCION nodes use the same source/destination IP/port information, which causes the network interface to direct all traffic to a single receive queue handled by one CPU core. This creates a bottleneck on multi-core systems.
SCION RSS solves this by varying the UDP source port for each SCION flow on the underlay. The network interface can then use its RSS hash function (typically a hash over the 5-tuple: source IP, destination IP, source port, destination port, and protocol) to distribute packets across multiple receive queues, each processed by a different CPU core and enable parallel packet processing.
Requirements
For SCION RSS to work properly, both sides of a link must support it, i.e., must be running at least release v0.34. In the following, we describe how to configure SCION RSS on links between neighbor appliances and sibling appliances.
A neighbor appliance is an appliance that is directly connected to the appliance but located in a different AS.
A sibling appliance is an appliance that is located in the same AS as the appliance.
Traffic between neighbor appliances
To enable SCION RSS for traffic forwarded to a particular neighbor, enable it on all interfaces
connected to that neighbor. This can be done by setting the enable_scion_rss option to true
in the config.scion.ases.neighbors.interfaces sections:
{
"scion": {
"ases": [
{
"neighbors": [
{
"interfaces": [
{
"interface_id": 1,
"address": "[fd02:e8a2:c9e2:03e6::2]:30100",
"remote": {
"address": "[fd02:e8a2:c9e2:03e6::1]:30100",
"interface_id": 201
}
"enable_scion_rss": true
}
]
}
]
}
]
}
}
Traffic between sibling appliances
With topology synchronization, the appliance automatically detects sibling appliances that support SCION RSS and enables the feature accordingly.
Without topology synchronization, it can statically be defined whether a sibling appliance supports
SCION RSS or not, and the feature will be enabled accordingly. This can be done by setting
scion_rss to true or false in the config.cluster.peers.features section:
{
"cluster": {
"peers": [
{
"features": {
"scion_rss": true
}
}
]
}
}
By default, SCION RSS is not enabled for traffic sent from a gateway to a sibling router on a
different appliance. To enable SCION RSS set scion_tunneling.endpoint.enable_scion_rss = true.