Skip to main content

Performance optimization

The following sections describe how to optimize the performance of the appliance. For now, we only provide recommendations for the dataplane because the dataplane is the most performance-critical part of the appliance as it is responsible for forwarding packets. The dataplane is built on top of VPP and mainly uses DPDK for packet I/O. Therefore, general performance optimizations for VPP and DPDK can often also be applied to the appliance as well.

System

CPU

note

If not explicitly configured, the appliance automatically configures itself according to the recommendations given in this section. Hence, it is usually not required to manually configure this section.

This applies to releases v0.36 and later.

The appliance is designed to run on a multi-core CPU with different cores assigned to different tasks. We recommend that core 0 be assigned to Linux and control plane services. Core 1 should be used as the VPP's main core, which handles management functions (CLI, API, stats collection). The rest of the cores should be used as the VPP's worker cores, which perform the packet processing.

note

To configure multiple workers, the network interface must support multiple queues, as each worker gets a unique pair of RX/TX queues. I.e., it does not make sense to configure more workers than the number of queues supported by the network interface.

These configuration options can be set under the config.system.vpp.cpu section of the configuration file. Assuming that the appliance is running on a 4-core CPU, the following configuration is recommended:

{
"system": {
"vpp": {
"cpu": {
"main_core": 1,
"workers": 2
}
}
}
}

If the system only has two cores, we recommend that core 0 is used for Linux and core 1 for VPP. The work performed usually performed by the worker cores is then performed by VPP's main core. To explicitly achieve this configuration the number of workers can be set to 0 as follows:

{
"system": {
"vpp": {
"cpu": {
"main_core": 1,
"workers": 0
}
}
}
}

It is also possible to run the appliance on a single-core CPU and have VPP share the core with Linux.

note

In case the packet processing is not performed on a separate core, the performance of the appliance will be significantly reduced. In particular, jitter and latency will be negatively affected.

Hugepages

The dataplane makes use of hugepages (see Linux kernel documentation) for the packet buffers. Hence, it is important that there are enough hugepages available. The size of the hugepages and number of hugepages allocated by the appliance can be configured in the config.system.kernel section of the appliance configuration:

{
"system": {
"kernel": {
"hugepage_size": "2M",
"hugepages": 256
}
}
}

By default, 256 2MB hugepages are allocated. This is sufficient for most deployments. If there is a large number of fast interfaces (e.g., 25 Gbps), it might be necessary to increase the number of hugepages.

note

If there are not enough hugepages available, the dataplane will not start or log an error message.

note

Currently, the appliance only supports 2MB hugepages.

Buffers

In case there is a larger number of interfaces and worker threads, it might be necessary to increase the number of buffers. By default, the appliance uses a fixed portion of the hugepages to allocate buffers. Hence, it is recommended to increase the number of buffers by increasing the number of hugepages. Alternatively, the number of buffers can be configured in the config.system.vpp.buffers section.

{
"system": {
"vpp": {
"buffers": {
"data_size": 9000,
"num_buffers": 32400
}
}
}
}
note

The memory allocated for buffers must fit into the allocated hugepages.

RX/TX queues and descriptors

By default, the appliance configures each interface with one 1 RX queue per worker core. The default RSS hash function over the 5 tuple of the packet is used to distribute the incoming traffic among the workers.

note

The number of RX/TX queues cannot be independently configured through the appliance API. This will be added in a future release.

The number of descriptors per RX/TX queue is by default set to 1024, which is sufficient for most deployments. If there is a large number of fast interfaces (e.g., 25 Gbps), it might be necessary to increase the number of descriptors. We recommend 2048 queue descriptors for a 25 Gbps interface and 4096 descriptors for a 100 Gbps interface.

The number of queue descriptors can be configured individually for each interface in the config.interfaces.<type> section of the appliance configuration:

{
"interfaces": {
"ethernets": [
{
"name": "eth0",
"rx_queue_size": 2048,
"tx_queue_size": 2048
}
]
}
}
note

The number of RX/TX queue descriptors can only be configured for VPP interfaces of the type ethernets and virtual_functions.

SCION

SCION RSS

SCION RSS (Receive Side Scaling) is a performance optimization feature that enables multi-core systems to process SCION traffic in parallel across multiple CPU cores. This significantly improves throughput on multi-core systems.

What is RSS?

RSS is a network driver technology that distributes incoming network traffic across multiple CPU cores based on a hash of the packet header. This allows the system to process packets in parallel, utilizing all available CPU cores instead of bottlenecking on a single core.

How does SCION RSS work?

SCION packets are encapsulated in UDP/IP packets on the underlay network. By default, all packets between two SCION nodes use the same source/destination IP/port information, which causes the network interface to direct all traffic to a single receive queue handled by one CPU core. This creates a bottleneck on multi-core systems.

SCION RSS solves this by varying the UDP source port for each SCION flow on the underlay. The network interface can then use its RSS hash function (typically a hash over the 5-tuple: source IP, destination IP, source port, destination port, and protocol) to distribute packets across multiple receive queues, each processed by a different CPU core and enable parallel packet processing.

Requirements

For SCION RSS to work properly, both sides of a link must support it, i.e., must be running at least release v0.34. In the following, we describe how to configure SCION RSS on links between neighbor appliances and sibling appliances.

note

A neighbor appliance is an appliance that is directly connected to the appliance but located in a different AS.

A sibling appliance is an appliance that is located in the same AS as the appliance.

Traffic between neighbor appliances

To enable SCION RSS for traffic forwarded to a particular neighbor, enable it on all interfaces connected to that neighbor. This can be done by setting the enable_scion_rss option to true in the config.scion.ases.neighbors.interfaces sections:

{
"scion": {
"ases": [
{
"neighbors": [
{
"interfaces": [
{
"interface_id": 1,
"address": "[fd02:e8a2:c9e2:03e6::2]:30100",
"remote": {
"address": "[fd02:e8a2:c9e2:03e6::1]:30100",
"interface_id": 201
}
"enable_scion_rss": true
}
]
}
]
}
]
}
}

Traffic between sibling appliances

With topology synchronization, the appliance automatically detects sibling appliances that support SCION RSS and enables the feature accordingly.

Without topology synchronization, it can statically be defined whether a sibling appliance supports SCION RSS or not, and the feature will be enabled accordingly. This can be done by setting scion_rss to true or false in the config.cluster.peers.features section:

{
"cluster": {
"peers": [
{
"features": {
"scion_rss": true
}
}
]
}
}

By default, SCION RSS is not enabled for traffic sent from a gateway to a sibling router on a different appliance. To enable SCION RSS set scion_tunneling.endpoint.enable_scion_rss = true.