High-Speed Packet Processing with DPDK 25.11 on Microsoft Azure

Date: 2026-04-01

Explore how DPDK 25.11 achieves line-rate packet processing on Azure VMs with architectural insights and best practices for tuning high-performance networking workloads.

Tags: ["Azure", "DPDK", "Networking", "Performance", "Linux"]

The demand for ultra-high-speed networking in the cloud continues to accelerate as workloads like virtual network functions (VNFs), cloud-native packet processing, and network appliances grow in scale and complexity. However, unlocking consistent, low-latency, and high-throughput packet processing in virtualized Azure environments requires more than just deploying advanced software—it demands careful engineering, rigorous validation, and informed tuning.

Microsoft’s recent performance report on the Data Plane Development Kit (DPDK) version 25.11 brings clarity to this challenge by providing a comprehensive analysis of how DPDK performs on Azure VMs with Accelerated Networking enabled. The findings are invaluable for developers and network architects aiming to run packet workloads effectively in Azure, showcasing near line-rate throughput, impressive multi-core scaling, and low jitter when best practices are followed.

In this post, we dive into the architecture supporting this performance, highlight key technical insights from the report, and walk through recommended configurations for running DPDK on Azure at scale. Whether you are designing network appliances or optimizing virtual network functions, these insights will help you harness Azure’s infrastructure to its fullest.

Architecture Overview

┌─────────────────────────────────────────────────────┐
│                 Enterprise Networking Workloads     │
├─────────────────────────────────────────────────────┤
│ • Virtual Network Functions (VNFs)                   │
│ • Network Appliances & Packet Brokers                │
│ • Security & Monitoring Agents                        │
└─────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────┐
│                Azure Infrastructure                  │
├─────────────────────────────────────────────────────┤
│ • Azure VMs with Accelerated Networking (SR-IOV)    │
│ • NUMA-aware CPU & Memory Architecture               │
│ • Hugepage-Backed Memory Allocation                   │
└─────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────┐
│                Data Plane Development Kit (DPDK)    │
├─────────────────────────────────────────────────────┤
│ • PMD Threads & RX/TX Queue Alignment                │
│ • Multi-core Scaling and NUMA Pinning                 │
│ • Packet Forwarding with Poll Mode Drivers (PMDs)    │
└─────────────────────────────────────────────────────┘

This architecture underscores the importance of aligning Azure VM hardware features—like Accelerated Networking which uses SR-IOV—and Linux NUMA topology with DPDK’s high-speed packet processing capabilities. Correct placement of polling mode driver (PMD) threads and proper hugepage configuration are foundational to achieving the predictable throughput and low latency detailed in the report.

Microsoft Azure: Powering optimized DPDK workloads

Key Technical Observations

Accelerated Networking Enables SR-IOV for Bare-Metal Performance — Leveraging Azure SKUs with Accelerated Networking offloads networking to hardware, minimizing hypervisor overhead and enabling near line-rate packet processing.
NUMA-Aware CPU and Memory Placement is Critical — Pinning PMD threads to dedicated cores on the same NUMA node as their assigned queues and memory prevents cross-node latencies that degrade packet throughput and increase jitter.
Hugepage-Backed Memory Allocation Minimizes TLB Misses — Configuring large memory pages reduces translation lookaside buffer (TLB) thrashing, a common bottleneck in packet processing workloads with frequent memory accesses.
Per-Queue RX/TX Pairing Maximizes Parallelism — One-to-one mapping of RX and TX queues to PMD threads dramatically reduces queue contention, resulting in more consistent throughput and scale across multiple cores.
Packet Size Impacts Performance Scaling — Smaller packets require more intensive processing at line rate, but DPDK on Azure scales near-linearly across cores enabling high throughput even for 64-byte frames.
Real-World Traffic Generation Enhances Benchmark Validity — Using pktgen-dpdk or testpmd with controlled, repeatable traffic profiles ensures the performance results reflect practical network scenarios, not synthetic or overly idealized conditions.

How It Works

Selecting Azure VMs with Accelerated Networking

Azure offers VM SKUs such as the D-series, Fsv2, and Eav4 equipped with Accelerated Networking—a crucial enabler for high-performance DPDK workloads. Accelerated Networking leverages Single Root I/O Virtualization (SR-IOV) to provide direct hardware access for network packets, bypassing much of the virtualization stack’s overhead.

Choosing the right SKU ensures your VM can handle millions of packets per second without being CPU-bound on networking interrupts or context switches.

NUMA-Aware CPU Pinning and Memory Configuration

DPDK’s performance hinges on reducing cross-CPU memory access latencies. On Azure VMs, CPU cores and memory are partitioned into NUMA nodes. Aligning PMD threads to dedicated cores and allocating their memory buffers (hugepages) on the local NUMA node decreases remote memory access delays.

The official guidance advises:

# Pin PMD threads to specific cores within a NUMA node:
taskset -c <core-list> ./testpmd ...

# Allocate hugepages bound to a NUMA node, e.g.,
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Configuring RX/TX Queue Mapping

Each PMD thread is assigned one RX queue and one TX queue. This tight coupling prevents contention and synchronization overhead that occurs when multiple threads share queues.

Within DPDK applications like testpmd, queues and threads are linked through explicit command-line parameters or configuration files, which mirror the Azure VM's virtual NIC capabilities.

Achieving Packet Forwarding at Scale

DPDK switches incoming packets from RX queues to TX queues in user space with polling mode drivers, avoiding kernel overhead from interrupts and context switches.

The report demonstrates this approach scales almost linearly with additional cores, maintaining low jitter and consistent latency even at millions of packets per second, validated with real packet generators like pktgen-dpdk.

while (running) {
    nb_rx = rte_eth_rx_burst(port_id, queue_id, bufs, BURST_SIZE);
    if (nb_rx > 0) {
        // Process received packets
        nb_tx = rte_eth_tx_burst(port_id, queue_id, bufs, nb_rx);
    }
}

This busy-poll RX/TX loop ensures predictable performance by eliminating interrupt-driven packet handling.

Quick Tips & Tricks

Choose Accelerated Networking-Enabled Azure VMs — Opt for SKUs like D, Fsv2, or Eav4 to get SR-IOV offload and hardware packet acceleration for maximum throughput.
Pin PMD Threads to Dedicated Cores — Avoid sharing CPU cores with other processes or hypervisor tasks to minimize contention and jitter.
Allocate Hugepages per NUMA Node — Match hugepage memory allocation to the local NUMA node of the pinned cores to reduce remote memory access latencies.
Map RX and TX Queues One-to-One — Maintaining pairings between RX/TX queues and PMD threads maximizes queue efficiency and scaling.
Use Realistic Packet Generators for Testing — Simulate real workload profiles with pktgen-dpdk or testpmd to validate performance under production-like conditions.
Continuously Monitor Jitter and Latency — Even with line-rate throughput, look out for latency variance using statistical metrics to maintain quality of service.

Conclusion

DPDK 25.11 on Azure delivers compelling performance for demanding packet processing workloads when using carefully tuned Azure VMs and DPDK configurations. The combination of hardware-assisted networking, NUMA-aware resource allocation, and proven packet forwarding code paths achieves near line-rate throughput with predictable latency and nearly linear multi-core scaling.

These insights empower developers to architect cloud-native networking solutions that do not compromise on performance despite the virtualization layer. As Azure continues investing in network acceleration and optimized tooling, DPDK-based workloads stand to benefit from ongoing platform advancements and community collaboration.

For anyone building or optimizing network functions in the cloud, embracing these recommended patterns and leveraging the detailed Microsoft Azure DPDK Performance Report unlocks real-world, production-grade performance at scale.

References

DPDK 25.11 Performance on Azure for High-Speed Packet Workloads | Microsoft Community Hub — Official blog post with insights and best practices.
Microsoft Azure DPDK Performance Report (PDF) — Comprehensive benchmarking and analysis document.
GitHub Repository: dpdk-perf — Scripts and configurations for reproducing performance tests.
Linux on Azure Tech Community — broader community insights on Linux workloads in Azure.
Azure VM SKUs with Accelerated Networking — Details on available Azure VM types and networking features.
DPDK Official Documentation — Deep dive into DPDK internals, APIs, and optimization guides.