// TRANSMISSION_ID: 002 :: TYPE: TUTORIAL

ZERO-COPY NETWORKING WITH IO_URING

AUTHOR: SARAH M. :: DATE: DEC 15, 2025 :: TIME: 12 MIN READ

We hit a wall. Our NGINX reverse proxy was maxing out CPU usage at 4Gbps throughput. The bottleneck wasn't the network card—it was the Linux kernel context switching.

To achieve our goal of 10Gbps on a single core, we had to bypass standard syscalls entirely. We moved to io_uring.

1. The Syscall Overhead

In traditional networking (using epoll), every time you read a packet, you make a system call. The CPU pauses your application, switches to kernel mode, does the work, and switches back.

This "mode switching" takes nanoseconds, but when you are processing millions of packets per second, it burns 40% of your CPU cycles just "asking" to do work.

2. Implementing io_uring in C

io_uring creates a shared ring buffer between user space (your app) and kernel space. You drop a "submission" into the ring, and the kernel picks it up without waking you up.

Here is the C implementation of the setup:

// setup_ring.c
struct io_uring ring;

int setup_io_uring(unsigned entries) {
    struct io_uring_params params;
    memset(¶ms, 0, sizeof(params));
    params.flags |= IORING_SETUP_SQPOLL; // Kernel poller thread

    int ret = io_uring_queue_init_params(entries, &ring, ¶ms);
    if (ret < 0) {
        fprintf(stderr, "Queue init failed: %d\n", ret);
        return -1;
    }
    return 0;
}

3. The Zero-Copy Trick (SPLICE)

The real magic is combining `io_uring` with SPLICE. Usually, a proxy reads data into a user buffer, then writes it out to another socket.

Kernel -> User RAM -> Kernel -> Network Card

With Zero-Copy, we tell the kernel to move the data from Socket A directly to Socket B. The data never touches our application memory.

// Submitting a SPLICE request
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_splice(sqe,
    fd_in, NULL,
    pipe_write, NULL,
    len,
    SPLICE_F_MOVE | SPLICE_F_MORE
);
io_uring_submit(&ring);

THE RESULT

We dropped CPU usage by 60%. We hit 10Gbps line rate on a single core. The latency distribution flattened, eliminating the p99 spikes caused by syscall blocking.

4. Why Rust?

While the core logic is C (via `liburing`), managing the lifecycle of these pointers is a nightmare. One use-after-free and you kernel panic.

We wrap this C logic in Rust abstractions. The borrow checker ensures that we don't drop a buffer while the kernel is still holding it. It is unsafe code wrapped in safe interfaces.


WE LIKE HARD PROBLEMS

If you know what a ring buffer is, we want to talk to you.

VIEW SYSTEMS ROLES