Getting data to user-space

In previous chapters we were logging packets with aya-log. However, what if we need to send additional data about the packet or any other type of information for the user-space program to utilize? In this chapter, we will explore how to leverage perf buffers and define data structures in the -common crate to transfer data from eBPF program to user-space applications. By doing so, user-space programs can access and utilize the transferred data effectively.

Source code

Full code for the example in this chapter is available here

In this chapter, we will be sending data from the kernel space to the user space by writing it into a struct and outputting it to user space. We can achieve this by using an eBPF map. There are different types of maps available, but in this case, we will use PerfEventArray.

PerfEventArray is a collection of per-CPU circular buffers that enable the kernel to emit events (defined as custom structs) to user space. Each CPU has its own buffer, and the eBPF program emits an event to the buffer of the CPU it's currently running on. The events are unordered, meaning that they arrive in the user-space in a different order than they were created and sent from the eBPF program.

To gather events from all CPUs, we are going to spawn a task for each CPU to poll for the events and then iterate over them.

The data structure we'll be using needs to hold an IPv4 address and a port.

xdp-perfbuf-custom-data-common/src/lib.rs
#![no_std]

#[repr(C)]
#[derive(Clone, Copy)]
pub struct PacketLog {
    pub ipv4_address: u32,
    pub port: u32,
}

#[cfg(feature = "user")]
unsafe impl aya::Pod for PacketLog {} // (1)

We implement the aya::Pod trait for our struct since it is Plain Old Data as can be safely converted to a byte slice and back.
Events emitted with PerfEventArray are copied from kernel memory to user memory, therefore they must implement the aya::Pod trait (where Pod stands for "plain old data") which expresses that it's safe to convert them into a sequence of bytes.

Alignment, padding and verifier errors

At program load time, the eBPF verifier checks that all the memory used is properly initialized. This can be a problem if - to ensure alignment - the compiler inserts padding bytes between fields in your types.

Example:

#[repr(C)]
struct SourceInfo {
    source_port: u16,
    source_ip: u32,
}

let source_port = ...;
let source_ip = ...;
let si = SourceInfo { source_port, source_ip };

In the example above, the compiler will insert two extra bytes between the struct fields source_port and source_ip to make sure that source_ip is correctly aligned to a 4-byte address (assuming mem::align_of::<u32>() == 4). Since padding bytes are typically not initialized by the compiler, this will result in the infamous invalid indirect read from stack verifier error.

To avoid the error, you can either manually ensure that all the fields in your types are correctly aligned (e.g. by explicitly adding padding or by making field types larger to enforce alignment) or use #[repr(packed)]. Since the latter comes with its own foot-guns and can perform less efficiently, explicitly adding padding or tweaking alignment is recommended.

Solution ensuring alignment using larger types:

#[repr(C)]
pub struct SourceInfo {
    pub source_port: u32,
    pub source_ip: u32,
}

let source_port = ...;
let source_ip = ...;
let si = SourceInfo { source_port, source_ip };

Solution with explicit padding:

#[repr(C)]
pub struct SourceInfo {
    pub source_port: u16,
    _padding: u16,
    pub source_ip: u32,
}

let source_port = ...;
let source_ip = ...;
let si = SourceInfo { source_port, padding: 0, source_ip };

Extracting packet data from the context and into the map

The eBPF program code in this section is similar to the one in the previous chapters. It extracts the source IP address and port information from packet headers.

The difference is that after obtaining the data from the headers, we create a PacketLog struct and output it to our PerfEventArray instead of logging data directly.

The resulting code looks like this:

xdp-perfbuf-custom-data-ebpf/src/main.rs
#![no_std]
#![no_main]

use core::mem;

use aya_bpf::{
    bindings::xdp_action,
    macros::{map, xdp},
    maps::PerfEventArray,
    programs::XdpContext,
};
use network_types::{
    eth::{EthHdr, EtherType},
    ip::{IpProto, Ipv4Hdr},
    tcp::TcpHdr,
    udp::UdpHdr,
};

use xdp_perfbuf_custom_data_common::PacketLog;

#[map(name = "EVENTS")] // (1)
static EVENTS: PerfEventArray<PacketLog> =
    PerfEventArray::<PacketLog>::with_max_entries(1024, 0);

#[xdp]
pub fn xdp_perfbuf_custom_data(ctx: XdpContext) -> u32 {
    match try_xdp_perfbuf_custom_data(ctx) {
        Ok(ret) => ret,
        Err(_) => xdp_action::XDP_ABORTED,
    }
}

#[inline(always)]
fn ptr_at<T>(ctx: &XdpContext, offset: usize) -> Result<*const T, ()> {
    let start = ctx.data();
    let end = ctx.data_end();
    let len = mem::size_of::<T>();

    if start + offset + len > end {
        return Err(());
    }

    Ok((start + offset) as *const T)
}

fn try_xdp_perfbuf_custom_data(ctx: XdpContext) -> Result<u32, ()> {
    let ethhdr: *const EthHdr = ptr_at(&ctx, 0)?;
    match unsafe { (*ethhdr).ether_type } {
        EtherType::Ipv4 => {}
        _ => return Ok(xdp_action::XDP_PASS),
    }

    let ipv4hdr: *const Ipv4Hdr = ptr_at(&ctx, EthHdr::LEN)?;
    let source_addr = unsafe { (*ipv4hdr).src_addr };

    let source_port = match unsafe { (*ipv4hdr).proto } {
        IpProto::Tcp => {
            let tcphdr: *const TcpHdr =
                ptr_at(&ctx, EthHdr::LEN + Ipv4Hdr::LEN)?;
            u16::from_be(unsafe { (*tcphdr).source })
        }
        IpProto::Udp => {
            let udphdr: *const UdpHdr =
                ptr_at(&ctx, EthHdr::LEN + Ipv4Hdr::LEN)?;
            u16::from_be(unsafe { (*udphdr).source })
        }
        _ => return Err(()),
    };

    let log_entry = PacketLog {
        ipv4_address: source_addr,
        port: source_port as u32,
    };
    EVENTS.output(&ctx, &log_entry, 0); // (2)

    Ok(xdp_action::XDP_PASS)
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::hint::unreachable_unchecked() }
}

Create our map.
Output the event to the map.

Reading data

To read from the perf event array in user space, we need to choose one of the following types:

AsyncPerfEventArray which is designed for use with async Rust.
PerfEventArray, intended for synchronous Rust.

By default, our project template is written in async Rust and uses the Tokio runtime. Therefore, we will use AsyncPerfEventArray in this chapter.

To read from the AsyncPerfEventArray, we must call AsyncPerfEventArray::open() for each online CPU and poll the file descriptor for events.

Additionally, we need to add a dependency on bytes to xdp-log/Cargo.toml. This library simplifies handling the chunks of bytes yielded by the AsyncPerfEventArray.

Here's the code:

xdp-perfbuf-custom-data/src/main.rs
use std::net;

use anyhow::Context;
use aya::{
    include_bytes_aligned,
    maps::perf::AsyncPerfEventArray,
    programs::{Xdp, XdpFlags},
    util::online_cpus,
    Bpf,
};
use bytes::BytesMut;
use clap::Parser;
use log::info;
use tokio::{signal, task};

use xdp_perfbuf_custom_data_common::PacketLog;

#[derive(Debug, Parser)]
struct Opt {
    #[clap(short, long, default_value = "eth0")]
    iface: String,
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    let opt = Opt::parse();

    env_logger::init();

    // This will include your eBPF object file as raw bytes at compile-time and load it at
    // runtime. This approach is recommended for most real-world use cases. If you would
    // like to specify the eBPF program at runtime rather than at compile-time, you can
    // reach for `Bpf::load_file` instead.
    #[cfg(debug_assertions)]
    let mut bpf = Bpf::load(include_bytes_aligned!(
        "../../target/bpfel-unknown-none/debug/xdp-perfbuf-custom-data"
    ))?;
    #[cfg(not(debug_assertions))]
    let mut bpf = Bpf::load(include_bytes_aligned!(
        "../../target/bpfel-unknown-none/release/xdp-perfbuf-custom-data"
    ))?;
    let program: &mut Xdp = bpf
        .program_mut("xdp_perfbuf_custom_data")
        .unwrap()
        .try_into()?;
    program.load()?;
    program.attach(&opt.iface, XdpFlags::default())
        .context("failed to attach the XDP program with default flags - try changing XdpFlags::default() to XdpFlags::SKB_MODE")?;

    // (1)
    let mut perf_array = AsyncPerfEventArray::try_from(bpf.map_mut("EVENTS")?)?;

    let cpus = online_cpus()?;
    for cpu_id in cpus {
        // (2)
        let mut buf = perf_array.open(cpu_id, None)?;

        // (3)
        task::spawn(async move {
            // (4)
            let mut buffers = (0..10)
                .map(|_| BytesMut::with_capacity(1024))
                .collect::<Vec<_>>();

            loop {
                // (5)
                let events = buf.read_events(&mut buffers).await.unwrap();
                for i in 0..events.read {
                    let buf = &mut buffers[i];
                    let ptr = buf.as_ptr() as *const PacketLog;
                    // (6)
                    let data = unsafe { ptr.read_unaligned() };
                    let src_addr = net::Ipv4Addr::from(data.ipv4_address);
                    // (7)
                    info!("SRC IP: {}, SRC_PORT: {}", src_addr, data.port);
                }
            }
        });
    }

    info!("Waiting for Ctrl-C...");
    signal::ctrl_c().await?;
    info!("Exiting...");

    Ok(())
}

Define our map.
Call open() for each online CPU.
Spawn a tokio::task.
Create buffers.
Read events in to buffers.
Use read_unaligned to read the event data into a PacketLog.
Log the packet data.

Running the program

As before, you can overwrite the interface by by providing the interface name as a parameter, for example, RUST_LOG=info cargo xtask run -- --iface wlp2s0.

$ RUST_LOG=info cargo xtask run
[2023-01-25T08:57:41Z INFO  xdp_perfbuf_custom_data] SRC IP: 60.235.240.157, SRC_PORT: 443
[2023-01-25T08:57:41Z INFO  xdp_perfbuf_custom_data] SRC IP: 98.21.76.76, SRC_PORT: 443
[2023-01-25T08:57:41Z INFO  xdp_perfbuf_custom_data] SRC IP: 95.194.217.172, SRC_PORT: 443
[2023-01-25T08:57:41Z INFO  xdp_perfbuf_custom_data] SRC IP: 95.194.217.172, SRC_PORT: 443
[2023-01-25T08:57:41Z INFO  xdp_perfbuf_custom_data] SRC IP: 95.10.251.142, SRC_PORT: 443

Getting data to user-space

Sharing data

Extracting packet data from the context and into the map

Reading data

Running the program