At my company, we run a Wireguard server for VPN access, specifically Firezone. While we’re quite happy with it so far, one missing feature (though I’ve heard they’re working on it for the paid tier!) is more robust logging. In particular, I’d like it so that we can log all the remote IP addresses for a given user. Hopefully the security use-case for this is somewhat clear - if an employee gets their VPN key compromised, we’ll notice when their most recent login IP is in an AS131279 range.

Under Firezone, every user can register multiple devices. Each device gets its own Wireguard keypair, and it gets added as a peer on the wg-firezone interface.

So ideally we’d like a way to not only log the source of Wireguard traffic (this could be handled with network-level monitoring), but - since we want to know which user or device got compromised - to associate it to a specific peer (this is a bit harder).

The Goal

A log that looks something like this:

= TIMESTAMP     USER      REMOTE IP
  08:02:25       Alice     1.2.3.4
  08:32:49       Bob       4.5.6.7
  09:12:01       Alice     1.2.3.4
  09:16:53       Alice     66.66.66.66     *Whoa, that's weird!*
  09:23:44       Alice     1.2.3.4

Existing Solutions

If you Google “wireguard logging”, you’ll probably see this page from Pro Custodibus. It serves as a nice summary of some options. I’ll go over them, a bit out-of-order.

tcpdump

As mentioned (“network level monitoring”), this wouldn’t associate remote IPs to specific peers. :(

iptables

This one feels kind of hairy. At first glance, I don’t believe it would associate traffic to peers? Now, in Firezone, each peer has a set internal IP address. But with iptables rules, IIUC, one would see the encrypted traffic on the “physical” interface (with its remote IP), and the cleartext traffic on the wg interface (with its internal IP)… but those wouldn’t really correlate together? Unless you do it manually somehow?

I wouldn’t be surprised if there’s some magical iptables incantation that you can do to get it to wrestle the remote IP ↔ internal IP association out of the depths of the kernel networking stack (fwmark? -j TRACE? Something something?). But this is beyond my abilities, and doesn’t feel super maintainable.

Pro Custodibus

I’m giving them a mention just because I’m genuinely curious how they do it; they’re closed source (I think?) and the website doesn’t really go into details. Yet they seem pretty fully-featured, with a clean admin UI and good logging. The Agent seems to be open source, as well as some docker images, so maybe that’d be interesting to look at someday.

Also, the font and naming choice feel very… ancient Roman? It’s very distinct from any other software company I’ve ever seen. It’s kind of delightful, honestly. Their blog is also really cool.

Just watch the config?

Like, just run wg show repeatedly really fast, and log if the endpoint: for a given peer changes? This one isn’t even in the page, but it’s worth a mention. It’s actually kind of what Firezone does already! In apps/fz_vpn/lib/fz_vpn/stats_push_service.ex, there seems to be a timer that runs every 60 seconds and ultimately calls Devices.StatsUpdater to refresh device information from the Wireguard config.

Honestly, this would probably work fine. I’m not sure what Wireguard does when it rapidly recieves valid peer traffic from two different IP addresses - does the Endpoint address rapidly bounce around to whichever one it most recently heard from? That’s what I’d expect, but it’s purely a guess.

So my worry with this solution was that someone could “sneak in” their traffic really fast at the same time the legitimate user is using the VPN, and the monitoring script would never quite catch the moments when the Endpoint was the evildoer’s IP address.

Is this a realistic concern? Meh. Probably not. Again, this probably would have worked fine. But it’s not as cool!

Dynamic Debug

This one also pops up elsewhere on the Internet. Basically, you run echo module wireguard +p | sudo tee /sys/kernel/debug/dynamic_debug/control, and within the kernel, Wireguard starts logging stuff to dmesg.

The logs look something like this.

[6305964.569633] wireguard: wg-firezone: Receiving handshake initiation from peer 18 (93.184.216.34:61795)
[6305964.569642] wireguard: wg-firezone: Sending handshake response to peer 18 (93.184.216.34:61795)
[6305964.569891] wireguard: wg-firezone: Keypair 42365 destroyed for peer 18
[6305964.569893] wireguard: wg-firezone: Keypair 42371 created for peer 18
[6305964.586554] wireguard: wg-firezone: Receiving keepalive packet from peer 18 (93.184.216.34:61795)

[6305983.760595] wireguard: wg-firezone: Receiving handshake initiation from peer 14 (130.85.12.141:54719)
[6305983.760604] wireguard: wg-firezone: Sending handshake response to peer 14 (130.85.12.141:54719)
[6305983.760823] wireguard: wg-firezone: Keypair 42366 destroyed for peer 14
[6305983.760825] wireguard: wg-firezone: Keypair 42372 created for peer 14
[6305983.774605] wireguard: wg-firezone: Receiving keepalive packet from peer 14 (130.85.12.141:54719)

Pro Custodibus admit the problem themselves:

Peers will be numbered in the order added since the WireGuard module was loaded. This can make identifying peers difficult, particularly if you have multiple WireGuard interfaces, or if you ever bring an interface down and up (or otherwise change the configuration for a peer).

And yeah, there’s the rub. Firezone does its own management of adding peers and changing configurations, so the peer numbers just aren’t reliable. Looking at the source seems to indicate that these numbers come from peer->internal_id, which doesn’t appear to be exposed anywhere.

Looking Really Hard at the Source

It really is a shame. That Sending handshake response to peer debug statement is almost perfect. It’s a good event to log on; any newly-connecting peer will need to do a handshake and we’ll have to send it a handshake response. It prints out the peer’s remote IP. It logs to dmesg, which goes to journald. It just doesn’t contain a good peer identifier.

But what if we could add our own debugging to that wg_packet_send_handshake_response function?

At long last, an excuse to use bpftrace.

Hello, bpftrace

As their Github describes, “bpftrace is a high-level tracing language for Linux enhanced Berkeley Packet Filter (eBPF) available in recent Linux kernels.” Basically, you can insert various types of probes into the kernel, and safely execute code (eBPF is guaranteed to be safe) when they are triggered. Their README has a Tools section which is effectively a massive flex of all the cool things you can do with it. bashreadline.bt is probably by far my favorite.

I love bpftrace. I hate their logo, though.

Let’s start by listing out all the possible things we can probe, and searching for Wireguard-related… things.

$ sudo ./bpftrace -l | grep wireguard
$ sudo ./bpftrace -l | grep wg
...
kprobe:wg_packet_receive
kprobe:wg_packet_rx_poll
kprobe:wg_packet_send_handshake_cookie
kprobe:wg_packet_send_handshake_initiation
kprobe:wg_packet_send_handshake_response
kprobe:wg_packet_send_keepalive
kprobe:wg_packet_send_queued_handshake_initiation
kprobe:wg_packet_send_staged_packets
kprobe:wg_packet_tx_worker
kprobe:wg_peer_create
kprobe:wg_peer_get_maybe_zero
kprobe:wg_peer_put
kprobe:wg_peer_remove
...

Seems there’s roughly 116 available probes, and they’re all of the kprobe type. Let’s look more into that.

The Humble kprobe

kprobes allow us to probe on the execution of almost any kernel function. Personally, I’m amazed that such a thing is possible, and even more amazed at how they do it while making it reliably disconnect when I Ctrl+C bpftrace. The kernel really is magic.

Thankfully, that wg_packet_send_handshake_response function from earlier is kprobeable! Let’s give it a try. We’ll run the program, then disconnect and reconnect the VPN.

$ sudo ./bpftrace -e 'kprobe:wg_packet_send_handshake_response { printf("sent handshake\n"); }'
Attaching 1 probe...
sent handshake
^C

When we reconnected, sent handshake popped up. The probe seems to be working.

Let’s take a look at the top of that function again.

void wg_packet_send_handshake_response(struct wg_peer *peer)
{
	struct message_handshake_response packet;

	atomic64_set(&peer->last_sent_handshake, ktime_get_coarse_boottime_ns());
	net_dbg_ratelimited("%s: Sending handshake response to peer %llu (%pISpfsc)\n",
			    peer->device->dev->name, peer->internal_id,
			    &peer->endpoint.addr);

The first and only argument, *peer, has all the info we need inside it. We can access arguments in bpftrace with the keywords arg0, arg1, etc. All we’ll need here is arg0.

But how do we access the fields of the struct? We’ll need to get the struct definition into bpftrace. While you can generally just paste C structs into it and the syntax will align enough that it can understand them, we can’t just paste in struct wg_peer from peer.h, because it has references to various kernel and Wireguard structs within it.

Thankfully, bpftrace has an #include statement, which operates similarly to how you’d expect it to in C. The typical use is to include Linux headers (you need to install these) with something like #include <linux/socket.h>, but I guess the Linux headers didn’t expose these Wireguard internal structs, like the stuff in wireguard/peer.h.

My initial workaround was - don’t laugh - to download the entire Linux source, put it next to my bpftrace program, and then #include "linux/drivers/net/wireguard/peer.h". I later realized (with the help of sudo strace -f -e openat ./bpftrace test.bt 2>&1 | grep "\./linux" to see which headers it was reading locally) that we only need to have a few Wireguard header files locally, and the rest of the structs will be pulled from the Linux headers over in /lib/modules/5.15.0-52-generic/build/include/.

#!/bin/bash

mkdir wireguard
cd wireguard

for filename in allowedips.h cookie.h device.h messages.h noise.h peer.h peerlookup.h; do
    echo Getting $filename...
    wget https://raw.githubusercontent.com/torvalds/linux/master/drivers/net/wireguard/$filename
done

Now that that’s done, let’s crack that wg_peer argument open. The syntax of bpftrace is similar to C, so we can pretty much copy over code from the kernel. Variables we define need to be prefixed with a $, though.

#include "wireguard/peer.h"

kprobe:wg_packet_send_handshake_response {
    $peer = (struct wg_peer *) arg0;

    printf("Sent handshake response \t Interface: %s \n",
        $peer->device->dev->name
    );
}
$ sudo ./bpftrace test.bt
Attaching 1 probe...
Sent handshake response          Interface: wg0

Wasn’t that easy? It’s a pretty cool piece of software.

Printing Remote IPs

Here things get less copy-pastey. The kernel code uses some fancy printf facility (%pISpfsc) to print the endpoint address. bpftrace presumably does not have that.

It does, however, conveniently have a function called ntop(). “ntop returns the string representation of an IPv4 or IPv6 address.

What do we feed to it, though? Digging into peer->endpoint reveals it to be a struct with some union members.

struct endpoint {
	union {
		struct sockaddr addr;
		struct sockaddr_in addr4;
		struct sockaddr_in6 addr6;
	};
	union {
		struct {
			struct in_addr src4;
			/* Essentially the same as addr6->scope_id */
			int src_if4;
		};
		struct in6_addr src6;
	};
};

There are probably more elegant approaches here, but sockaddr, sockaddr_in, and sockaddr_in6 will always start with the socket family, so we can have a conditional based on that (IPv4 vs IPv6):

kprobe:wg_packet_send_handshake_response {
    $peer = (struct wg_peer *) arg0;

    // Check if endpoint is IPv4 or IPv6
    $addr_family = $peer->endpoint.addr.sa_family;
    if ($addr_family == AF_INET) {
        // IPv4 address
    } else if ($addr_family == AF_INET6) {
        // IPv6 address
    }

And then we just do a bunch of field accesses to get to the addressy meat of the struct:

kprobe:wg_packet_send_handshake_response {
    $peer = (struct wg_peer *) arg0;

    // Check if endpoint is IPv4 or IPv6
    $addr_family = $peer->endpoint.addr.sa_family;
    if ($addr_family == AF_INET) {
        // IPv4 address
        $ip = ntop(AF_INET, $peer->endpoint.addr4.sin_addr.s_addr);
    } else if ($addr_family == AF_INET6) {
        // IPv6 address (UNTESTED)
        $ip = ntop(AF_INET6, $peer->endpoint.addr6.sin6_addr.in6_u.u6_addr8);
    }

$ip can then be printed with %s.

Printing Peer Pubkeys

A peer’s identifier within Wireguard is generally its public key. This is the info that’s missing from the existing debug output. Let’s see if we can get it. But where is it?

struct wg_peer {
	struct wg_device *device;
	struct prev_queue tx_queue, rx_queue;
	struct sk_buff_head staged_packet_queue;
	int serial_work_cpu;
	bool is_dead;
	struct noise_keypairs keypairs;
	struct endpoint endpoint;
	struct dst_cache endpoint_cache;
	rwlock_t endpoint_lock;
	struct noise_handshake handshake;
	atomic64_t last_sent_handshake;
	struct work_struct transmit_handshake_work, clear_peer_work, transmit_packet_work;
	struct cookie latest_cookie;
	struct hlist_node pubkey_hash;
	u64 rx_bytes, tx_bytes;
	struct timer_list timer_retransmit_handshake, timer_send_keepalive;
	struct timer_list timer_new_handshake, timer_zero_key_material;
	struct timer_list timer_persistent_keepalive;
	unsigned int timer_handshake_attempts;
	u16 persistent_keepalive_interval;
	bool timer_need_another_keepalive;
	bool sent_lastminute_handshake;
	struct timespec64 walltime_last_handshake;
	struct kref refcount;
	struct rcu_head rcu;
	struct list_head peer_list;
	struct list_head allowedips_list;
	struct napi_struct napi;
	u64 internal_id;
};

There’s pubkey_hash, but I want the real thing. Scrolling down just a little bit conveniently shows the peer creation function:

struct wg_peer *wg_peer_create(struct wg_device *wg,
			       const u8 public_key[NOISE_PUBLIC_KEY_LEN],
			       const u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN]);

In that function’s definition, we can trace the public_key argument through to wg_noise_handshake_init(&peer->handshake, &wg->static_identity, public_key, preshared_key, peer); which looks something like this:

void wg_noise_handshake_init(struct noise_handshake *handshake,
			     struct noise_static_identity *static_identity,
			     const u8 peer_public_key[NOISE_PUBLIC_KEY_LEN],
			     const u8 peer_preshared_key[NOISE_SYMMETRIC_KEY_LEN],
			     struct wg_peer *peer)
{
	// ...
	memcpy(handshake->remote_static, peer_public_key, NOISE_PUBLIC_KEY_LEN);
	// ...
}

So peer->handshake.remote_static will hold our public key. Let’s try printing it! Except it’s an array and bpftra-

printf("Sent handshake response \t Interface: %s \t Endpoint IP: %s \t Pubkey: %02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x \n",
    $peer->device->dev->name,
    $ip,

    $peer->handshake.remote_static[0],
    $peer->handshake.remote_static[1],
    $peer->handshake.remote_static[2],
    $peer->handshake.remote_static[3],
    $peer->handshake.remote_static[4],
    $peer->handshake.remote_static[5],
    $peer->handshake.remote_static[6],
    $peer->handshake.remote_static[7],

    $peer->handshake.remote_static[8],
    $peer->handshake.remote_static[9],
    $peer->handshake.remote_static[10],
    $peer->handshake.remote_static[11],
    $peer->handshake.remote_static[12],
    $peer->handshake.remote_static[13],
    $peer->handshake.remote_static[14],
    $peer->handshake.remote_static[15],

    $peer->handshake.remote_static[16],
    $peer->handshake.remote_static[17],
    $peer->handshake.remote_static[18],
    $peer->handshake.remote_static[19],
    $peer->handshake.remote_static[20],
    $peer->handshake.remote_static[21],
    $peer->handshake.remote_static[22],
    $peer->handshake.remote_static[23],

    $peer->handshake.remote_static[24],
    $peer->handshake.remote_static[25],
    $peer->handshake.remote_static[26],
    $peer->handshake.remote_static[27],
    $peer->handshake.remote_static[28],
    $peer->handshake.remote_static[29],
    $peer->handshake.remote_static[30],
    $peer->handshake.remote_static[31]
);

Yeah, whatever, good enough.

(bpftrace seems to require literal integers in array indices; presumably it’s to avoid annoying the eBPF verifier. There are definitely far more elegant ways of doing this, I’m just lazy.)

Wrapper Script

There’s just one problem now.

$ sudo ./bpftrace test.bt
Attaching 1 probe...
Sent handshake response          Interface: wg0          Endpoint IP: 1.2.3.4      Pubkey: 70fe4aa237c9fc2373e51b4ef95d80134bceadd8b9d23fa62af582dbd5b4fb90

This prints it in hex, but standard Wireguard syntax is Base64. We could write a Base64 encoder in bpftrace, but that seems like a lot of work.

#!/bin/bash

IFS=$'\n'
while read line
do
    IFS=$' '
    arr=($line)
    pubkey=$(echo ${arr[-1]} | xxd -r -p | base64)
    left_part=$(echo -n "$line" | sed "s/Pubkey.*//")
    echo "$left_part" Pubkey: $pubkey
done < <(./bpftrace kprobe_wg_handshake.bt)

This wrapper script will split up each line that comes out of the bpftrace execution, take the last field, hex-decode it, base64-encode it, and reprint it.

Finally, we can spruce up the script a little bit to support both PATH and “local” bpftrace, and avoid parsing the “Attaching 1 probe” message…

#!/bin/bash

runcmd="bpftrace"
if ! command -v bpftrace &> /dev/null
then
    echo "bpftrace not on path. trying ./bpftrace"
    runcmd="./bpftrace"
fi

IFS=$'\n'
while read line
do
    if echo "$line" | grep -q "Attaching"; then
        echo "Starting Wireguard handshake logger."
        continue
    fi
    IFS=$' '
    arr=($line)
    pubkey=$(echo ${arr[-1]} | xxd -r -p | base64)
    left_part=$(echo -n "$line" | sed "s/Pubkey.*//")
    echo "$left_part" Pubkey: $pubkey
done < <($runcmd kprobe_wg_handshake.bt)

echo Exiting Wireguard handshake logger.

As a systemd Service

And at long last, to come back to the original goal of logging Wireguard connections, we can run the wrapper as a systemd service:

[Unit]
Description=Wireguard Handshake Logger
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/home/ubuntu/wireguard-handshake-logger
ExecStart=/home/ubuntu/wireguard-handshake-logger/wireguard-handshake-logger.sh
Restart=on-failure
ProtectSystem=yes
ProtectHome=read-only

[Install]
WantedBy=multi-user.target

(Sadly, bpftrace demands UID 0, CAP_BPF doesn’t work.)

This now logs to journald as expected:

May 11 05:07:52 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 1.2.3.4           Pubkey: DKpyuLmt+Cb3s18qp20in1vw2ESYdnbDeZGUlBYoSEw=
May 11 05:09:49 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 88.88.88.88           Pubkey: HMeFUZDq0FBcnL1ndHzfByrVv/UUsTRSLrR6yeHfWQU=
May 11 05:09:52 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 1.2.3.4           Pubkey: DKpyuLmt+Cb3s18qp20in1vw2ESYdnbDeZGUlBYoSEw=
May 11 05:11:49 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 88.88.88.88           Pubkey: HMeFUZDq0FBcnL1ndHzfByrVv/UUsTRSLrR6yeHfWQU=
May 11 05:11:52 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 1.2.3.4           Pubkey: DKpyuLmt+Cb3s18qp20in1vw2ESYdnbDeZGUlBYoSEw=
May 11 05:13:50 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 88.88.88.88           Pubkey: HMeFUZDq0FBcnL1ndHzfByrVv/UUsTRSLrR6yeHfWQU=
May 11 05:13:52 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 1.2.3.4           Pubkey: DKpyuLmt+Cb3s18qp20in1vw2ESYdnbDeZGUlBYoSEw=
May 11 05:15:52 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 1.2.3.4           Pubkey: DKpyuLmt+Cb3s18qp20in1vw2ESYdnbDeZGUlBYoSEw=
May 11 05:17:52 hostname wireguard-handshake-logger.sh[791642]: Sent handshake response          Interface: wg-firezone          Endpoint IP: 1.2.3.4           Pubkey: DKpyuLmt+Cb3s18qp20in1vw2ESYdnbDeZGUlBYoSEw=

… where promtail can collect it and forward it to Loki, and we can have a nice dashboard of endpoint IPs.

Future Work

We’re not done yet, though. There’s one blatant thing we’ve missed from our goal, and that’s that we’re printing out public keys instead of user names (or device names)! It’s still usable like this since you can look up pubkeys in the Firezone UI, but that’s not ideal. Firezone has an API which should be able to help accomplish this, but this post is already far too long; I’ll leave that for a later day.

For now, the code is available at https://github.com/nikolabura/wireguard-handshake-logger.