Under the Hood with Go TLS and eBPF

By Shaun Duncan

Under the Hood with Go TLS and eBPF

There are many reasons why you might want to leverage eBPF for plaintext inspection of TLS traffic including security monitoring, debugging, traffic analysis, etc. The formula for doing this for applications that use OpenSSL is largely the same: attach probes to the SSL_read and SSL_write symbols in libssl.so to observe function entry and function return, and then process the unencrypted data depending how you please. Go, however, makes this straightforward approach completely unusable.

But why?

In this post we’ll discuss the challenges faced in instrumenting Go applications with eBPF probes, and present options for overcoming them in order to capture plaintext TLS payloads.


A Quick eBPF Refresher

eBPF is a feature of the Linux kernel that allows you to run sandboxed programs at a variety of kernel-space and user-space hooks, giving you the ability to observe, and even influence, system or application behavior with very little overhead.

There are many program types available that serve different purposes. For this post, we are most interested in BPF_PROG_TYPE_KPROBE which provides the ability to attach to a user-space function in order to observe when it is called and when it returns, called a uprobe and uretprobe respectively. This can be done for any ELF executable or shared object, so long as symbols aren’t stripped.

Additionally, eBPF programs can, and usually do, make use of generic storage mechanisms called maps in order to share data between kernel-space eBPF programs or between kernel-space and user-space.

For more detailed information, the following are excellent resources:


How TLS Capture Usually Works (Non‑Go Appliations)

Many, if not most, applications terminate TLS through user‑space libraries that provide higher level APIs that implicitly handle the complexities of TLS. And because all of this happens in user-space, any kernel-space observations will be encrypted.

Let’s briefly revisit the OpenSSL pattern mentioned earlier. OpenSSL provides two functions, SSL_read and SSL_write, that allow applications to work only with plaintext data. By creating eBPF probes to attach to these functions’ entry and exit points, we’re able to observe plaintext data before it’s encrypted and after it’s decrypted, without the need for any complex setup involving trusted CA certificates or MITM proxies. Here’s a structural example of eBPF programs for OpenSSL using libbpf.

#include "vmlinux.h"  // see: https://docs.kernel.org/bpf/libbpf/libbpf_overview.html#bpf-co-re-compile-once-run-everywhere

#include <bpf/bpf_tracing.h>

SEC("uprobe/SSL_read")
int BPF_UPROBE(ssl_read_call)
{
    // process SSL_read call
    return 0;
}

SEC("uprobe/SSL_read")
int BPF_URETPROBE(ssl_read_return)
{
    // process SSL_read return
    return 0;
}

SEC("uprobe/SSL_write")
int BPF_UPROBE(ssl_write_call)
{
    // process SSL_write call
    return 0;
}

SEC("uprobe/SSL_write")
int BPF_URETPROBE(ssl_write_return)
{
    // process SSL_write return
    return 0;
}

Note that for trivial cases like these, the uprobe and uretprobe are attached at specific, known locations: when the function is called and when the function returns. Attaching these probes are not strictly limited to these locations. For reasons we will see in a moment, you can also attach a uprobe at a specific offset from the function itself, effectively giving you the opportunity to define “custom” hooks within the function body.


Why Go is Different (and Tricky)

First and foremost, Go uses its own internal TLS implementation via the crypto/tls package in order to preserve the statically linked aspect of compiled binaries. As a result, there’s not a one-size-fits-all solution for Go applications like there would be for applications dynamically linking to libssl.so. This is made even more challenging by the fact that the functions we want to attach to, Read and Write of a *tls.Conn, will have different locations for each unique compiled Go binary.

Also, Go uses a different ABI convention than the Linux kernel expects. This, combined with the fact that Go stack frames can dynamically resize from an initial size, means that the use of a a normal uretprobe is unreliable and useless when instrumenting Go applications.

Although not necessarily a Go-specific limitation, binaries must be unstripped. Using go build with -ldflags="-s" strips the symbol table from the compiled ELF binary. Without the symbol table, we’re unable to attach a uprobe to a specific function since its address is unknowable. To demonstrate, let’s look at an example. Take the following simple Go program that performs a single TLS request and prints the response:

package main

import (
	"fmt"
	"io"
	"log"
	"net/http"
)

func main() {
	resp, err := http.Get("https://httpbingo.org/uuid")
	if err != nil {
		log.Fatal("failed to get page")
	}

	body, err := io.ReadAll(resp.Body)
	if err != nil {
		log.Fatal("failed to read body")
	}

	fmt.Println(string(body))
}

If built without -ldflags="-s", the symbol table of the ELF binary can be observed using objdump -t. For our use-case, we want to see the Read and Write functions for *tls.Conn, but more on this later. If the binary is stripped, you’d see the following unusable information from objdump:

$ objdump -t stripped-binary

stripped-binary:     file format elf64-littleaarch64

SYMBOL TABLE:
no symbols

This ABI or That ABI

Regarding the Go ABI, something that will affect how you might support an arbitrary Go application is the version of Go with which the application was compiled. It’s important to remember because the ABI version used by the Go application you are intending to monitor changes where you need to look for function arguments and return values (stack or registers).

In recent years, Go changed its internal ABI from a Plan9 influenced approach that passed function arguments purely on the stack (referred to as ABI0) to one that uses a hybrid approach using both registers and the stack, registers being preferred (referred to as ABIInternal). It was a marked performance improvement for Go applications that was initially released for amd64 in Go 1.17 and later expanded to arm64 (and more) in Go 1.18.

The intricacies of both ABI versions are outside the scope of this post, but let’s assume we only want to support the newer ABIInternal convention, the details of which can be found in the specification.

If you are building a Go application to attach eBPF probes to other Go applications, you can use the debug/buildinfo package in the Go standard library to enforce version checks before you attempt to attach eBPF probes to a Go binary:

if info, err := buildinfo(pathToBinary); err != nil {
    // handle error
}  else {
    // check info.Version, a string like "go1.24.2" for ABI compatibility
}

Attaching to Go Symbols for TLS

As previously mentioned, Go binaries are statically linked and utilize an internal TLS implementation rather than dynamically linking to libssl.so (or another TLS library). Although it’s a bit of an inconvenience, the approach is still the same: attach to the address of the symbol we want to instrument. The difference is that we have to do this on a per-binary basis rather than once per shared object.

For TLS instrumentation, we’re primarily interested in two symbols - the Read and Write functions for *tls.Conn. These correspond to the symbols crypto/tls.(*Conn).Read and crypto/tls.(*Conn).Write respectively. You can see this by using objdump:

$ objdump -t your-binary | grep -E 'crypto/tls.\(\*Conn\).(Read|Write)$'
0000000000173070 g     F .text  0000000000000710              crypto/tls.(*Conn).Write
0000000000174560 g     F .text  0000000000000380              crypto/tls.(*Conn).Read

You can also do this programatically in Go by using the debug/elf package in the standard library:

f, err := elf.Open(pathToBinary)
if err != nil {
    // handle error
}
defer f.Close()

symbols, err := f.Symbols()
if err != nil {
    // handle error
}

for _, symbol := range symbols {
    // check if symbol.Name matches tls.Conn Read/Write
}

However, Cilium’s eBPF package for Go makes this check much more ergonomic via its link subpackage:

ex, _ := link.OpenExecutable(targetPath)
up, _ := ex.Uprobe("crypto/tls.(*Conn).Read", ebpfEntryProg, nil)
// returns: link.Link

What About uretprobe?

Using a uretprobe is practically unusable for Go applications due to the reasons outlined above. So how do we work around this limitation? To answer that means getting our hands a little dirty with some assembly code, mostly so we can locate any RET instruction. The goal is to create our own “custom” function return probes just for the Go application we want to monitor.

Let’s take a quick peek at the disassembled contents of the crypto/tls.(*Conn).Read symbol in a Go ELF binary via a partial output of objdump -d:

0000000000174560 <crypto/tls.(*Conn).Read>:
  174560:       f9400b90        ldr     x16, [x28, #16]
  174564:       eb3063ff        cmp     sp, x16
  174568:       54001aa9        b.ls    1748bc <crypto/tls.(*Conn).Read+0x35c>  // b.plast
  17456c:       f8190ffe        str     x30, [sp, #-112]!
  174570:       f81f83fd        stur    x29, [sp, #-8]
  174574:       d10023fd        sub     x29, sp, #0x8
  174578:       f90033ff        str     xzr, [sp, #96]
  17457c:       f9003fe0        str     x0, [sp, #120]
  174580:       f90047e2        str     x2, [sp, #136]
  174584:       f90043e1        str     x1, [sp, #128]
  174588:       3900bfff        strb    wzr, [sp, #47]
  17458c:       f9001bff        str     xzr, [sp, #48]
  174590:       a9047fff        stp     xzr, xzr, [sp, #64]
  174594:       d503201f        nop
  174598:       d0000e81        adrp    x1, 346000 <go:itab.*crypto/internal/fips140/aes.CBCEncrypter,crypto/cipher.BlockMode+0x8>
  17459c:       91214021        add     x1, x1, #0x850
  1745a0:       90001ee2        adrp    x2, 550000 <runtime.mheap_+0x189a0>
  1745a4:       91100042        add     x2, x2, #0x400
... snipped ...
  174650:       f9401be0        ldr     x0, [sp, #48]
  174654:       aa1f03e1        mov     x1, xzr
  174658:       aa1f03e2        mov     x2, xzr
  17465c:       f85f83fd        ldur    x29, [sp, #-8]
  174660:       f84707fe        ldr     x30, [sp], #112
  174664:       d65f03c0        ret
  174668:       f9001bff        str     xzr, [sp, #48]
  17466c:       f90023e0        str     x0, [sp, #64]
  174670:       f90027e1        str     x1, [sp, #72]
  174674:       f9401be3        ldr     x3, [sp, #48]
  174678:       aa0103e2        mov     x2, x1
  17467c:       aa0003e1        mov     x1, x0
  174680:       aa0303e0        mov     x0, x3
  174684:       f85f83fd        ldur    x29, [sp, #-8]
  174688:       f84707fe        ldr     x30, [sp], #112

If you look closely, there’s a specific opcode to take note of: the ret instruction. This happens at several points in the Read function (and also the Write function). More importantly though is that we can use this to our advantage and mimic the behavior of a uretprobe by attaching a simple uprobe at each ret instructions based on its offset from the function symbol.

First, let’s take a look at how we can programmatically discover ret instructions when we want to monitor a Go application in our user-space program. There are two Go packages that can be extremely helpful here, depending on the architecture you are targeting: golang.org/x/arch/x86/x86asm for amd64 binaries and golang.org/x/arch/arm64/arm64asm for arm64 binaries.

The approach for both is largely the same, differing only slightly in how opcode size accounting is done. Specifically, arm64 opcodes are always 4 bytes whereas the amd64 (x86_64) opcodes are variable 1-15 bytes. In both cases, we’ll need to find the data of the symbol’s section in the ELF binary:

import "debug/elf"

func getELFSymbolData(binaryPath, symbolName string) ([]byte, err) {
    f, err := elf.Open(binaryPath)
    if err != nil {
        return err
    }
    defer f.Close()

    symbols, err := f.Symbols()
    if err != nil {
        return err
    }

    for _, symbol := range symbols {
        // "crypto/tls.(*Conn).Read" or "crypto/tls.(*Conn).Write"
        if symbol.Name != symbolName {
            continue
        }

        section := f.Sections[symbol.Section]
        sectionData, err := section.Data()
        if err != nil {
            return err
        }

        // the symbol is part of a larger section, so we need to get the just for that symbol
        start := symbol.Value - section.Addr
        end := start + symbol.Size

        return sectionData[start:end]
    }
}

Now comes the architecture-dependent disassembly work to find the offsets of the ret instructions. Starting with arm64, which has 4-byte opcodes as mentioned earlier:

import "golang.org/x/arch/arm64/arm64asm"

func getArm64ReturnOffsets(data []byte) []int {
    var offsets []int

    for i := 0; i < len(data); i += 4 {
        instruction, err := arm64asm.Decode(data[i : i+4])
        if err != nil {
            continue
        }

        if instruction.Op == arm64asm.RET {
            offsets = append(offsets, i)
        }
    }

    return offsets
}

Moving on to amd64 (x86_64), whose opcode sizes are variable 1-15 bytes:

import "golang.org/x/arch/x86/x86asm"

func getAmd64ReturnOffsets(data []byte) []int {
    var offsets []int

    for i := 0; i < len(data); {
        instruction, err := x86asm.Decode(data[i:], 64)
        if err != nil {
            continue
        }

        i += instruction.Len

        if instruction.Op == x86asm.RET {
            offsets = append(offsets, i)
        }
    }

    return offsets
}

Again, pretty much the same, but you’ll have to know the architecture of the ELF binary you’re dealing with. That’s easily done by checking .Machine of the file returned by elf.Open(). It will be either elf.EM_AARCH64 for arm64 or elf.EM_X86_64 for amd64.

Now that we have these offset values, what do we do with them? Once again, Cilium’s ebpf package makes this very easy. Putting everything together to attach both our function entry uprobe and the ret specific ones, we might have something that looks like the following:

import (
    "debug/elf"

    "github.com/cilium/ebpf"
    "github.com/cilium/ebpf/link"
)

var (
    uprobeEntryProgram *ebpf.Program
    uprobeExitProgram  *ebpf.Program
)

func attachGo(binaryPath string) ([]link.Link, error {
    var links []link.Link

    ex, err := link.OpenExecutable(binaryPath)
    if err != nil {
        return links, err
    }

    // attach the entry uprobe 
    l, err := ex.Uprobe("crypto/tls.(*Conn).Read", uprobeEntryProgram, nil) // or crypto/tls.(*Conn).Write
    if err != nil {
        return links, err
    }
    links = append(links, l)

    // get RET offsets
    data, err := getELFSymbolData(binaryPath, "crypto/tls.(*Conn).Read")
    if err != nil {
        return links, err
    }

    f, err := elf.Open(binaryPath)
    if err != nil {
        return links, err
    }
    defer f.Close()

    var offsets []int

    switch f.Machine {
    case elf.EM_AARCH64:
        offsets, err = getArm64ReturnOffsets(data)
        if err != nil {
            return links, err
        }
    case elf.EM_X86_64:
        offsets, err = getAmd64ReturnOffsets(data)
        if err != nil {
            return links, err
        }
    default:
        return links, fmt.Errorf("invalid architecture: %v", f.Machine)
    }

    // process each RET offset
    for _, offset := range offsets {
        opts := &link.UprobeOptions{
            Offset: uint64(offset),
        }

        l, err := ex.Uprobe("crypto/tls.(*Conn).Read", uprobeExitProgram, retOpts)
        if err != nil {
            continue
        }

        links = append(links, l)
    }

    return links, nil
}

And we’re done! We’ve now attached our TLS probes at the entry and return of *tls.Conn.Read(). Keep in mind though that the demonstration above will attach to all running processes of the binaryPath argument. You can limit this on a per-PID basis by specifying the PID attribute of UprobeOptions.


But Wait, There’s More (C Code)!

So far you’ve only seen a rough skeleton of some eBPF C code followed by some Go code to attach uprobe programs to a specific binary, but that’s only a partial picture. The real work is on the eBPF side and the actual probes themselves. The approach is largely formulaic: in the function call uprobe, grab any pointers you need after the function returns and store them in a map, then in the function return get those points from the map and process them as you need to. Let’s start with the function call uprobe for the Read() function, keeping in mind that we are limiting our program to support ABIInternal binaries built with Go >= 1.18:

#include "vmlinux.h"  // see: https://docs.kernel.org/bpf/libbpf/libbpf_overview.html#bpf-co-re-compile-once-run-everywhere

#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __type(key, __u64);
    __type(value, uintptr_t);
    __uint(max_entries, 1024);  // or whatever suits your needs
} go_read_map SEC(".maps");

SEC("uprobe/go_tls_conn_read")
int BPF_UPROBE(go_tls_conn_read_call)
{
    // store the pointer to the []byte argument for later. refer to the abi specification for details on
    // register layout: https://go.googlesource.com/go/+/refs/heads/master/src/cmd/compile/abi-internal.md
    uintptr_t byte_slice_ptr = (uintptr_t)GO_REGS_PARM2(ctx);

    // identifier
    __u64 pid_tgid = bpf_get_current_pid_tgid();

    bpf_map_update_elem(&go_read_map, &pid_tgid, &byte_slice_ptr, BPF_ANY);
    
    return 0;
}

The function call uprobe is really just about setting up for the function return. One thing you may have noticed here is GO_REGS_PARM2; that’s not in any kernel eBPF header nor is it in any libbpf header. This GO_REGS_PARM2 macro is a utility I’ve setup to simplify getting the correct register depending on the architecture my eBPF program is running on. Here’s what it looks like:

#include <bpf/bpf_helpers.h>

#ifdef __TARGET_ARCH_x86
#define __GO_PARM1_REG ax
#define __GO_PARM2_REG bx
#endif

#ifdef __TARGET_ARCH_arm64
#define __GO_PARM1_REG __PT_PARM1_REG
#define __GO_PARM2_REG __PT_PARM2_REG
#endif

#define GO_REGS_PARM1(x) (__PT_REGS_CAST(x)->__GO_PARM1_REG)
#define GO_REGS_PARM2(x) (__PT_REGS_CAST(x)->__GO_PARM2_REG)

Moving on, the real work lies in processing the return. At this point, we need to retrive the []byte pointer we stored when the function was called and process the bytes that were read or written, an amount obtainable via the first return value n. Since Go allows multiple return values, we can’t rely on the libbpf macro PT_REGS_RET. Instead, it’s more reliable to refer to Go’s ABIInternal spec and pull the return values we need from registers (or the stack). Luckily for us, this is retrievable via the registers R0 for arm64 and RAX for amd64 (x86_64):

SEC("uprobe/go_tls_conn_read")
int BPF_UPROBE(go_tls_conn_read_return)
{
    // identifier
    __u64 pid_tgid = bpf_get_current_pid_tgid();

    uintptr_t* byte_slice_ptr = bpf_map_lookup_elem(&go_read_map, &pid_tgid);
    if (!byte_slice_ptr)
        return 0;

    // get the return value "n". only process if there were more than 0 bytes
    size_t n = (size_t)GO_REGS_PARM1(ctx);
    if (n <= 0)
        return 0;

    // process the data
    
    return 0;
}

Processing the data becomes a little more involved. eBPF programs have a limited stack size by design, so doing any sort of data processing on the eBPF side is out of the question. In this situation the path forward means sending data from your kernel-space eBPF program to a user-space one that doesn’t have the same set of strict limitations. This is generally done with the use of a specific type of eBPF map: BPF_MAP_TYPE_PERF_EVENT_ARRAY. Although the name can be misleading since we aren’t dealing with data and not events, it can be leveraged by eBPF programs in order to send larger amounts of data from kernel-space to user-space. With that in mind, we can configure a perf event map that will allow us to send chunks of TLS data to user-space.

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __type(key, __u32);
    __type(value, struct chunk);
    __uint(max_entries, 1024);  // or whatever you need
} go_tls_chunk_map SEC(".maps");

A caveat to this is that perf event arrays have a key size limited to 32 bits, which can hold pid or tgid, but not both. For the sake of simplicity, let’s just assume that we’re only monitoring a single PID that isn’t handling multiple concurrent TLS operations (supporting this would involve struct chunk including additonal context to identify the specific stream and whether it was a Read() or a Write()).

So why am I referring to the map value as a “chunk”? Recall that eBPF programs are tightly sandboxed with strict requirements on what they can and cannot do. One of those things is dynamically allocate memory, so using malloc() is out of the question. As it turns out, there’s a map “trick” to this: create a per-CPU array map that pre-allocates a single struct chunk containing a char array of a known size:

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __type(key, __u32);
    __type(value, struct chunk);
    __uint(max_entries, 1);
} chunk_map SEC(".maps");

The idea is to create chunks of data and send them to user-space. As a side note, this can be a bit of a hurdle by itself since in older kernels (<= v6.4), the eBPF verifier needed to know that a program being loaded would actually terminate or not. This meant that for loops needed to be bounded to a maximum number of iterations that can be determined by the verifier. This isn’t the case in kernel versions 6.4 and later that support open coded iterators. The details of handling the iteration are outside the scope of this post, so we’ll assume that what we have is acceptable by the eBPF verifier.

Let’s proceed into processing our data in the return probe. We have the pointer to the []byte function argument, the return value n either being the number of bytes read or the number of bytes written, a perf event array to push chunks to, and a single entry per-CPU array of pre-allocated space to hold data chunks. Here’s what we do:

#define CHUNK_SIZE 1024
#define MAX_CHUNKS 512

SEC("uprobe/go_tls_conn_read")
int BPF_UPROBE(go_tls_conn_read_return)
{
    // identifier
    __u64 pid_tgid = bpf_get_current_pid_tgid();

    uintptr_t* byte_slice_ptr = bpf_map_lookup_elem(&go_read_map, &pid_tgid);
    if (!byte_slice_ptr)
        return 0;

    // get the return value "n". only process if there were more than 0 bytes
    size_t n = (size_t)GO_REGS_PARM1(ctx);
    if (n <= 0)
        return 0;

    // process the data
    __32 num_chunks = MIN((n / CHUNK_SIZE) + ((n % CHUNK_SIZE) > 0 ? 1 : 0), MAX_CHUNKS);

    // get pre-allocated chunk buffer
    __u32 zero = 0;
    struct chunk* c = bpf_map_lookup_elem(&chunk_map, &zero);
    if (!c)
        return 0;

    __u32 i = 0;
    __u32 offset = 0;

    // reference
    void* data = (void*)byte_slice_ptr;

    for (i = 0; i < num_chunks; i++) {
        int chunk_size = MIN(n - (i * sizeof(c->buf)), sizeof(c->buf));
        if (chunk_size <= 0)
            break;

        c->len = chunk_size;

        if (bpf_probe_read_user(c->buf, chunk_size, data + offset) < 0)
            break;

        if (bpf_perf_event_output(ctx, &go_tls_chunk_map, BPF_F_CURRENT_CPU, c, sizeof(struct chunk)) < 0)
            break;

        offset += chunk_size;
    }
    
    return 0;
}

And we’re done! At this point we should be able to consume the plaintext payloads in our user-space Go application:

import (
    "binary"
    "os"

    "github.com/cilium/ebpf"
    "github.com/cilium/ebpf/perf"
)

func readPerfEvents(map *ebpf.Map) error {
    perfReader, err := perf.NewReader(map, os.Getpagesize())
    if err != nil {
        return err
    }

    var record perf.Record

    for {
        if err := perfReader.ReadInto(&record); err != nil {
            return err
        }

        if len(record.RawSample) == 0 {
          continue
        }

        b := bytes.NewReader(record.RawSample)

        var chunk generatedPkg.Chunk

        if err := binary.Read(b, binary.LittleEndian, &chunk); err != nil {
          continue
        }

        // do what you wish with the chunk
    }
}

Wrapping Up

To summarize what we’ve covered about the TLS capture process for Go applications:

  1. Go applications present a unique challenge for eBPF TLS capture.
  2. Supporting two different Go ABIs is tricky, but limiting support to Go >= 1.18 simplifies things.
  3. Go applications you want to monitor must be unstripped.
  4. A simple uprobe works fine but not a uretprobe. For the TLS symbols you want to monitor (crypto/tls.(*Conn).Read or crypto/tls.(*Conn).Write), disassemble the symbol’s data and walk through it to find RET instructions. Attach a uprobe at the offsets of those RET instructions relative to the beginning of the function.
  5. Setup a pointer handoff from the function entry uprobe to the return uprobe.
  6. Retrieve the stored pointer in the return uprobe and then process the data by creating chunks and sending it to user-space by way of a perf event array.
  7. Pull entries from the perf event array in user-space and process them as you need to.

Instrumenting Go applications with eBPF for TLS capture is significantly more involved than the typical OpenSSL approach, but it’s far from impossible. The primary source of the complexity stems from Go’s design decisions (e.g. static linking, dynamic stack sizing), that prioritize performance and operational simplicity over instrumentation convenience.

The techniques covered here represent somewhat of an agreement between the eBPF world and the Go runtime. The hurdles are unlikely to change, and yes, you’ll need to handle some disassembly and manual probe attachment, but the core patterns are reusable across different Go applications.

That having been said, the juice is worth the squeeze if you need deep visibility into TLS traffic without the use of a proxy-based solution. Just remember: Go applications you want to monitor must be unstripped.