LLVM Alloca in Loops: The Silent Stack Overflow

The problem

You’re building a compiler that emits LLVM IR. Everything works in tests with small inputs, but when the program tries to process 500 thousand elements in a loop, the process dies with SIGSEGV (exit code 139) — no error message, no stack trace, nothing.

The culprit is an alloca instruction emitted inside the body of a loop.

What is alloca

In LLVM IR, alloca reserves space on the stack frame of the current function. It’s the equivalent of declaring a local variable in C:

define i32 @example() {
entry:
  %x = alloca i32, align 4    ; reserves 4 bytes on the stack
  store i32 42, ptr %x, align 4
  %val = load i32, ptr %x, align 4
  ret i32 %val
}

The crucial detail is that memory allocated by alloca is only freed when the function returns. There is no free for the stack — everything is reclaimed at once in the function epilogue, when the stack pointer is restored.

Inside a function without loops, this is perfectly safe. The problem appears when alloca is called repeatedly.

The bug: alloca inside a loop

Consider a code generator that needs to create a temporary variable for each iteration of a loop. If the alloca is emitted inside the loop body, each iteration consumes more stack without ever freeing it:

; BUG: alloca inside the loop — stack grows with each iteration
define void @fill_vector(ptr %vector, i32 %n) {
entry:
  br label %loop

loop:
  %i = phi i32 [0, %entry], [%next, %loop]
  %tmp = alloca i32, align 4         ; ← 4 more bytes on the stack per iteration
  store i32 1, ptr %tmp, align 4
  call void @vector_push(ptr %vector, ptr %tmp)
  %next = add i32 %i, 1
  %done = icmp eq i32 %next, %n
  br i1 %done, label %end, label %loop

end:
  ret void
}

With n = 800,000, this code consumes ~3.2 MB of stack (800K x 4 bytes) — enough to blow past the default 8 MB limit on most systems. The process receives SIGSEGV and dies silently.

The behavior is treacherous because with n = 100 it works perfectly. The bug only manifests with inputs large enough to exceed the stack limit.

How to diagnose

When you see an unexplained segfault in LLVM-generated code, especially with large inputs, follow these steps:

1. Compile with AddressSanitizer

clang -fsanitize=address program.ll -o program
./program

If the issue is stack overflow, ASan reports it clearly:

ERROR: AddressSanitizer: stack-overflow on address 0x7ffc2a400000

2. Inspect the generated .ll file

Look for alloca instructions outside the entry block. Any alloca inside a block that can be reached more than once (such as a loop body) is suspect:

# Look for allocas outside the entry block
grep -n "alloca" program.ll

If the alloca appears in a block that is not entry, investigate whether that block is part of a loop.

3. Compare small N vs large N

If the program works with N=100 but crashes with N=100000, the cause is almost certainly uncontrolled stack growth.

The fix

The solution is to move every alloca instruction to the entry block of the function. Since the entry block executes exactly once, the stack allocation happens only once, and the same space is reused on each iteration:

; CORRECT: alloca in the entry block, reused each iteration
define void @fill_vector(ptr %vector, i32 %n) {
entry:
  %tmp = alloca i32, align 4         ; ← allocated only once
  br label %loop

loop:
  %i = phi i32 [0, %entry], [%next, %loop]
  store i32 1, ptr %tmp, align 4     ; reuses the same slot
  call void @vector_push(ptr %vector, ptr %tmp)
  %next = add i32 %i, 1
  %done = icmp eq i32 %next, %n
  br i1 %done, label %end, label %loop

end:
  ret void
}

In a code generator, the typical implementation is a helper that saves the current insertion point, moves to the end of the entry block, emits the alloca, and restores the original insertion point:

function createAllocaInEntryBlock(
  func: llvm.Function,
  builder: llvm.IRBuilder,
  type: llvm.Type,
  name: string
): llvm.AllocaInst {
  const entryBlock = func.getEntryBlock();
  const currentBlock = builder.getInsertBlock();

  // Move to the end of the entry block
  builder.setInsertionPoint(entryBlock, entryBlock.end());

  const alloca = builder.createAlloca(type, name);

  // Restore the original insertion point
  if (currentBlock) {
    builder.setInsertionPoint(currentBlock, currentBlock.end());
  }

  return alloca;
}

This pattern is exactly what the Kaleidoscope tutorial recommends, and is used by mature compilers like Clang.

Why -O2 won’t save you

It’s tempting to think that the LLVM optimizer would fix this automatically. In some simple cases, the mem2reg pass can promote alloca to SSA registers, eliminating the problem. But this does not work when the variable’s address escapes the function — such as when passing a pointer to an external call:

call void @vector_push(ptr %vector, ptr %tmp)  ; address of %tmp escapes

When the address is passed to another function, LLVM must ensure it points to valid memory. It cannot promote to a register, it cannot eliminate the alloca. The result: each loop iteration allocates more stack, even with -O2 or -O3.

The rule is clear: don’t rely on the optimizer to fix a bug in your code generator.

The lesson for code generator authors

If you’re building a compiler or transpiler that emits LLVM IR, the rule is simple:

Every call to CreateAlloca (or the equivalent in whatever API you use) must emit the instruction in the entry block of the function, never at the point of use. This applies to temporary variables, loop variables, parameters copied to the stack — everything.

This is one of those traps that doesn’t show up in any compilation error and passes unnoticed through every unit test with small inputs. It only manifests in production, with real data, and when it happens, the only symptom is a mysterious segfault.