Stack Machine & Compiler Correctness

Compiler Verification Systems

End-to-end verification of a compiler from arithmetic expressions to stack machine instructions — proving that compiled programs always produce correct results.


Overview

Compiler correctness is one of the most important problems in formal verification. An incorrect compiler can silently transform correct source code into buggy executables, undermining all source-level reasoning.

This case study presents two verified systems:

  1. A verified compiler — compiling arithmetic expressions to stack machine instructions, with a proof that compiled code always evaluates to the correct result
  2. A verified stack machine program — proving that a specific loop program correctly computes the sum 1 + 2 + ... + n

Part 1: Verified Compiler

The complete development is self-contained IML you can paste into ImandraX.

Source Language

type expr =
  | Int of int
  | Plus of expr * expr
  | Times of expr * expr
  | Sub of expr * expr
  | Div of expr * expr
 
let rec eval_expr e =
  match e with
  | Int i -> i
  | Plus (x,y) ->
    let e1, e2 = eval_expr x, eval_expr y in e1 + e2
  | Times (x,y) ->
    let e1, e2 = eval_expr x, eval_expr y in e1 * e2
  | Sub (x,y) ->
    let e1, e2 = eval_expr x, eval_expr y in e1 - e2
  | Div (x,y) ->
    let e1, e2 = eval_expr x, eval_expr y in e1 / e2

Target: Stack Machine

type instr = Add | Mult | Minus | Quot | Push of int
 
type state = { stack : int list }
 
let apply (s : state) (f : int -> int -> int) =
  match s.stack with
  | a :: b :: rst -> { stack = (f a b) :: rst }
  | _ -> s
 
let step s inst =
  match inst with
  | Add -> apply s (+)
  | Mult -> apply s ( * )
  | Minus -> apply s (fun x y -> y - x)
  | Quot -> apply s (fun x y -> y / x)
  | Push i -> { stack = i :: s.stack }
 
let rec run s program =
  match program with
  | [] -> s
  | i :: i' -> run (step s i) i'
[@@adm program]

The Compiler

let rec compile (e : expr) : instr list =
  match e with
  | Int i -> [Push i]
  | Plus (x,y) -> compile x @ compile y @ [Add]
  | Times (x,y) -> compile x @ compile y @ [Mult]
  | Sub (x,y) -> compile x @ compile y @ [Minus]
  | Div (x,y) -> compile x @ compile y @ [Quot]

Correctness Theorem

The main theorem: running the compiled code on any initial stack always pushes the correct value:

theorem compiler_correct s e =
  run s (compile e) = { stack = (eval_expr e) :: s.stack }
[@@by induction ~id:[%id compiler_induct] () @>>| auto]

The proof uses a custom induction scheme compiler_induct that captures the recursive structure of compilation — crucially, the inductive hypothesis for the second operand uses a stack already containing the first operand's result:

let rec compiler_induct s e =
  let ih s x y =
    compiler_induct s x
    && compiler_induct { stack = (eval_expr x) :: s.stack } y
  in
  match e with
  | Plus (x,y) | Times (x,y) | Sub (x,y) | Div (x,y) -> ih s x y
  | _ -> true

Why Custom Induction Is Needed

Standard structural induction on expressions doesn't work for the compiler correctness theorem. The problem is that after compiling the first operand of Plus(x,y), the machine state has changed — eval_expr x is now on the stack. The inductive hypothesis for y must account for this modified state.

The compiler_induct scheme captures this by threading the evaluation result through:

  • For Plus(x,y): prove correctness for x with current stack s, then prove correctness for y with stack eval_expr(x) :: s.stack
  • This ensures the IH for the second operand matches the actual machine state after the first operand's code has run

This is a classic challenge in compiler verification — the "accumulator threading" problem.

A key enabling lemma establishes that running concatenated programs is equivalent to running them sequentially:

lemma run_assoc s program_1 program_2 =
  run s (program_1 @ program_2) = run (run s program_1) program_2
[@@by auto] [@@rw]

Part 2: Stack Machine Program Verification

A more detailed verification of a specific stack machine program that computes sum(n) = 1 + 2 + ... + n:

let ex : prog = [
    Const 0;     (* 0: initialize accumulator *)
    Store RA;    (* 1: RA = 0 *)
    Load RN;     (* 2: push n *)
    If_leq 10;   (* 3: if n <= 0, jump to halt *)
    Load RN;     (* 4: push n *)
    Load RA;     (* 5: push accumulator *)
    Add;         (* 6: n + acc *)
    Store RA;    (* 7: RA = n + acc *)
    Load RN;     (* 8: push n *)
    Const 1;     (* 9: push 1 *)
    Sub;         (* 10: n - 1 *)
    Store RN;    (* 11: RN = n - 1 *)
    Goto (-10);  (* 12: jump back to loop head *)
    Halt;        (* 13: done *)
]

The Correctness Theorem

theorem main stack ra rn s steps =
  steps >= 11*rn + 5
  && rn >= 0
  && s = { halted=false; prog=ex; pc=0; stack; ra; rn }
  ==>
  match run s steps with
  { halted; ra; _ } ->
    halted=true && ra = sum rn

The proof proceeds by:

  1. Initialization — proving lines 0–1 set up the accumulator
  2. Loop body — proving each iteration correctly adds rn to the accumulator and decrements rn
  3. Loop invariant — by induction with a custom scheme, proving the loop correctly computes sum(rn)
  4. Composition — combining initialization, loop, and termination

Verification Methodology

The stack machine proof proceeds in layers:

  1. Instruction table lemma — establishes what instruction lives at each program counter address
  2. Initialization (lines 0–1) — proves RA is set to 0 in 2 steps
  3. Base case (lines 2–3) — when rn ≤ 0, the loop exits to Halt in 2 more steps
  4. Loop body (lines 2–12) — one iteration adds rn to ra and decrements rn, taking exactly 11 steps
  5. Loop invariant — by induction with a custom measure Ordinal.of_int rn, proves the loop correctly computes sum(rn)
  6. Final assembly — combines initialization, loop, and termination

Each layer uses unroll to verify fixed-length instruction sequences and %use to compose the layers.


End-to-End Correctness

The compiler is proved correct for all expressions and all initial machine states — not just tested on examples.

Custom Induction Schemes

Both proofs use carefully crafted induction schemes that capture the recursive structure of the problem.

Two Levels of Verification

From high-level compiler correctness to low-level instruction-by-instruction loop reasoning.


References