Verification Pipeline

perlchecker processes each annotated function through an eight-stage pipeline. This page describes what each stage does and how they connect.

Pipeline Overview

┌─────────────────────────────────────────────────────┐
│  Perl Source File                                   │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  1. Function Extraction                             │
│     Scan for # sig: blocks → ExtractedFunction      │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  2. Annotation Parsing                              │
│     Parse sig, pre, post → FunctionSpec             │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  3. PEG Parsing                                     │
│     Parse body → AST (loops unrolled here)          │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  4. Type Checking                                   │
│     Validate types, declarations, expressions       │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  5. SSA / IR Lowering                               │
│     Version variables → SsaFunction                 │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  6. CFG Construction                                │
│     Basic blocks + terminators → ControlFlowGraph   │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  7. Symbolic Execution                              │
│     Explore all paths with symbolic inputs          │
└──────────────────────┬──────────────────────────────┘
                       ▼
┌──────────────────────────────────────────────────────┐
│  8. SMT Encoding (Z3)                               │
│     For each path: is ¬postcondition satisfiable?   │
└──────────────────────┬──────────────────────────────┘
                       ▼
           ┌───────────┴───────────┐
           │                       │
      ✔ Verified            ✘ Counterexample

Stage 1: Function Extraction

Module: src/extractor/mod.rs

Scans the source file for contiguous comment blocks starting with # sig:, followed immediately by sub name { ... }. Extracts:

  • Function name

  • Annotation lines (the #-prefixed comments)

  • Function body text (everything between { and the matching })

  • Source line number

Non-annotated functions are silently skipped.

Stage 2: Annotation Parsing

Module: src/annotations/mod.rs

Parses the extracted annotation lines into a FunctionSpec:

  • # sig: (T1, T2) -> R → parameter types and return type

  • # pre: expr → precondition expression (defaults to true)

  • # post: expr (or # pos:) → postcondition expression

Validates that variable references in pre/post match the declared parameters and that the parameter count in the signature matches the function body.

Stage 3: PEG Parsing

Module: src/parser/mod.rs + src/parser/perl_subset.pest

Parses the function body using a PEG (Parsing Expression Grammar) via the pest library. Produces a FunctionAst containing a list of statements and expressions.

Loop unrolling happens here. All loops (while, for, until, do-while, do-until) are converted to nested if/else chains up to max_loop_unroll depth. For example, with a depth of 3:

while ($x > 0) { $x--; }

becomes:

if ($x > 0) {
    $x--;
    if ($x > 0) {
        $x--;
        if ($x > 0) {
            $x--;
            // LoopBoundExceeded marker
        }
    }
}

Desugaring also occurs at this stage:

  • $x += 1$x = $x + 1

  • $x++$x = $x + 1

  • unless (C)if (!C)

  • until (C)while (!C)

  • Statement modifiers → if blocks

  • 2-arg substr($s, $off)substr($s, $off, length($s) - $off)

Stage 4: Type Checking

Module: src/ast/mod.rs

Performs flow-sensitive type checking on the AST:

  • Every variable must be declared with my before use

  • Declared but uninitialized variables cannot be read

  • Arithmetic operators require I64 operands

  • String operators require Str operands

  • Array indices must be I64; hash keys must be Str

  • Precondition and postcondition expressions are type-checked too

  • Reference aliases are tracked through branches and propagated to the postcondition (aliases created in both branches of an if/else are preserved after the merge)

Stage 5: SSA / IR Lowering

Module: src/ir/mod.rs

Converts the AST to Static Single Assignment (SSA) form:

  • Each variable assignment creates a new versioned name (x__0, x__1, x__2, …)

  • if/else branches produce merge (phi) nodes that select the correct version based on which branch was taken

  • Function calls in expressions are lifted to separate assignments

  • Array operations become store/select expressions

Stage 6: CFG Construction

Module: src/ir/mod.rs

Builds a Control Flow Graph from the SSA form:

  • Each basic block contains straight-line assignments

  • Blocks end with a terminator: Branch (conditional), Goto, Return, Die, or LoopBoundExceeded

  • Branch targets reference other block IDs

  • Merge points use block parameters (phi functions)

Stage 7: Symbolic Execution

Module: src/symexec/mod.rs

Executes the CFG symbolically, treating all function inputs as unconstrained symbolic variables:

  • Maintains a path condition — accumulated constraints from branches

  • At each Branch terminator, forks into two states (then/else)

  • Only explores feasible paths (path condition is satisfiable)

  • Tracks a path budget (default: 1024 paths) to prevent explosion

  • Array length invariants are injected: all array length companions are constrained to be ≥ 0

  • Extern contracts are strictly evaluated: if a precondition or postcondition expression cannot be evaluated, it is an error (not silently assumed true)

  • For each path reaching a Return:

    • Checks if the path is valid via a well-definedness condition (reflexivity check on the result expression, including collections)

    • Encodes the verification query for the SMT solver

Path pruning: If the path condition conjoined with a branch condition is unsatisfiable, that branch is not explored.

Stage 8: SMT Encoding and Solving

Module: src/smt/mod.rs

For each completed path, translates symbolic expressions to Z3 formulas:

  • I64 expressions → Z3 I64 sort

  • Str expressions → Z3 String sort

  • Bool expressions → Z3 Bool sort

  • Arrays → Z3 Array sort with Select/Store

  • Hashes → Z3 Array<String, Value> sort

The verification query for each path is:

path_condition ∧ precondition ∧ ¬postcondition
  • If UNSAT: the postcondition holds on this path (verified)

  • If SAT: the solver model is a counterexample — concrete input values that violate the postcondition

  • If UNKNOWN: the solver timed out or hit a theory limitation

Safety constraints are automatically injected:

  • Division and modulo: divisor 0

  • String length: all strings bounded to 32 characters

  • Substring bounds: offset 0, offset length(str), len 0