Verification Pipeline ===================== perlchecker processes each annotated function through an eight-stage pipeline. This page describes what each stage does and how they connect. Pipeline Overview ----------------- .. code-block:: text ┌─────────────────────────────────────────────────────┐ │ Perl Source File │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 1. Function Extraction │ │ Scan for # sig: blocks → ExtractedFunction │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 2. Annotation Parsing │ │ Parse sig, pre, post → FunctionSpec │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 3. PEG Parsing │ │ Parse body → AST (loops unrolled here) │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 4. Type Checking │ │ Validate types, declarations, expressions │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 5. SSA / IR Lowering │ │ Version variables → SsaFunction │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 6. CFG Construction │ │ Basic blocks + terminators → ControlFlowGraph │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 7. Symbolic Execution │ │ Explore all paths with symbolic inputs │ └──────────────────────┬──────────────────────────────┘ ▼ ┌──────────────────────────────────────────────────────┐ │ 8. SMT Encoding (Z3) │ │ For each path: is ¬postcondition satisfiable? │ └──────────────────────┬──────────────────────────────┘ ▼ ┌───────────┴───────────┐ │ │ ✔ Verified ✘ Counterexample Stage 1: Function Extraction ----------------------------- **Module:** ``src/extractor/mod.rs`` Scans the source file for contiguous comment blocks starting with ``# sig:``, followed immediately by ``sub name { ... }``. Extracts: - Function name - Annotation lines (the ``#``-prefixed comments) - Function body text (everything between ``{`` and the matching ``}``) - Source line number Non-annotated functions are silently skipped. Stage 2: Annotation Parsing ----------------------------- **Module:** ``src/annotations/mod.rs`` Parses the extracted annotation lines into a ``FunctionSpec``: - ``# sig: (T1, T2) -> R`` → parameter types and return type - ``# pre: expr`` → precondition expression (defaults to ``true``) - ``# post: expr`` (or ``# pos:``) → postcondition expression Validates that variable references in pre/post match the declared parameters and that the parameter count in the signature matches the function body. Stage 3: PEG Parsing ---------------------- **Module:** ``src/parser/mod.rs`` + ``src/parser/perl_subset.pest`` Parses the function body using a PEG (Parsing Expression Grammar) via the `pest `_ library. Produces a ``FunctionAst`` containing a list of statements and expressions. **Loop unrolling happens here.** All loops (``while``, ``for``, ``until``, ``do-while``, ``do-until``) are converted to nested ``if``/``else`` chains up to ``max_loop_unroll`` depth. For example, with a depth of 3:: while ($x > 0) { $x--; } becomes:: if ($x > 0) { $x--; if ($x > 0) { $x--; if ($x > 0) { $x--; // LoopBoundExceeded marker } } } **Desugaring** also occurs at this stage: - ``$x += 1`` → ``$x = $x + 1`` - ``$x++`` → ``$x = $x + 1`` - ``unless (C)`` → ``if (!C)`` - ``until (C)`` → ``while (!C)`` - Statement modifiers → ``if`` blocks - 2-arg ``substr($s, $off)`` → ``substr($s, $off, length($s) - $off)`` Stage 4: Type Checking ------------------------ **Module:** ``src/ast/mod.rs`` Performs flow-sensitive type checking on the AST: - Every variable must be declared with ``my`` before use - Declared but uninitialized variables cannot be read - Arithmetic operators require ``I64`` operands - String operators require ``Str`` operands - Array indices must be ``I64``; hash keys must be ``Str`` - Precondition and postcondition expressions are type-checked too - Reference aliases are tracked through branches and propagated to the postcondition (aliases created in both branches of an ``if``/``else`` are preserved after the merge) Stage 5: SSA / IR Lowering ---------------------------- **Module:** ``src/ir/mod.rs`` Converts the AST to Static Single Assignment (SSA) form: - Each variable assignment creates a new versioned name (``x__0``, ``x__1``, ``x__2``, ...) - ``if``/``else`` branches produce **merge** (phi) nodes that select the correct version based on which branch was taken - Function calls in expressions are **lifted** to separate assignments - Array operations become ``store``/``select`` expressions Stage 6: CFG Construction --------------------------- **Module:** ``src/ir/mod.rs`` Builds a Control Flow Graph from the SSA form: - Each basic block contains straight-line assignments - Blocks end with a **terminator**: ``Branch`` (conditional), ``Goto``, ``Return``, ``Die``, or ``LoopBoundExceeded`` - Branch targets reference other block IDs - Merge points use **block parameters** (phi functions) Stage 7: Symbolic Execution ----------------------------- **Module:** ``src/symexec/mod.rs`` Executes the CFG symbolically, treating all function inputs as unconstrained symbolic variables: - Maintains a **path condition** — accumulated constraints from branches - At each ``Branch`` terminator, forks into two states (then/else) - Only explores **feasible** paths (path condition is satisfiable) - Tracks a **path budget** (default: 1024 paths) to prevent explosion - **Array length invariants** are injected: all array length companions are constrained to be ≥ 0 - **Extern contracts** are strictly evaluated: if a precondition or postcondition expression cannot be evaluated, it is an error (not silently assumed true) - For each path reaching a ``Return``: - Checks if the path is *valid* via a well-definedness condition (reflexivity check on the result expression, including collections) - Encodes the verification query for the SMT solver **Path pruning:** If the path condition conjoined with a branch condition is unsatisfiable, that branch is not explored. Stage 8: SMT Encoding and Solving ------------------------------------ **Module:** ``src/smt/mod.rs`` For each completed path, translates symbolic expressions to Z3 formulas: - ``I64`` expressions → Z3 ``I64`` sort - ``Str`` expressions → Z3 ``String`` sort - ``Bool`` expressions → Z3 ``Bool`` sort - Arrays → Z3 ``Array`` sort with ``Select``/``Store`` - Hashes → Z3 ``Array`` sort The verification query for each path is: .. code-block:: text path_condition ∧ precondition ∧ ¬postcondition - If **UNSAT**: the postcondition holds on this path (verified) - If **SAT**: the solver model is a counterexample — concrete input values that violate the postcondition - If **UNKNOWN**: the solver timed out or hit a theory limitation **Safety constraints** are automatically injected: - Division and modulo: ``divisor ≠ 0`` - String length: all strings bounded to 32 characters - Substring bounds: ``offset ≥ 0``, ``offset ≤ length(str)``, ``len ≥ 0``