Perlchecker: Symbolic Verification for Perl via SMT Solving
(AI Generated)
What if you could prove that your Perl function is correct — not just test it with a few examples, but mathematically verify that it satisfies its specification for all valid inputs? That’s what perlchecker does.
perlchecker is a Rust tool that formally verifies annotated Perl functions using symbolic execution and SMT solving (Z3). You write preconditions and postconditions as comments above your functions, and perlchecker either proves they hold universally or returns a concrete counterexample.
The Idea
Formal verification tools exist for languages like C (CBMC), Java (KeY), and Rust (Kani), but Perl has been left out of the party. Perl’s dynamic nature makes full verification intractable, but a restricted subset — with static types, bounded loops, and no heap — is perfectly amenable to SMT reasoning.
The key insight: if you constrain the language to what Z3 can reason about, you get push-button verification for free. No proof assistants, no manual invariants, just annotations and an automated solver.
How It Works
The verification pipeline:
Perl Source → Function Extraction → Annotation Parsing → PEG Parsing
→ Type Checking → SSA/IR Lowering → CFG Construction
→ Symbolic Execution → SMT Encoding (Z3) → Verified / Counterexample
- Parse — A PEG grammar (via
pest) parses the Perl subset into an AST - Type check — Static types are inferred from
# sig:annotations - Lower to IR — The AST is converted to SSA form for symbolic execution
- Build CFG — Control flow graph with bounded loop unrolling
- Symbolic execution — Explore all paths, collecting path conditions
- SMT encoding — Translate to Z3 assertions, check the negation of the postcondition
If Z3 finds a satisfying assignment to the negated postcondition, that assignment is a counterexample — a concrete input that violates the spec.
Annotations
Functions are annotated with three comment directives:
# sig: (Int, Int) -> Int
# pre: $y != 0
# post: $result == $x / $y
sub safe_division {
my ($x, $y) = @_;
return $x / $y;
}
# sig:declares the type signature (required)# pre:states preconditions — assumptions about inputs (optional)# post:states the postcondition — what must hold about$result(required)
Supported types include Int, Str, Array<Int>, Array<Str>, Hash<Str, Int>, and Hash<Str, Str>.
Counterexamples
When verification fails, perlchecker produces a concrete witness:
# sig: (Int) -> Int
# post: $result > $x
sub broken {
my ($x) = @_;
if ($x >= 0) {
return $x; # Bug: $x is not > $x
} else {
return $x + 1;
}
}
Running perlchecker check broken.pl will report that $x = 0 (or any
non-negative value) is a counterexample: the function returns $x itself, which
is not strictly greater than $x.
What Can It Verify?
The Perl subset is restricted but expressive enough for real logic:
- Arithmetic:
+,-,*,/,%,**, bitwise ops, shifts - Control flow:
if/elsif/else,unless, ternary,while,for,do-while,until - Loop control:
last,next, bounded unrolling - Data structures: arrays (push, pop, index), hashes (read, write, exists)
- Strings:
length,substr,index,contains,starts_with,ends_with,chr,ord - Functions: calls between verified functions,
# extern:contracts for external functions - Specification features: ghost variables, mid-function assertions, die-as-unreachable
Here’s a more complex example — a bounded accumulator with loops:
# sig: (Int) -> Int
# pre: $x >= 0 && $x <= 5
# post: $result == 0
sub countdown {
my ($x) = @_;
while ($x > 0) {
$x = $x - 1;
}
return $x;
}
perlchecker unrolls the loop up to the configured bound and proves that for any
$x in [0, 5], the while loop terminates with $x == 0.
Architecture
The implementation is about 5,000 lines of Rust, structured as:
| Module | Role |
|---|---|
parser/ |
PEG grammar + AST construction |
ast/ |
AST types and type checker |
ir/ |
SSA/IR lowering |
symexec/ |
Symbolic execution engine |
smt/ |
Z3 encoding and model extraction |
limits.rs |
Configurable safety bounds |
The safety limits prevent path explosion and infinite loops:
--max_loop_unroll 9 # loop unrolling depth
--max_paths 1024 # symbolic execution path limit
--solver_timeout_ms 5000 # Z3 timeout per query
Why Rust + Z3?
Rust gives us the performance to handle path-explosive symbolic execution without
garbage collection pauses. The z3 crate provides safe Rust bindings to
Microsoft’s Z3 solver — the same engine used in production verification tools
like KLEE, Dafny, and F*.
The PEG parser (via pest) makes grammar evolution painless: adding a new Perl
construct means adding a rule to the .pest file and a corresponding AST node.
Limitations and Tradeoffs
This is verification of a subset, not full Perl:
- No objects, closures, or references (scalar references recently added)
- No unbounded recursion (rejected at parse time)
- No dynamic
eval, regexes as executable code, or string interpolation - Loop unrolling is bounded — infinite loops produce “unknown” verdicts
- No heap modeling — arrays and hashes are modeled as SMT arrays
These restrictions are deliberate. Each one removes a source of undecidability while preserving the ability to express real algorithmic logic.
Trying It Out
git clone https://github.com/dhilst/perlchecker
cd perlchecker
cargo build --release
./target/release/perlchecker check examples/09_safe_division.pl
You’ll need libz3 installed (the z3 crate links against it).
How It Was Built: ChatGPT + Claude Code
perlchecker was not written by hand in a single sitting. It was developed through a collaboration between ChatGPT (for research and planning) and Claude Code (for implementation), orchestrated in iterative rounds.
Step 1: Research in ChatGPT
The project started with exploratory conversations in ChatGPT — figuring out
which libraries to use (pest for parsing, z3 crate for SMT), how to
structure the pipeline, and what subset of Perl would be tractable for symbolic
verification. This was pure problem exploration: no code yet, just deciding
the approach.
Step 2: Generate a Plan for Claude Code
Once the design was clear, I asked ChatGPT to produce a detailed implementation plan formatted as input for Claude Code. This became the bootstrap document that Claude Code would follow to scaffold the project — the parser, AST, IR, symbolic execution engine, and SMT backend, all wired together from day one.
Step 3: Bootstrap with Claude Code
I fed the plan into Claude Code and let it bootstrap the project: create the Cargo workspace, implement the initial pipeline end-to-end, and get the first trivial example verifying. This was the PLAN → IMPLEMENT → TEST cycle for the foundation.
Step 4: Iterative Expansion (~118 Rounds)
With the foundation in place, the real work was expanding the Perl subset. This happened in two phases:
Phase A (~50 rounds): I went back to ChatGPT to explore which Perl constructs could be added while staying SMT-tractable. Each round followed a protocol: propose a feature, validate it won’t blow up Z3, implement it in Claude Code, run the test suite. Fifty rounds of this got us from basic arithmetic to loops, arrays, hashes, string builtins, and compound expressions.
Phase B (rounds ~50–118): To push further, I ran an adversarial debate inside Claude Code using multi-agents — a CRITIC and a DEFENDER. The CRITIC’s job was to dismiss the project — pointing out real Perl patterns that perlchecker couldn’t handle, arguing the restrictions made it useless. The DEFENDER argued back, showing what could be verified and why bounded verification is a valid engineering tradeoff. The debate traces were recorded and mined for expansion ideas. Features like ghost variables, extern contracts, and mid-function assertions came directly from this adversarial process — the CRITIC would say “you can’t even express X” and the DEFENDER would figure out how to add it tractably.
Each expansion round followed the same meta-plan: propose one feature, check SMT tractability, implement, run QA, commit. The protocol ensured no round broke existing functionality and every new feature shipped with a verifiable example.
The Takeaway
The interesting pattern here is using LLMs in complementary roles:
- ChatGPT for open-ended research, library selection, and generating structured plans
- Claude Code for disciplined implementation within a defined protocol
- Adversarial debate (multi-agent in Claude Code) to stress-test the design and generate non-obvious feature ideas
The result is a tool that would have taken weeks of manual implementation, built incrementally through 118 automated rounds with human oversight at the steering level.
Conclusion
perlchecker shows that formal verification doesn’t have to be a heavyweight process reserved for safety-critical systems. By choosing a tractable subset and leveraging mature SMT infrastructure, we get push-button proofs for Perl functions — write annotations, run the tool, get either a proof or a counterexample. No PhDs required.
Source: github.com/dhilst/perlchecker