POPL 2026 (series) / Dafny 2026 (series) / Dafny 2026 /
DafnyPro: LLM-Assisted Automated Verification for Dafny Programs
Large language models (LLMs) have demonstrated significant potential for automating formal program verification, particularly for inductive loop invariant generation. We present DafnyPro, a novel inference-time framework for generating verification annotations in Dafny. Using Claude Sonnet 3.5, DafnyPro achieves 86% correct proofs on the challenging DafnyBench benchmark, a 16 percentage point improvement over the previous state of the art with the same model. We also introduce a suite of smaller, cost-efficient fine-tuned models: our 14B model achieves 70% correct proofs and our 7B model reaches 68% on DafnyBench. These results demonstrate that high verification accuracy can be maintained even with compact models.