A benchmark for vericoding: formally verified program synthesis
This program is tentative and subject to change.
We present a benchmark for vericoding, generation of formally verified code from formal specifications — in contrast to vibe coding, which generates potentially buggy code from natural language descriptions. Rapid AI progress has popularized LLM-based generation of computer programs from natural language descriptions. Unfortunately, the resulting code can be buggy, and traditional test case analysis can typically only demonstrate the presence and not the absence of bugs. Fortunately, rigorous correctness guarantees can be created via formal verification, by generating a machine-checkable proof that code meets its human-written specifications.
To support automation of formal verification and vericoding, we present an extensive suite of formal specifications for Lean, Rust/Verus and Dafny. We discuss how we assembled and curated these tasks, as well as vericoding results from straightforward LLM-prompting experiments
| extended abstract (main.pdf) | 442KiB |
This program is tentative and subject to change.
Sun 11 JanDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
16:00 - 18:00 | |||
16:00 18mTalk | ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis Dafny Mantas Bakšys University of Cambridge, Stefan Zetzsche Amazon Web Services, Olivier Bouissou Amazon Web Services, Soonho Kong Amazon Web Services, Remi Delmas Amazon Web Services Pre-print | ||
16:18 18mTalk | A benchmark for vericoding: formally verified program synthesis Dafny Sergiu Bursuc Beneficial AI Foundation, Theodore Ehrenborg Beneficial AI Foundation, Shaowei Lin Beneficial AI Foundation, Lăcrămioara Aștefănoaei Beneficial AI Foundation, Ionel Emilian Chiosa MIT, Jure Kukovec Beneficial AI Foundation, Alok Singh Beneficial AI Foundation, Oliver Butterley Beneficial AI Foundation, Adem Bizid MIT, Quinn Dougherty Beneficial AI Foundation, Miranda Zhao MIT, Max Tan MIT, Max Tegmark Massachusetts Institute of Technology Pre-print File Attached | ||
16:36 18mTalk | DafnyPro: LLM-Assisted Automated Verification for Dafny Programs Dafny | ||
16:54 18mTalk | MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification Dafny Mantas Bakšys University of Cambridge, Stefan Zetzsche Amazon Web Services, Olivier Bouissou Amazon Web Services Pre-print | ||
17:12 18mTalk | Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs Dafny Valentina Wu Faculdade de Engenharia, Universidade do Porto, Alexandra Mendes Faculty of Engineering, University of Porto & INESC TEC, Alexandre Abreu INESC TEC, Faculdade de Engenharia, Universidade do Porto | ||
17:30 18mTalk | Toward Automated, Contamination-free Dafny Benchmark Generation Dafny Changjie Wang KTH Royal Institute of Technology, Mariano Scazzariello RISE Research Institutes of Sweden, Dejan Kostic KTH Royal Institute of Technology, Marco Chiesa KTH Royal Institute of Technology | ||
17:48 12mDay closing | Day closing Dafny | ||