A benchmark for vericoding: formally verified program synthesis (Dafny 2026)

Sun 11 - Sat 17 January 2026 Rennes, France

Who

Sergiu Bursuc, Theodore Ehrenborg, Shaowei Lin, Lăcrămioara Aștefănoaei, Ionel Emilian Chiosa, Jure Kukovec, Alok Singh, Oliver Butterley, Adem Bizid, Quinn Dougherty, Miranda Zhao, Max Tan, Max Tegmark

Track

Dafny 2026

Time Zone

The program is currently displayed in (GMT+01:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+01:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 11 Jan 2026 16:18 - 16:36 at Horizons - LLMs in Auto-Active Verification

Abstract

We present a benchmark for vericoding, generation of formally verified code from formal specifications — in contrast to vibe coding, which generates potentially buggy code from natural language descriptions. Rapid AI progress has popularized LLM-based generation of computer programs from natural language descriptions. Unfortunately, the resulting code can be buggy, and traditional test case analysis can typically only demonstrate the presence and not the absence of bugs. Fortunately, rigorous correctness guarantees can be created via formal verification, by generating a machine-checkable proof that code meets its human-written specifications.

To support automation of formal verification and vericoding, we present an extensive suite of formal specifications for Lean, Rust/Verus and Dafny. We discuss how we assembled and curated these tasks, as well as vericoding results from straightforward LLM-prompting experiments

Link to Preprint

https://arxiv.org/pdf/2509.22908

File attachments

extended abstract (main.pdf)	442KiB

Sergiu Bursuc

Beneficial AI Foundation

Theodore Ehrenborg

Beneficial AI Foundation

Shaowei Lin

Beneficial AI Foundation

Lăcrămioara Aștefănoaei

Beneficial AI Foundation

Ionel Emilian Chiosa

MIT

Jure Kukovec

Beneficial AI Foundation

Alok Singh

Beneficial AI Foundation

Oliver Butterley

Beneficial AI Foundation

Adem Bizid

MIT

Quinn Dougherty

Beneficial AI Foundation

Miranda Zhao

MIT

Max Tan

MIT

Max Tegmark

Massachusetts Institute of Technology

Time Zone

The program is currently displayed in (GMT+01:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+01:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 11 Jan
Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

16:00 - 18:00	LLMs in Auto-Active VerificationDafny at Horizons

16:00 18m Talk		ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis Dafny Mantas Bakšys University of Cambridge, Stefan Zetzsche Amazon Web Services, Olivier Bouissou Amazon Web Services, Soonho Kong Amazon Web Services, Remi Delmas Amazon Web Services Pre-print
16:18 18m Talk		A benchmark for vericoding: formally verified program synthesis Dafny Sergiu Bursuc Beneficial AI Foundation, Theodore Ehrenborg Beneficial AI Foundation, Shaowei Lin Beneficial AI Foundation, Lăcrămioara Aștefănoaei Beneficial AI Foundation, Ionel Emilian Chiosa MIT, Jure Kukovec Beneficial AI Foundation, Alok Singh Beneficial AI Foundation, Oliver Butterley Beneficial AI Foundation, Adem Bizid MIT, Quinn Dougherty Beneficial AI Foundation, Miranda Zhao MIT, Max Tan MIT, Max Tegmark Massachusetts Institute of Technology Pre-print File Attached
16:36 18m Talk		DafnyPro: LLM-Assisted Automated Verification for Dafny Programs Dafny Debangshu Banerjee UIUC, Olivier Bouissou Amazon Web Services, Stefan Zetzsche Amazon Web Services Pre-print
16:54 18m Talk		MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification Dafny Mantas Bakšys University of Cambridge, Stefan Zetzsche Amazon Web Services, Olivier Bouissou Amazon Web Services Pre-print
17:12 18m Talk		Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs Dafny Valentina Wu Faculdade de Engenharia, Universidade do Porto, Alexandra Mendes Faculty of Engineering, University of Porto & INESC TEC, Alexandre Abreu University of Porto & INESC TEC
17:30 18m Talk		Toward Automated, Contamination-free Dafny Benchmark Generation Dafny Changjie Wang KTH Royal Institute of Technology, Mariano Scazzariello RISE Research Institutes of Sweden, Dejan Kostic KTH Royal Institute of Technology, Marco Chiesa KTH Royal Institute of Technology Pre-print
17:48 12m Day closing		Day closing Dafny Stefan Zetzsche Amazon Web Services, Yannick Moy ANSSI