A Modular Static Cost Analysis for GPU Warp-Level Parallelism (POPL 2026 - POPL Research Papers)

Sun 11 - Sat 17 January 2026 Rennes, France

Who

Gregory Blike, Hannah Zicarelli, Udaya Sathiyamoorthy, Julien Lange, Tiago Cogumbreiro

Track

POPL 2026

Time Zone

The program is currently displayed in (GMT+01:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+01:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 14 Jan 2026 10:30 - 10:55 at Dortoirs - Program Analysis Chair(s): Xavier Rival

Abstract

Graphics Processing Units (GPUs) are the accelerator of choice for performance-critical applications, yet optimizing for performance requires mastery of the complex interactions between its memory architecture and its execution model. Existing static analysis tools for GPU kernels either identify performance bugs without quantifying costs or cannot handle thread-divergent control flow, leading to significant over-approximations. We present the first static relational-cost analysis for GPU warp-level parallelism that can give exact bounds even in the presence of thread divergence. Our analysis is general and flexible, as it is parametric on the resource metric (uncoalesced accesses, bank conflicts) and on the cost relation (=, ≤, ≥). We establish a soundness theorem for our technique, provide mechanized proofs in Rocq and implement our theory in a tool called Pico. In a reproducibility experiment, Pico produced the tightest bounds in every input, outperforming the state-of-the-art tool RaCUDA in 10 kernels (1.7×better), while RaCUDA produced 4 incorrect bounds and crashed on 2 kernels. In an experiment to measure the accuracy of Pico, we studied the impact of thread-divergence in control-flow in a dataset of 226 kernels. We found that at least 75.3% of conditionals and 85.4% of loops can be captured exactly, without introducing approximation.

DOI

https://doi.org/10.1145/3776693

Gregory Blike

University of Massachusetts at Boston

United States

Hannah Zicarelli

University of Massachusetts at Boston

United States

Udaya Sathiyamoorthy

University of Massachusetts at Boston

United States

Julien Lange

Royal Holloway University of London

United Kingdom

Tiago Cogumbreiro

University of Massachusetts at Boston

United States

Time Zone

The program is currently displayed in (GMT+01:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+01:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 14 Jan
Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

10:30 - 12:10	Program AnalysisPOPL at Dortoirs Chair(s): Xavier Rival Inria - CNRS - Ecole Normale Superieure de Paris - PSL University

10:30 25m Talk		A Modular Static Cost Analysis for GPU Warp-Level Parallelism POPL Gregory Blike University of Massachusetts at Boston, Hannah Zicarelli University of Massachusetts at Boston, Udaya Sathiyamoorthy University of Massachusetts at Boston, Julien Lange Royal Holloway University of London, Tiago Cogumbreiro University of Massachusetts at Boston DOI
10:55 25m Talk		ChiSA: Static Analysis for Lightweight Chisel Verification POPL Jiacai Cui Nanjing University, Qinlin Chen Nanjing University, Zhongsheng Zhan Nanjing University, Tian Tan Nanjing University, Yue Li Nanjing University DOI
11:20 25m Talk		Miri: Practical Undefined Behavior Detection for Rust POPL Ralf Jung ETH Zurich, Benjamin Kimock Lansweeper NV, Christian Poveda Unaffiliated, Eduardo Sánchez Muñoz Unaffiliated, Oli Scherer Unaffiliated, Qian (Andy) Wang ETH Zurich and Imperial College London DOI
11:45 25m Talk		Piecewise Analysis of Probabilistic Programs via k-Induction POPL Tengshun Yang Institute of Software at Chinese Academy of Sciences, Shenghua Feng Institute of Software at Chinese Academy of Sciences, Hongfei Fu Shanghai Jiao Tong University, Naijun Zhan Peking University; Zhongguancun Lab, Jingyu Ke Shanghai Jiao Tong University, Shiyang Wu Shanghai Jiao Tong University DOI