A Modular Static Cost Analysis for GPU Warp-Level Parallelism
Graphics Processing Units (GPUs) are the accelerator of choice for performance-critical applications, yet optimizing for performance requires mastery of the complex interactions between its memory architecture and its execution model. Existing static analysis tools for GPU kernels either identify performance bugs without quantifying costs or cannot handle thread-divergent control flow, leading to significant over-approximations. We present the first static relational-cost analysis for GPU warp-level parallelism that can give exact bounds even in the presence of thread divergence. Our analysis is general and flexible, as it is parametric on the resource metric (uncoalesced accesses, bank conflicts) and on the cost relation (=, ≤, ≥). We establish a soundness theorem for our technique, provide mechanized proofs in Rocq and implement our theory in a tool called Pico. In a reproducibility experiment, Pico produced the tightest bounds in every input, outperforming the state-of-the-art tool RaCUDA in 10 kernels (1.7×better), while RaCUDA produced 4 incorrect bounds and crashed on 2 kernels. In an experiment to measure the accuracy of Pico, we studied the impact of thread-divergence in control-flow in a dataset of 226 kernels. We found that at least 75.3% of conditionals and 85.4% of loops can be captured exactly, without introducing approximation.
Wed 14 JanDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
10:30 - 12:10 | Program AnalysisPOPL at Dortoirs Chair(s): Xavier Rival Inria - CNRS - Ecole Normale Superieure de Paris - PSL University | ||
10:30 25mTalk | A Modular Static Cost Analysis for GPU Warp-Level Parallelism POPL Gregory Blike University of Massachusetts at Boston, Hannah Zicarelli University of Massachusetts at Boston, Udaya Sathiyamoorthy University of Massachusetts at Boston, Julien Lange Royal Holloway University of London, Tiago Cogumbreiro University of Massachusetts at Boston DOI | ||
10:55 25mTalk | ChiSA: Static Analysis for Lightweight Chisel Verification POPL Jiacai Cui Nanjing University, Qinlin Chen Nanjing University, Zhongsheng Zhan Nanjing University, Tian Tan Nanjing University, Yue Li Nanjing University DOI | ||
11:20 25mTalk | Miri: Practical Undefined Behavior Detection for Rust POPL Ralf Jung ETH Zurich, Benjamin Kimock Lansweeper NV, Christian Poveda Unaffiliated, Eduardo Sánchez Muñoz Unaffiliated, Oli Scherer Unaffiliated, Qian (Andy) Wang ETH Zurich and Imperial College London DOI | ||
11:45 25mTalk | Piecewise Analysis of Probabilistic Programs via k-Induction POPL Tengshun Yang Institute of Software at Chinese Academy of Sciences, Shenghua Feng Institute of Software at Chinese Academy of Sciences, Hongfei Fu Shanghai Jiao Tong University, Naijun Zhan Peking University; Zhongguancun Lab, Jingyu Ke Shanghai Jiao Tong University, Shiyang Wu Shanghai Jiao Tong University DOI | ||