Vectorized probabilistic programming languages (PPLs) support high-performance, data-parallel programmable inference. To expose high-level programming models to users, vectorization in these systems hides the concerns of memory management and parallel threading, resulting in black-boxed parallel compilation and restrictions on custom optimizations. We present a design for GraPPL—a GPU-programmable PPL—which exposes high-level features, including traces and probabilistic generative function interfaces, while enabling GPU-programmable control over low-level runtime and memory profiles. GraPPL allows models to be expressed as sequential C++ functions and/or vectorized CUDA GPU kernels which support random choice expressions; GraPPL’s template-specialized interpreters transform these expressions into various probabilistic semantics, while automatically maintaining coherent execution traces of the probabilistic program across CPU and GPU execution contexts. We demonstrate GraPPL’s efficiency in an example on blocks Gibbs sampling on factor graphs, which shows a 3× gain over JAX-based implementations with equivalent levels of automation and modularity.