We present a design for a probabilistic programming system that enables massively parallel Gibbs sampling on GPUs through workload-aware scheduling of hierarchical parallelism. Workload characteristics of the Gibbs sampler dictate where parallelism is most effective: across variables, within variable updates, or hierarchically on both levels. Hence, there are no one-size-fits-all solutions for efficient parallel Gibbs sampling. Our system addresses this challenge through two key components: a factor-based intermediate representation that enables static analysis to detect parallelism over variables via graph coloring, and reified schedule objects that allow both manual control and automatic performance tuning of hierarchical parallelism. Preliminary results demonstrate orders of magnitude performance improvements for popular Bayesian networks, Ising models, and hidden Markov models. By combining static analysis with dynamic auto-tuning, our design significantly reduces the development cost of efficient GPU-accelerated Gibbs samplers while maintaining high-level abstractions for model specification.