Work Packets: A New Abstraction for GC Software Engineering, Optimization, and Innovation
Garbage collection (GC) implementations must meet efficiency and maintainability requirements, which are often perceived to be at odds. Moreover, the desire for efficiency is often at odds with agility, undermining rapid development and innovation, which harms longer-term performance aspirations. Prior GC implementations struggle to: i) maximize efficiency, parallelism, and hardware utilization, while ii) correctly and elegantly implementing complex algorithm-specific optimizations and scheduling constraints. This struggle is reflected in today’s monolithic implementations with simple phase-based synchronization.
This paper presents a new design for GC implementations that emphasizes agility and efficiency. The design simplifies and unifies all GC tasks into work packets which define: i) work items, ii) kernels that process them, and iii) scheduling constraints.
Our simple insights are that GC implementations are high-level algorithms that orchestrate vast numbers of performance-critical work items, and that execution is dominated by a few very small, heavily executed kernels. Work packets comprise groups of like work items, such as the scanning of a thread’s stack or the tracing of a single object in a multi-million object heap. The kernel attached to a packet specifies how to process items within the packet, such as how to scan a stack, or how to trace an object. The scheduling constraints express dependencies, e.g. all mutators must stop before copying any objects. Fully parallel activities, such as scanning roots and performing a transitive closure, proceed with little synchronization. The implementation of a GC algorithm reduces to declaring required work packets, their kernels, and dependencies. The execution model operates transparently of GC algorithms and work packet type. We broaden the scope of work-stealing, applying it to any type of GC work and introduce a novel two-tier work-stealing algorithm to further optimize parallelism at fine granularity.
We show the software engineering benefits of this design via eight collectors that use 23 common work packet implementations in the MMTk GC framework. We use the LXR collector to show that the work packet abstraction supports innovation and high performance, outperforming the latest (OpenJDK 24) instance of the highly-tuned, state-of-the-art G1 garbage collector. We thus demonstrate that work packets achieve high performance, while simplifying GC implementation, making them inherently easier to optimize and verify.
Sat 18 OctDisplayed time zone: Perth change
16:00 - 17:30 | AbstractionOOPSLA at Orchid East Chair(s): Steve Blackburn Google and Australian National University | ||
16:00 15mTalk | Abstract Interpretation of Temporal Safety Effects of Higher Order Programs OOPSLA Mihai Nicola Stevens Institute of Technology, Chaitanya Agarwal New York University, Eric Koskinen Stevens Institute of Technology, Thomas Wies New York University | ||
16:15 15mTalk | A Hoare Logic For Symmetry Properties OOPSLA | ||
16:30 15mTalk | Efficient Abstract Interpretation via Selective Widening OOPSLA | ||
16:45 15mTalk | Encode the $\forall\exists$ Relational Hoare Logic into Standard Hoare Logic OOPSLA Shushu Wu Shanghai Jiao Tong University, Xiwei Wu Shanghai Jiao Tong University, Qinxiang Cao Shanghai Jiao Tong University | ||
17:00 15mTalk | Structural Abstraction and Refinement for Probabilistic Programs OOPSLA Guanyan Li University of Oxford, Juanen Li Beijing Normal University, Zhilei Han Tsinghua University, Peixin Wang East China Normal University, Hongfei Fu Shanghai Jiao Tong University, Fei He Tsinghua University | ||
17:15 15mTalk | Work Packets: A New Abstraction for GC Software Engineering, Optimization, and Innovation OOPSLA Wenyu Zhao Australian National University, Stephen M. Blackburn Google; Australian National University, Kathryn S McKinley Google | ||