An Empirical Evaluation of Property-Based Testing in Python (SPLASH 2025 - OOPSLA)

Sun 12 - Sat 18 October 2025 Singapore

co-located with ICFP/SPLASH 2025

Who

Savitha Ravi, Michael Coblenz

Track

SPLASH 2025 OOPSLA

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+08:00) Perth.

Use conference time zone: (GMT+08:00) PerthSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 17 Oct 2025 13:45 - 14:00 at Orchid Plenary Ballroom - Testing 1

Abstract

Property-based testing (PBT) is a testing methodology with origins in the functional programming community. In recent years, PBT libraries have been developed for non-functional languages, including Python. However, to date, there is little evidence regarding how effective property-based tests are at finding bugs, and whether some kinds of property-based tests might be more effective than others. To gather this evidence, we conducted a corpus study of 426 Python programs that use Hypothesis, Python’s most popular library for PBT. We developed formal definitions for 12 categories of property-based test and implemented an intraprocedural static analysis that categorizes tests. Then, we evaluated the efficacy of test suites of 40 projects using mutation testing, and found that on average, each property-based test finds about 50 times as many mutations as the average unit test. We also identified the categories with the tests most effective at finding mutations, finding that tests that look for exceptions, that test inclusion in collections, and that check types are over 19 times more effective at finding mutations than other kinds of property-based tests. Finally, we conducted a parameter sweep study to assess the strength of property-based tests as a function of the number of random inputs generated, finding that 76% of mutations found were found within the first 20 inputs.

Savitha Ravi

UC San Diego

Michael Coblenz

University of California, San Diego

United States