Correct Black-Box Monitors for Distributed Deadlock Detection: Formalisation and Implementation (SPLASH 2025 - OOPSLA)

Sun 12 - Sat 18 October 2025 Singapore

co-located with ICFP/SPLASH 2025

Who

Radosław Jan Rowicki, Adrian Francalanza, Alceste Scalas

Track

SPLASH 2025 OOPSLA

Time Zone

The program is currently displayed in (GMT+08:00) Perth.

Use conference time zone: (GMT+08:00) PerthSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 16 Oct 2025 14:15 - 14:30 at Orchid West - Reasoning Chair(s): Adam Chlipala

Abstract

Many software applications rely on concurrent and distributed (micro)services that interact via message- passing and various forms of remote procedure calls (RPC). As these systems organically evolve and grow in scale and complexity, the risk of introducing deadlocks increases and their impact may worsen: even if only a few services deadlock, many other services may block while awaiting responses from the deadlocked ones. As a result, the “core” of the deadlock can be obfuscated by its consequences on the rest of the system, and diagnosing and fixing the problem can be challenging.

In this work we tackle the challenge by proposing distributed black-box monitors that are deployed alongside each service and can detect deadlocks by only observing the incoming and outgoing messages. We present a formal model that captures popular RPC-based application styles (e.g., gen_servers in Erlang/OTP), and we formalise a distributed black-box monitoring algorithm that we prove sound and complete (i.e., identifies deadlocked services without false positives nor false negatives). We also implement our result in a tool called DDMon for the monitoring of Erlang/OTP applications, and we evaluate its performance.

This is the first work that formalises, proves the correctness, and implements distributed black-box monitors for deadlock detection. Our formalisation and theoretical results are mechanised in Coq. DDMon is the companion artifact of this paper.

Link to Preprint

https://arxiv.org/abs/2508.14851

DOI

https://doi.org/10.1145/3763069

Radosław Jan Rowicki

Technical University of Denmark

Denmark

Adrian Francalanza

University of Malta

Malta

Alceste Scalas