Static Inference of Regular Grammars for Ad Hoc Parsers (SPLASH 2025 - OOPSLA)

Sun 12 - Sat 18 October 2025 Singapore

co-located with ICFP/SPLASH 2025

Who

Michael Schröder, Jürgen Cito

Track

SPLASH 2025 OOPSLA

Time Zone

The program is currently displayed in (GMT+08:00) Perth.

Use conference time zone: (GMT+08:00) PerthSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 17 Oct 2025 11:30 - 11:45 at Orchid Plenary Ballroom - Compilation 3 Chair(s): Hidehiko Masuhara

Abstract

Parsing, the process of structuring a linear representation according to a given grammar, is a fundamental activity in software engineering. While formal language theory has provided theoretical foundations for parsing, the most common kind of parsers used in practice are written \emph{ad hoc}. They use common string operations for parsing, without explicitly defining an input grammar. These ad hoc parsers are often intertwined with application logic and can result in subtle semantic bugs. Grammars, which are complete formal descriptions of input languages, can enhance program comprehension, facilitate testing and debugging, and provide formal guarantees for parsing code. But writing grammars—e.g., in the form of regular expressions—can be tedious and error-prone. Inspired by the success of type inference in programming languages, we propose a general approach for static inference of regular input string grammars from unannotated ad hoc parser source code. We approach this problem as an intersection of refinement typing and abstract interpretation. We use refinement type inference to synthesize logical and string constraints that represent regular parsing operations, which we then interpret with an abstract semantics into regular expressions. Our contributions include a core calculus $\lambda_\Sigma$ for representing ad hoc parsers, a formulation of (regular) grammar inference as refinement inference, an abstract interpretation framework for solving refinement variables, and a set of abstract domains for efficiently representing the kinds of numeric and string values encountered during regular ad hoc parsing. We implement our approach in the PANINI system and evaluate its efficacy on a benchmark of 204 Python ad hoc parsers.

Link to Preprint

https://mcschroeder.github.io/files/panini_preprint.pdf

DOI

https://doi.org/10.1145/3763054

Michael Schröder

TU Wien

Austria

Jürgen Cito