Static Inference of Regular Grammars for Ad Hoc Parsers
This program is tentative and subject to change.
Parsing, the process of structuring a linear representation according to a given grammar, is a fundamental activity in software engineering. While formal language theory has provided theoretical foundations for parsing, the most common kind of parsers used in practice are written \emph{ad hoc}. They use common string operations for parsing, without explicitly defining an input grammar. These ad hoc parsers are often intertwined with application logic and can result in subtle semantic bugs. Grammars, which are complete formal descriptions of input languages, can enhance program comprehension, facilitate testing and debugging, and provide formal guarantees for parsing code. But writing grammars—e.g., in the form of regular expressions—can be tedious and error-prone. Inspired by the success of type inference in programming languages, we propose a general approach for static inference of regular input string grammars from unannotated ad hoc parser source code. We approach this problem as an intersection of refinement typing and abstract interpretation. We use refinement type inference to synthesize logical and string constraints that represent regular parsing operations, which we then interpret with an abstract semantics into regular expressions. Our contributions include a core calculus $\lambda_\Sigma$ for representing ad hoc parsers, a formulation of (regular) grammar inference as refinement inference, an abstract interpretation framework for solving refinement variables, and a set of abstract domains for efficiently representing the kinds of numeric and string values encountered during regular ad hoc parsing. We implement our approach in the PANINI system and evaluate its efficacy on a benchmark of 204 Python ad hoc parsers.
This program is tentative and subject to change.
Fri 17 OctDisplayed time zone: Perth change
10:30 - 12:15 | |||
10:30 15mTalk | Efficient Algorithms for the Uniform Tokenization Problem OOPSLA | ||
10:45 15mTalk | REPTILE: Performant Tiling of Recurrences OOPSLA Muhammad Usman Tariq Stanford University, Shiv Sundram Stanford University, Fredrik Kjolstad Stanford University | ||
11:00 15mTalk | SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention OOPSLA Ahan Gupta University of Illinois at Urbana-Champaign, Yueming Yuan University of Illinois Urbana-Champaign, Devansh Jain University of Illinois at Urbana-Champaign, Yuhao Ge University of Illinois at Urbana-Champaign, David Aponte Microsoft, Yanqi Zhou Google, Charith Mendis University of Illinois at Urbana-Champaign | ||
11:15 15mTalk | Statically Analyzing the Dataflow of R Programs OOPSLA | ||
11:30 15mTalk | Static Inference of Regular Grammars for Ad Hoc Parsers OOPSLA Pre-print | ||
11:45 15mTalk | Syntactic Completions with Material Obligations OOPSLA David Moon University of Michigan, Andrew Blinn University of Michigan, Thomas J. Porter University of Michigan, Cyrus Omar University of Michigan DOI Pre-print |