Statically Analyzing the Dataflow of R Programs
This program is tentative and subject to change.
The R programming language is primarily designed for statistical computing and mostly used by researchers without a background in computer science. R provides a wide range of dynamic features and peculiarities that are difficult to analyze statically like dynamic scoping and lazy evaluation with dynamic side-effects. At the same time, the R ecosystem lacks sophisticated analysis tools that support researchers in understanding and improving their code. In this paper, we present a novel static dataflow analysis framework for the R programming language that is capable of handling the dynamic nature of R programs and produces the dataflow graph of given R programs. This graph can be essential in a range of analyses, including program slicing, which we implement as a proof of concept. The core analysis works as a stateful fold over a normalized version of the abstract syntax tree of the R program, which tracks (re-)definitions, values, function calls, side effects, external files, and a dynamic control flow to produce one dataflow graph per program. We evaluate the correctness of our analysis using output equivalence testing on a manually curated dataset of 779 sensible slicing points from executable real-world R scripts. Additionally, we use a set of systematic test cases based on the capabilities of the R language and the implementation of the R interpreter and measure the runtime as well as the memory consumption on a set of 4,230 real-world R scripts and 20,815 packages available on R’s package manager CRAN. Furthermore, we evaluate the recall of our program slicer, its accuracy using shrinking, and its improvement over the state of the art. We correctly analyze almost all programs in our equivalence test suite, preserving the identical output for 99.7 % of the manually curated slicing points. On average, we require 153.2 ms to analyze the dataflow of a given file and around 211.6 kB for the dataflow graph. This shows that our analysis is capable of analyzing real-world sources quickly and correctly. Our slicer achieves an average reduction of 84.8 % of tokens indicating its potential to improve program comprehension.
This program is tentative and subject to change.
Fri 17 OctDisplayed time zone: Perth change
10:30 - 12:15 | |||
10:30 15mTalk | Efficient Algorithms for the Uniform Tokenization Problem OOPSLA | ||
10:45 15mTalk | REPTILE: Performant Tiling of Recurrences OOPSLA Muhammad Usman Tariq Stanford University, Shiv Sundram Stanford University, Fredrik Kjolstad Stanford University | ||
11:00 15mTalk | SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention OOPSLA Ahan Gupta University of Illinois at Urbana-Champaign, Yueming Yuan University of Illinois Urbana-Champaign, Devansh Jain University of Illinois at Urbana-Champaign, Yuhao Ge University of Illinois at Urbana-Champaign, David Aponte Microsoft, Yanqi Zhou Google, Charith Mendis University of Illinois at Urbana-Champaign | ||
11:15 15mTalk | Statically Analyzing the Dataflow of R Programs OOPSLA | ||
11:30 15mTalk | Static Inference of Regular Grammars for Ad Hoc Parsers OOPSLA Pre-print | ||
11:45 15mTalk | Syntactic Completions with Material Obligations OOPSLA David Moon University of Michigan, Andrew Blinn University of Michigan, Thomas J. Porter University of Michigan, Cyrus Omar University of Michigan DOI Pre-print |