SRC Grand Finalists 2024

GRADUATE CATEGORY

First Place:
Stefan Klessinger - University of Passau
SIGMOD/PODS 2023
Title of Submission: Capturing Data-inherent Dependencies in JSON Schema Extraction

Second Place:
Zhewen Pan - University of Wisconsin-Madison
SIGMICRO 2023
Title of Submission: The XOR Cache: A Catalyst for Compression

Third Place:
Chengjie Lu - Simula Research Laboratory
ICSE 2023
Title of Submission: Test Scenario Generation for Autonomous Driving Systems with Reinforcement Learning

UNDERGRADUATE CATEGORY

First Place:
Jakub Bachurski - University of Cambridge
POPL 2024
Title of Submission: Embedding Pointful Array Programming in Python

Second Place:
Amar Shah - University of California,Berkeley
PLDI 2023
Title of Submission: An Eager SMT Solver for Algebraic Data Type Queries

Third Place:
Rhett Olson - University of Minnesota
SIGSPATIAL 2023
Title of Submission: An Automatic Approach to Finding Geographic Name Changes on Historical Maps

ACM Student Research Competition

The ACM Student Research Competition is an internationally recognized venue enabling undergraduate and graduate students to experience the research world, share research results and exchange ideas, rub shoulders with academic and industry luminaries, understand the practical applications of their research and gain recognition.

2024 SRC Winner, 1st Place, Graduate Category

Stefan Klessinger, University of Passau

"Capturing Data-inherent Dependencies in JSON Schema Extraction" (SIGMOD/PODS 2023)

1 PROBLEM AND MOTIVATION JSON is a popular semi-structured data exchange format widely used across various technological domains. It describes data as keyvalue pairs, often referred to as properties. JSON is essential in web applications for data transmission and in document stores such as MongoDB or Couchbase. Even relational database management systems such as PostgreSQL and MySQL support JSON data types. A sample JSON instance from log data generated in the gameWorld of Warcraft [4] is shown in Fig. 1a. It describes two kinds of events, discriminated by the value of property type: depending on the value of type, either properties resourceChange and resourceChangeType or property overheal are present. Although JSON instances are selfdescribing, they may be accompanied by an explicitly declared schema, commonly encoded in the JSON Schema language. JSON Schema [1] allows to describe and constrain JSON data. It is the de-facto standard for schema description in JSON and adopted across many different use cases. Schemastore.org [2] lists over 800 curated and publicly available schemas, providing specifications ranging from configuration files, workflows, and pipelines, to components of content management systems, and video games. JSONSchema is supported by a wide range of tools and libraries in many different programming languages. It allows data analysts to define and enforce constraints on the data, which aids in identifying and correcting errors in JSON data sets. The conformity of a JSON instance to a JSON Schema can be analyzed with a wide range of validation tools. This improves the reliability and quality of data. Furthermore, the schema provides a documentation of the data structure to data consumers. [Read more]

2024 SRC Winner, 2nd Place, Graduate Category

Zhewen Pan, University of Wisconsin-Madison

"The XOR Cache: A Catalyst for Compression" (SIGMICRO 2023)

Abstract—Modern computing systems allocate significant amounts of resources for caching, especially for the last level cache (LLC). We observe that there is untapped potential for compression by leveraging redundancy due to private caching and inclusion that are common in today’s systems. We introduce the XOR Cache to exploit this redundancy via XOR compression. Unlike conventional cache architectures, XOR Cache stores bitwise XOR values of line pairs, halving the number of stored lines via a form of inter-line compression. When combined with other compression schemes, XOR Cache can further boost intra-line compression ratios by XORing lines of similar value, reducing the entropy of the data prior to compression. Evaluation results show that the XOR Cache can save LLC area by 1.32× and power by 1.67× at a cost of 3.58% performance overhead compared to a 2× larger uncompressed cache. I. PROBLEM AND MOTIVATION Today’s computing systems dedicate tens to hundreds of megabytes of SRAM to caching, which contributes to a significant portion of die area, e.g, AMD’s Zen3’s 32 MB L3 cache occupies around 40% of die area [8]. Additionally, the power consumption of these systems also surges, further straining the overall energy efficiency. The demand for resources in thecache hierarchy will continue to increase due to the growthin dataset size and memory wall problem. However, large caches do not necessarily translate into better performance despite having more capacity; additionally, they come at the cost of high access latency, usually in tens of cycles. Given their resource-demanding nature, these factors combined maketraditional large caches inefficient for future systems.) [Read more]

2024 SRC Winner 3rd Place, Graduate Category

Chengjie Lu, Simula Research Laboratory

"Test Scenario Generation for Autonomous Driving Systems with Reinforcement Learning" (ICSE 2023)

Abstract—We have seen rapid development of autonomous driving systems (ADSs) in recent years. These systems place high requirements on safety and reliability for their mass adoption, and ADS testing is one crucial approach to ensure their success.However, it is impossible to test all the scenarios due to theinherent complexity and uncertainty of ADSs and their driving tasks. Besides, the operating environment of ADSs is dynamic, continuously evolving, and full of uncertainties, which requires a testing approach adaptive to the environment. Reinforcement learning (RL) has shown great potential in various complex tasks requiring constant adaptation to dynamic environments. To this end, this paper presents RLTester, a novel ADS testing approach, that adopts reinforcement learning (RL) to learn critical environment configurations (i.e., test scenarios) of the operating environment of ADSs that could reveal their unsafe behaviors. To generate diverse and critical test scenarios, we defined 142 environment configuration actions and adopted the Time-To-Collision metric to construct the reward function. Our evaluation shows that RLTester discovered a total of 256 collisions, of which 192 are unique, and took an average of 11.59 seconds for each collision. Further, RLTester is effective in generating more diverse test scenarios compared to a stateof- the-art approach, DeepCollision. We also introduce an opensource driving scenario dataset, DeepScenario, which consists of over 30K driving scenarios. Index Terms—Autonomous Driving System Testing, Critical Scenario, Reinforcement Learning, Scenario Dataset [Read more]

2024 SRC Winner, 1st Place, Undergraduate Category

Jakub Bachurski, University of Cambridge

"Embedding Pointful Array Programming in Python" (POPL 2024)

Abstract Multidimensional array operations are ubiquitous in machine learning. The dominant ecosystem in this field is centred around Python and NumPy, where programs are expressed with elaborate and error-prone calls in the point-free array programming model. Such code is difficult to statically analyse and maintain. Other array programming paradigms offer to solve these problems, in particular the pointful style of Dex. However, only limited approaches – based on Einstein summation – have been embedded in Python. We introduce Ein, a pointful array DSL embedded in Python. We also describe a novel connection between pointful and point-free array programming. Thanks to this connection, Ein generates performant, optimised and type-safe calls to NumPy. Ein reconciles the readability of comprehension-style definitions with the capabilities of existing array frameworks.[Read more]

2024 SRC Winner, 2nd Place, Undergraduate Category

Amar Shah, University of California, Berkeley

"An Eager SMT Solver for Algebraic Data Type Queries" (PLDI 2023)

1 Introduction and Motivation Algebraic Data Types (ADTs) are a programming construct classically found in functional programming languages but are increasingly found in all kinds of modern languages. ADTs are a convenient generalization of structures like enumerated types, lists, and binary trees. A natural problem is the satisfiability of formulas over the theory ADT. This has applications in modelling languages [Milner 1978], proof assistants [Gonthier 2005] and program verification [Bjørner et al. 2013]. While the need to reason about ADTs have grown, the techniques to do so have not. Satisfiability Modulo Theory (SMT) extends the Boolean Satisfiability (SAT) problem to include additional theories, in this case ADT. Most SMT solvers for ADT apply the same lazy approach that use a theory solver [Oppen 1980] in a loop with a SAT solver. We propose a fundamentally different approach: an eager solver for ADT satisfiability modulo theory (SMT) queries via a quantifier free reduction to Equality and Uninterpreted Functions (EUF) SMT queries. [Read more]

2024 SRC Winner, 3rd Place, Undergraduate Category

Rhett Olson, University of Minnesota

"An Automatic Approach to Finding Geographic Name Changes on Historical Maps" (SIGSPATIAL 2023)

ABSTRACT Changes in place names offer insight into regions’ culture, politics, and geographical characteristics. This paper proposes an automatic approach to retrieve time-sequenced maps that show place name changes on many maps from across history. The proposed approach utilizes gazetteers (i.e., indexes for geographic names) to retrieve a place’s coordinates and name variants and searches for text labels from maps matching those coordinates and names. To search for multiple-word place names, the approach constructs minimum spanning trees from an edge cost function to link text labels into phrases. We present two experiments: one to evaluate the effectiveness of the minimum spanning tree approach at linking multiple word place names, and the other to evaluate the maps retrieved by the query approach. The resulting maps give rich visual insight into how place names change over time and could facilitate scholarly investigation of geographic name changes at a large scale. [Read more]