cs.PL
49 postsarXiv:2501.12093v1 Announce Type: new Abstract: Static analysis is an essential component of many modern software development tools. Unfortunately, the ever-increasing complexity of static analyzers makes their coding error-prone. Even analysis tools based on rigorous mathematical techniques, such as abstract interpretation, are not immune to bugs. Ensuring the correctness and reliability of software analyzers is critical if they are to be inserted in production compilers and development environments. While compiler validation has seen notable success, formal validation of static analysis tools remains relatively unexplored. In this paper, we propose a method for testing abstract interpretation-based static analyzers. Broadly, it consists in checking, over a suite of benchmarks, that the properties inferred statically are satisfied dynamically. The main advantage of our approach lies in its simplicity, which stems directly from framing it within the Ciao assertion-based validation framework, and its blended static/dynamic assertion checking approach. We demonstrate that in this setting, the analysis can be tested with little effort by combining the following components already present in the framework: 1) the static analyzer, which outputs its results as the original program source with assertions interspersed; 2) the assertion run-time checking mechanism, which instruments a program to ensure that no assertion is violated at run time; 3) the random test case generator, which generates random test cases satisfying the properties present in assertion preconditions; and 4) the unit-test framework, which executes those test cases. We have applied our approach to the CiaoPP static analyzer, resulting in the identification of many bugs with reasonable overhead. Most of these bugs have been either fixed or confirmed, helping us detect a range of errors not only related to analysis soundness but also within other aspects of the framework.
arXiv:2501.11501v1 Announce Type: new Abstract: Programmers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses, that are conceptually different. However, existing mainstream programming languages use a unified string type to represent them all. As a result, their type systems will keep quiet when a function requiring an email address is instead fed an HTML text, which may cause unexceptional failures or vulnerabilities. To let the type system distinguish such conceptually different string types, in this paper, we propose to regard \emph{formal languages as types} (FLAT), thereby restricting the set of valid strings by context-free grammars and semantic constraints if needed. To this end, email addresses and HTML text are treated as different types. We realize this idea in Python as a testing framework FLAT-PY. It contains user annotations, all directly attached to the user's code, to (1) define such \emph{language types}, (2) specify pre-/post-conditions serving as \emph{semantic oracles} or contracts for functions, and (3) fuzz functions via random string inputs generated from a \emph{language-based fuzzer}. From these annotations, FLAY-PY \emph{automatically} checks type correctness at runtime via \emph{code instrumentation}, and reports any detected type error as soon as possible, preventing bugs from flowing deeply into other parts of the code. Case studies on real Python code fragments show that FLAT-PY is enable to catch logical bugs from random inputs, requiring a reasonable amount of user annotations.
arXiv:2312.02664v3 Announce Type: replace Abstract: The tensor notation used in several areas of mathematics is a useful one, but it is not widely available to the functional programming community. In a practical sense, the (embedded) domain-specific languages (DSLs) that are currently in use for tensor algebra are either 1. array-oriented languages that do not enforce or take advantage of tensor properties and algebraic structure or 2. follow the categorical structure of tensors but require the programmer to manipulate tensors in an unwieldy point-free notation. A deeper issue is that for tensor calculus, the dominant pedagogical paradigm assumes an audience which is either comfortable with notational liberties which programmers cannot afford, or focus on the applied mathematics of tensors, largely leaving their linguistic aspects (behaviour of variable binding, syntax and semantics, etc.) for the reader to figure out by themselves. This state of affairs is hardly surprising, because, as we highlight, several properties of standard tensor notation are somewhat exotic from the perspective of lambda calculi. We bridge the gap by defining a DSL, embedded in Haskell, whose syntax closely captures the index notation for tensors in wide use in the literature. The semantics of this EDSL is defined in terms of the algebraic structures which define tensors in their full generality. This way, we believe that our EDSL can be used both as a tool for scientific computing, but also as a vehicle to express and present the theory and applications of tensors.
arXiv:2501.08496v2 Announce Type: replace Abstract: Contrary to the conventional emphasis on dataset size, we explore the role of data alignment -- an often overlooked aspect of data quality -- in training capable Large Language Models (LLMs). To do so, we use the Task2Vec-based alignment coefficient, a quantitative measure of the similarity between two datasets, to quantify the impact of alignment between training data and evaluation data on downstream performance. In particular, we conduct controlled \textit{interventional} experiments for two settings: 1. the impact of increased alignment coefficients between various pre-training (pt) against evaluation datasets, and 2. the impact of increased alignment coefficients between domain specific fine-tuning (ft) against domain specific evaluation. The domain specific task we explore is Autoformalization -- the machine translation task between natural language and code for formal verification. In both settings, we find a strong, predictable negative correlation between the alignment coefficient of a model's training and evaluation data and the model's loss/perplexity on the respective downstream task. These findings suggest a re-evaluation of LLM training approaches, demonstrating the relevance of data alignment compared to data quantity, especially in specialized downstream tasks such as Autoformalization.
arXiv:2501.10802v1 Announce Type: new Abstract: Authenticated data structures allow untrusted third parties to carry out operations which produce proofs that can be used to verify an operation's output. Such data structures are challenging to develop and implement correctly. This paper gives a formal proof of security and correctness for a library that generates authenticated versions of data structures automatically. The proof is based on a new relational separation logic for reasoning about programs that use collision-resistant cryptographic hash functions. This logic provides a basis for constructing two semantic models of a type system, which are used to justify how the library makes use of type abstraction to enforce security and correctness. Using these models, we also prove the correctness of several optimizations to the library and then show how optimized, hand-written implementations of authenticated data structures can be soundly linked with automatically generated code. All of the results in this paper have been mechanized in the Coq proof assistant using the Iris framework.
arXiv:2501.11001v1 Announce Type: new Abstract: Reverse engineering tools are required to handle the complexity of software products and the unique requirements of many different tasks, like software analysis and visualization. Thus, reverse engineering tools should adapt to a variety of cases. Static Code Analysis (SCA) is a technique for analyzing and exploring software source code without running it. Manual review of software source code puts additional effort on software developers and is a tedious, error-prone, and costly job. This paper proposes an original approach (called ScaMaha) for Object-Oriented (OO) source code analysis and visualization based on SCA. ScaMaha is a modular, flexible, and extensible reverse engineering tool. ScaMaha revolves around a new meta-model and a new code parser, analyzer, and visualizer. ScaMaha parser extracts software source code based on the Abstract Syntax Tree (AST) and stores this code as a code file. The code file includes all software code identifiers, relations, and structural information. ScaMaha analyzer studies and exploits the code files to generate useful information regarding software source code. The software metrics file gives unique metrics regarding software systems, such as the number of method access relations. Software source code visualization plays an important role in software comprehension. Thus, ScaMaha visualizer exploits code files to visualize different aspects of software source code. The visualizer generates unique graphs about software source code, like the visualization of inheritance relations. ScaMaha tool was applied to several case studies from small to large software systems, such as drawing shapes, mobile photo, health watcher, rhino, and ArgoUML. Results show the scalability, performance, soundness, and accuracy of ScaMaha tool. Evaluation metrics, such as precision and recall, demonstrate the accuracy of ScaMaha ...
arXiv:2501.12313v1 Announce Type: new Abstract: Software verification witnesses are a common exchange format for software verification tools. They were developed to provide arguments supporting the verification result, allowing other tools to reproduce the verification results. Correctness witnesses in the current format (version 2.0) allow only for the encoding of loop and location invariants using C expressions. This limits the correctness arguments that verifiers can express in the witness format. One particular limitation is the inability to express function contracts, which consist of a pre-condition and a post-condition for a function. We propose an extension to the existing witness format 2.0 to allow for the specification of function contracts. Our extension includes support for several features inspired by ACSL (\result, \old, \at). This allows for the export of more information from tools and for the exchange of information with tools that require function contracts.
arXiv:2208.14739v5 Announce Type: replace Abstract: The class of Basic Feasible Functionals BFF is the second-order counterpart of the class of first-order functions computable in polynomial time. We present several implicit characterizations of BFF based on a typed programming language of terms. These terms may perform calls to non-recursive imperative procedures. The type discipline has two layers: the terms follow a standard simply-typed discipline and the procedures follow a standard tier-based type discipline. BFF consists exactly of the second-order functionals that are computed by typable and terminating programs. The completeness of this characterization surprisingly still holds in the absence of lambda-abstraction. Moreover, the termination requirement can be specified as a completeness-preserving instance, which can be decided in time quadratic in the size of the program. As typing is decidable in polynomial time, we obtain the first tractable (i.e., decidable in polynomial time), sound, complete, and implicit characterization of BFF, thus solving a problem opened for more than 20 years.
arXiv:2404.04132v2 Announce Type: replace Abstract: Symbolic execution is an SMT-based software verification and testing technique. Symbolic execution requires tracking performed computations during software simulation to reason about branches in the software under test. The prevailing approach on symbolic execution of binary code tracks computations by transforming the code to be tested to an architecture-independent IR and then symbolically executes this IR. However, the resulting IR must be semantically equivalent to the binary code, making this process complex and error-prone. The semantics of the binary code are specified by the targeted ISA, commonly given in natural language and requiring a manual implementation of the transformation to an IR. In recent years, the use of formal languages to describe ISA semantics in a machine-readable way has gained increased popularity. We investigate the utilization of such formal semantics for symbolic execution of binary code, achieving an accurate representation of instruction semantics. We present a prototype for the RISC-V ISA and conduct a case study to demonstrate that it can be easily extended to additional instructions. Furthermore, we perform an experimental comparison with prior work which resulted in the discovery of five previously unknown bugs in the ISA implementation of the popular IR-based symbolic executor angr.
arXiv:2405.11361v2 Announce Type: replace Abstract: Scripting languages are widely used to compose external calls, such as foreign functions that perform expensive computations, remote APIs, and more recently, machine learning systems such as large language models (LLMs). The execution time of scripts is often dominated by waiting for these external calls, and large speedups can be achieved via parallelization and streaming. However, doing this manually is challenging, even for expert programmers. To address this, we propose a novel opportunistic evaluation strategy for scripting languages based on a core lambda calculus that automatically executes external calls in parallel, as early as possible. We prove that our approach is confluent, ensuring that it preserves the programmer's original intent, and that our approach eventually executes every external call. We implement this approach in a framework called EPIC, embedded in Python. We demonstrate its versatility and performance on several applications drawn from the LLM literature, including Tree-of-Throughts and tool use. Our experiments show that opportunistic evaluation improves total running time (up to $6.2\times$) and latency (up to $12.7\times$) compared to several state-of-the-art baselines, while performing very close (between $1.3\%$ and $18.5\%$ running time overhead) to hand-tuned manually optimized parallel Rust implementations.
arXiv:2411.10659v3 Announce Type: replace Abstract: Latency is a major concern for web rendering engines like those in Chrome, Safari, and Firefox. These engines reduce latency by using an incremental layout algorithm to redraw the page when the user interacts with it. In such an algorithm, elements that change frame-to-frame are marked dirty; only the dirty elements need be processed to draw the next frame, dramatically reducing latency. However, the standard incremental layout algorithm must search the page for dirty elements, accessing a number of auxiliary elements in the process. These auxiliary elements add cache misses and stalled cycles, and are responsible for a sizable fraction of all layout latency. We introduce a new, faster incremental layout algorithm called Spineless Traversal. Spineless Traversal uses a more computationally demanding priority queue algorithm to avoid the need to access auxiliary nodes and thus reduces cache traffic and stalls. This leads to dramatic speedups on the most latency-critical interactions such as hovering, typing, or animations. Moreover, thanks to numerous low-level optimizations, we are able to make Spineless Traversal competitive across the whole spectrum of incremental layout workloads. As a result, across 2216 benchmarks, Spineless Traversal is faster on 78.2% of the benchmark, with a mean speedup of 3.23x concentrated in the most latency-critical interactions such as hovering, typing, and animations.
arXiv:2501.11252v1 Announce Type: new Abstract: Logic bugs are bugs that can cause database management systems (DBMSs) to silently produce incorrect results for given queries. Such bugs are severe, because they can easily be overlooked by both developers and users, and can cause applications that rely on the DBMSs to malfunction. In this work, we propose Constant-Optimization-Driven Database Testing (CODDTest) as a novel approach for detecting logic bugs in DBMSs. This method draws inspiration from two well-known optimizations in compilers: constant folding and constant propagation. Our key insight is that for a certain database state and query containing a predicate, we can apply constant folding on the predicate by replacing an expression in the predicate with a constant, anticipating that the results of this predicate remain unchanged; any discrepancy indicates a bug in the DBMS. We evaluated CODDTest on five mature and extensively-tested DBMSs-SQLite, MySQL, CockroachDB, DuckDB, and TiDB-and found 45 unique, previously unknown bugs in them. Out of these, 24 are unique logic bugs. Our manual analysis of the state-of-the-art approaches indicates that 11 logic bugs are detectable only by CODDTest. We believe that CODDTest is easy to implement, and can be widely adopted in practice.
arXiv:2501.10560v1 Announce Type: new Abstract: Ensuring the proper use of sensitive data in analytics under complex privacy policies is an increasingly critical challenge. Many existing approaches lack portability, verifiability, and scalability across diverse data processing frameworks. We introduce Picachv, a novel security monitor that automatically enforces data use policies. It works on relational algebra as an abstraction for program semantics, enabling policy enforcement on query plans generated by programs during execution. This approach simplifies analysis across diverse analytical operations and supports various front-end query languages. By formalizing both data use policies and relational algebra semantics in Coq, we prove that Picachv correctly enforces policies. Picachv also leverages Trusted Execution Environments (TEEs) to enhance trust in runtime, providing provable policy compliance to stakeholders that the analytical tasks comply with their data use policies. We integrated Picachv into Polars, a state-of-the-art data analytics framework, and evaluate its performance using the TPC-H benchmark. We also apply our approach to real-world use cases. Our work demonstrates the practical application of formal methods in securing data analytics, addressing key challenges.
arXiv:2501.10668v1 Announce Type: new Abstract: Existing precise pointer tracing methods introduce substantial runtime overhead to the program being traced and are applicable only at specific program execution points. We propose MappedTrace that leverages compiler-generated read-only maps to accurately identify all pointers in any given snapshot of a program's execution state. The maps record the locations and types of pointers, allowing the tracer to precisely identify pointers without requiring the traced program to maintain bookkeeping data structures or poll at safe points, thereby reducing runtime overhead. By running the tracer from a different address space or machine, MappedTrace presents new opportunities to improve memory management techniques like memory leak detection and enables novel use cases such as infinite memory abstraction for resource-constrained environments.
arXiv:2411.14802v2 Announce Type: replace Abstract: Hierarchical graph rewriting is a highly expressive computational formalism that manipulates graphs enhanced with box structures for representing hierarchies. It has provided the foundations of various graph-based modeling tools, but the design of high-level declarative languages based on hierarchical graph rewriting is still a challenge. For a solid design choice, well-established formalisms with backgrounds other than graph rewriting would provide useful guidelines. Proof nets of Multiplicative Exponential Linear Logic (MELL) is such a framework because its original formulation of cut elimination is essentially graph rewriting involving box structures, where so-called Promotion Boxes with an indefinite number of non-local edges may be cloned, migrated and deleted. This work builds on LMNtal as a declarative language based on hierarchical (port) graph rewriting, and discusses how it can be extended to support the above operations on Promotion Boxes of MELL proof nets. LMNtal thus extended turns out to be a practical graph rewriting language that has strong affinity with MELL proof nets. The language features provided are general enough to encode other well-established models of concurrency. Using the toolchain of LMNtal that provides state-space search and model checking, we implemented cut elimination rules of MELL proof nets in extended LMNtal and demonstrated that the platform could serve as a useful workbench for proof nets.
arXiv:2501.11613v1 Announce Type: new Abstract: This study introduces Conversation Routines (CR), a structured prompt engineering framework for developing task-oriented dialog systems using Large Language Models (LLMs). While LLMs demonstrate remarkable natural language understanding capabilities, engineering them to reliably execute complex business workflows remains challenging. The proposed CR framework enables the development of Conversation Agentic Systems (CAS) through natural language specifications, embedding task-oriented logic within LLM prompts. This approach provides a systematic methodology for designing and implementing complex conversational workflows while maintaining behavioral consistency. We demonstrate the framework's effectiveness through two proof of concept implementations: a Train Ticket Booking System and an Interactive Troubleshooting Copilot. These case studies validate CR's capability to encode sophisticated behavioral patterns and decision logic while preserving natural conversational flexibility. Results show that CR enables domain experts to design conversational workflows in natural language while leveraging custom enterprise functionalities (tools) developed by software engineers, creating an efficient division of responsibilities where developers focus on core API implementation and domain experts handle conversation design. While the framework shows promise in accessibility and adaptability, we identify key challenges including computational overhead, non-deterministic behavior, and domain-specific logic optimization. Future research directions include enhancing system robustness, improving scalability for complex multi-agent interactions, and addressing the identified limitations across diverse business applications.
arXiv:2501.06579v1 Announce Type: new Abstract: We consider the problem of refuting equivalence of probabilistic programs, i.e., the problem of proving that two probabilistic programs induce different output distributions. We study this problem in the context of programs with conditioning (i.e., with observe and score statements), where the output distribution is conditioned by the event that all the observe statements along a run evaluate to true, and where the probability densities of different runs may be updated via the score statements. Building on a recent work on programs without conditioning, we present a new equivalence refutation method for programs with conditioning. Our method is based on weighted restarting, a novel transformation of probabilistic programs with conditioning to the output equivalent probabilistic programs without conditioning that we introduce in this work. Our method is the first to be both a) fully automated, and b) providing provably correct answers. We demonstrate the applicability of our method on a set of programs from the probabilistic inference literature.
arXiv:2312.09982v3 Announce Type: replace Abstract: The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework. This paper presents ACPO: An AI-Enabled Compiler Framework, a novel framework that provides LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes. We first showcase the high-level view, class hierarchy, and functionalities of ACPO and subsequently, demonstrate \taco{a couple of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining passes used in LLVM's O3. and finally, describe how ACPO can be leveraged to optimize other passes. Experimental results reveal that the ACPO model for Loop Unroll can gain on average 4\%, 3\%, 5.4\%, and 0.2\% compared to LLVM's vanilla O3 optimization when deployed on Polybench, Coral-2, CoreMark, and Graph-500, respectively. Furthermore, by including both Function Inlining and Loop Unroll models, ACPO can provide a combined speedup of 4.5\% on Polybench and 2.4\% on Cbench when compared with LLVM's O3, respectively.
arXiv:2501.07047v1 Announce Type: new Abstract: Cloud-based services are making the outsourcing of sensitive client data increasingly common. Although homomorphic encryption (HE) offers strong privacy guarantee, it requires substantially more resources than computing on plaintext, often leading to unacceptably large latencies in getting the results. HE accelerators have emerged to mitigate this latency issue, but with the high cost of ASICs. In this paper we show that HE primitives can be converted to AI operators and accelerated on existing ASIC AI accelerators, like TPUs, which are already widely deployed in the cloud. Adapting such accelerators for HE requires (1) supporting modular multiplication, (2) high-precision arithmetic in software, and (3) efficient mapping on matrix engines. We introduce the CROSS compiler (1) to adopt Barrett reduction to provide modular reduction support using multiplier and adder, (2) Basis Aligned Transformation (BAT) to convert high-precision multiplication as low-precision matrix-vector multiplication, (3) Matrix Aligned Transformation (MAT) to covert vectorized modular operation with reduction into matrix multiplication that can be efficiently processed on 2D spatial matrix engine. Our evaluation of CROSS on a Google TPUv4 demonstrates significant performance improvements, with up to 161x and 5x speedup compared to the previous work on many-core CPUs and V100. The kernel-level codes are open-sourced at https://github.com/google/jaxite.git.
arXiv:2501.06283v1 Announce Type: new Abstract: Using large language models (LLMs) to generate source code from natural language prompts is a popular and promising idea with a wide range of applications. One of its limitations is that the generated code can be faulty at times, often in a subtle way, despite being presented to the user as correct. In this paper, we explore ways in which formal methods can assist with increasing the quality of code generated by an LLM. Instead of emitting code in a target language directly, we propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny, that can be automatically validated for correctness against agreed on specifications. The correct Dafny program is then compiled to the target language and returned to the user. All user-system interactions throughout the procedure occur via natural language; Dafny code is never exposed. We describe our current prototype and report on its performance on the HumanEval Python code generation benchmarks.