cs.MS
12 postsarXiv:2401.01921v2 Announce Type: replace Abstract: We introduce a tensor network library designed for classical and quantum physics simulations called Cytnx (pronounced as sci-tens). This library provides almost an identical interface and syntax for both C++ and Python, allowing users to effortlessly switch between two languages. Aiming at a quick learning process for new users of tensor network algorithms, the interfaces resemble the popular Python scientific libraries like NumPy, Scipy, and PyTorch. Not only multiple global Abelian symmetries can be easily defined and implemented, Cytnx also provides a new tool called Network that allows users to store large tensor networks and perform tensor network contractions in an optimal order automatically. With the integration of cuQuantum, tensor calculations can also be executed efficiently on GPUs. We present benchmark results for tensor operations on both devices, CPU and GPU. We also discuss features and higher-level interfaces to be added in the future.
arXiv:2501.12349v1 Announce Type: new Abstract: Robust and scalable function evaluation at any arbitrary point in the finite/spectral element mesh is required for querying the partial differential equation solution at points of interest, comparison of solution between different meshes, and Lagrangian particle tracking. This is a challenging problem, particularly for high-order unstructured meshes partitioned in parallel with MPI, as it requires identifying the element that overlaps a given point and computing the corresponding reference space coordinates. We present a robust and efficient technique for general field evaluation in large-scale high-order meshes with quadrilaterals and hexahedra. In the proposed method, a combination of globally partitioned and processor-local maps are used to first determine a list of candidate MPI ranks, and then locally candidate elements that could contain a given point. Next, element-wise bounding boxes further reduce the list of candidate elements. Finally, Newton's method with trust region is used to determine the overlapping element and corresponding reference space coordinates. Since GPU-based architectures have become popular for accelerating computational analyses using meshes with tensor-product elements, specialized kernels have been developed to utilize the proposed methodology on GPUs. The method is also extended to enable general field evaluation on surface meshes. The paper concludes by demonstrating the use of proposed method in various applications ranging from mesh-to-mesh transfer during r-adaptivity to Lagrangian particle tracking.
arXiv:2412.12361v2 Announce Type: replace Abstract: Fundamental mathematical constants appear in nearly every field of science, from physics to biology. Formulas that connect different constants often bring great insight by hinting at connections between previously disparate fields. Discoveries of such relations, however, have remained scarce events, relying on sporadic strokes of creativity by human mathematicians. Recent developments of algorithms for automated conjecture generation have accelerated the discovery of formulas for specific constants. Yet, the discovery of connections between constants has not been addressed. In this paper, we present the first library dedicated to mathematical constants and their interrelations. This library can serve as a central repository of knowledge for scientists from different areas, and as a collaborative platform for development of new algorithms. The library is based on a new representation that we propose for organizing the formulas of mathematical constants: a hypergraph, with each node representing a constant and each edge representing a formula. Using this representation, we propose and demonstrate a systematic approach for automatically enriching this library using PSLQ, an integer relation algorithm based on QR decomposition and lattice construction. During its development and testing, our strategy led to the discovery of 75 previously unknown connections between constants, including a new formula for the `first continued fraction' constant $C_1$, novel formulas for natural logarithms, and new formulas connecting $\pi$ and $e$. The latter formulas generalize a century-old relation between $\pi$ and $e$ by Ramanujan, which until now was considered a singular formula and is now found to be part of a broader mathematical structure. The code supporting this library is a public, open-source API that can serve researchers in experimental mathematics and other fields of science.
arXiv:2501.03398v1 Announce Type: new Abstract: Space-filling experimental design techniques are commonly used in many computer modeling and simulation studies to explore the effects of inputs on outputs. This research presents raxpy, a Python package that leverages expressive annotation of Python functions and classes to simplify space-filling experimentation. It incorporates code introspection to derive a Python function's input space and novel algorithms to automate the design of space-filling experiments for spaces with optional and hierarchical input dimensions. In this paper, we review the criteria for design evaluation given these types of dimensions and compare the proposed algorithms with numerical experiments. The results demonstrate the ability of the proposed algorithms to create improved space-filling experiment designs. The package includes support for parallelism and distributed execution. raxpy is available as free and open-source software under a MIT license.
arXiv:2501.00279v1 Announce Type: new Abstract: BLAS is a fundamental building block of advanced linear algebra libraries and many modern scientific computing applications. GPUs are known for their strong arithmetic computing capabilities and are highly suited for BLAS operations. However, porting code to GPUs often requires significant effort, especially for large, complex codes or legacy codes, even for BLAS-heavy applications. While various tools exist to automatically offload BLAS to GPUs, they are often impractical due to the high costs associated with mandatory data transfers. The advent of unified memory architectures in recent GPU designs, such as the NVIDIA Grace-Hopper, allows cache-coherent memory access across all types of memory for both CPU and GPU, potentially eliminating the bottlenecks faced in conventional architectures. This breakthrough paves the way for innovative application developments and porting strategies. Building on our preliminary work demonstrating the potential of automatic *gemm offload, this paper extends the framework to all level-3 BLAS operations and introduces SCILIB-Accel, a novel tool for automatic BLAS offload. SCILIB-Accel leverages the memory coherency in Grace-Hopper and introduces a Device First-Use data movement policy inspired by the OpenMP First-Touch approach in multi-socket CPU programming, minimizing CPU-GPU data transfers for typical scientific computing codes. Additionally, utilizing dynamic binary instrumentation, the tool intercepts BLAS symbols directly from a CPU binary, requiring no code modifications or recompilation. SCILIB-Accel has been evaluated using multiple quantum physics codes on up to a few hundred GPU nodes, yielding promising speedups. Notably, for the LSMS method in the MuST suite, a 3x speedup was achieved on Grace-Hopper compared to Grace-Grace.
arXiv:2403.02237v2 Announce Type: replace-cross Abstract: We present our investigation of the study of two variable hypergeometric series, namely Appell $F_{1}$ and $F_{3}$ series, and obtain a comprehensive list of its analytic continuations enough to cover the whole real $(x,y)$ plane, except on their singular loci. We also derive analytic continuations of their 3-variable generalization, the Lauricella $F_{D}^{(3)}$ series and the Lauricella-Saran $F_{S}^{(3)}$ series, leveraging the analytic continuations of $F_{1}$ and $F_{3}$, which ensures that the whole real $(x,y,z)$ space is covered, except on the singular loci of these functions. While these studies are motivated by the frequent occurrence of these multivariable hypergeometric functions in Feynman integral evaluation, they can also be used whenever they appear in other branches of mathematical physics. To facilitate their practical use, we provide four packages: $\texttt{AppellF1.wl}$, $\texttt{AppellF3.wl}$, $\texttt{LauricellaFD.wl}$, and $\texttt{LauricellaSaranFS.wl}$ in $\textit{MATHEMATICA}$. These packages are applicable for generic as well as non-generic values of parameters, keeping in mind their utilities in the evaluation of the Feynman integrals. We explicitly present various physical applications of these packages in the context of Feynman integral evaluation and compare the results using other packages such as $\texttt{FIESTA}$. Upon applying the appropriate conventions for numerical evaluation, we find that the results obtained from our packages are consistent. Various $\textit{Mathematica}$ notebooks demonstrating different numerical results are also provided along with this paper.
arXiv:2408.07843v2 Announce Type: replace Abstract: There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown.
arXiv:2412.16161v1 Announce Type: new Abstract: In this short article I introduce the evitaicossa package which provides functionality for antiassociative algebras in the R programming language; it is available on CRAN at https://CRAN.R-project.org/package=evitaicossa.
arXiv:2412.17265v1 Announce Type: new Abstract: Xiaomai is an intelligent tutoring system (ITS) designed to help Chinese college students in learning advanced mathematics and preparing for the graduate school math entrance exam. This study investigates two distinctive features within Xiaomai: the incorporation of free-response questions with automatic feedback and the metacognitive element of reflecting on self-made errors.
arXiv:2410.10908v2 Announce Type: replace Abstract: Julia has been heralded as a potential successor to Python for scientific machine learning and numerical computing, boasting ergonomic and performance improvements. Since Julia's inception in 2012 and declaration of language goals in 2017, its ecosystem and language-level features have grown tremendously. In this paper, we take a modern look at Julia's features and ecosystem, assess the current state of the language, and discuss its viability and pitfalls as a replacement for Python as the de-facto scientific machine learning language. We call for the community to address Julia's language-level issues that are preventing further adoption.
arXiv:2412.15221v1 Announce Type: new Abstract: The gps2gtfs package addresses a critical need for converting raw Global Positioning System (GPS) trajectory data from public transit vehicles into the widely used GTFS (General Transit Feed Specification) format. This transformation enables various software applications to efficiently utilize real-time transit data for purposes such as tracking, scheduling, and arrival time prediction. Developed in Python, gps2gtfs employs techniques like geo-buffer mapping, parallel processing, and data filtering to manage challenges associated with raw GPS data, including high volume, discontinuities, and localization errors. This open-source package, available on GitHub and PyPI, enhances the development of intelligent transportation solutions and fosters improved public transit systems globally.
arXiv:2310.19051v4 Announce Type: replace-cross Abstract: The Hurst exponent is a significant metric for characterizing time sequences with long-term memory property and it arises in many fields. The available methods for estimating the Hurst exponent can be categorized into time-domain and spectrum-domain methods. Although there are various estimation methods for the Hurst exponent, there are still some disadvantages that should be overcome: firstly, the estimation methods are mathematics-oriented instead of engineering-oriented; secondly, the accuracy and effectiveness of the estimation algorithms are inadequately assessed; thirdly, the framework of classification for the estimation methods are insufficient; and lastly there is a lack of clear guidance for selecting proper estimation in practical problems involved in data analysis. The contributions of this paper lie in four aspects: 1) the optimal sequence partition method is proposed for designing the estimation algorithms for Hurst exponent; 2) the algorithmic pseudo-codes are adopted to describe the estimation algorithms, which improves the understandability and usability of the estimation methods and also reduces the difficulty of implementation with computer programming languages; 3) the performance assessment is carried for the typical estimation algorithms via the ideal time sequence with given Hurst exponent and the practical time sequence captured in applications; 4) the guidance for selecting proper algorithms for estimating the Hurst exponent is presented and discussed. It is expected that the systematic survey of available estimation algorithms could help the users to understand the principles and the assessment of the various estimation methods could help the users to select, implement and apply the estimation algorithms of interest in practical situations in an easy way.