math.DS

33 posts

cs.CL cs.LG math.DS

Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models

arXiv:2503.21929v1 Announce Type: new Abstract: Advances in hardware and language model architecture have spurred a revolution in natural language generation. However, autoregressive models compute probability distributions over next-token choices, and sampling from these distributions, known as decoding, has received significantly less attention than other design choices. Existing decoding strategies are largely based on heuristics, resulting in methods that are hard to apply or improve in a principled manner. We develop the theory of decoding strategies for language models by expressing popular decoding algorithms as equilibrium states in the language of ergodic theory and stating the functions they optimize. Using this, we analyze the effect of the local normalization step of top-k, nucleus, and temperature sampling, used to make probabilities sum to one. We argue that local normalization distortion is a fundamental defect of decoding strategies and quantify the size of this distortion and its effect on mathematical proxies for the quality and diversity of generated text. Contrary to the prevailing explanation, we argue that the major cause of the under-performance of top-k sampling relative to nucleus sampling is local normalization distortion. This yields conclusions for the future design of decoding algorithms and the detection of machine-generated text.

Tom Kempton, Stuart Burrell3/31/2025

math.DS cs.CE cs.SY eess.SY math.DG math.OC

Data-Driven Nonlinear Model Reduction to Spectral Submanifolds via Oblique Projection

arXiv:2503.21895v1 Announce Type: cross Abstract: The dynamics in a primary Spectral Submanifold (SSM) constructed over the slowest modes of a dynamical system provide an ideal reduced-order model for nearby trajectories. Modeling the dynamics of trajectories further away from the primary SSM, however, is difficult if the linear part of the system exhibits strong non-normal behavior. Such non-normality implies that simply projecting trajectories onto SSMs along directions normal to the slow linear modes will not pair those trajectories correctly with their reduced counterparts on the SSMs. In principle, a well-defined nonlinear projection along a stable invariant foliation exists and would exactly match the full dynamics to the SSM-reduced dynamics. This foliation, however, cannot realistically be constructed from practically feasible amounts and distributions of experimental data. Here we develop an oblique projection technique that is able to approximate this foliation efficiently, even from a single experimental trajectory of a significantly non-normal and nonlinear beam.

Leonardo Bettini, B\'alint Kasz\'as, Bernhard Zybach, J\"urg Dual, George Haller3/31/2025

eess.SY cs.NA cs.SY math.DS math.NA

PI-Controlled Variable Time-Step Power System Simulation Using an Adaptive Order Differential Transformation Method

arXiv:2503.09898v1 Announce Type: new Abstract: Dynamic simulation plays a crucial role in power system transient stability analysis, but traditional numerical integration-based methods are time-consuming due to the small time step sizes. Other semi-analytical solution methods, such as the Differential Transformation method, often struggle to select proper orders and steps, leading to slow performance and numerical instability. To address these challenges, this paper proposes a novel adaptive dynamic simulation approach for power system transient stability analysis. The approach adds feedback control and optimization to selecting the step and order, utilizing the Differential Transformation method and a proportional-integral control strategy to control truncation errors. Order selection is formulated as an optimization problem resulting in a variable-step-optimal-order method that achieves significantly larger time step sizes without violating numerical stability. It is applied to three systems: the IEEE 9-bus, 3-generator system, IEEE 39-bus, 10-generator system, and a Polish 2383-bus, 327-generator system, promising computational efficiency and numerical robustness for large-scale power system is demonstrated in comprehensive case studies.

Kaiyang Huang, Yang Liu, Kai Sun, Feng Qiu3/14/2025

math.OC cs.SY eess.SY math.DS

Analysis of a continuous opinion and discrete action dynamics model coupled with an external observation dynamics

arXiv:2403.09473v2 Announce Type: cross Abstract: We consider a set of consumers in a city or town (who thus generate pollution) whose opinion is governed by a continuous opinion and discrete action (CODA) dynamics model. This dynamics is coupled with an observation signal dynamics, which defines the information the consumers have access to regarding the common pollution. We show that the external observation signal has a significant impact on the asymptotic behavior of the CODA model. When the coupling is strong, it induces either a chaotic behavior or convergence towards a limit cycle. When the coupling is weak, a more classical behavior characterized by local agreements in polarized clusters is observed. In both cases, conditions under which clusters of consumers don't change their actions are provided.Numerical examples are provided to illustrate the derived analytical results.

Anthony Couthures, Thomas Mongaillard, Vineeth S. Varma, Samson Lasaulce, Irinel-Constantin Morarescu3/14/2025

physics.soc-ph cs.MA cs.SY eess.SY math.DS math.OC

Coalescing Force of Group Pressure: Consensus in Nonlinear Opinion Dynamics

arXiv:2410.04301v3 Announce Type: replace-cross Abstract: This work extends the recent opinion dynamics model from Cheng et al., emphasizing the role of group pressure in consensus formation. We generalize the findings to incorporate social influence algorithms with general time-varying, opinion-dependent weights and multidimensional opinions, beyond bounded confidence dynamics. We demonstrate that, with uniformly positive conformity levels, group pressure consistently drives consensus and provide a tighter estimate for the convergence rate. Unlike previous models, the common public opinion in our framework can assume arbitrary forms within the convex hull of current opinions, offering flexibility applicable to real-world scenarios such as opinion polls with random participant selection. This analysis provides deeper insights into how group pressure mechanisms foster consensus under diverse conditions.

Iryna Zabarianska, Anton V. Proskurnikov3/14/2025

math.DS cs.NA math.NA math.PR

Rigorous enclosure of Lyapunov exponents of stochastic flows

arXiv:2411.07064v2 Announce Type: replace-cross Abstract: We develop a powerful and general method to provide rigorous and accurate upper and lower bounds for Lyapunov exponents of stochastic flows. Our approach is based on computer-assisted tools, the adjoint method and established results on the ergodicity of diffusion processes. We do not require any structural assumptions on the stochastic system, work under mild hypoellipticity conditions and outside of perturbative regimes. Therefore, our method allows for the treatment of systems that were so far out of reach from existing mathematical tools. We demonstrate our method to exhibit the chaotic nature of three different systems. Finally, we show the robustness of our approach by combining it with continuation methods to produce bounds on Lyapunov exponents over large parameter regions.

Maxime Breden, Hugo Chu, Jeroen S. W. Lamb, Martin Rasmussen3/14/2025

cs.MA cs.AI cs.GT math.DS

Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

arXiv:2503.10186v1 Announce Type: new Abstract: Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, and instead display complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdos-Renyi model, a well-studied model for social networks, and the Stochastic Block model, which generalizes the above by accounting for community structures within the network. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the sparsity of the network. Finally, we validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided network sparsity is controlled.

Aamal Hussain, Dan Leonte, Francesco Belardinelli, Raphael Huser, Dario Paccagnan3/14/2025

eess.SY cs.RO cs.SY math.DS

Optimizing AUV speed dynamics with a data-driven Koopman operator approach

arXiv:2503.09628v1 Announce Type: new Abstract: Autonomous Underwater Vehicles (AUVs) play an essential role in modern ocean exploration, and their speed control systems are fundamental to their efficient operation. Like many other robotic systems, AUVs exhibit multivariable nonlinear dynamics and face various constraints, including state limitations, input constraints, and constraints on the increment input, making controller design challenging and requiring significant effort and time. This paper addresses these challenges by employing a data-driven Koopman operator theory combined with Model Predictive Control (MPC), which takes into account the aforementioned constraints. The proposed approach not only ensures the performance of the AUV under state and input limitations but also considers the variation in incremental input to prevent rapid and potentially damaging changes to the vehicle's operation. Additionally, we develop a platform based on ROS2 and Gazebo to validate the effectiveness of the proposed algorithms, providing new control strategies for underwater vehicles against the complex and dynamic nature of underwater environments.

Zhiliang Liu, Xin Zhao, Peng Cai, Bing Cong3/14/2025

eess.SY cs.DM cs.SY math.DS math.PR math.SP

Analysis and Mitigation of Cascading Failures Using a Stochastic Interaction Graph with Eigen-analysis

arXiv:2503.09904v1 Announce Type: new Abstract: In studies on complex network systems using graph theory, eigen-analysis is typically performed on an undirected graph model of the network. However, when analyzing cascading failures in a power system, the interactions among failures suggest the need for a directed graph beyond the topology of the power system to model directions of failure propagation. To accurately quantify failure interactions for effective mitigation strategies, this paper proposes a stochastic interaction graph model and associated eigen-analysis. Different types of modes on failure propagations are defined and characterized by the eigenvalues of a stochastic interaction matrix, whose absolute values are unity, zero, or in between. Finding and interpreting these modes helps identify the probable patterns of failure propagation, either local or widespread, and the participating components based on eigenvectors. Then, by lowering the failure probabilities of critical components highly participating in a mode of widespread failures, cascading can be mitigated. The validity of the proposed stochastic interaction graph model, eigen-analysis and the resulting mitigation strategies is demonstrated using simulated cascading failure data on an NPCC 140-bus system.

Zhenping Guo, Xiaowen Su, Kai Sun, Byungkwon Park, Srdjan Simunovic3/14/2025

math.DS cs.NA math.NA

Entropic transfer operators for stochastic systems

arXiv:2503.05308v1 Announce Type: cross Abstract: Dynamical systems can be analyzed via their Frobenius-Perron transfer operator and its estimation from data is an active field of research. Recently entropic transfer operators have been introduced to estimate the operator of deterministic systems. The approach is based on the regularizing properties of entropic optimal transport plans. In this article we generalize the method to stochastic and non-stationary systems and give a quantitative convergence analysis of the empirical operator as the available samples increase. We introduce a way to extend the operator's eigenfunctions to previously unseen samples, such that they can be efficiently included into a spectral embedding. The practicality and numerical scalability of the method are demonstrated on a real-world fluid dynamics experiment.

Hancheng Bi, Cl\'ement Sarrazin, Bernhard Schmitzer, Thilo D. Stier3/10/2025

eess.SY cs.MA cs.RO cs.SY math.DS math.OC

Cooperative Nonlinear Guidance Strategies for Guaranteed Pursuit-Evasion

arXiv:2402.06176v2 Announce Type: replace Abstract: This paper addresses the pursuit-evasion problem involving three agents -- a purser, an evader, and a defender. We develop cooperative guidance laws for the evader-defender team that guarantee that the defender intercepts the pursuer before it reaches the vicinity of the evader. Unlike heuristic methods, optimal control, differential game formulation, and recently proposed time-constrained guidance techniques, we propose a geometric solution to safeguard the evader from the pursuer's incoming threat. The proposed strategy is computationally efficient and expected to be scalable as the number of agents increases. Another alluring feature of the proposed strategy is that the evader-defender team does not require the knowledge of the pursuer's strategy and that the pursuer's interception is guaranteed from arbitrary initial engagement geometries. We further show that the necessary error variables for the evader-defender team vanish within a time that can be exactly prescribed prior to the three-body engagement. Finally, we demonstrate the efficacy of the proposed cooperative defense strategy via simulation in diverse engagement scenarios.

Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha3/10/2025

math.GR cs.CC cs.FL math.DS

Word problems and embedding-obstructions in cellular automata groups on groups

arXiv:2503.05572v1 Announce Type: cross Abstract: We study groups of reversible cellular automata, or CA groups, on groups. More generally, we consider automorphism groups of subshifts of finite type on groups. It is known that word problems of CA groups on virtually nilpotent groups are in co-NP, and can be co-NP-hard. We show that under the Gap Conjecture of Grigorchuk, their word problems are PSPACE-hard on all other groups. On free and surface groups, we show that they are indeed always in PSPACE. On a group with co-NEXPTIME word problem, CA groups themselves have co-NEXPTIME word problem, and on the lamplighter group (which itself has polynomial-time word problem) we show they can be co-NEXPTIME-hard. We show also two nonembeddability results: the group of cellular automata on a non-cyclic free group does not embed in the group of cellular automata on the integers (this solves a question of Barbieri, Carrasco-Vargas and Rivera-Burgos); and the group of cellular automata in dimension $D$ does not embed in a group of cellular automata in dimension $d$ if $D \geq 3d+2$ (this solves a question of Hochman).

Ville Salo3/10/2025

physics.soc-ph cs.SI math.DS

An Approximate-Master-Equation Formulation of the Watts Threshold Model on Hypergraphs

arXiv:2503.04020v1 Announce Type: cross Abstract: In traditional models of behavioral or opinion dynamics on social networks, researchers suppose that all interactions occur between pairs of individuals. However, in reality, social interactions also occur in groups of three or more individuals. A common way to incorporate such polyadic interactions is to study dynamical processes on hypergraphs. In a hypergraph, interactions can occur between any number of the individuals in a network. The Watts threshold model (WTM) is a well-known model of a simplistic social spreading process. Very recently, Chen et al. extended the WTM from dyadic networks (i.e., graphs) to polyadic networks (i.e., hypergraphs). In the present paper, we extend their discrete-time model to continuous time using approximate master equations (AMEs). By using AMEs, we are able to model the system with very high accuracy. We then reduce the high-dimensional AME system to a system of three coupled differential equations without any detectable loss of accuracy. This much lower-dimensional system is more computationally efficient to solve numerically and is also easier to interpret. We linearize the reduced AME system and calculate a cascade condition, which allows us to determine when a large spreading event occurs. We then apply our model to a social contact network of a French primary school and to a hypergraph of computer-science coauthorships. We find that the AME system is accurate in modelling the polyadic WTM on these empirical networks; however, we expect that future work that incorporates structural correlations between nearby nodes and groups into the model for the dynamics will lead to more accurate theory for real-world networks.

Leah A. Keating, Kwang-Il Goh, Mason A. Porter3/10/2025

math.DS cs.AI cs.LG

On the Dimension of Pullback Attractors in Recurrent Neural Networks

arXiv:2501.11357v1 Announce Type: cross Abstract: Recurrent Neural Networks (RNNs) are high-dimensional state space models capable of learning functions on sequence data. Recently, it has been conjectured that reservoir computers, a particular class of RNNs, trained on observations of a dynamical systems can be interpreted as embeddings. This result has been established for the case of linear reservoir systems. In this work, we use a nonautonomous dynamical systems approach to establish an upper bound for the fractal dimension of the subset of reservoir state space approximated during training and prediction phase. We prove that when the input sequences comes from an Nin-dimensional invertible dynamical system, the fractal dimension of this set is bounded above by Nin. The result obtained here are useful in dimensionality reduction of computation in RNNs as well as estimating fractal dimensions of dynamical systems from limited observations of their time series. It is also a step towards understanding embedding properties of reservoir computers.

Muhammed Fadera1/22/2025

math.AP cs.NA math.DS math.NA

A rigorous integrator and global existence for higher-dimensional semilinear parabolic PDEs via semigroup theory

arXiv:2402.00406v2 Announce Type: replace-cross Abstract: In this paper, we introduce a general constructive method to compute solutions of initial value problems of semilinear parabolic partial differential equations on hyper-rectangular domains via semigroup theory and computer-assisted proofs. Once a numerical candidate for the solution is obtained via a finite dimensional projection, Chebyshev series expansions are used to solve the linearized equations about the approximation from which a solution map operator is constructed. Using the solution operator (which exists from semigroup theory), we define an infinite dimensional contraction operator whose unique fixed point together with its rigorous bounds provide the local inclusion of the solution. Applying this technique for multiple time steps leads to constructive proofs of existence of solutions over long time intervals. As applications, we study the 3D/2D Swift-Hohenberg, where we combine our method with explicit constructions of trapping regions to prove global existence of solutions of initial value problems converging asymptotically to nontrivial equilibria. A second application consists of the 2D Ohta-Kawasaki equation, providing a framework for handling derivatives in nonlinear terms.

Gabriel William Duchesne, Jean-Philippe Lessard, Akitoshi Takayasu1/22/2025

math.NA cs.NA math.DS math.OC

Changing the ranking in eigenvector centrality of a weighted graph by small perturbations

arXiv:2501.10745v1 Announce Type: new Abstract: In this article, we consider eigenvector centrality for the nodes of a graph and study the robustness (and stability) of this popular centrality measure. For a given weighted graph $\G$ (both directed and undirected), we consider the associated weighted adiacency matrix $A$, which by definition is a non-negative matrix. Eigenvector centrality consists of ranking the elements of the graph according to the corresponding entries of the Perron eigenvector of $A$, which is associated with the positive eigenvalue with largest modulus. An indicator of the robustness of eigenvector centrality consists in looking for a nearby perturbed graph $\widetilde{\G}$, with the same structure as $\G$ (i.e., with the same vertices and edges), but with a weighted adiacency matrix $\widetilde A$ such that the highest $m$ entries ($m \ge 2$) of the Perron eigenvector of $\widetilde A$ coalesce, making the ranking at the highest level ambiguous. To compute a solution to this matrix nearness problem, a nested iterative algorithm is proposed that makes use of a constrained gradient system of matrix differential equations (possibly on a low-rank manifold) in the inner iteration and a one-dimensional optimization of the perturbation size in the outer iteration. The proposed algorithm produces the {\em optimal} perturbation (i.e., the one with smallest Frobenius norm) of the graph, which causes the looked-for coalescence, which is a measure of the sensitivity of the graph. The methodology is formulated in terms of graphs but applies to any nonnegative matrix, with potential applications in fields like population models, consensus dynamics, economics, etc.

Michele Benzi, Nicola Guglielmi1/22/2025

eess.SY cs.SY math.DS math.OC

Characterization of Invariance, Periodic Solutions and Optimization of Dynamic Financial Networks

arXiv:2501.12156v1 Announce Type: new Abstract: Cascading failures, such as bankruptcies and defaults, pose a serious threat for the resilience of the global financial system. Indeed, because of the complex investment and cross-holding relations within the system, failures can occur as a result of the propagation of a financial collapse from one organization to another. While this problem has been studied in depth from a static angle, namely, when the system is at an equilibrium, we take a different perspective and study the corresponding dynamical system. The contribution of this paper is threefold. First, we carry out a systematic analysis of the regions of attraction and invariance of the system orthants, defined by the positive and negative values of the organizations' equity. Second, we investigate periodic solutions and show through a counterexample that there could exist periodic solutions of period greater than 2. Finally, we study the problem of finding the smallest cash injection that would bring the system to the maximal invariant region of the positive orthant.

Leonardo Stella, Dario Bauso, Franco Blanchini, Patrizio Colaneri1/22/2025

Monochromatic arithmetic progressions in the Fibonacci word

arXiv:2501.05830v1 Announce Type: cross Abstract: We investigate the lengths and starting positions of the longest monochromatic arithmetic progressions for a fixed difference in the Fibonacci word. We provide a complete classification for their lengths in terms of a simple formula. Our strongest results are proved using methods from dynamical systems, especially the dynamics of circle rotations. We also employ computer-based methods in the form of the automatic theorem-proving software Walnut. This allows us to extend recent results concerning similar questions for the Thue-Morse sequence and the Rudin-Shapiro sequence. This also allows us to obtain some results for the Fibonacci word that do not seem to be amenable to dynamical methods.

Gandhar Joshi, Dan Rust1/14/2025

cs.AI cs.NE math.DS q-bio.NC

A Computational Model of Learning and Memory Using Structurally Dynamic Cellular Automata

arXiv:2501.06192v1 Announce Type: new Abstract: In the fields of computation and neuroscience, much is still unknown about the underlying computations that enable key cognitive functions including learning, memory, abstraction and behavior. This paper proposes a mathematical and computational model of learning and memory based on a small set of bio-plausible functions that include coincidence detection, signal modulation, and reward/penalty mechanisms. Our theoretical approach proposes that these basic functions are sufficient to establish and modulate an information space over which computation can be carried out, generating signal gradients usable for inference and behavior. The computational method used to test this is a structurally dynamic cellular automaton with continuous-valued cell states and a series of recursive steps propagating over an undirected graph with the memory function embedded entirely in the creation and modulation of graph edges. The experimental results show: that the toy model can make near-optimal choices to re-discover a reward state after a single training run; that it can avoid complex penalty configurations; that signal modulation and network plasticity can generate exploratory behaviors in sparse reward environments; that the model generates context-dependent memory representations; and that it exhibits high computational efficiency because of its minimal, single-pass training requirements combined with flexible and contextual memory representation.

Jeet Singh1/14/2025

cs.LG math.AT math.DS math.OC stat.ML

Barcodes as Summary of Loss Function Topology

arXiv:1912.00043v3 Announce Type: replace Abstract: We propose to study neural networks' loss surfaces by methods of topological data analysis. We suggest to apply barcodes of Morse complexes to explore topology of loss surfaces. An algorithm for calculations of the loss function's barcodes of local minima is described. We have conducted experiments for calculating barcodes of local minima for benchmark functions and for loss surfaces of small neural networks. Our experiments confirm our two principal observations for neural networks' loss surfaces. First, the barcodes of local minima are located in a small lower part of the range of values of neural networks' loss function. Secondly, increase of the neural network's depth and width lowers the barcodes of local minima. This has some natural implications for the neural network's learning and for its generalization properties.

Serguei Barannikov, Alexander Korotin, Dmitry Oganesyan, Daniil Emtsev, Evgeny Burnaev1/14/2025