math.FA
17 postsarXiv:2408.13114v2 Announce Type: replace-cross Abstract: We present a general variational framework for the training of freeform nonlinearities in layered computational architectures subject to some slope constraints. The regularization that we add to the traditional training loss penalizes the second-order total variation of each trainable activation. The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility. These properties are crucial to ensure the proper functioning of certain classes of signal-processing algorithms (e.g., plug-and-play schemes, unrolled proximal gradient, invertible flows). We prove that the global optimum of the stated constrained-optimization problem is achieved with nonlinearities that are adaptive nonuniform linear splines. We then show how to solve the resulting function-optimization problem numerically by representing the nonlinearities in a suitable (nonuniform) B-spline basis. Finally, we illustrate the use of our framework with the data-driven design of (weakly) convex regularizers for the denoising of images and the resolution of inverse problems.
arXiv:2301.06765v4 Announce Type: replace-cross Abstract: The Chernoff approximation method is a powerful and flexible tool of functional analysis, which allows in many cases to express exp(tL) in terms of variable coefficients of a linear differential operator L. In this paper, we prove a theorem that allows us to apply this method to find the resolvent of L. Our theorem states that the Laplace transforms of Chernoff approximations of a $C_0$-semigroup converge to the resolvent of the generator of this semigroup. We demonstrate the proposed method on a second-order differential operator with variable coefficients. As a consequence, we obtain a new representation of the solution of a nonhomogeneous linear ordinary differential equation of the second order in terms of functions that are coefficients of this equation, playing the role of parameters of the problem. For the Chernoff function, based on the shift operator, we give an estimate for the rate of convergence of approximations to the solution.
arXiv:2503.10428v1 Announce Type: new Abstract: In this work, we will establish that the Langevin Monte-Carlo algorithm can learn depth-2 neural nets of any size and for any data and we give non-asymptotic convergence rates for it. We achieve this via showing that under Total Variation distance and q-Renyi divergence, the iterates of Langevin Monte Carlo converge to the Gibbs distribution of Frobenius norm regularized losses for any of these nets, when using smooth activations and in both classification and regression settings. Most critically, the amount of regularization needed for our results is independent of the size of the net. The key observation of ours is that two layer neural loss functions can always be regularized by a constant amount such that they satisfy the Villani conditions, and thus their Gibbs measures satisfy a Poincare inequality.
arXiv:2503.10360v1 Announce Type: cross Abstract: Time-frequency concentration and resolution of the Cohen's class time-frequency distribution (CCTFD) has attracted much attention in time-frequency analysis. A variety of uncertainty principles of the CCTFD is therefore derived, including the weak Heisenberg type, the Hardy type, the Nazarov type, and the local type. However, the standard Heisenberg type still remains unresolved. In this study, we address the question of how the standard Heisenberg's uncertainty principle of the CCTFD is affected by fundamental properties. The investigated distribution properties are Parseval's relation and the concise frequency domain definition (i.e., only frequency variables are explicitly found in the tensor product), based on which we confine our attention to the CCTFD with some specific kernels. That is the unit modulus and v-independent time translation, reversal and scaling invariant kernel CCTFD (UMITRSK-CCTFD). We then extend the standard Heisenberg's uncertainty principles of the Wigner distribution to those of the UMITRSK-CCTFD, giving birth to various types of attainable lower bounds on the uncertainty product in the UMITRSK-CCTFD domain. The derived results strengthen the existing weak Heisenberg type and fill gaps in the standard Heisenberg type.
arXiv:2503.10274v1 Announce Type: cross Abstract: This paper devotes to combine the chirp basis function transformation and symplectic coordinates transformation to yield a novel Wigner distribution (WD) associated with the linear canonical transform (LCT), named as the symplectic WD in the LCT domain (SWDL). It incorporates the merits of the symplectic WD (SWD) and the WD in the LCT domain (WDL), achieving stronger capability in the linear frequency-modulated (LFM) signal frequency rate feature extraction while maintaining the same level of computational complexity. Some essential properties of the SWDL are derived, including marginal distributions, energy conservations, unique reconstruction, Moyal formula, complex conjugate symmetry, time reversal symmetry, scaling property, time translation property, frequency modulation property, and time translation and frequency modulation property. Heisenberg's uncertainty principles of the SWDL are formulated, giving rise to three kinds of lower bounds attainable respectively by Gaussian enveloped complex exponential signal, Gaussian signal and Gaussian enveloped chirp signal. The optimal symplectic matrices corresponding to the highest time-frequency resolution are generated by solving the lower bound optimization (minimization) problem. The time-frequency resolution of the SWDL is compared with those of the SWD and WDL to demonstrate its superiority in LFM signals time-frequency energy concentration. A synthesis example is also carried out to verify the feasibility and reliability of the theoretical analysis.
arXiv:2104.11977v2 Announce Type: replace-cross Abstract: Motivated by the problem of robustness to deformations of the input for deep convolutional neural networks, we identify signal classes which are inherently stable to irregular deformations induced by distortion fields $\tau\in L^\infty(\mathbb{R}^d;\mathbb{R}^d)$, to be characterized in terms of a generalized modulus of continuity associated with the deformation operator. Resorting to ideas of harmonic and multiscale analysis, we prove that for signals in multiresolution approximation spaces $U_s$ at scale $s$, stability in $L^2$ holds in the regime $\|\tau\|_{L^\infty}/s\ll 1$ - essentially as an effect of the uncertainty principle. Instability occurs when $\|\tau\|_{L^\infty}/s\gg 1$, and we provide a sharp upper bound for the asymptotic growth rate. The stability results are then extended to signals in the Besov space $B^{d/2}_{2,1}$ tailored to the given multiresolution approximation. We also consider the case of more general time-frequency deformations. Finally, we provide stochastic versions of the aforementioned results, namely we study the issue of stability in mean when $\tau(x)$ is modeled as a random field (not bounded, in general) with identically distributed variables $|\tau(x)|$, $x\in\mathbb{R}^d$.
arXiv:2408.00933v2 Announce Type: replace-cross Abstract: The bad science matrix problem consists in finding, among all matrices $A \in \mathbb{R}^{n \times n}$ with rows having unit $\ell^2$ norm, one that maximizes $\beta(A) = \frac{1}{2^n} \sum_{x \in \{-1, 1\}^n} \|Ax\|_\infty$. Our main contribution is an explicit construction of an $n \times n$ matrix $A$ showing that $\beta(A) \geq \sqrt{\log_2(n+1)}$, which is only 18% smaller than the asymptotic rate. We prove that every entry of any optimal matrix is a square root of a rational number, and we find provably optimal matrices for $n \leq 4$.
arXiv:2501.11451v1 Announce Type: new Abstract: Tomographic investigations are a central tool in medical applications, allowing doctors to image the interior of patients. The corresponding measurement process is commonly modeled by the Radon transform. In practice, the solution of the tomographic problem requires discretization of the Radon transform and its adjoint (called the backprojection). There are various discretization schemes; often structured around three discretization parameters: spatial-, detector-, and angular resolutions. The most widespread approach uses the ray-driven Radon transform and the pixel-driven backprojection in a balanced resolution setting, i.e., the spatial resolution roughly equals the detector resolution. The use of these particular discretization approaches is based on anecdotal reports of their approximation performance, but there is little rigorous analysis of these methods' approximation errors. This paper presents a novel interpretation of ray-driven and pixel-driven methods as convolutional discretizations, illustrating that from an abstract perspective these methods are similar. Moreover, we announce statements concerning the convergence of the ray-driven Radon transform and the pixel-driven backprojection under balanced resolutions. Our considerations are supported by numerical experiments highlighting aspects of the discussed methods.
arXiv:2501.11871v1 Announce Type: new Abstract: In $\mathbb{R}^3$, the primal and dual constructions yield completely different discrete Laplacians for tetrahedral meshes.In this article, we prove that the discrete Laplacian satisfies the Euler-Lagrange equation of the Dirichlet energy in terms of the associated discrete Laplacian corresponding to the dual construction. Specifically, for a three simplex immersed in $\mathbb{R}^3$, the associated discrete Laplacian on the tetrahedron can be expressed as the discrete Laplacian of the faces of the tetrahedron and the associated discrete mean curvature term given by the ambient space $\mathbb{R}^3$. Based on geometric foundations, we provide a mathematical proof showing that the dual construction gives a optimal Laplacian in $\mathbb{R}^3$ compared to the primal construction. Moreover, we show that the associated discrete mean curvature is more sensitive to the initial mesh than other state-of-the-art discrete mean curvatures when the angle changes instantaneously. Instead of improving the angular transient accuracy through mesh subdivision, we can improve the accuracy by providing a higher order approximation of the instantaneous change in angle to reduce the solution error.
arXiv:2501.10607v1 Announce Type: cross Abstract: Given $N$ geodesic caps on the normalized unit sphere in $\mathbb{R}^d$, and whose total surface area sums to one, what is the maximal surface area their union can cover? We show that when these caps have equal surface area, as both the dimension $d$ and the number of caps $N$ tend to infinity, the maximum proportion covered approaches $1 - e^{-1} \approx 0.632$. Furthermore, this maximum is achieved by a random partial sphere covering. Our result refines a classical estimate for the covering density of $\mathbb{R}^d$ by Erd\H{o}s, Few, and Rogers (Mathematika, 11(2):171--184, 1964).
arXiv:2408.11612v2 Announce Type: replace Abstract: Computations in high-dimensional spaces can often be realized only approximately, using a certain number of projections onto lower dimensional subspaces or sampling from distributions. In this paper, we are interested in pairs of real-valued functions $(F,f)$ on $[0,\infty)$ that are related by the projection/slicing formula $F (\| x \|) = \mathbb E_{\xi} \big[ f \big(|\langle x,\xi \rangle| \big) \big]$ for $x\in\mathbb R^d$, where the expectation value is taken over uniformly distributed directions in $\mathbb R^d$. While it is known that $F$ can be obtained from $f$ by an Abel-like integral formula, we construct conversely $f$ from given $F$ using their Fourier transforms. First, we consider the relation between $F$ and $f$ for radial functions $F(\| \cdot\| )$ that are Fourier transforms of $L^1$ functions. Besides $d$- and one-dimensional Fourier transforms, it relies on a rotation operator, an averaging operator and a multiplication operator to manage the walk from $d$ to one dimension in the Fourier space. Then, we generalize the results to tempered distributions, where we are mainly interested in radial regular tempered distributions. Based on Bochner's theorem, this includes positive definite functions $F(\| \cdot\| )$ and, by the theory of fractional derivatives, also functions $F$ whose derivative of order $\lfloor d/2\rfloor$ is slowly increasing and continuous.
arXiv:2410.00355v2 Announce Type: replace-cross Abstract: Finding eigenvalue distributions for a number of sparse random matrix ensembles can be reduced to solving nonlinear integral equations of the Hammerstein type. While a systematic mathematical theory of such equations exists, it has not been previously applied to sparse matrix problems. We close this gap in the literature by showing how one can employ numerical solutions of Hammerstein equations to accurately recover the spectra of adjacency matrices and Laplacians of random graphs. While our treatment focuses on random graphs for concreteness, the methodology has broad applications to more general sparse random matrices.
arXiv:2501.06539v1 Announce Type: new Abstract: We construct a Neural Network that approximates the matrix multiplication operator for any activation functions for which there exist a Neural Network which can approximate the scalar multiplication function. In particular, we use the Strassen algorithm for reducing the number of weights and layers needed for such Neural Network. This allows us to define another Neural Network for approximating the inverse matrix operator. Also, by relying on the Galerkin method, we apply those Neural Networks for resolving parametric elliptic PDEs for a whole set of parameters at the same time. Finally, we discuss the improvements with respect to prior results.
arXiv:2501.03697v1 Announce Type: new Abstract: Identifying an appropriate function space for deep neural networks remains a key open question. While shallow neural networks are naturally associated with Reproducing Kernel Banach Spaces (RKBS), deep networks present unique challenges. In this work, we extend RKBS to chain RKBS (cRKBS), a new framework that composes kernels rather than functions, preserving the desirable properties of RKBS. We prove that any deep neural network function is a neural cRKBS function, and conversely, any neural cRKBS function defined on a finite dataset corresponds to a deep neural network. This approach provides a sparse solution to the empirical risk minimization problem, requiring no more than $N$ neurons per layer, where $N$ is the number of data points.
arXiv:2501.01929v1 Announce Type: cross Abstract: This paper extends the sample complexity theory for ill-posed inverse problems developed in a recent work by the authors [`Compressed sensing for inverse problems and the sample complexity of the sparse Radon transform', J. Eur. Math. Soc., to appear], which was originally focused on the sparse Radon transform. We demonstrate that the underlying abstract framework, based on infinite-dimensional compressed sensing and generalized sampling techniques, can effectively handle a variety of practical applications. Specifically, we analyze three case studies: (1) The reconstruction of a sparse signal from a finite number of pointwise blurred samples; (2) The recovery of the (sparse) source term of an elliptic partial differential equation from finite samples of the solution; and (3) A moderately ill-posed variation of the classical sensing problem of recovering a wavelet-sparse signal from finite Fourier samples, motivated by magnetic resonance imaging. For each application, we establish rigorous recovery guarantees by verifying the key theoretical requirements, including quasi-diagonalization and coherence bounds. Our analysis reveals that careful consideration of balancing properties and optimized sampling strategies can lead to improved reconstruction performance. The results provide a unified theoretical foundation for compressed sensing approaches to inverse problems while yielding practical insights for specific applications.
arXiv:2410.08361v2 Announce Type: replace-cross Abstract: In this paper, we study a Markov chain-based stochastic gradient algorithm in general Hilbert spaces, aiming to approximate the optimal solution of a quadratic loss function. We establish probabilistic upper bounds on its convergence. We further extend these results to an online regularized learning algorithm in reproducing kernel Hilbert spaces, where the samples are drawn along a Markov chain trajectory hence the samples are of the non i.i.d. type.
arXiv:2412.18468v1 Announce Type: cross Abstract: Matrix concentration inequalities and their recently discovered sharp counterparts provide powerful tools to bound the spectrum of random matrices whose entries are linear functions of independent random variables. However, in many applications in theoretical computer science and in other areas one encounters more general random matrix models, called matrix chaoses, whose entries are polynomials of independent random variables. Such models have often been studied on a case-by-case basis using ad-hoc methods that can yield suboptimal dimensional factors. In this paper we provide general matrix concentration inequalities for matrix chaoses, which enable the treatment of such models in a systematic manner. These inequalities are expressed in terms of flattenings of the coefficients of the matrix chaos. We further identify a special family of matrix chaoses of combinatorial type for which the flattening parameters can be computed mechanically by a simple rule. This allows us to provide a unified treatment of and improved bounds for matrix chaoses that arise in a variety of applications, including graph matrices, Khatri-Rao matrices, and matrices that arise in average case analysis of the sum-of-squares hierarchy.