physics.geo-ph
13 postsarXiv:2501.00738v1 Announce Type: cross Abstract: The multiscale and turbulent nature of Earth's atmosphere has historically rendered accurate weather modeling a hard problem. Recently, there has been an explosion of interest surrounding data-driven approaches to weather modeling, which in many cases show improved forecasting accuracy and computational efficiency when compared to traditional methods. However, many of the current data-driven approaches employ highly parameterized neural networks, often resulting in uninterpretable models and limited gains in scientific understanding. In this work, we address the interpretability problem by explicitly discovering partial differential equations governing various weather phenomena, identifying symbolic mathematical models with direct physical interpretations. The purpose of this paper is to demonstrate that, in particular, the Weak form Sparse Identification of Nonlinear Dynamics (WSINDy) algorithm can learn effective weather models from both simulated and assimilated data. Our approach adapts the standard WSINDy algorithm to work with high-dimensional fluid data of arbitrary spatial dimension. Moreover, we develop an approach for handling terms that are not integrable-by-parts, such as advection operators.
arXiv:2403.02774v2 Announce Type: replace-cross Abstract: Accurate and high-resolution Earth system model (ESM) simulations are essential to assess the ecological and socio-economic impacts of anthropogenic climate change, but are computationally too expensive to be run at sufficiently high spatial resolution. Recent machine learning approaches have shown promising results in downscaling ESM simulations, outperforming state-of-the-art statistical approaches. However, existing methods require computationally costly retraining for each ESM and extrapolate poorly to climates unseen during training. We address these shortcomings by learning a consistency model (CM) that efficiently and accurately downscales arbitrary ESM simulations without retraining in a zero-shot manner. Our approach yields probabilistic downscaled fields at a resolution only limited by the observational reference data. We show that the CM outperforms state-of-the-art diffusion models at a fraction of computational cost while maintaining high controllability on the downscaling task. Further, our method generalizes to climate states unseen during training without explicitly formulated physical constraints.
arXiv:2501.00887v1 Announce Type: new Abstract: In this work, we develop a fast and accurate method for the scattering of flexural-gravity waves by a thin plate of varying thickness overlying a fluid of infinite depth. This problem commonly arises in the study of sea ice and ice shelves, which can have complicated heterogeneities that include ridges and rolls. With certain natural assumptions on the thickness, we present an integral equation formulation for solving this class of problems and analyze its mathematical properties. The integral equation is then discretized and solved using a high-order-accurate, FFT-accelerated algorithm. The speed, accuracy, and scalability of this approach are demonstrated through a variety of illustrative examples.
arXiv:2501.00941v1 Announce Type: new Abstract: Recently, the advent of generative AI technologies has made transformational impacts on our daily lives, yet its application in scientific applications remains in its early stages. Data scarcity is a major, well-known barrier in data-driven scientific computing, so physics-guided generative AI holds significant promise. In scientific computing, most tasks study the conversion of multiple data modalities to describe physical phenomena, for example, spatial and waveform in seismic imaging, time and frequency in signal processing, and temporal and spectral in climate modeling; as such, multi-modal pairwise data generation is highly required instead of single-modal data generation, which is usually used in natural images (e.g., faces, scenery). Moreover, in real-world applications, the unbalance of available data in terms of modalities commonly exists; for example, the spatial data (i.e., velocity maps) in seismic imaging can be easily simulated, but real-world seismic waveform is largely lacking. While the most recent efforts enable the powerful diffusion model to generate multi-modal data, how to leverage the unbalanced available data is still unclear. In this work, we use seismic imaging in subsurface geophysics as a vehicle to present ``UB-Diff'', a novel diffusion model for multi-modal paired scientific data generation. One major innovation is a one-in-two-out encoder-decoder network structure, which can ensure pairwise data is obtained from a co-latent representation. Then, the co-latent representation will be used by the diffusion process for pairwise data generation. Experimental results on the OpenFWI dataset show that UB-Diff significantly outperforms existing techniques in terms of Fr\'{e}chet Inception Distance (FID) score and pairwise evaluation, indicating the generation of reliable and useful multi-modal pairwise data.
arXiv:2501.00149v1 Announce Type: cross Abstract: Identifying tropical cyclones that generate destructive storm tides for risk assessment, such as from large downscaled storm catalogs for climate studies, is often intractable because it entails many expensive Monte Carlo hydrodynamic simulations. Here, we show that surrogate models are promising from accuracy, recall, and precision perspectives, and they ``generalize" to novel climate scenarios. We then present an informative online learning approach to rapidly search for extreme storm tide-producing cyclones using only a few hydrodynamic simulations. Starting from a minimal subset of TCs with detailed storm tide hydrodynamic simulations, a surrogate model selects informative data to retrain online and iteratively improves its predictions of damaging TCs. Results on an extensive catalog of downscaled TCs indicate a 100% precision retrieving the rare destructive storms training using less than 20% of the simulations as training. The informative sampling approach is efficient, scalable to large storm catalogs, and generalizable to climate scenarios.
arXiv:2412.18099v1 Announce Type: new Abstract: Earthquake early warning systems play crucial roles in reducing the risk of seismic disasters. Previously, the dominant modeling system was the single-station models. Such models digest signal data received at a given station and predict earth-quake parameters, such as the p-phase arrival time, intensity, and magnitude at that location. Various methods have demonstrated adequate performance. However, most of these methods present the challenges of the difficulty of speeding up the alarm time, providing early warning for distant areas, and considering global information to enhance performance. Recently, deep learning has significantly impacted many fields, including seismology. Thus, this paper proposes a deep learning-based framework, called SENSE, for the intensity prediction task of earthquake early warning systems. To explicitly consider global information from a regional or national perspective, the input to SENSE comprises statistics from a set of stations in a given region or country. The SENSE model is designed to learn the relationships among the set of input stations and the locality-specific characteristics of each station. Thus, SENSE is not only expected to provide more reliable forecasts by considering multistation data but also has the ability to provide early warnings to distant areas that have not yet received signals. This study conducted extensive experiments on datasets from Taiwan and Japan. The results revealed that SENSE can deliver competitive or even better performances compared with other state-of-the-art methods.
arXiv:2412.18032v1 Announce Type: cross Abstract: There is growing concern about our vulnerability to space weather hazards and the disruption critical infrastructure failures could cause to society and the economy. However, the socio-economic impacts of space weather hazards, such as from geomagnetic storms, remain under-researched. This study introduces a novel framework to estimate the economic impacts of electricity transmission infrastructure failure due to space weather. By integrating existing geophysical and geomagnetically induced current (GIC) estimation models with a newly developed geospatial model of the Continental United States power grid, GIC vulnerabilities are assessed for a range of space weather scenarios. The approach evaluates multiple power network architectures, incorporating input-output economic modeling to translate business and population disruptions into macroeconomic impacts from GIC-related thermal heating failures. The results indicate a daily GDP loss from 6 billion USD to over 10 billion USD. Even under conservative GIC thresholds (75 A/ph) aligned with thermal withstand limits from the North American Electric Reliability Corporation (NERC), significant economic disruptions are evident. This study is limited by its restriction to thermal heating analysis, though GICs can also affect the grid through other pathways, such as voltage instability and harmonic distortions. Addressing these other failure mechanisms need to be the focus of future research.
arXiv:2412.16375v1 Announce Type: new Abstract: NOAA's Deep-ocean Assessment and Reporting of Tsunamis (DART) data are critical for NASA-JPL's tsunami detection, real-time operations, and oceanographic research. However, these time-series data often contain spikes, steps, and drifts that degrade data quality and obscure essential oceanographic features. To address these anomalies, the work introduces an Iterative Encoding-Decoding Variational Autoencoders (Iterative Encoding-Decoding VAEs) model to improve the quality of DART time series. Unlike traditional filtering and thresholding methods that risk distorting inherent signal characteristics, Iterative Encoding-Decoding VAEs progressively remove anomalies while preserving the data's latent structure. A hybrid thresholding approach further retains genuine oceanographic features near boundaries. Applied to complex DART datasets, this approach yields reconstructions that better maintain key oceanic properties compared to classical statistical techniques, offering improved robustness against spike removal and subtle step changes. The resulting high-quality data supports critical verification and validation efforts for the GRACE-FO mission at NASA-JPL, where accurate surface measurements are essential to modeling Earth's gravitational field and global water dynamics. Ultimately, this data processing method enhances tsunami detection and underpins future climate modeling with improved interpretability and reliability.
arXiv:2412.17333v1 Announce Type: new Abstract: Earthquakes are rare. Hence there is a fundamental call for reliable methods to generate realistic ground motion data for data-driven approaches in seismology. Recent GAN-based methods fall short of the call, as the methods either require special information such as geological traits or generate subpar waveforms that fail to satisfy seismological constraints such as phase arrival times. We propose a specialized Latent Diffusion Model (LDM) that reliably generates realistic waveforms after learning from real earthquake data with minimal conditions: location and magnitude. We also design a domain-specific training method that exploits the traits of earthquake dataset: multiple observed waveforms time-aligned and paired to each earthquake source that are tagged with seismological metadata comprised of earthquake magnitude, depth of focus, and the locations of epicenter and seismometers. We construct the time-aligned earthquake dataset using Southern California Earthquake Data Center (SCEDC) API, and train our model with the dataset and our proposed training method for performance evaluation. Our model surpasses all comparable data-driven methods in various test criteria not only from waveform generation domain but also from seismology such as phase arrival time, GMPE analysis, and spectrum analysis. Our result opens new future research directions for deep learning applications in seismology.
arXiv:2412.17506v1 Announce Type: cross Abstract: Accurate uncertainty information associated with essential climate variables (ECVs) is crucial for reliable climate modeling and understanding the spatiotemporal evolution of the Earth system. In recent years, geoscience and climate scientists have benefited from rapid progress in deep learning to advance the estimation of ECV products with improved accuracy. However, the quantification of uncertainties associated with the output of such deep learning models has yet to be thoroughly adopted. This survey explores the types of uncertainties associated with ECVs estimated from deep learning and the techniques to quantify them. The focus is on highlighting the importance of quantifying uncertainties inherent in ECV estimates, considering the dynamic and multifaceted nature of climate data. The survey starts by clarifying the definition of aleatoric and epistemic uncertainties and their roles in a typical satellite observation processing workflow, followed by bridging the gap between conventional statistical and deep learning views on uncertainties. Then, we comprehensively review the existing techniques for quantifying uncertainties associated with deep learning algorithms, focusing on their application in ECV studies. The specific need for modification to fit the requirements from both the Earth observation side and the deep learning side in such interdisciplinary tasks is discussed. Finally, we demonstrate our findings with two ECV examples, snow cover and terrestrial water storage, and provide our perspectives for future research.
arXiv:2412.17685v1 Announce Type: cross Abstract: Space weather impact assessment is constrained by the lack of available asset information to undertake modeling of Geomagnetically Induced Currents (GICs) in Extra High Voltage electricity infrastructure networks. The U.S. National Space Weather Strategy and Action Plan identifies underutilized data as a central issue for improving risk assessment, motivating this research. Accurate GIC prediction is generally not possible without information on the electrical circuit, therefore we define a reproducible method based on open-source data, which enables risk analysts to collect their own substation component data. This process converts OpenStreetMap (OSM) substation locations to high-resolution, component-level mapping of electricity transmission assets by utilizing an innovative web-browser platform to facilitate component annotation. As a case study example, we convert an initial 1,313 high-voltage (>115 kV) substations to 52,273 substation components via Google Earth APIs utilizing low-altitude, satellite, and Streetview imagery. We find that a total of 41,642 substation components (79.6%) connect to the highest substation voltage levels (>345 kV) and are possibly susceptible to GIC, with a total of 7,949 transformers identified. Compared to the initial OSM baseline, we provide new detailed insights on voltage levels, line capacities, and substation configurations. Two validation workshops were undertaken to align the method and data with GIC assessment needs. The approach ensures consistency and rapid scalability, enabling users to quickly count components via a flexible web-browser application.
arXiv:2412.17793v1 Announce Type: cross Abstract: Singular Spectrum Analysis (SSA) occupies a prominent place in the real signal analysis toolkit alongside Fourier and Wavelet analysis. In addition to the two aforementioned analyses, SSA allows the separation of patterns directly from the data space into the data space, with data that need not be strictly stationary, continuous, or even normally sampled. In most cases, SSA relies on a combination of Hankel or Toeplitz matrices and Singular Value Decomposition (SVD). Like Fourier and Wavelet analysis, SSA has its limitations. The main bottleneck of the method can be summarized in three points. The first is the diagonalization of the Hankel/Toeplitz matrix, which can become a major problem from a memory and/or computational point of view if the time series to be analyzed is very long or heavily sampled. The second point concerns the size of the analysis window, typically denoted as 'L', which will affect the detection of patterns in the time series as well as the dimensions of the Hankel/Toeplitz matrix. Finally, the third point concerns pattern reconstruction: how to easily identify in the eigenvector/eigenvalue space which patterns should be grouped. We propose to address each of these issues by describing a hopefully effective approach that we have been developing for over 10 years and that has yielded good results in our research work.
arXiv:2412.10184v2 Announce Type: replace Abstract: Acquiring, processing, and visualizing geospatial data requires significant computing resources, especially for large spatio-temporal domains. This challenge hinders the rapid discovery of predictive features, which is essential for advancing geospatial modeling. To address this, we developed Similarity Search (Sims), a no-code web tool that allows users to perform clustering and similarity search over defined regions of interest using Google Earth Engine as a backend. Sims is designed to complement existing modeling tools by focusing on feature exploration rather than model creation. We demonstrate the utility of Sims through a case study analyzing simulated maize yield data in Rwanda, where we evaluate how different combinations of soil, weather, and agronomic features affect the clustering of yield response zones. Sims is open source and available at https://github.com/microsoft/Sims