cs.LG

1880 posts

arXiv:2501.06247v1 Announce Type: new Abstract: Optimal Transport (OT) has established itself as a robust framework for quantifying differences between distributions, with applications that span fields such as machine learning, data science, and computer vision. This paper offers a detailed examination of the OT problem, beginning with its theoretical foundations, including the classical formulations of Monge and Kantorovich and their extensions to modern computational techniques. It explores cutting-edge algorithms, including Sinkhorn iterations, primal-dual strategies, and reduction-based approaches, emphasizing their efficiency and scalability in addressing high-dimensional problems. The paper also highlights emerging trends, such as integrating OT into machine learning frameworks, the development of novel problem variants, and ongoing theoretical advancements. Applications of OT are presented across a range of domains, with particular attention to its innovative application in time series data analysis via Optimal Transport Warping (OTW), a robust alternative to methods like Dynamic Time Warping. Despite the significant progress made, challenges related to scalability, robustness, and ethical considerations remain, necessitating further research. The paper underscores OT's potential to bridge theoretical depth and practical utility, fostering impactful advancements across diverse disciplines.

Sina Moradi1/14/2025

arXiv:2501.06248v1 Announce Type: new Abstract: Current methods that train large language models (LLMs) with reinforcement learning feedback, often resort to averaging outputs of multiple rewards functions during training. This overlooks crucial aspects of individual reward dimensions and inter-reward dependencies that can lead to sub-optimal outcomes in generations. In this work, we show how linear aggregation of rewards exhibits some vulnerabilities that can lead to undesired properties of generated text. We then propose a transformation of reward functions inspired by economic theory of utility functions (specifically Inada conditions), that enhances sensitivity to low reward values while diminishing sensitivity to already high values. We compare our approach to the existing baseline methods that linearly aggregate rewards and show how the Inada-inspired reward feedback is superior to traditional weighted averaging. We quantitatively and qualitatively analyse the difference in the methods, and see that models trained with Inada-transformations score as more helpful while being less harmful.

Roberto-Rafael Maura-Rivero, Chirag Nagpal, Roma Patel, Francesco Visin1/14/2025

arXiv:2501.06244v1 Announce Type: new Abstract: With the growing demand for Earth observation, it is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements. The Space Computing Power Network (Space-CPN) offers a promising solution by providing onboard computing and extensive coverage capabilities for real-time inference. This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations to achieve real-time inference performance. The framework employs the microservice architecture, decomposing monolithic inference tasks into reusable, independent modules to address high latency and resource heterogeneity. This distributed approach enables optimized microservice deployment, minimizing resource utilization while meeting quality of service and functional requirements. We introduce Robust Optimization to the deployment problem to address data uncertainty. Additionally, we model the Robust Optimization problem as a Partially Observable Markov Decision Process and propose a robust reinforcement learning algorithm to handle the semi-infinite Quality of Service constraints. Our approach yields sub-optimal solutions that minimize accuracy loss while maintaining acceptable computational costs. Simulation results demonstrate the effectiveness of our framework.

Zhiyong Yu, Yuning Jiang, Xin Liu, Yuanming Shi, Chunxiao Jiang, Linling Kuang1/14/2025

arXiv:2501.06210v1 Announce Type: new Abstract: This study explores using Natural Language Processing in aviation safety, focusing on machine learning algorithms to enhance safety measures. There are currently May 2024, 34 Scopus results from the keyword search natural language processing and aviation safety. Analyzing these studies allows us to uncover trends in the methodologies, findings and implications of NLP in aviation. Both qualitative and quantitative tools have been used to investigate the current state of literature on NLP for aviation safety. The qualitative analysis summarises the research motivations, objectives, and outcomes, showing how NLP can be utilized to help identify critical safety issues and improve aviation safety. This study also identifies research gaps and suggests areas for future exploration, providing practical recommendations for the aviation industry. We discuss challenges in implementing NLP in aviation safety, such as the need for large, annotated datasets, and the difficulty in interpreting complex models. We propose solutions like active learning for data annotation and explainable AI for model interpretation. Case studies demonstrate the successful application of NLP in improving aviation safety, highlighting its potential to make aviation safer and more efficient.

Aziida Nanyonga, Keith Joiner, Ugur Turhan, Graham Wild1/14/2025

arXiv:2501.06215v1 Announce Type: new Abstract: This paper is the first-place solution for ICASSP MEIJU@2025 Track I, which focuses on low-resource multimodal emotion and intention recognition. How to effectively utilize a large amount of unlabeled data, while ensuring the mutual promotion of different difficulty levels tasks in the interaction stage, these two points become the key to the competition. In this paper, pseudo-label labeling is carried out on the model trained with labeled data, and samples with high confidence and their labels are selected to alleviate the problem of low resources. At the same time, the characteristic of easy represented ability of intention recognition found in the experiment is used to make mutually promote with emotion recognition under different attention heads, and higher performance of intention recognition is achieved through fusion. Finally, under the refined processing data, we achieve the score of 0.5532 in the Test set, and win the championship of the track.

Xinger Li, Zhiqiang Zhong, Bo Huang, Yang Yang1/14/2025

arXiv:2501.06219v1 Announce Type: new Abstract: The rodent vibrissal system is pivotal in advancing neuroscience research, particularly for studies of cortical plasticity, learning, decision-making, sensory encoding, and sensorimotor integration. Despite the advantages, curating touch events is labor intensive and often requires >3 hours per million video frames, even after leveraging automated tools like the Janelia Whisker Tracker. We address this limitation by introducing Whisker Automatic Contact Classifier (WhACC), a python package designed to identify touch periods from high-speed videos of head-fixed behaving rodents with human-level performance. WhACC leverages ResNet50V2 for feature extraction, combined with LightGBM for Classification. Performance is assessed against three expert human curators on over one million frames. Pairwise touch classification agreement on 99.5% of video frames, equal to between-human agreement. Finally, we offer a custom retraining interface to allow model customization on a small subset of data, which was validated on four million frames across 16 single-unit electrophysiology recordings. Including this retraining step, we reduce human hours required to curate a 100 million frame dataset from ~333 hours to ~6 hours.

Phillip Maire, Samson G. King, Jonathan Andrew Cheung, Stefanie Walker, Samuel Andrew Hires1/14/2025

arXiv:2501.06220v1 Announce Type: new Abstract: Vision Transformers (ViTs) have demonstrated remarkable success on large-scale datasets, but their performance on smaller datasets often falls short of convolutional neural networks (CNNs). This paper explores the design and optimization of Tiny ViTs for small datasets, using CIFAR-10 as a benchmark. We systematically evaluate the impact of data augmentation, patch token initialization, low-rank compression, and multi-class token strategies on model performance. Our experiments reveal that low-rank compression of queries in Multi-Head Latent Attention (MLA) incurs minimal performance loss, indicating redundancy in ViTs. Additionally, introducing multiple CLS tokens improves global representation capacity, boosting accuracy. These findings provide a comprehensive framework for optimizing Tiny ViTs, offering practical insights for efficient and effective designs. Code is available at https://github.com/erow/PoorViTs.

Gent Wu1/14/2025

arXiv:2501.06221v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have emerged as transformative tools for modeling complex relational data, offering unprecedented capabilities in tasks like forecasting and optimization. This study investigates the application of GNNs to demand forecasting within supply chain networks using the SupplyGraph dataset, a benchmark for graph-based supply chain analysis. By leveraging advanced GNN methodologies, we enhance the accuracy of forecasting models, uncover latent dependencies, and address temporal complexities inherent in supply chain operations. Comparative analyses demonstrate that GNN-based models significantly outperform traditional approaches, including Multilayer Perceptrons (MLPs) and Graph Convolutional Networks (GCNs), particularly in single-node demand forecasting tasks. The integration of graph representation learning with temporal data highlights GNNs' potential to revolutionize predictive capabilities for inventory management, production scheduling, and logistics optimization. This work underscores the pivotal role of forecasting in supply chain management and provides a robust framework for advancing research and applications in this domain.

Chi-Sheng Chen, Ying-Jung Chen1/14/2025

arXiv:2501.06222v1 Announce Type: new Abstract: Acknowledging the effects of outdoor air pollution, the literature inadequately addresses indoor air pollution's impacts. Despite daily health risks, existing research primarily focused on monitoring, lacking accuracy in pinpointing indoor pollution sources. In our research work, we thoroughly investigated the influence of indoor activities on pollution levels. A survey of 143 participants revealed limited awareness of indoor air pollution. Leveraging 65 days of diverse data encompassing activities like incense stick usage, indoor smoking, inadequately ventilated cooking, excessive AC usage, and accidental paper burning, we developed a comprehensive monitoring system. We identify pollutant sources and effects with high precision through clustering analysis and interpretability models (LIME and SHAP). Our method integrates Decision Trees, Random Forest, Naive Bayes, and SVM models, excelling at 99.8% accuracy with Decision Trees. Continuous 24-hour data allows personalized assessments for targeted pollution reduction strategies, achieving 91% accuracy in predicting activities and pollution exposure.

Pritisha Sarkar, Kushalava reddy Jala, Mousumi Saha1/14/2025

arXiv:2501.06225v1 Announce Type: new Abstract: Medical images are characterized by intricate and complex features, requiring interpretation by physicians with medical knowledge and experience. Classical neural networks can reduce the workload of physicians, but can only handle these complex features to a limited extent. Theoretically, quantum computing can explore a broader parameter space with fewer parameters, but it is currently limited by the constraints of quantum hardware.Considering these factors, we propose a distributed hybrid quantum convolutional neural network based on quantum circuit splitting. This model leverages the advantages of quantum computing to effectively capture the complex features of medical images, enabling efficient classification even in resource-constrained environments. Our model employs a quantum convolutional neural network (QCNN) to extract high-dimensional features from medical images, thereby enhancing the model's expressive capability.By integrating distributed techniques based on quantum circuit splitting, the 8-qubit QCNN can be reconstructed using only 5 qubits.Experimental results demonstrate that our model achieves strong performance across 3 datasets for both binary and multiclass classification tasks. Furthermore, compared to recent technologies, our model achieves superior performance with fewer parameters, and experimental results validate the effectiveness of our model.

Yangyang Li, Zhengya Qia, Yuelin Lia, Haorui Yanga, Ronghua Shanga, Licheng Jiaoa1/14/2025

arXiv:2501.06226v1 Announce Type: new Abstract: Machine learning (ML) has become crucial in modern life, with growing interest from researchers and the public. Despite its potential, a significant entry barrier prevents widespread adoption, making it challenging for non-experts to understand and implement ML techniques. The increasing desire to leverage ML is counterbalanced by its technical complexity, creating a gap between potential and practical application. This work introduces asanAI, an offline-first, open-source, no-code machine learning toolkit designed for users of all skill levels. It allows individuals to design, debug, train, and test ML models directly in a web browser, eliminating the need for software installations and coding. The toolkit runs on any device with a modern web browser, including smartphones, and ensures user privacy through local computations while utilizing WebGL for enhanced GPU performance. Users can quickly experiment with neural networks and train custom models using various data sources, supported by intuitive visualizations of network structures and data flows. asanAI simplifies the teaching of ML concepts in educational settings and is released under an open-source MIT license, encouraging modifications. It also supports exporting models in industry-ready formats, empowering a diverse range of users to effectively learn and apply machine learning in their projects. The proposed toolkit is successfully utilized by researchers of ScaDS.AI to swiftly draft and test machine learning ideas, by trainers to effectively educate enthusiasts, and by teachers to introduce contemporary ML topics in classrooms with minimal effort and high clarity.

Norman Koch, Siavash Ghiasvand1/14/2025

arXiv:2501.06227v1 Announce Type: new Abstract: This paper reviews the state-of-the-art in deepfake generation and detection, focusing on modern deep learning technologies and tools based on the latest scientific advancements. The rise of deepfakes, leveraging techniques like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion models and other generative models, presents significant threats to privacy, security, and democracy. This fake media can deceive individuals, discredit real people and organizations, facilitate blackmail, and even threaten the integrity of legal, political, and social systems. Therefore, finding appropriate solutions to counter the potential threats posed by this technology is essential. We explore various deepfake methods, including face swapping, voice conversion, reenactment and lip synchronization, highlighting their applications in both benign and malicious contexts. The review critically examines the ongoing "arms race" between deepfake generation and detection, analyzing the challenges in identifying manipulated contents. By examining current methods and highlighting future research directions, this paper contributes to a crucial understanding of this rapidly evolving field and the urgent need for robust detection strategies to counter the misuse of this powerful technology. While focusing primarily on audio, image, and video domains, this study allows the reader to easily grasp the latest advancements in deepfake generation and detection.

Arash Dehghani, Hossein Saberi1/14/2025

arXiv:2501.06232v1 Announce Type: new Abstract: Predicting the lateral pile response is challenging due to the complexity of pile-soil interactions. Machine learning (ML) techniques have gained considerable attention for their effectiveness in non-linear analysis and prediction. This study develops an interpretable ML-based model for predicting p-y curves of monopile foundations. An XGBoost model was trained using a database compiled from existing research. The results demonstrate that the model achieves superior predictive accuracy. Shapley Additive Explanations (SHAP) was employed to enhance interpretability. The SHAP value distributions for each variable demonstrate strong alignment with established theoretical knowledge on factors affecting the lateral response of pile foundations.

Biao Li, Qing-Kai Song, Wen-Gang Qi, Fu-Ping Gao1/14/2025

arXiv:2501.06233v1 Announce Type: new Abstract: Metastructured auxetic patches, characterized by negative Poisson's ratios, offer unique mechanical properties that closely resemble the behavior of human tissues and organs. As a result, these patches have gained significant attention for their potential applications in organ repair and tissue regeneration. This study focuses on neural networks-based computational modeling of auxetic patches with a sinusoidal metastructure fabricated from silk fibroin, a bio-inspired material known for its biocompatibility and strength. The primary objective of this research is to introduce a novel, data-driven framework for patch design. To achieve this, we conducted experimental fabrication and mechanical testing to determine material properties and validate the corresponding finite element models. Finite element simulations were then employed to generate the necessary data, while greedy sampling, an active learning technique, was utilized to reduce the computational cost associated with data labeling. Two neural networks were trained to accurately predict Poisson's ratios and stresses for strains up to 15\%, respectively. Both models achieved $R^2$ scores exceeding 0.995, which indicates highly reliable predictions. Building on this, we developed a neural network-based design model capable of tailoring patch designs to achieve specific mechanical properties. This model demonstrated superior performance when compared to traditional optimization methods, such as genetic algorithms, by providing more efficient and precise design solutions. The proposed framework represents a significant advancement in the design of bio-inspired metastructures for medical applications, paving the way for future innovations in tissue engineering and regenerative medicine.

Yingbin Chen, Milad Arzani, Xuan Mu, Sophia Jin, Shaoping Xiao1/14/2025

arXiv:2501.06236v1 Announce Type: new Abstract: Modeling radio propagation is essential for wireless network design and performance optimization. Traditional methods rely on physics models of radio propagation, which can be inaccurate or inflexible. In this work, we propose using graph neural networks to learn radio propagation behaviors directly from real-world network data. Our approach converts the radio propagation environment into a graph representation, with nodes corresponding to locations and edges representing spatial and ray-tracing relationships between locations. The graph is generated by converting images of the environment into a graph structure, with specific relationships between nodes. The model is trained on this graph representation, using sensor measurements as target data. We demonstrate that the graph neural network, which learns to predict radio propagation directly from data, achieves competitive performance compared to traditional heuristic models. This data-driven approach outperforms classic numerical solvers in terms of both speed and accuracy. To the best of our knowledge, we are the first to apply graph neural networks to real-world radio propagation data to generate coverage maps, enabling generative models of signal propagation with point measurements only.

Adrien Bufort, Laurent Lebocq, Stefan Cathabard1/14/2025

arXiv:2501.06237v1 Announce Type: new Abstract: In the evolving landscape of data privacy, the anonymization of electric load profiles has become a critical issue, especially with the enforcement of the General Data Protection Regulation (GDPR) in Europe. These electric load profiles, which are essential datasets in the energy industry, are classified as personal behavioral data, necessitating stringent protective measures. This article explores the implications of this classification, the importance of data anonymization, and the potential of forecasting using microaggregated data. The findings underscore that effective anonymization techniques, such as microaggregation, do not compromise the performance of forecasting models under certain conditions (i.e., forecasting aggregated). In such an aggregated level, microaggregated data maintains high levels of utility, with minimal impact on forecasting accuracy. The implications for the energy sector are profound, suggesting that privacy-preserving data practices can be integrated into smart metering technology applications without hindering their effectiveness.

Joaquin Delgado Fernandez, Sergio Potenciano Menci, Alessio Magitteri1/14/2025

arXiv:2501.06238v1 Announce Type: new Abstract: Feature level sets (FLS) have shown significant potential in the analysis of multi-field data by using traits defined in attribute space to specify features in the domain. In this work, we address key challenges in the practical use of FLS: trait design and feature selection for rendering. To simplify trait design, we propose a Cartesian decomposition of traits into simpler components, making the process more intuitive and computationally efficient. Additionally, we utilize dictionary learning results to automatically suggest point traits. To enhance feature selection, we introduce trait-induced merge trees (TIMTs), a generalization of merge trees for feature level sets, aimed at topologically analyzing tensor fields or general multi-variate data. The leaves in the TIMT represent areas in the input data that are closest to the defined trait, thereby most closely resembling the defined feature. This merge tree provides a hierarchy of features, enabling the querying of the most relevant and persistent features. Our method includes various query techniques for the tree, allowing the highlighting of different aspects. We demonstrate the cross-application capabilities of this approach through five case studies from different domains.

Danhua Lei, Jochen Jankowai, Petar Hristov, Hamish Carr, Leif Denby, Talha Bin Masood, Ingrid Hotz1/14/2025

arXiv:2501.06240v1 Announce Type: new Abstract: Capsule networks(CapsNet) are recently proposed neural network models with new processing layers, specifically for entity representation and discovery of images. It is well known that CapsNet have some advantages over traditional neural networks, especially in generalization capability. At the same time, some studies report negative experimental results. The causes of this contradiction have not been thoroughly analyzed. The preliminary experimental results show that the behavior of routing algorithms does not always produce good results as expected, and in most cases, different routing algorithms do not change the classification results, but simply polarize the link strength, especially when they continue to repeat without stopping. To realize the true potential of the CapsNet, deep mathematical analysis of the routing algorithms is crucial. In this paper, we will give the objective function that is minimized by the dynamic routing algorithm, which is a concave function. The dynamic routing algorithm can be regarded as nonlinear gradient method to solving an optimization algorithm under linear constraints, and its convergence can be strictly proved mathematically. Furthermore, the mathematically rigorous proof of the convergence is given for this class of iterative routing procedures. We analyze the relation between the objective function and the constraints solved by the dynamic routing algorithm in detail, and perform the corresponding routing experiment to analyze the effect of our convergence proof.

Daoyuan Ye, Juntao Li, Yiting Shen1/14/2025

arXiv:2501.06241v1 Announce Type: new Abstract: This study investigates the efficacy of machine learning models for predicting house rental prices in Ghana, addressing the need for accurate and accessible housing market information. Utilising a comprehensive dataset of rental listings, we trained and evaluated various models, including CatBoost, XGBoost, and Random Forest. CatBoost emerged as the best-performing model, achieving an $R^2$ of 0.876, demonstrating its ability to effectively capture complex relationships within the housing market. Feature importance analysis revealed that location-based features, number of bedrooms, bathrooms, and furnishing status are key drivers of rental prices. Our findings provide valuable insights for stakeholders, including real estate professionals, investors, and policymakers, while also highlighting opportunities for future research, such as incorporating temporal data and exploring regional variations.

Philip Adzanoukpe1/14/2025

arXiv:2501.06252v1 Announce Type: new Abstract: Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce \implname, a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, \implname employs a two-pass mechanism: first, a dispatch system identifies the task properties, and then task-specific "expert" vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. Our method outperforms ubiquitous approaches such as LoRA, with fewer parameters and greater efficiency. \implname demonstrates versatility across different LLM architectures and modalities, including vision-language tasks. \implname represents a significant leap forward, offering a scalable, efficient solution for enhancing the adaptability and task-specific performance of LLMs, paving the way for truly dynamic, self-organizing AI systems.

Qi Sun, Edoardo Cetin, Yujin Tang1/14/2025