Arxiv
arXiv:2501.01449v1 Announce Type: new Abstract: Human motion synthesis conditioned on textual input has gained significant attention in recent years due to its potential applications in various domains such as gaming, film production, and virtual reality. Conditioned Motion synthesis takes a text input and outputs a 3D motion corresponding to the text. While previous works have explored motion synthesis using raw motion data and latent space representations with diffusion models, these approaches often suffer from high training and inference times. In this paper, we introduce a novel framework that utilizes Generative Adversarial Networks (GANs) in the latent space to enable faster training and inference while achieving results comparable to those of the state-of-the-art diffusion methods. We perform experiments on the HumanML3D, HumanAct12 benchmarks and demonstrate that a remarkably simple GAN in the latent space achieves a FID of 0.482 with more than 91% in FLOPs reduction compared to latent diffusion model. Our work opens up new possibilities for efficient and high-quality motion synthesis using latent space GANs.
arXiv:2501.01470v1 Announce Type: new Abstract: To address the modality imbalance caused by data heterogeneity, existing multi-modal learning (MML) approaches primarily focus on balancing this difference from the perspective of optimization objectives. However, almost all existing methods ignore the impact of sample sequences, i.e., an inappropriate training order tends to trigger learning bias in the model, further exacerbating modality imbalance. In this paper, we propose Balance-aware Sequence Sampling (BSS) to enhance the robustness of MML. Specifically, we first define a multi-perspective measurer to evaluate the balance degree of each sample. Via the evaluation, we employ a heuristic scheduler based on curriculum learning (CL) that incrementally provides training subsets, progressing from balanced to imbalanced samples to rebalance MML. Moreover, considering that sample balance may evolve as the model capability increases, we propose a learning-based probabilistic sampling method to dynamically update the training sequences at the epoch level, further improving MML performance. Extensive experiments on widely used datasets demonstrate the superiority of our method compared with state-of-the-art (SOTA) MML approaches.
arXiv:2501.01438v1 Announce Type: new Abstract: DC motors have been widely used in many industrial applications, from small jointed robots with multiple degrees of freedom to household appliances and transportation vehicles such as electric cars and trains. The main function of these motors is to ensure stable positioning performance and speed for mechanical systems based on pre-designed control methods. However, achieving optimal speed performance for servo motors faces many challenges due to the impact of internal and external loads, which affect output stability. To optimize the speed performance of DC Servo motors, a control method combining PID controllers and artificial neural networks has been proposed. Traditional PID controllers have the advantage of a simple structure and effective control capability in many systems, but they face difficulties when dealing with nonlinear and uncertain changes. The neural network is integrated to adjust the PID parameters in real time, helping the system adapt to different operating conditions. Simulation and experimental results have demonstrated that the proposed method significantly improves the speed tracking capability and stability of the motor while ensuring quick response, zero steady-state error, and eliminating overshoot. This method offers high potential for application in servo motor control systems requiring high precision and performance.
arXiv:2501.01445v1 Announce Type: new Abstract: We establish optimal error bounds on an exponential wave integrator (EWI) for the space fractional nonlinear Schr\"{o}dinger equation (SFNLSE) with low regularity potential and/or nonlinearity. For the semi-discretization in time, under the assumption of $L^\infty$-potential, $C^1$-nonlinearity, and $H^\alpha$-solution with $1<\alpha \leq 2$ being the fractional index of $(-\Delta)^\frac{\alpha}{2}$, we prove an optimal first-order $L^2$-norm error bound $O(\tau)$ and a uniform $H^\alpha$-norm bound of the semi-discrete numerical solution, where $\tau$ is the time step size. We further discretize the EWI in space by the Fourier spectral method and obtain an optimal error bound without introducing any CFL-type time step size restrictions. In particular, the spatial convergence is optimal with respect to the regularity of the exact solution. Moreover, under slightly stronger regularity assumptions, we obtain optimal error bounds in $H^\frac{\alpha}{2}$-norm, which is the norm associated to the energy. Extensive numerical examples are provided to validate the optimal error bounds and show their sharpness. We also find distinct evolving patterns between the SFNLSE and the classical nonlinear Schr\"{o}dinger equation.
arXiv:2501.01457v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit great potential in complex multi-step reasoning through inference-time thinking but still struggle with deciding when to stop thinking due to limited self-awareness about their knowledge boundaries. While human preference alignment has shown extraordinary opportunities, expensive labeling challenges adherence to scaling law. Language model self-critique, as an alternative to using human-labeled reasoning data, is questioned with its inherited biases. This work addresses these challenges by distilling the LLM's own reasoning processes into synthetic behavioral data, eliminating the need for manual labeling of intermediate steps. Building on this concept, we propose Distillation-Reinforcement-Reasoning (DRR), a three-step framework that leverages the LLM's inherent behaviors as external feedback by first generating behavioral data using the Reasoner (LLM) to reflect its reasoning capabilities, then training a lightweight discriminative reward model (DM) on behavioral data, and finally deploying the DM at inference time to assist the Reasoner's decision-making. Experiments on multiple benchmarks show that the DRR framework outperforms self-critique approaches without relying on additional complex data annotation. Benefiting from lightweight design, ease of replication, and adaptability, DRR is applicable to a wide range of LLM-centric tasks.
arXiv:2501.01463v1 Announce Type: new Abstract: Goal Recognition aims to infer an agent's goal from a sequence of observations. Existing approaches often rely on manually engineered domains and discrete representations. Deep Recognition using Actor-Critic Optimization (DRACO) is a novel approach based on deep reinforcement learning that overcomes these limitations by providing two key contributions. First, it is the first goal recognition algorithm that learns a set of policy networks from unstructured data and uses them for inference. Second, DRACO introduces new metrics for assessing goal hypotheses through continuous policy representations. DRACO achieves state-of-the-art performance for goal recognition in discrete settings while not using the structured inputs used by existing approaches. Moreover, it outperforms these approaches in more challenging, continuous settings at substantially reduced costs in both computing and memory. Together, these results showcase the robustness of the new algorithm, bridging traditional goal recognition and deep reinforcement learning.
arXiv:2501.01431v1 Announce Type: new Abstract: Reaping the benefits of multi-antenna communication systems in frequency division duplex (FDD) requires channel state information (CSI) reporting from mobile users to the base station (BS). Over the last decades, the amount of CSI to be collected has become very challenging owing to the dramatic increase of the number of antennas at BSs. To mitigate the overhead associated with CSI reporting, compressed CSI techniques have been proposed with the idea of recovering the original CSI at the BS from its compressed version sent by the mobile users. Channel charting is an unsupervised dimensionality reduction method that consists in building a radio-environment map from CSIs. Such a method can be considered in the context of the CSI compression problem, since a chart location is, by definition, a low-dimensional representation of the CSI. In this paper, the performance of channel charting for a task-based CSI compression application is studied. A comparison of the proposed method against baselines on realistic synthetic data is proposed, showing promising results.
arXiv:2501.01435v1 Announce Type: new Abstract: General Purpose AI - such as Large Language Models (LLMs) - have seen rapid deployment in a wide range of use cases. Most surprisingly, they have have made their way from plain language models, to chat-bots, all the way to an almost ``operating system''-like status that can control decisions and logic of an application. Tool-use, Microsoft co-pilot/office integration, and OpenAIs Altera are just a few examples of increased autonomy, data access, and execution capabilities. These methods come with a range of cybersecurity challenges. We highlight some of the work we have done in terms of evaluation as well as outline future opportunities and challenges.
arXiv:2501.01441v1 Announce Type: new Abstract: Representation bias is one of the most common types of biases in artificial intelligence (AI) systems, causing AI models to perform poorly on underrepresented data segments. Although AI practitioners use various methods to reduce representation bias, their effectiveness is often constrained by insufficient domain knowledge in the debiasing process. To address this gap, this paper introduces a set of generic design guidelines for effectively involving domain experts in representation debiasing. We instantiated our proposed guidelines in a healthcare-focused application and evaluated them through a comprehensive mixed-methods user study with 35 healthcare experts. Our findings show that involving domain experts can reduce representation bias without compromising model accuracy. Based on our findings, we also offer recommendations for developers to build robust debiasing systems guided by our generic design guidelines, ensuring more effective inclusion of domain experts in the debiasing process.
arXiv:2501.01443v1 Announce Type: new Abstract: This MS thesis outlines my contributions to the closed loop control and system integration of two robotic platforms: 1) Aerobat, a flapping wing robot stabilized by air jets, and 2) Harpy, a bipedal robot equipped with dual thrusters. Both systems share a common theme of the integration of posture manipulation and thrust vectoring to achieve stability and controlled movement. For Aerobat, I developed the software and control architecture that enabled its first untethered flights. The control system combines flapping wing dynamics with multiple air jet stabilization to maintain roll, pitch and yaw stability. These results were published in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). For Harpy, I implemented a closed-loop control framework that incorporates active thruster assisted frontal dynamics stabilization . My work led to preliminary untethered dynamic walking. This approach demonstrates how thrust assisted stability can enhance locomotion in legged robots which has not been explored before.
arXiv:2501.01451v1 Announce Type: new Abstract: Recently, there is an increasing interest in using artificial intelligence (AI) to automate aspects of the research process, or even autonomously conduct the full research cycle from idea generation, over data analysis, to composing and evaluation of scientific manuscripts. Examples of working AI scientist systems have been demonstrated for computer science tasks and running molecular biology labs. While some approaches aim for full autonomy of the scientific AI, others rather aim for leveraging human-AI teaming. Here, we address how to adapt such approaches for boosting Brain-Computer Interface (BCI) development, as well as brain research resp. neuroscience at large. We argue that at this time, a strong emphasis on human-AI teaming, in contrast to fully autonomous AI BCI researcher will be the most promising way forward. We introduce the collaborative workspaces concept for human-AI teaming based on a set of Janusian design principles, looking both ways, to the human as well as to the AI side. Based on these principles, we present ChatBCI, a Python-based toolbox for enabling human-AI collaboration based on interaction with Large Language Models (LLMs), designed for BCI research and development projects. We show how ChatBCI was successfully used in a concrete BCI project on advancing motor imagery decoding from EEG signals. Our approach can be straightforwardly extended to broad neurotechnological and neuroscientific topics, and may by design facilitate human expert knowledge transfer to scientific AI systems in general.
arXiv:2501.01453v1 Announce Type: new Abstract: Rapid yet accurate simulations of fluid dynamics around complex geometries is critical in a variety of engineering and scientific applications, including aerodynamics and biomedical flows. However, while scientific machine learning (SciML) has shown promise, most studies are constrained to simple geometries, leaving complex, real-world scenarios underexplored. This study addresses this gap by benchmarking diverse SciML models, including neural operators and vision transformer-based foundation models, for fluid flow prediction over intricate geometries. Using a high-fidelity dataset of steady-state flows across various geometries, we evaluate the impact of geometric representations -- Signed Distance Fields (SDF) and binary masks -- on model accuracy, scalability, and generalization. Central to this effort is the introduction of a novel, unified scoring framework that integrates metrics for global accuracy, boundary layer fidelity, and physical consistency to enable a robust, comparative evaluation of model performance. Our findings demonstrate that foundation models significantly outperform neural operators, particularly in data-limited scenarios, and that SDF representations yield superior results with sufficient training data. Despite these advancements, all models struggle with out-of-distribution generalization, highlighting a critical challenge for future SciML applications. By advancing both evaluation methodologies and modeling capabilities, this work paves the way for robust and scalable ML solutions for fluid dynamics across complex geometries.
arXiv:2501.01458v1 Announce Type: new Abstract: Identifying druggable genes is essential for developing effective pharmaceuticals. With the availability of extensive, high-quality data, computational methods have become a significant asset. Protein Interaction Network (PIN) is valuable but challenging to implement due to its high dimensionality and sparsity. Previous methods relied on indirect integration, leading to resolution loss. This study proposes GAN-TAT, a framework utilizing an advanced graph embedding technology, ImGAGN, to directly integrate PIN for druggable gene inference work. Tested on three Pharos datasets, GAN-TAT achieved the highest AUC-ROC score of 0.951 on Tclin. Further evaluation shows that GAN-TAT's predictions are supported by clinical evidence, highlighting its potential practical applications in pharmacogenomics. This research represents a methodological attempt with the direct utilization of PIN, expanding potential new solutions for developing drug targets. The source code of GAN-TAT is available at (https://github.com/george-yuanji-wang/GAN-TAT).
arXiv:2501.01462v1 Announce Type: new Abstract: Host-response-based diagnostics can improve the accuracy of diagnosing bacterial and viral infections, thereby reducing inappropriate antibiotic prescriptions. However, the existing cohorts with limited sample size and coarse infections types are unable to support the exploration of an accurate and generalizable diagnostic model. Here, we curate the largest infection host-response transcriptome data, including 11,247 samples across 89 blood transcriptome datasets from 13 countries and 21 platforms. We build a diagnostic model for pathogen prediction starting from a pan-infection model as foundation (AUC = 0.97) based on the pan-infection dataset. Then, we utilize knowledge distillation to efficiently transfer the insights from this "teacher" model to four lightweight pathogen "student" models, i.e., staphylococcal infection (AUC = 0.99), streptococcal infection (AUC = 0.94), HIV infection (AUC = 0.93), and RSV infection (AUC = 0.94), as well as a sepsis "student" model (AUC = 0.99). The proposed knowledge distillation framework not only facilitates the diagnosis of pathogens using pan-infection data, but also enables an across-disease study from pan-infection to sepsis. Moreover, the framework enables high-degree lightweight design of diagnostic models, which is expected to be adaptively deployed in clinical settings.
arXiv:2501.01429v1 Announce Type: new Abstract: Sequential recommendation refers to recommending the next item of interest for a specific user based on his/her historical behavior sequence up to a certain time. While previous research has extensively examined Markov chain-based sequential recommendation models, the majority of these studies has focused on the user's historical behavior sequence but has paid little attention to the overall correlation between items. This study introduces a sequential recommendation algorithm known as Item Association Factorization Mixed Markov Chains, which incorporates association information between items using an item association graph, integrating it with user behavior sequence information. Our experimental findings from the four public datasets demonstrate that the newly introduced algorithm significantly enhances the recommendation ranking results without substantially increasing the parameter count. Additionally, research on tuning the prior balancing parameters underscores the significance of incorporating item association information across different datasets.
arXiv:2501.01430v1 Announce Type: new Abstract: Developing excavation autonomy is challenging given the environments where excavators operate, the complexity of physical interaction and the degrees of freedom of operation of the excavator itself. Simulation is a useful tool to build parts of the autonomy without the complexity of experimentation. Traditional excavator simulators are geared towards high fidelity interactions between the joints or between the terrain but do not incorporate other challenges such as perception required for end to end autonomy. A complete simulator should be capable of supporting real time operation while providing high fidelity simulation of the excavator(s), the environment, and their interaction. In this paper we present TERA (Terrain Excavation Robot Autonomy), a simulator geared towards autonomous excavator applications based on Unity3D and AGX that provides the extensibility and scalability required to study full autonomy. It provides the ability to configure the excavator and the environment per the user requirements. We also demonstrate realistic dynamics by incorporating a time-varying model that introduces variations in the system's responses. The simulator is then evaluated with different scenarios such as track deformation, velocities on different terrains, similarity of the system with the real excavator and the overall path error to show the capabilities of the simulation.
arXiv:2501.01432v1 Announce Type: new Abstract: Control systems are critical to modern technological infrastructure, spanning industries from aerospace to healthcare. This survey explores the landscape of safe robot learning, investigating methods that balance high-performance control with rigorous safety constraints. By examining classical control techniques, learning-based approaches, and embedded system design, the research seeks to understand how robotic systems can be developed to prevent hazardous states while maintaining optimal performance across complex operational environments.
arXiv:2501.01433v1 Announce Type: new Abstract: While logic puzzles have engaged individuals through problem-solving and critical thinking, the creation of new puzzle rules has largely relied on ad-hoc processes. Pencil puzzles, such as Slitherlink and Sudoku, represent a prominent subset of these games, celebrated for their intellectual challenges rooted in combinatorial logic and spatial reasoning. Despite extensive research into solving techniques and automated problem generation, a unified framework for systematic and scalable rule design has been lacking. Here, we introduce a mathematical framework for defining and systematizing pencil puzzle rules. This framework formalizes grid elements, their positional relationships, and iterative composition operations, allowing for the incremental construction of structures that form the basis of puzzle rules. Furthermore, we establish a formal method to describe constraints and domains for each structure, ensuring solvability and coherence. Applying this framework, we successfully formalized the rules of well-known Nikoli puzzles, including Slitherlink and Sudoku, demonstrating the formal representation of a significant portion (approximately one-fourth) of existing puzzles. These results validate the potential of the framework to systematize and innovate puzzle rule design, establishing a pathway to automated rule generation. By providing a mathematical foundation for puzzle rule creation, this framework opens avenues for computers, potentially enhanced by AI, to design novel puzzle rules tailored to player preferences, expanding the scope of puzzle diversity. Beyond its direct application to pencil puzzles, this work illustrates how mathematical frameworks can bridge recreational mathematics and algorithmic design, offering tools for broader exploration in logic-based systems, with potential applications in educational game design, personalized learning, and computational creativity.
arXiv:2501.01439v1 Announce Type: new Abstract: Advanced Air Mobility (AAM) is a growing field that demands accurate modeling of legal concepts and restrictions in navigating intelligent vehicles. In addition, any implementation of AAM needs to face the challenges posed by inherently dynamic and uncertain human-inhabited spaces robustly. Nevertheless, the employment of Unmanned Aircraft Systems (UAS) beyond visual line of sight (BVLOS) is an endearing task that promises to enhance significantly today's logistics and emergency response capabilities. To tackle these challenges, we present a probabilistic and neuro-symbolic architecture to encode legal frameworks and expert knowledge over uncertain spatial relations and noisy perception in an interpretable and adaptable fashion. More specifically, we demonstrate Probabilistic Mission Design (ProMis), a system architecture that links geospatial and sensory data with declarative, Hybrid Probabilistic Logic Programs (HPLP) to reason over the agent's state space and its legality. As a result, ProMis generates Probabilistic Mission Landscapes (PML), which quantify the agent's belief that a set of mission conditions is satisfied across its navigation space. Extending prior work on ProMis' reasoning capabilities and computational characteristics, we show its integration with potent machine learning models such as Large Language Models (LLM) and Transformer-based vision models. Hence, our experiments underpin the application of ProMis with multi-modal input data and how our method applies to many important AAM scenarios.
arXiv:2501.01471v1 Announce Type: new Abstract: This paper investigates physiological markers of Social Anxiety Disorder (SAD) by examining the relationship between Electrocardiogram (ECG) measurements and speech, a known anxiety-inducing activity. Specifically, we analyze changes in heart rate variability (HRV) and heart rate (HR) during four distinct phases: baseline, anticipation, speech activity, and reflection. Our study, involving 51 participants (31 with SAD and 20 without), found that HRV decreased and HR increased during the anticipation and speech activity phases compared to baseline. In contrast, during the reflection phase, HRV increased and HR decreased. Additionally, participants with SAD exhibited lower HRV, higher HR, and reported greater self-perceived anxiety compared to those without SAD. These findings have implications for developing wearable technology to monitor SAD. We also provide our dataset, which captures anxiety across multiple stages, to support further research in this area.