cs.HC
137 postsarXiv:2501.00597v1 Announce Type: new Abstract: Eye movement prediction is a promising area of research to compensate for the latency introduced by eye-tracking systems in virtual reality devices. In this study, we comprehensively analyze the complexity of the eye movement prediction task associated with subjects. We use three fundamentally different models within the analysis: the lightweight Long Short-Term Memory network (LSTM), the transformer-based network for multivariate time series representation learning (TST), and the Oculomotor Plant Mathematical Model wrapped in the Kalman Filter framework (OPKF). Each solution is assessed following a sample-to-event evaluation strategy and employing the new event-to-subject metrics. Our results show that the different models maintained similar prediction performance trends pertaining to subjects. We refer to these outcomes as per-subject complexity since some subjects' data pose a more significant challenge for models. Along with the detailed correlation analysis, this report investigates the source of the per-subject complexity and discusses potential solutions to overcome it.
arXiv:2501.00935v1 Announce Type: new Abstract: Dynamic gesture recognition is one of the challenging research areas due to variations in pose, size, and shape of the signer's hand. In this letter, Multiscaled Multi-Head Attention Video Transformer Network (MsMHA-VTN) for dynamic hand gesture recognition is proposed. A pyramidal hierarchy of multiscale features is extracted using the transformer multiscaled head attention model. The proposed model employs different attention dimensions for each head of the transformer which enables it to provide attention at the multiscale level. Further, in addition to single modality, recognition performance using multiple modalities is examined. Extensive experiments demonstrate the superior performance of the proposed MsMHA-VTN with an overall accuracy of 88.22\% and 99.10\% on NVGesture and Briareo datasets, respectively.
arXiv:2501.00190v1 Announce Type: new Abstract: Sepsis is an organ dysfunction caused by a deregulated immune response to an infection. Early sepsis prediction and identification allow for timely intervention, leading to improved clinical outcomes. Clinical calculators (e.g., the six-organ dysfunction assessment of SOFA) play a vital role in sepsis identification within clinicians' workflow, providing evidence-based risk assessments essential for sepsis diagnosis. However, artificial intelligence (AI) sepsis prediction models typically generate a single sepsis risk score without incorporating clinical calculators for assessing organ dysfunctions, making the models less convincing and transparent to clinicians. To bridge the gap, we propose to mimic clinicians' workflow with a novel framework SepsisCalc to integrate clinical calculators into the predictive model, yielding a clinically transparent and precise model for utilization in clinical settings. Practically, clinical calculators usually combine information from multiple component variables in Electronic Health Records (EHR), and might not be applicable when the variables are (partially) missing. We mitigate this issue by representing EHRs as temporal graphs and integrating a learning module to dynamically add the accurately estimated calculator to the graphs. Experimental results on real-world datasets show that the proposed model outperforms state-of-the-art methods on sepsis prediction tasks. Moreover, we developed a system to identify organ dysfunctions and potential sepsis risks, providing a human-AI interaction tool for deployment, which can help clinicians understand the prediction outputs and prepare timely interventions for the corresponding dysfunctions, paving the way for actionable clinical decision-making support for early intervention.
arXiv:2501.00476v1 Announce Type: new Abstract: This paper implies Bluetooth technology, which is put into effect to alter extant, wired into wireless Programmable Logic Controller (PLC). Here two Bluetooth devices are employed as a transceiver to transmit and receives the input signal to contrive wireless PLC. The main advantage of PLC is to control the output according to the status of input. In Bluetooth technology, the handshaking between the two Bluetooth modules takes place, which is interfaced with a microcontroller board (Arduino board) and then to PLC such that field devices can be controlled without wire.
arXiv:2501.00822v1 Announce Type: new Abstract: In robotic bimanual teleoperation, multimodal sensory feedback plays a crucial role, providing operators with a more immersive operating experience, reducing cognitive burden, and improving operating efficiency. In this study, we develop an immersive bilateral isomorphic bimanual telerobotic system, which comprises dual arm and dual dexterous hands, with visual and haptic force feedback. To assess the performance of this system, we carried out a series of experiments and investigated the user's teleoperation experience. The results demonstrate that haptic force feedback enhances physical perception capabilities and complex task operating abilities. In addition, it compensates for visual perception deficiencies and reduces the operator's work burden. Consequently, our proposed system achieves more intuitive, realistic and immersive teleoperation, improves operating efficiency, and expands the complexity of tasks that robots can perform through teleoperation.
arXiv:2501.00867v1 Announce Type: new Abstract: We introduce Interactionalism as a new set of guiding principles and heuristics for the design and architecture of learning now available due to Generative AI (GenAI) platforms. Specifically, we articulate interactional intelligence as a net new skill set that is increasingly important when core cognitive tasks are automatable and augmentable by GenAI functions. We break down these skills into core sets of meta-cognitive and meta-emotional components and show how working with Large Language Model (LLM)-based agents can be proactively used to help develop learners. Interactionalism is not advanced as a theory of learning; but as a blueprint for the practice of learning - in coordination with GenAI.
arXiv:2501.00074v1 Announce Type: new Abstract: As technology advances, the integration of physical, virtual, and social worlds has led to a complex landscape of ``Realities'' such as Virtual Reality (VR), Augmented Reality (AR), metaverse, spatial computing, and other emerging paradigms. This paper builds upon and refines the concept of eXtended Reality (XR) as the unifying framework that not only interpolates across these diverse realities but also extrapolates (extends) to create entirely new possibilities. XR is the ``physical spatial metaverse,'' bridging the physical world, the virtual world of artificial intelligence, and the social world of human interaction. These three worlds define the Socio-Cyber-Physical Taxonomy of XR that allows us to identify underexplored research areas such as Diminished Reality (DR), and chart future directions to {\bf advance technology for people and planet}. We highlight the six core properties of XR for applications in sustainability, healthcare, frontline work, and daily life. Central to this vision is the development of AI-driven wearable technologies, such as the smart eyeglass, that sustainably extend human capabilities.
arXiv:2501.00168v1 Announce Type: new Abstract: We present a virtual reality (VR) environment featuring conversational avatars powered by a locally-deployed LLM, integrated with automatic speech recognition (ASR), text-to-speech (TTS), and lip-syncing. Through a pilot study, we explored the effects of three types of avatar status indicators during response generation. Our findings reveal design considerations for improving responsiveness and realism in LLM-driven conversational systems. We also detail two system architectures: one using an LLM-based state machine to control avatar behavior and another integrating retrieval-augmented generation (RAG) for context-grounded responses. Together, these contributions offer practical insights to guide future work in developing task-oriented conversational AI in VR environments.
arXiv:2501.00383v1 Announce Type: new Abstract: One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
arXiv:2501.00449v1 Announce Type: new Abstract: Past researches show that personality trait is a strong predictor for ones academic performance. Today, mature and verified marker systems for assessing personality traits already exist. However, marker systems-based assessing methods have their own limitations. For example, dishonest responses cannot be avoided. In this research, the goal is to develop a method that can overcome the limitations. The proposed method will rely on physiological signals for the assessment. Thirty participants have participated in this experiment. Based on the statistical results, we found that there are correlations between students personality traits and their physiological signal change when learning via videos. Specifically, we found that participants degree of extraversion, agreeableness, conscientiousness, and openness to experiences are correlated with the variance of heart rates, the variance of GSR values, and the skewness of voice frequencies, etc.
arXiv:2501.00775v1 Announce Type: new Abstract: Traditional qualitative analysis requires significant effort and collaboration to achieve consensus through formal coding processes, including open coding, discussions, and codebook merging. However, in scenarios where such rigorous and time-intensive methods are unnecessary-such as summarizing meetings or personal ideation-quick yet structual insights are more practical. To address this need, we proposed MindCoder, a tool inspired by the "Codes-to-theory" model and developed through an iterative design process to support flexible and structural inductive qualitative analysis. With OpenAI's GPT-4o model, MindCoder supports data preprocessing, automatic open coding, automatic axial coding, and automatic concept development, ultimately presenting a report to support insights presentation. An evaluation with 12 participants highlights its effectiveness in enabling flexible yet structured analysis and its advantages over ChatGPT and Atlas.ti Web AI coding function.
arXiv:2501.00791v1 Announce Type: new Abstract: The study illustrates a first step towards an ongoing work aimed at developing a dataset of dialogues potentially useful for customer service conversation management between humans and AI chatbots. The approach exploits ChatGPT 3.5 to generate dialogues. One of the requirements is that the dialogue is characterized by a specific language proficiency level of the user; the other one is that the user expresses a specific emotion during the interaction. The generated dialogues were then evaluated for overall quality. The complexity of the language used by both humans and AI agents, has been evaluated by using standard complexity measurements. Furthermore, the attitudes and interaction patterns exhibited by the chatbot at each turn have been stored for further detection of common conversation patterns in specific emotional contexts. The methodology could improve human-AI dialogue effectiveness and serve as a basis for systems that can learn from user interactions.
arXiv:2501.00825v1 Announce Type: new Abstract: Studies have indicated that personality is related to achievement, and several personality assessment models have been developed. However, most are either questionnaires or based on marker systems, which entails limitations. We proposed a physiological signal based model, thereby ensuring the objectivity of the data and preventing unreliable responses. Thirty participants were recruited from the Department of Electrical Engineering of Yuan Ze University in Taiwan. Wearable sensors were used to collect physiological signals as the participants watched and summarized a video. They then completed a personality questionnaire based on the big five factor markers system. The results were used to construct a personality prediction model, which revealed that galvanic skin response and heart rate variance were key factors predicting extroversion; heart rate variance also predicted agreeableness and conscientiousness. The results of this experiment can elucidate students personality traits, which can help educators select the appropriate pedagogical methods.
arXiv:2501.00861v1 Announce Type: new Abstract: In light of the growing proportion of older individuals in our society, the timely diagnosis of Alzheimer's disease has become a crucial aspect of healthcare. In this paper, we propose a non-invasive and cost-effective detection method based on speech technology. The method employs a pre-trained language model in conjunction with techniques such as prompt fine-tuning and conditional learning, thereby enhancing the accuracy and efficiency of the detection process. To address the issue of limited computational resources, this study employs the efficient LORA fine-tuning method to construct the classification model. Following multiple rounds of training and rigorous 10-fold cross-validation, the prompt fine-tuning strategy based on the LLAMA2 model demonstrated an accuracy of 81.31\%, representing a 4.46\% improvement over the control group employing the BERT model. This study offers a novel technical approach for the early diagnosis of Alzheimer's disease and provides valuable insights into model optimization and resource utilization under similar conditions. It is anticipated that this method will prove beneficial in clinical practice and applied research, facilitating more accurate and efficient screening and diagnosis of Alzheimer's disease.
arXiv:2501.00017v1 Announce Type: new Abstract: This study investigates students' perceptions of Generative Artificial Intelligence (AI), with a focus on Higher Education institutions in Northern Ireland and India. We collect quantitative Likert ratings and qualitative comments from 1,211 students on their awareness and perceptions of AI and investigate variations in attitudes toward AI across institutions and subject areas, as well as interactions between these variables with demographic variables (focusing on gender). We find that: (a) while perceptions varied across institutions, responses for Computer Sciences students were similar; and (b) after controlling for institution and subject area, we observed no effect of gender. These results are consistent with previous studies, which find that students' perceptions are predicted by prior experience. We consider the implications of this relation and some considerations for the role of experience.
arXiv:2501.00038v1 Announce Type: new Abstract: Emotion recognition and touch gesture decoding are crucial for advancing human-robot interaction (HRI), especially in social environments where emotional cues and tactile perception play important roles. However, many humanoid robots, such as Pepper, Nao, and Furhat, lack full-body tactile skin, limiting their ability to engage in touch-based emotional and gesture interactions. In addition, vision-based emotion recognition methods usually face strict GDPR compliance challenges due to the need to collect personal facial data. To address these limitations and avoid privacy issues, this paper studies the potential of using the sounds produced by touching during HRI to recognise tactile gestures and classify emotions along the arousal and valence dimensions. Using a dataset of tactile gestures and emotional interactions from 28 participants with the humanoid robot Pepper, we design an audio-only lightweight touch gesture and emotion recognition model with only 0.24M parameters, 0.94MB model size, and 0.7G FLOPs. Experimental results show that the proposed sound-based touch gesture and emotion recognition model effectively recognises the arousal and valence states of different emotions, as well as various tactile gestures, when the input audio length varies. The proposed model is low-latency and achieves similar results as well-known pretrained audio neural networks (PANNs), but with much smaller FLOPs, parameters, and model size.
arXiv:2501.00078v1 Announce Type: new Abstract: Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majority of computational resources are allocated to 3D rendering, leaving limited capacity for AI methods, which often demand high computational power, particularly those relying on pixel-based sensors. Moreover, the gaming industry prioritizes creating human-like behavior in AI agents to enhance player experience, unlike academic models that focus on maximizing game performance. This paper introduces a novel methodology for training neural networks via imitation learning to play a complex, commercial-standard, VALORANT-like 2v2 tactical shooter game, requiring only modest CPU hardware during inference. Our approach leverages an innovative, pixel-free perception architecture using a small set of ray-cast sensors, which capture essential spatial information efficiently. These sensors allow AI to perform competently without the computational overhead of traditional methods. Models are trained to mimic human behavior using supervised learning on human trajectory data, resulting in realistic and engaging AI agents. Human evaluation tests confirm that our AI agents provide human-like gameplay experiences while operating efficiently under computational constraints. This offers a significant advancement in AI model development for tactical shooter games and possibly other genres.
arXiv:2501.00081v1 Announce Type: new Abstract: This paper provides a comprehensive review of the design and implementation of automatically generated assessment reports (AutoRs) for formative use in K-12 Science, Technology, Engineering, and Mathematics (STEM) classrooms. With the increasing adoption of technology-enhanced assessments, there is a critical need for human-computer interactive tools that efficiently support the interpretation and application of assessment data by teachers. AutoRs are designed to provide synthesized, interpretable, and actionable insights into students' performance, learning progress, and areas for improvement. Guided by cognitive load theory, this study emphasizes the importance of reducing teachers' cognitive demands through user-centered and intuitive designs. It highlights the potential of diverse information presentation formats such as text, visual aids, and plots and advanced functionalities such as live and interactive features to enhance usability. However, the findings also reveal that many existing AutoRs fail to fully utilize these approaches, leading to high initial cognitive demands and limited engagement. This paper proposes a conceptual framework to inform the design, implementation, and evaluation of AutoRs, balancing the trade-offs between usability and functionality. The framework aims to address challenges in engaging teachers with technology-enhanced assessment results, facilitating data-driven decision-making, and providing personalized feedback to improve the teaching and learning process.
arXiv:2501.00359v1 Announce Type: new Abstract: Visitors to cultural heritage sites often encounter official information, while local people's unofficial stories remain invisible. To explore expression of local narratives, we conducted a workshop with 20 participants utilizing Generative AI (GenAI) to support visual narratives, asking them to use Stable Diffusion to create images of familiar cultural heritage sites, as well as images of unfamiliar ones for comparison. The results revealed three narrative strategies and highlighted GenAI's strengths in illuminating, amplifying, and reinterpreting personal narratives. However, GenAI showed limitations in meeting detailed requirements, portraying cultural features, and avoiding bias, which were particularly pronounced with unfamiliar sites due to participants' lack of local knowledge. To address these challenges, we recommend providing detailed explanations, prompt engineering, and fine-tuning AI models to reduce uncertainties, using objective references to mitigate inaccuracies from participants' inability to recognize errors or misconceptions, and curating datasets to train AI models capable of accurately portraying cultural features.
arXiv:2501.00939v1 Announce Type: new Abstract: The Web has drastically simplified our access to knowledge and learning, and fact-checking online resources has become a part of our daily routine. Studying online knowledge consumption is thus critical for understanding human behavior and informing the design of future platforms. In this Chapter, we approach this subject by describing the navigation patterns of the readers of Wikipedia, the world's largest platform for open knowledge. We provide a comprehensive overview of what is known about the three steps that characterize navigation on Wikipedia: (1) how readers reach the platform, (2) how readers navigate the platform, and (3) how readers leave the platform. Finally, we discuss open problems and opportunities for future research in this field.