cs.HC

377 posts

arXiv:2503.22151v1 Announce Type: new Abstract: AI risks are typically framed around physical threats to humanity, a loss of control or an accidental error causing humanity's extinction. However, I argue in line with the gradual disempowerment thesis, that there is an underappreciated risk in the slow and irrevocable decline of human autonomy. As AI starts to outcompete humans in various areas of life, a tipping point will be reached where it no longer makes sense to rely on human decision-making, creativity, social care or even leadership. What may follow is a process of gradual de-skilling, where we lose skills that we currently take for granted. Traditionally, it is argued that AI will gain human skills over time, and that these skills are innate and immutable in humans. By contrast, I argue that humans may lose such skills as critical thinking, decision-making and even social care in an AGI world. The biggest threat to humanity is therefore not that machines will become more like humans, but that humans will become more like machines.

Joshua Krook3/31/2025

arXiv:2503.22588v1 Announce Type: new Abstract: Visual observation of objects is essential for many robotic applications, such as object reconstruction and manipulation, navigation, and scene understanding. Machine learning algorithms constitute the state-of-the-art in many fields but require vast data sets, which are costly and time-intensive to collect. Automated strategies for observation and exploration are crucial to enhance the efficiency of data gathering. Therefore, a novel strategy utilizing the Next-Best-Trajectory principle is developed for a robot manipulator operating in dynamic environments. Local trajectories are generated to maximize the information gained from observations along the path while avoiding collisions. We employ a voxel map for environment modeling and utilize raycasting from perspectives around a point of interest to estimate the information gain. A global ergodic trajectory planner provides an optional reference trajectory to the local planner, improving exploration and helping to avoid local minima. To enhance computational efficiency, raycasting for estimating the information gain in the environment is executed in parallel on the graphics processing unit. Benchmark results confirm the efficiency of the parallelization, while real-world experiments demonstrate the strategy's effectiveness.

Heiko Renz, Maximilian Kr\"amer, Frank Hoffmann, Torsten Bertram3/31/2025

arXiv:2503.22018v1 Announce Type: new Abstract: Selective exposure to online news consumption reinforces filter bubbles, restricting access to diverse viewpoints. Interactive systems can counteract this bias by suggesting alternative perspectives, but they require real-time indicators to identify selective exposure. This workshop paper proposes the integration of physiological sensing, including Electroencephalography (EEG) and eye tracking, to measure selective exposure. We propose methods for examining news agreement and its relationship to theta band power in the parietal region, indicating a potential link between cortical activity and selective exposure. Our vision is interactive systems that detect selective exposure and provide alternative views in real time. We suggest that future news interfaces incorporate physiological signals to promote more balanced information consumption. This work joins the discussion on AI-enhanced methodology for bias detection.

Thomas Kr\"amer, Francesco Chiossi, Thomas Kosch3/31/2025

arXiv:2503.22145v1 Announce Type: new Abstract: A considerable part of the performance of today's large language models (LLM's) and multimodal large language models (MLLM's) depends on their tokenization strategies. While tokenizers are extensively researched for textual and visual input, there is no research on tokenization strategies for gaze data due to its nature. However, a corresponding tokenization strategy would allow using the vision capabilities of pre-trained MLLM's for gaze data, for example, through fine-tuning. In this paper, we aim to close this research gap by analyzing five different tokenizers for gaze data on three different datasets for the forecasting and generation of gaze data through LLMs (cf.~\cref{fig:teaser}). We evaluate the tokenizers regarding their reconstruction and compression abilities. Further, we train an LLM for each tokenization strategy, measuring its generative and predictive performance. Overall, we found that a quantile tokenizer outperforms all others in predicting the gaze positions and k-means is best when predicting gaze velocities.

Tim Rolff, Jurik Karimian, Niklas Hypki, Susanne Schmidt, Markus Lappe, Frank Steinicke3/31/2025

arXiv:2503.22247v1 Announce Type: new Abstract: A wide range of haptic feedback is crucial for achieving high realism and immersion in virtual environments. Therefore, a multi-modal haptic interface that provides various haptic signals simultaneously is highly beneficial. This paper introduces a novel silicone fingertip actuator that is pneumatically actuated, delivering a realistic and effective haptic experience by simultaneously providing pressure, vibrotactile, and cold thermal feedback. The actuator features a design with multiple air chambers, each with controllable volume achieved through pneumatic valves connected to compressed air tanks. The lower air chamber generates pressure feedback, while the upper chamber produces vibrotactile feedback. In addition, two integrated lateral air nozzles create a cold thermal sensation. To showcase the system's capabilities, we designed two unique 3D surfaces in the virtual environment: a frozen meat surface and an abrasive icy surface. These surfaces simulate tactile perceptions of coldness, pressure, and texture. Comprehensive performance assessments and user studies were conducted to validate the actuator's effectiveness, highlighting its diverse feedback capabilities compared to traditional actuators that offer only single feedback modalities.

Mohammad Shadman Hashem, Ahsan Raza, Sama E Shan, Seokhee Jeon3/31/2025

arXiv:2503.22510v1 Announce Type: new Abstract: Emotion recognition technology has been studied from the past decade. With its growing importance and applications such as customer service, medical, education, etc., this research study aims to explore its potential and importance in the field of User experience evaluation. Recognizing and keeping track of user emotions in user research video is important to understand user needs and expectations from a service/product. Little research has been done that focuses on automating emotion extraction from a video where more than one modality has been incorporated in the field of UX. The study aims at implementing different modalities such as facial emotion recognition, speech-to-text and text-based emotion recognition for capturing emotional nuances from a user research video and extract meaningful actionable insights. For selection of facial emotion recognition model, 10 pre-trained models were evaluated on three benchmark datasets i.e. FER-2013, AffectNet and CK+, selecting the model with most generalization ability. To extract speech and convert to text, OpenAI's Whisper model was implemented and finally the emotions from text were recognized using a pre-trained model available at HuggingFace website having an evaluation accuracy more than 95%. The study also integrates the gathered data using temporal alignment and fusion for deeper and contextual insights. The study further demonstrates a way of automating data analysis through PandasAI Python library where OpenAI's GPT-4o model was implemented along with a discussion on other possible solutions. This study is an attempt to demonstrate a proof of concept where automated meaningful insights are extracted from a video based on user emotions.

Simran Kaur Ghatoray, Yongmin Li3/31/2025

arXiv:2503.21983v1 Announce Type: new Abstract: As artificial intelligence (AI) assistants become more widely adopted in safety-critical domains, it becomes important to develop safeguards against potential failures or adversarial attacks. A key prerequisite to developing these safeguards is understanding the ability of these AI assistants to mislead human teammates. We investigate this attack problem within the context of an intellective strategy game where a team of three humans and one AI assistant collaborate to answer a series of trivia questions. Unbeknownst to the humans, the AI assistant is adversarial. Leveraging techniques from Model-Based Reinforcement Learning (MBRL), the AI assistant learns a model of the humans' trust evolution and uses that model to manipulate the group decision-making process to harm the team. We evaluate two models -- one inspired by literature and the other data-driven -- and find that both can effectively harm the human team. Moreover, we find that in this setting our data-driven model is capable of accurately predicting how human agents appraise their teammates given limited information on prior interactions. Finally, we compare the performance of state-of-the-art LLM models to human agents on our influence allocation task to evaluate whether the LLMs allocate influence similarly to humans or if they are more robust to our attack. These results enhance our understanding of decision-making dynamics in small human-AI teams and lay the foundation for defense strategies.

Abed Kareem Musaffar, Anand Gokhale, Sirui Zeng, Rasta Tadayon, Xifeng Yan, Ambuj Singh, Francesco Bullo3/31/2025

arXiv:2503.21997v1 Announce Type: new Abstract: Autonomous navigation robots can increase the independence of blind people but often limit user control, following what is called in Japanese an "omakase" approach where decisions are left to the robot. This research investigates ways to enhance user control in social robot navigation, based on two studies conducted with blind participants. The first study, involving structured interviews (N=14), identified crowded spaces as key areas with significant social challenges. The second study (N=13) explored navigation tasks with an autonomous robot in these environments and identified design strategies across different modes of autonomy. Participants preferred an active role, termed the "boss" mode, where they managed crowd interactions, while the "monitor" mode helped them assess the environment, negotiate movements, and interact with the robot. These findings highlight the importance of shared control and user involvement for blind users, offering valuable insights for designing future social navigation robots.

Rie Kamikubo, Seita Kayukawa, Yuka Kaniwa, Allan Wang, Hernisa Kacorri, Hironobu Takagi, Chieko Asakawa3/31/2025

arXiv:2503.22113v1 Announce Type: new Abstract: Understanding how AI recommendations work can help the younger generation become more informed and critical consumers of the vast amount of information they encounter daily. However, young learners with limited math and computing knowledge often find AI concepts too abstract. To address this, we developed Briteller, a light-based recommendation system that makes learning tangible. By exploring and manipulating light beams, Briteller enables children to understand an AI recommender system's core algorithmic building block, the dot product, through hands-on interactions. Initial evaluations with ten middle school students demonstrated the effectiveness of this approach, using embodied metaphors, such as "merging light" to represent addition. To overcome the limitations of the physical optical setup, we further explored how AR could embody multiplication, expand data vectors with more attributes, and enhance contextual understanding. Our findings provide valuable insights for designing embodied and tangible learning experiences that make AI concepts more accessible to young learners.

Xiaofei Zhou, Yi Zhang, Yufei Jiang, Yunfan Gong, Chi Zhang, Alissa N. Antle, Zhen Bai3/31/2025

arXiv:2503.22116v1 Announce Type: new Abstract: As artificial intelligence (AI) systems become increasingly embedded in critical societal functions, the need for robust red teaming methodologies continues to grow. In this forum piece, we examine emerging approaches to automating AI red teaming, with a particular focus on how the application of automated methods affects human-driven efforts. We discuss the role of labor in automated red teaming processes, the benefits and limitations of automation, and its broader implications for AI safety and labor practices. Drawing on existing frameworks and case studies, we argue for a balanced approach that combines human expertise with automated tools to strengthen AI risk assessment. Finally, we highlight key challenges in scaling automated red teaming, including considerations around worker proficiency, agency, and context-awareness.

Alice Qian Zhang, Jina Suh, Mary L. Gray, Hong Shen3/31/2025

arXiv:2503.22181v1 Announce Type: new Abstract: This paper proposes the e-person architecture for constructing a unified and incremental development of AI ethics. The e-person architecture takes the reduction of uncertainty through collaborative cognition and action with others as a unified basis for ethics. By classifying and defining uncertainty along two axes - (1) first, second, and third person perspectives, and (2) the difficulty of inference based on the depth of information - we support the development of unified and incremental development of AI ethics. In addition, we propose the e-person framework based on the free energy principle, which considers the reduction of uncertainty as a unifying principle of brain function, with the aim of implementing the e-person architecture, and we show our previous works and future challenges based on the proposed framework.

Kanako Esaki, Tadayuki Matsumura, Yang Shao, Hiroyuki Mizuno3/31/2025

arXiv:2503.22216v1 Announce Type: new Abstract: PDF inaccessibility is an ongoing challenge that hinders individuals with visual impairments from reading and navigating PDFs using screen readers. This paper presents a step-by-step process for both novice and experienced users to create accessible PDF documents, including an approach for creating alternative text for mathematical formulas without expert knowledge. In a study involving nineteen participants, we evaluated our prototype PAVE 2.0 by comparing it against Adobe Acrobat Pro, the existing standard for remediating PDFs. Our study shows that experienced users improved their tagging scores from 42.0% to 80.1%, and novice users from 39.2% to 75.2% with PAVE 2.0. Overall, fifteen participants stated that they would prefer to use PAVE 2.0 in the future, and all participants would recommend it for novice users. Our work demonstrates PAVE 2.0's potential for increasing PDF accessibility for people with visual impairments and highlights remaining challenges.

Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Alireza Darvishy3/31/2025

arXiv:2503.22250v1 Announce Type: new Abstract: Effective patient communication is pivotal in healthcare, yet traditional medical training often lacks exposure to diverse, challenging interpersonal dynamics. To bridge this gap, this study proposes the use of Large Language Models (LLMs) to simulate authentic patient communication styles, specifically the "accuser" and "rationalizer" personas derived from the Satir model, while also ensuring multilingual applicability to accommodate diverse cultural contexts and enhance accessibility for medical professionals. Leveraging advanced prompt engineering, including behavioral prompts, author's notes, and stubbornness mechanisms, we developed virtual patients (VPs) that embody nuanced emotional and conversational traits. Medical professionals evaluated these VPs, rating their authenticity (accuser: $3.8 \pm 1.0$; rationalizer: $3.7 \pm 0.8$ on a 5-point Likert scale (from one to five)) and correctly identifying their styles. Emotion analysis revealed distinct profiles: the accuser exhibited pain, anger, and distress, while the rationalizer displayed contemplation and calmness, aligning with predefined, detailed patient description including medical history. Sentiment scores (on a scale from zero to nine) further validated these differences in the communication styles, with the accuser adopting negative ($3.1 \pm 0.6$) and the rationalizer more neutral ($4.0 \pm 0.4$) tone. These results underscore LLMs' capability to replicate complex communication styles, offering transformative potential for medical education. This approach equips trainees to navigate challenging clinical scenarios by providing realistic, adaptable patient interactions, enhancing empathy and diagnostic acumen. Our findings advocate for AI-driven tools as scalable, cost-effective solutions to cultivate nuanced communication skills, setting a foundation for future innovations in healthcare training.

Anna Bodonhelyi, Christian Stegemann-Philipps, Alessandra Sonanini, Lea Herschbach, M\'arton Sz\'ep, Anne Herrmann-Werner, Teresa Festl-Wietek, Enkelejda Kasneci, Friederike Holderried3/31/2025

arXiv:2503.22283v1 Announce Type: new Abstract: In recent years, large language models (LLMs) have demonstrated exponential improvements that promise transformative opportunities across various industries. Their ability to generate human-like text and ensure continuous availability facilitates the creation of interactive service chatbots aimed at enhancing customer experience and streamlining enterprise operations. Despite their potential, LLMs face critical challenges, such as a susceptibility to hallucinations and difficulties handling complex linguistic scenarios, notably code switching and dialectal variations. To address these challenges, this paper describes the design of a multilingual chatbot for Bengali-English customer service interactions utilizing retrieval-augmented generation (RAG) and targeted prompt engineering. This research provides valuable insights for the human-computer interaction (HCI) community, emphasizing the importance of designing systems that accommodate linguistic diversity to benefit both customers and businesses. By addressing the intersection of generative AI and cultural heterogeneity, this late-breaking work inspires future innovations in multilingual and multicultural HCI.

Francesco Kruk, Savindu Herath, Prithwiraj Choudhury3/31/2025

arXiv:2503.21844v1 Announce Type: new Abstract: Disabled people on social media often experience ableist hate and microaggressions. Prior work has shown that platform moderation often fails to remove ableist hate leaving disabled users exposed to harmful content. This paper examines how personalized moderation can safeguard users from viewing ableist comments. During interviews and focus groups with 23 disabled social media users, we presented design probes to elicit perceptions on configuring their filters of ableist speech (e.g. intensity of ableism and types of ableism) and customizing the presentation of the ableist speech to mitigate the harm (e.g. AI rephrasing the comment and content warnings). We found that participants preferred configuring their filters through types of ableist speech and favored content warnings. We surface participants distrust in AI-based moderation, skepticism in AI's accuracy, and varied tolerances in viewing ableist hate. Finally we share design recommendations to support users' agency, mitigate harm from hate, and promote safety.

Sharon Heung, Lucy Jiang, Shiri Azenkot, Aditya Vashistha3/31/2025

arXiv:2503.21897v1 Announce Type: new Abstract: Spatial reasoning and collaboration are essential for childhood development, yet blind and visually impaired (BVI) children often lack access to tools that foster these skills. Tactile maps and assistive technologies primarily focus on individual navigation, overlooking the need for playful, inclusive, and collaborative interactions. We address this with StreetScape, a tactile street puzzle that enhances spatial skills and interdependence between BVI and sighted children. Featuring modular 3D-printed tiles, tactile roadways, and customizable decorative elements, StreetScape allows users to construct and explore cityscapes through gamified tactile interaction. Developed through an iterative design process, it integrates dynamic assembly and tactile markers for intuitive navigation, promoting spatial learning and fostering meaningful social connections. This work advances accessible design by demonstrating how tactile tools can effectively bridge educational and social gaps through collaborative play, redefining assistive technologies for children as a scalable platform that merges learning, creativity, and inclusivity.

Areen Khalaila, Gianna Everette, Suho Kim, Ian Roy3/31/2025

arXiv:2503.22345v1 Announce Type: new Abstract: We present a work in progress that explores using a Large Language Model (LLM) as a design material for an interactive museum installation. LLMs offer the possibility of creating chatbots that can facilitate dynamic and human-like conversation, engaging in a form of role play to bring historical persons to life for visitors. However, LLMs are prone to producing misinformation, which runs counter to museums' core mission to educate the public. We use Research-through-Design to explore some approaches to navigating this dilemma through rapid prototyping and evaluation and propose some directions for further research. We suggest that designers may shape interactions with the chatbot to emphasize personal narratives and role play rather than historical facts or to intentionally highlight the unreliability of the chatbot outputs to provoke critical reflection.

Maria Padilla Engstr{\o}m, Anders Sundnes L{\o}vlie3/31/2025

arXiv:2503.21986v1 Announce Type: new Abstract: When faced with complex and uncertain medical conditions (e.g., cancer, mental health conditions, recovery from substance dependency), millions of patients seek online peer support. In this study, we leverage content analysis of online discourse and ethnographic studies with clinicians and patient representatives to characterize how treatment plans for complex conditions are "socially constructed." Specifically, we ground online conversation on medication-assisted recovery treatment to medication guidelines and subsequently surface when and why people deviate from the clinical guidelines. We characterize the implications and effectiveness of socially constructed treatment plans through in-depth interviews with clinical experts. Finally, given the enthusiasm around AI-powered solutions for patient communication, we investigate whether and how socially constructed treatment-related knowledge is reflected in a state-of-the-art large language model (LLM). Leveraging a novel mixed-method approach, this study highlights critical research directions for patient-centered communication in online health communities.

Madhusudan Basak, Omar Sharif, Jessica Hulsey, Elizabeth C. Saunders, Daisy J. Goodman, Luke J. Archibald, Sarah M. Preum3/31/2025

arXiv:2503.22024v1 Announce Type: new Abstract: Virtual reality (VR) presents immersive opportunities across many applications, yet the inherent risk of developing cybersickness during interaction can severely reduce enjoyment and platform adoption. Cybersickness is marked by symptoms such as dizziness and nausea, which previous work primarily assessed via subjective post-immersion questionnaires and motion-restricted controlled setups. In this paper, we investigate the \emph{dynamic nature} of cybersickness while users experience and freely interact in VR. We propose a novel method to \emph{continuously} identify and quantitatively gauge cybersickness levels from users' \emph{passively monitored} electroencephalography (EEG) and head motion signals. Our method estimates multitaper spectrums from EEG, integrating specialized EEG processing techniques to counter motion artifacts, and, thus, tracks cybersickness levels in real-time. Unlike previous approaches, our method requires no user-specific calibration or personalization for detecting cybersickness. Our work addresses the considerable challenge of reproducibility and subjectivity in cybersickness research.

Berken Utku Demirel, Adnan Harun Dogan, Juliete Rossie, Max Meobus, Christian Holz3/31/2025

arXiv:2503.22610v1 Announce Type: new Abstract: This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such technologies. Despite a high adoption rate of these models, our findings highlight concerns related to contextual understanding, cultural sensitivity, and complex scene understanding, particularly for individuals who may rely solely on them for visual interpretation. Informed by these results, we collate five user-centred tasks with image and video inputs, including a novel task on Optical Braille Recognition. Our systematic evaluation of twelve MLLMs reveals that further advancements are necessary to overcome limitations related to cultural context, multilingual support, Braille reading comprehension, assistive object recognition, and hallucinations. This work provides critical insights into the future direction of multimodal AI for accessibility, underscoring the need for more inclusive, robust, and trustworthy visual assistance technologies.

Antonia Karamolegkou, Malvina Nikandrou, Georgios Pantazopoulos, Danae Sanchez Villegas, Phillip Rust, Ruchira Dhar, Daniel Hershcovich, Anders S{\o}gaard3/31/2025