cs.CY
172 postsarXiv:2501.07570v1 Announce Type: new Abstract: With rapid digitization and digitalization, drawing a fine line between the digital and the physical world has become nearly impossible. It has become essential more than ever to integrate all spheres of life into a single Digital Thread to address pressing challenges of modern society: accessible and inclusive healthcare in terms of equality and equity. Techno-social advancements and mutual acceptance have enabled the infusion of digital models to simulate social settings with minimum resource utilization to make effective decisions. However, a significant gap exists in feeding back the models with appropriate real-time changes. In other words, active behavioral modeling of modern society is lacking, influencing community healthcare as a whole. By creating virtual replicas of (physical) behavioral systems, digital twins can enable real-time monitoring, simulation, and optimization of urban dynamics. This paper explores the potential of digital twins to promote inclusive healthcare for evolving smart cities. We argue that digital twins can be used to: Identify and address disparities in access to healthcare services, Facilitate community participation, Simulate the impact of urban policies and interventions on different groups of people, and Aid policy-making bodies for better access to healthcare. This paper proposes several ways to use digital twins to stitch the actual and virtual societies. Several discussed concepts within this framework envision an active, integrated, and synchronized community aware of data privacy and security. The proposal also provides high-level step-wise transitions that will enable this transformation.
arXiv:2501.06366v1 Announce Type: cross Abstract: When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit. However, the learned policy may disproportionately allocate efficacious actions to one subpopulation, creating or exacerbating disparities in other socioeconomically-disadvantaged subgroups. These biases tend to occur in multi-stage decision making and can be self-perpetuating, which if unaccounted for could cause serious unintended consequences that limit access to care or treatment benefit. Counterfactual fairness (CF) offers a promising statistical tool grounded in causal inference to formulate and study fairness. In this paper, we propose a general framework for fair sequential decision making. We theoretically characterize the optimal CF policy and prove its stationarity, which greatly simplifies the search for optimal CF policies by leveraging existing RL algorithms. The theory also motivates a sequential data preprocessing algorithm to achieve CF decision making under an additive noise assumption. We prove and then validate our policy learning approach in controlling unfairness and attaining optimal value through simulations. Analysis of a digital health dataset designed to reduce opioid misuse shows that our proposal greatly enhances fair access to counseling.
arXiv:2501.07392v1 Announce Type: new Abstract: We describe the development of a one-credit course to promote AI literacy at The University of Texas at Austin. In response to a call for the rapid deployment of class to serve a broad audience in Fall of 2023, we designed a 14-week seminar-style course that incorporated an interdisciplinary group of speakers who lectured on topics ranging from the fundamentals of AI to societal concerns including disinformation and employment. University students, faculty, and staff, and even community members outside of the University, were invited to enroll in this online offering: The Essentials of AI for Life and Society. We collected feedback from course participants through weekly reflections and a final survey. Satisfyingly, we found that attendees reported gains in their AI literacy. We sought critical feedback through quantitative and qualitative analysis, which uncovered challenges in designing a course for this general audience. We utilized the course feedback to design a three-credit version of the course that is being offered in Fall of 2024. The lessons we learned and our plans for this new iteration may serve as a guide to instructors designing AI courses for a broad audience.
arXiv:2501.07557v1 Announce Type: new Abstract: Music has always been central to human culture, reflecting and shaping traditions, emotions, and societal changes. Technological advancements have transformed how music is created and consumed, influencing tastes and the music itself. In this study, we use Network Science to analyze musical complexity. Drawing on $\approx20,000$ MIDI files across six macro-genres spanning nearly four centuries, we represent each composition as a weighted directed network to study its structural properties. Our results show that Classical and Jazz compositions have higher complexity and melodic diversity than recently developed genres. However, a temporal analysis reveals a trend toward simplification, with even Classical and Jazz nearing the complexity levels of modern genres. This study highlights how digital tools and streaming platforms shape musical evolution, fostering new genres while driving homogenization and simplicity.
arXiv:2501.06753v1 Announce Type: new Abstract: Fairness in machine learning (ML) has garnered significant attention in recent years. While existing research has predominantly focused on the distributive fairness of ML models, there has been limited exploration of procedural fairness. This paper proposes a novel method to achieve procedural fairness during the model training phase. The effectiveness of the proposed method is validated through experiments conducted on one synthetic and six real-world datasets. Additionally, this work studies the relationship between procedural fairness and distributive fairness in ML models. On one hand, the impact of dataset bias and the procedural fairness of ML model on its distributive fairness is examined. The results highlight a significant influence of both dataset bias and procedural fairness on distributive fairness. On the other hand, the distinctions between optimizing procedural and distributive fairness metrics are analyzed. Experimental results demonstrate that optimizing procedural fairness metrics mitigates biases introduced or amplified by the decision-making process, thereby ensuring fairness in the decision-making process itself, as well as improving distributive fairness. In contrast, optimizing distributive fairness metrics encourages the ML model's decision-making process to favor disadvantaged groups, counterbalancing the inherent preferences for advantaged groups present in the dataset and ultimately achieving distributive fairness.
arXiv:2501.06932v1 Announce Type: new Abstract: Large language models (LLMs) have revolutionized scientific research with their exceptional capabilities and transformed various fields. Among their practical applications, LLMs have been playing a crucial role in mitigating threats to human life, infrastructure, and the environment. Despite growing research in disaster LLMs, there remains a lack of systematic review and in-depth analysis of LLMs for natural disaster management. To address the gap, this paper presents a comprehensive survey of existing LLMs in natural disaster management, along with a taxonomy that categorizes existing works based on disaster phases and application scenarios. By collecting public datasets and identifying key challenges and opportunities, this study aims to guide the professional community in developing advanced LLMs for disaster management to enhance the resilience against natural disasters.
arXiv:2501.07148v1 Announce Type: new Abstract: Bandwidth constraints limit LoRa implementations. Contemporary IoT applications require higher throughput than that provided by LoRa. This work introduces a LoRa Multiple Input Multiple Output (MIMO) system and a spatial multiplexing algorithm to address LoRa's bandwidth limitation. The transceivers in the proposed approach modulate the signals on distinct frequencies of the same LoRa band. A Frequency Division Multiplexing (FDM) method is used at the transmitters to provide a wider MIMO channel. Unlike conventional Orthogonal Frequency Division Multiplexing (OFDM) techniques, this work exploits the orthogonality of the LoRa signals facilitated by its proprietary Chirp Spread Spectrum (CSS) modulation to perform an OFDM in the proposed LoRa MIMO system. By varying the Spreading Factor (SF) and bandwidth of LoRa signals, orthogonal signals can transmit on the same frequency irrespective of the FDM. Even though the channel correlation is minimal for different spreading factors and bandwidths, different Carrier Frequencies (CF) ensure the signals do not overlap and provide additional degrees of freedom. This work assesses the proposed model's performance and conducts an extensive analysis to provide an overview of resources consumed by the proposed system. Finally, this work provides the detailed results of a thorough evaluation of the model on test hardware.
arXiv:2501.07368v1 Announce Type: new Abstract: Social media play a key role in mobilizing collective action, holding the potential for studying the pathways that lead individuals to actively engage in addressing global challenges. However, quantitative research in this area has been limited by the absence of granular and large-scale ground truth about the level of participation in collective action among individual social media users. To address this limitation, we present a novel suite of text classifiers designed to identify expressions of participation in collective action from social media posts, in a topic-agnostic fashion. Grounded in the theoretical framework of social movement mobilization, our classification captures participation and categorizes it into four levels: recognizing collective issues, engaging in calls-to-action, expressing intention of action, and reporting active involvement. We constructed a labeled training dataset of Reddit comments through crowdsourcing, which we used to train BERT classifiers and fine-tune Llama3 models. Our findings show that smaller language models can reliably detect expressions of participation (weighted F1=0.71), and rival larger models in capturing nuanced levels of participation. By applying our methodology to Reddit, we illustrate its effectiveness as a robust tool for characterizing online communities in innovative ways compared to topic modeling, stance detection, and keyword-based methods. Our framework contributes to Computational Social Science research by providing a new source of reliable annotations useful for investigating the social dynamics of collective action.
arXiv:2501.07473v1 Announce Type: new Abstract: Political polarization, a key driver of social fragmentation, has drawn increasing attention for its role in shaping online and offline discourse. Despite significant efforts, accurately measuring polarization within ideological distributions remains a challenge. This study evaluates five widely used polarization measures, testing their strengths and weaknesses with synthetic datasets and a real-world case study on YouTube discussions during the 2020 U.S. Presidential Election. Building on these findings, we present a novel adaptation of Kleinberg's burst detection algorithm to improve mode detection in polarized distributions. By offering both a critical review and an innovative methodological tool, this work advances the analysis of ideological patterns in social media discourse.
arXiv:2501.07486v1 Announce Type: new Abstract: This article explores the evolution of constructionism as an educational framework, tracing its relevance and transformation across three pivotal eras: the advent of personal computing, the networked society, and the current era of generative AI. Rooted in Seymour Papert constructionist philosophy, this study examines how constructionist principles align with the expanding role of digital technology in personal and collective learning. We discuss the transformation of educational environments from hierarchical instructionism to constructionist models that emphasize learner autonomy and interactive, creative engagement. Central to this analysis is the concept of an expanded personality, wherein digital tools and AI integration fundamentally reshape individual self-perception and social interactions. By integrating constructionism into the paradigm of smart education, we propose it as a foundational approach to personalized and democratized learning. Our findings underscore constructionism enduring relevance in navigating the complexities of technology-driven education, providing insights for educators and policymakers seeking to harness digital innovations to foster adaptive, student-centered learning experiences.
arXiv:2501.06274v1 Announce Type: new Abstract: Here's a condensed 1920-character version: The rise of misinformation and fake news in online political discourse poses significant challenges to democratic processes and public engagement. While debunking efforts aim to counteract misinformation and foster fact-based dialogue, these discussions often involve language toxicity and emotional polarization. We examined over 86 million debunking tweets and more than 4 million Reddit debunking comments to investigate the relationship between language toxicity, pessimism, and social polarization in debunking efforts. Focusing on discussions of the 2016 and 2020 U.S. presidential elections and the QAnon conspiracy theory, our analysis reveals three key findings: (1) peripheral participants (1-degree users) play a disproportionate role in shaping toxic discourse, driven by lower community accountability and emotional expression; (2) platform mechanisms significantly influence polarization, with Twitter amplifying partisan differences and Reddit fostering higher overall toxicity due to its structured, community-driven interactions; and (3) a negative correlation exists between language toxicity and pessimism, with increased interaction reducing toxicity, especially on Reddit. We show that platform architecture affects informational complexity of user interactions, with Twitter promoting concentrated, uniform discourse and Reddit encouraging diverse, complex communication. Our findings highlight the importance of user engagement patterns, platform dynamics, and emotional expressions in shaping polarization in debunking discourse. This study offers insights for policymakers and platform designers to mitigate harmful effects and promote healthier online discussions, with implications for understanding misinformation, hate speech, and political polarization in digital environments.
arXiv:2501.06707v1 Announce Type: new Abstract: ELIZA, created by Joseph Weizenbaum at MIT in the early 1960s, is usually considered the world's first chatbot. It was developed in MAD-SLIP on MIT's CTSS, the world's first time-sharing system, on an IBM 7094. We discovered an original ELIZA printout in Prof. Weizenbaum's archives at MIT, including an early version of the famous DOCTOR script, a nearly complete version of the MAD-SLIP code, and various support functions in MAD and FAP. Here we describe the reanimation of this original ELIZA on a restored CTSS, itself running on an emulated IBM 7094. The entire stack is open source, so that any user of a unix-like OS can run the world's first chatbot on the world's first time-sharing system.
arXiv:2501.06913v1 Announce Type: new Abstract: Predictive analytics is widely used in learning analytics, but many resource-constrained institutions lack the capacity to develop their own models or rely on proprietary ones trained in different contexts with little transparency. Transfer learning holds promise for expanding equitable access to predictive analytics but remains underexplored due to legal and technical constraints. This paper examines transfer learning strategies for retention prediction at U.S. two-year community colleges. We envision a scenario where community colleges collaborate with each other and four-year universities to develop retention prediction models under privacy constraints and evaluate risks and improvement strategies of cross-institutional model transfer. Using administrative records from 4 research universities and 23 community colleges covering over 800,000 students across 7 cohorts, we identify performance and fairness degradation when external models are deployed locally without adaptation. Publicly available contextual information can forecast these performance drops and offer early guidance for model portability. For developers under privacy regulations, sequential training selecting institutions based on demographic similarities enhances fairness without compromising performance. For institutions lacking local data to fine-tune source models, customizing evaluation thresholds for sensitive groups outperforms standard transfer techniques in improving performance and fairness. Our findings suggest the value of transfer learning for more accessible educational predictive modeling and call for judicious use of contextual information in model training, selection, and deployment to achieve reliable and equitable model transfer.
arXiv:2501.06861v1 Announce Type: new Abstract: The integration of AI systems into the military domain is changing the way war-related decisions are made. It binds together three disparate groups of actors - developers, integrators, users - and creates a relationship between these groups and the machine, embedded in the (pre-)existing organisational and system structures. In this article, we focus on the important, but often neglected, group of integrators within such a sociotechnical system. In complex human-machine configurations, integrators carry responsibility for linking the disparate groups of developers and users in the political and military system. To act as the mediating group requires a deep understanding of the other groups' activities, perspectives and norms. We thus ask which challenges and shortcomings emerge from integrating AI systems into resort-to-force (RTF) decision-making processes, and how to address them. To answer this, we proceed in three steps. First, we conceptualise the relationship between different groups of actors and AI systems as a sociotechnical system. Second, we identify challenges within such systems for human-machine teaming in RTF decisions. We focus on challenges that arise a) from the technology itself, b) from the integrators' role in the sociotechnical system, c) from the human-machine interaction. Third, we provide policy recommendations to address these shortcomings when integrating AI systems into RTF decision-making structures.
arXiv:2501.06281v1 Announce Type: new Abstract: Zero Trust Architectures (ZTA) fundamentally redefine network security by adopting a "trust nothing, verify everything" approach that requires identity verification for all access. Conventional discrete access control measures have proven inadequate since they do not consider evolving user activities and contextual threats, leading to internal threats and enhanced attacks. This research applies the proposed AI-driven, autonomous, identity-based threat segmentation in ZTA, along with real-time identity analytics for fine-grained, real-time mechanisms. Some of the sharp practices include using the behavioral analytics approach to provide real-time risk scores, such as analyzing the patterns used for logging into the system, the access sought, and the resources used. Permissions are adjusted using machine learning models that take into account context-aware factors like geolocation, device type, and access time. Automated threat segmentation helps analysts identify multiple compromised identities in real-time, thus minimizing the likelihood of a breach advancing. The system's use cases are based on real scenarios; for example, insider threats in global offices demonstrate how compromised accounts are detected and locked. This work outlines measures to address privacy issues, false positives, and scalability concerns. This research enhances the security of other critical areas of computer systems by providing dynamic access governance, minimizing insider threats, and supporting dynamic policy enforcement while ensuring that the needed balance between security and user productivity remains a top priority. We prove via comparative analyses that the model is precise and scalable.
arXiv:2501.06981v1 Announce Type: new Abstract: The global AI surge demands crowdworkers from diverse languages and cultures. They are pivotal in labeling data for enabling global AI systems. Despite global significance, research has primarily focused on understanding the perspectives and experiences of US and India crowdworkers, leaving a notable gap. To bridge this, we conducted a survey with 100 crowdworkers across 16 Latin American and Caribbean countries. We discovered that these workers exhibited pride and respect for their digital labor, with strong support and admiration from their families. Notably, crowd work was also seen as a stepping stone to financial and professional independence. Surprisingly, despite wanting more connection, these workers also felt isolated from peers and doubtful of others' labor quality. They resisted collaboration and gender-based tools, valuing gender-neutrality. Our work advances HCI understanding of Latin American and Caribbean crowdwork, offering insights for digital resistance tools for the region.
arXiv:2501.06267v1 Announce Type: new Abstract: We describe a protocol for creating, updating, and revoking digital diplomas that we anticipate would make use of the protocol for transferring digital assets elaborated by Goodell, Toliver, and Nakib. Digital diplomas would maintain their own state, and make use a distributed ledger as a mechanism for verifying their integrity. The use of a distributed ledger enables verification of the state of an asset without the need to contact the issuing institution, and we describe how the integrity of a diploma issued in this way can persist even in the absence of the issuing institution.
arXiv:2501.06581v1 Announce Type: new Abstract: Prospective students face the challenging task of selecting a university program that will shape their academic and professional careers. For decision-makers and support services, it is often time-consuming and extremely difficult to match personal interests with suitable programs due to the vast and complex catalogue information available. This paper presents the first information system that provides students with efficient recommendations based on both program content and personal preferences. BERTopic, a powerful topic modeling algorithm, is used that leverages text embedding techniques to generate topic representations. It enables us to mine interest topics from all course descriptions, representing the full body of knowledge taught at the institution. Underpinned by the student's individual choice of topics, a shortlist of the most relevant programs is computed through statistical backtracking in the knowledge map, a novel characterization of the program-course relationship. This approach can be applied to a wide range of educational settings, including professional and vocational training. A case study at a post-secondary school with 80 programs and over 5,000 courses shows that the system provides immediate and effective decision support. The presented interest topics are meaningful, leading to positive effects such as serendipity, personalization, and fairness, as revealed by a qualitative study involving 65 students. Over 98% of users indicated that the recommendations aligned with their interests, and about 94% stated they would use the tool in the future. Quantitative analysis shows the system can be configured to ensure fairness, achieving 98% program coverage while maintaining a personalization score of 0.77. These findings suggest that this real-time, user-centered, data-driven system could improve the program selection process.
arXiv:2501.06399v1 Announce Type: new Abstract: From a simple text prompt, generative-AI image models can create stunningly realistic and creative images bounded, it seems, by only our imagination. These models have achieved this remarkable feat thanks, in part, to the ingestion of billions of images collected from nearly every corner of the internet. Many creators have understandably expressed concern over how their intellectual property has been ingested without their permission or a mechanism to opt out of training. As a result, questions of fair use and copyright infringement have quickly emerged. We describe a method that allows us to determine if a model was trained on a specific image or set of images. This method is computationally efficient and assumes no explicit knowledge of the model architecture or weights (so-called black-box membership inference). We anticipate that this method will be crucial for auditing existing models and, looking ahead, ensuring the fairer development and deployment of generative AI models.
arXiv:2305.03123v4 Announce Type: replace Abstract: ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e. sustainability, privacy, digital divide, and ethics and suggests that not only chatGPT but every subsequent entry in the category of conversational bots should undergo Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This paper discusses in detail the issues and concerns raised over chatGPT in line with aforementioned characteristics. We also discuss the recent EU AI Act briefly in accordance with the SPADE evaluation. We support our hypothesis by some preliminary data collection and visualizations along with hypothesized facts. We also suggest mitigations and recommendations for each of the concerns. Furthermore, we also suggest some policies and recommendations for EU AI policy act concerning ethics, digital divide, and sustainability