cs.CY

116 posts

arXiv:2501.00790v1 Announce Type: new Abstract: The rapid proliferation of Industrial Internet of Things (IIoT) systems necessitates advanced, interpretable, and scalable intrusion detection systems (IDS) to combat emerging cyber threats. Traditional IDS face challenges such as high computational demands, limited explainability, and inflexibility against evolving attack patterns. To address these limitations, this study introduces the Lightweight Explainable Network Security framework (LENS-XAI), which combines robust intrusion detection with enhanced interpretability and scalability. LENS-XAI integrates knowledge distillation, variational autoencoder models, and attribution-based explainability techniques to achieve high detection accuracy and transparency in decision-making. By leveraging a training set comprising 10% of the available data, the framework optimizes computational efficiency without sacrificing performance. Experimental evaluation on four benchmark datasets: Edge-IIoTset, UKM-IDS20, CTU-13, and NSL-KDD, demonstrates the framework's superior performance, achieving detection accuracies of 95.34%, 99.92%, 98.42%, and 99.34%, respectively. Additionally, the framework excels in reducing false positives and adapting to complex attack scenarios, outperforming existing state-of-the-art methods. Key strengths of LENS-XAI include its lightweight design, suitable for resource-constrained environments, and its scalability across diverse IIoT and cybersecurity contexts. Moreover, the explainability module enhances trust and transparency, critical for practical deployment in dynamic and sensitive applications. This research contributes significantly to advancing IDS by addressing computational efficiency, feature interpretability, and real-world applicability. Future work could focus on extending the framework to ensemble AI systems for distributed environments, further enhancing its robustness and adaptability.

Muhammet Anil Yagiz, Polat Goktas1/3/2025

arXiv:2501.00987v1 Announce Type: new Abstract: In light of Phillips' contention regarding the impracticality of Search Neutrality, asserting that non-epistemic factors presently dictate result prioritization, our objective in this study is to confront this constraint by questioning prevailing design practices in search engines. We posit that the concept of prioritization warrants scrutiny, along with the consistent hierarchical ordering that underlies this lack of neutrality. We introduce the term Search Plurality to encapsulate the idea of emphasizing the various means a query can be approached. This is demonstrated in a design that prioritizes the display of categories over specific search items, helping users grasp the breadth of their search. Whether a query allows for multiple interpretations or invites diverse opinions, the presentation of categories highlights the significance of organizing data based on relevance, importance, and relative significance, akin to traditional methods. However, unlike previous approaches, this method enriches our comprehension of the overall information landscape, countering the potential bias introduced by ranked lists.

Shiran Dudy1/3/2025

arXiv:2501.00164v1 Announce Type: new Abstract: Since the launch of ChatGPT in late 2022, the capacities of Large Language Models and their evaluation have been in constant discussion and evaluation both in academic research and in the industry. Scenarios and benchmarks have been developed in several areas such as law, medicine and math (Bommasani et al., 2023) and there is continuous evaluation of model variants. One area that has not received sufficient scenario development attention is journalism, and in particular journalistic sourcing and ethics. Journalism is a crucial truth-determination function in democracy (Vincent, 2023), and sourcing is a crucial pillar to all original journalistic output. Evaluating the capacities of LLMs to annotate stories for the different signals of sourcing and how reporters justify them is a crucial scenario that warrants a benchmark approach. It offers potential to build automated systems to contrast more transparent and ethically rigorous forms of journalism with everyday fare. In this paper we lay out a scenario to evaluate LLM performance on identifying and annotating sourcing in news stories on a five-category schema inspired from journalism studies (Gans, 2004). We offer the use case, our dataset and metrics and as the first step towards systematic benchmarking. Our accuracy findings indicate LLM-based approaches have more catching to do in identifying all the sourced statements in a story, and equally, in matching the type of sources. An even harder task is spotting source justifications.

Subramaniam Vincent, Phoebe Wang, Zhan Shi, Sahas Koka, Yi Fang1/3/2025

arXiv:2501.00778v1 Announce Type: new Abstract: Long-sequence causal reasoning seeks to uncover causal relationships within extended time series data but is hindered by complex dependencies and the challenges of validating causal links. To address the limitations of large-scale language models (e.g., GPT-4) in capturing intricate emotional causality within extended dialogues, we propose CauseMotion, a long-sequence emotional causal reasoning framework grounded in Retrieval-Augmented Generation (RAG) and multimodal fusion. Unlike conventional methods relying only on textual information, CauseMotion enriches semantic representations by incorporating audio-derived features-vocal emotion, emotional intensity, and speech rate-into textual modalities. By integrating RAG with a sliding window mechanism, it effectively retrieves and leverages contextually relevant dialogue segments, thus enabling the inference of complex emotional causal chains spanning multiple conversational turns. To evaluate its effectiveness, we constructed the first benchmark dataset dedicated to long-sequence emotional causal reasoning, featuring dialogues with over 70 turns. Experimental results demonstrate that the proposed RAG-based multimodal integrated approach, the efficacy of substantially enhances both the depth of emotional understanding and the causal inference capabilities of large-scale language models. A GLM-4 integrated with CauseMotion achieves an 8.7% improvement in causal accuracy over the original model and surpasses GPT-4o by 1.2%. Additionally, on the publicly available DiaASQ dataset, CauseMotion-GLM-4 achieves state-of-the-art results in accuracy, F1 score, and causal reasoning accuracy.

Yuxuan Zhang, Yulong Li, Zichen Yu, Feilong Tang, Zhixiang Lu, Chong Li, Kang Dang, Jionglong Su1/3/2025

arXiv:2501.00957v1 Announce Type: new Abstract: The rise of Generative AI (GAI) and Large Language Models (LLMs) has transformed industrial landscapes, offering unprecedented opportunities for efficiency and innovation while raising critical ethical, regulatory, and operational challenges. This study conducts a text-based analysis of 160 guidelines and policy statements across fourteen industrial sectors, utilizing systematic methods and text-mining techniques to evaluate the governance of these technologies. By examining global directives, industry practices, and sector-specific policies, the paper highlights the complexities of balancing innovation with ethical accountability and equitable access. The findings provide actionable insights and recommendations for fostering responsible, transparent, and safe integration of GAI and LLMs in diverse industry contexts.

Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar1/3/2025

arXiv:2501.00964v1 Announce Type: new Abstract: Recent advances in Generative AI (GenAI) are transforming multiple aspects of society, including education and foreign language learning. In the context of English as a Foreign Language (EFL), significant research has been conducted to investigate the applicability of GenAI as a learning aid and the potential negative impacts of new technologies. Critical questions remain about the future of AI, including whether improvements will continue at such a pace or stall and whether there is a true benefit to implementing GenAI in education, given the myriad costs and potential for negative impacts. Apart from the ethical conundrums that GenAI presents in EFL education, there is growing consensus that learners and teachers must develop AI literacy skills to enable them to use and critically evaluate the purposes and outputs of these technologies. However, there are few formalised frameworks available to support the integration and development of AI literacy skills for EFL learners. In this article, we demonstrate how the use of a general, all-purposes framework (the AI Assessment Scale) can be tailored to the EFL writing and translation context, drawing on existing empirical research validating the scale and adaptations to other contexts, such as English for Academic Purposes. We begin by engaging with the literature regarding GenAI and EFL writing and translation, prior to explicating the use of three levels of the updated AIAS for structuring EFL writing instruction which promotes academic literacy and transparency and provides a clear framework for students and teachers.

Jasper Roe (Durham University), Mike Perkins (British University Vietnam), Leon Furze (Deakin University)1/3/2025

arXiv:2501.00097v1 Announce Type: new Abstract: This paper introduces CaseSumm, a novel dataset for long-context summarization in the legal domain that addresses the need for longer and more complex datasets for summarization evaluation. We collect 25.6K U.S. Supreme Court (SCOTUS) opinions and their official summaries, known as "syllabuses." Our dataset is the largest open legal case summarization dataset, and is the first to include summaries of SCOTUS decisions dating back to 1815. We also present a comprehensive evaluation of LLM-generated summaries using both automatic metrics and expert human evaluation, revealing discrepancies between these assessment methods. Our evaluation shows Mistral 7b, a smaller open-source model, outperforms larger models on most automatic metrics and successfully generates syllabus-like summaries. In contrast, human expert annotators indicate that Mistral summaries contain hallucinations. The annotators consistently rank GPT-4 summaries as clearer and exhibiting greater sensitivity and specificity. Further, we find that LLM-based evaluations are not more correlated with human evaluations than traditional automatic metrics. Furthermore, our analysis identifies specific hallucinations in generated summaries, including precedent citation errors and misrepresentations of case facts. These findings demonstrate the limitations of current automatic evaluation methods for legal summarization and highlight the critical role of human evaluation in assessing summary quality, particularly in complex, high-stakes domains. CaseSumm is available at https://huggingface.co/datasets/ChicagoHAI/CaseSumm

Mourad Heddaya, Kyle MacMillan, Anup Malani, Hongyuan Mei, Chenhao Tan1/3/2025

arXiv:2501.00158v1 Announce Type: new Abstract: Accurate water consumption forecasting is a crucial tool for water utilities and policymakers, as it helps ensure a reliable supply, optimize operations, and support infrastructure planning. Urban Water Distribution Networks (WDNs) are divided into District Metered Areas (DMAs), where water flow is monitored to efficiently manage resources. This work focuses on short-term forecasting of DMA consumption using deep learning and aims to address two key challenging issues. First, forecasting based solely on a DMA's historical data may lack broader context and provide limited insights. Second, DMAs may experience sensor malfunctions providing incorrect data, or some DMAs may not be monitored at all due to computational costs, complicating accurate forecasting. We propose a novel method that first identifies DMAs with correlated consumption patterns and then uses these patterns, along with the DMA's local data, as input to a deep learning model for forecasting. In a real-world study with data from five DMAs, we show that: i) the deep learning model outperforms a classical statistical model; ii) accurate forecasting can be carried out using only correlated DMAs' consumption patterns; and iii) even when a DMA's local data is available, including correlated DMAs' data improves accuracy.

Kleanthis Malialis, Nefeli Mavri, Stelios G. Vrachimis, Marios S. Kyriakou, Demetrios G. Eliades, Marios M. Polycarpou1/3/2025

arXiv:2501.00192v1 Announce Type: new Abstract: Image content safety has become a significant challenge with the rise of visual media on online platforms. Meanwhile, in the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content, such as images containing sexual or violent material. Thus, it becomes crucial to identify such unsafe images based on established safety rules. Pre-trained Multimodal Large Language Models (MLLMs) offer potential in this regard, given their strong pattern recognition abilities. Existing approaches typically fine-tune MLLMs with human-labeled datasets, which however brings a series of drawbacks. First, relying on human annotators to label data following intricate and detailed guidelines is both expensive and labor-intensive. Furthermore, users of safety judgment systems may need to frequently update safety rules, making fine-tuning on human-based annotation more challenging. This raises the research question: Can we detect unsafe images by querying MLLMs in a zero-shot setting using a predefined safety constitution (a set of safety rules)? Our research showed that simply querying pre-trained MLLMs does not yield satisfactory results. This lack of effectiveness stems from factors such as the subjectivity of safety rules, the complexity of lengthy constitutions, and the inherent biases in the models. To address these challenges, we propose a MLLM-based method includes objectifying safety rules, assessing the relevance between rules and images, making quick judgments based on debiased token probabilities with logically complete yet simplified precondition chains for safety rules, and conducting more in-depth reasoning with cascaded chain-of-thought processes if necessary. Experiment results demonstrate that our method is highly effective for zero-shot image safety judgment tasks.

Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang, Lingjuan Lyu, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain1/3/2025

arXiv:2501.00367v1 Announce Type: new Abstract: This paper investigates the performance of several representative large models in the tasks of literature recommendation and explores potential biases in research exposure. The results indicate that not only LLMs' overall recommendation accuracy remains limited but also the models tend to recommend literature with greater citation counts, later publication date, and larger author teams. Yet, in scholar recommendation tasks, there is no evidence that LLMs disproportionately recommend male, white, or developed-country authors, contrasting with patterns of known human biases.

Yifan Tian, Yixin Liu, Yi Bu, Jiqun Liu1/3/2025

arXiv:2501.00855v1 Announce Type: new Abstract: Chatter on social media is 20% bots and 80% humans. Chatter by bots and humans is consistently different: bots tend to use linguistic cues that can be easily automated while humans use cues that require dialogue understanding. Bots use words that match the identities they choose to present, while humans may send messages that are not related to the identities they present. Bots and humans differ in their communication structure: sampled bots have a star interaction structure, while sampled humans have a hierarchical structure. These conclusions are based on a large-scale analysis of social media tweets across ~200mil users across 7 events. Social media bots took the world by storm when social-cybersecurity researchers realized that social media users not only consisted of humans but also of artificial agents called bots. These bots wreck havoc online by spreading disinformation and manipulating narratives. Most research on bots are based on special-purposed definitions, mostly predicated on the event studied. This article first begins by asking, "What is a bot?", and we study the underlying principles of how bots are different from humans. We develop a first-principle definition of a social media bot. With this definition as a premise, we systematically compare characteristics between bots and humans across global events, and reflect on how the software-programmed bot is an Artificial Intelligent algorithm, and its potential for evolution as technology advances. Based on our results, we provide recommendations for the use and regulation of bots. Finally, we discuss open challenges and future directions: Detect, to systematically identify these automated and potentially evolving bots; Differentiate, to evaluate the goodness of the bot in terms of their content postings and relationship interactions; Disrupt, to moderate the impact of malicious bots.

Lynnette Hui Xian Ng, Kathleen M. Carley1/3/2025

arXiv:2501.00939v1 Announce Type: new Abstract: The Web has drastically simplified our access to knowledge and learning, and fact-checking online resources has become a part of our daily routine. Studying online knowledge consumption is thus critical for understanding human behavior and informing the design of future platforms. In this Chapter, we approach this subject by describing the navigation patterns of the readers of Wikipedia, the world's largest platform for open knowledge. We provide a comprehensive overview of what is known about the three steps that characterize navigation on Wikipedia: (1) how readers reach the platform, (2) how readers navigate the platform, and (3) how readers leave the platform. Finally, we discuss open problems and opportunities for future research in this field.

Tiziano Piccardi, Robert West1/3/2025

arXiv:2501.00959v1 Announce Type: new Abstract: This paper introduces IGGA, a dataset of 160 industry guidelines and policy statements for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in industry and workplace settings, collected from official company websites, and trustworthy news sources. The dataset contains 104,565 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering, such as model synthesis, abstraction identification, and document structure assessment. Additionally, IGGA can be further annotated to function as a benchmark for various tasks, including ambiguity detection, requirements categorization, and the identification of equivalent requirements. Our methodologically rigorous approach ensured a thorough examination, with a selection of reputable and influential companies that represent a diverse range of global institutions across six continents. The dataset captures perspectives from fourteen industry sectors, including technology, finance, and both public and private institutions, offering a broad spectrum of insights into the integration of GAIs and LLMs in industry.

Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar1/3/2025

arXiv:2501.00962v1 Announce Type: new Abstract: Images generated by text-to-image (T2I) models often exhibit visual biases and stereotypes of concepts such as culture and profession. Existing quantitative measures of stereotypes are based on statistical parity that does not align with the sociological definition of stereotypes and, therefore, incorrectly categorizes biases as stereotypes. Instead of oversimplifying stereotypes as biases, we propose a quantitative measure of stereotypes that aligns with its sociological definition. We then propose OASIS to measure the stereotypes in a generated dataset and understand their origins within the T2I model. OASIS includes two scores to measure stereotypes from a generated image dataset: (M1) Stereotype Score to measure the distributional violation of stereotypical attributes, and (M2) WALS to measure spectral variance in the images along a stereotypical attribute. OASIS also includes two methods to understand the origins of stereotypes in T2I models: (U1) StOP to discover attributes that the T2I model internally associates with a given concept, and (U2) SPI to quantify the emergence of stereotypical attributes in the latent space of the T2I model during image generation. Despite the considerable progress in image fidelity, using OASIS, we conclude that newer T2I models such as FLUX.1 and SDv3 contain strong stereotypical predispositions about concepts and still generate images with widespread stereotypical attributes. Additionally, the quantity of stereotypes worsens for nationalities with lower Internet footprints.

Sepehr Dehdashtian, Gautam Sreekumar, Vishnu Naresh Boddeti1/3/2025

arXiv:2501.00081v1 Announce Type: new Abstract: This paper provides a comprehensive review of the design and implementation of automatically generated assessment reports (AutoRs) for formative use in K-12 Science, Technology, Engineering, and Mathematics (STEM) classrooms. With the increasing adoption of technology-enhanced assessments, there is a critical need for human-computer interactive tools that efficiently support the interpretation and application of assessment data by teachers. AutoRs are designed to provide synthesized, interpretable, and actionable insights into students' performance, learning progress, and areas for improvement. Guided by cognitive load theory, this study emphasizes the importance of reducing teachers' cognitive demands through user-centered and intuitive designs. It highlights the potential of diverse information presentation formats such as text, visual aids, and plots and advanced functionalities such as live and interactive features to enhance usability. However, the findings also reveal that many existing AutoRs fail to fully utilize these approaches, leading to high initial cognitive demands and limited engagement. This paper proposes a conceptual framework to inform the design, implementation, and evaluation of AutoRs, balancing the trade-offs between usability and functionality. The framework aims to address challenges in engaging teachers with technology-enhanced assessment results, facilitating data-driven decision-making, and providing personalized feedback to improve the teaching and learning process.

Ehsan Latif, Ying Chen, Xiaoming Zhai, Yue Yin1/3/2025

arXiv:2501.00083v1 Announce Type: new Abstract: The development of large language models has ushered in new paradigms for education. This paper centers on the multi-Agent system in education and proposes the von Neumann multi-Agent system framework. It breaks down each AI Agent into four modules: control unit, logic unit, storage unit, and input-output devices, defining four types of operations: task deconstruction, self-reflection, memory processing, and tool invocation. Furthermore, it introduces related technologies such as Chain-of-Thought, Reson+Act, and Multi-Agent Debate associated with these four types of operations. The paper also discusses the ability enhancement cycle of a multi-Agent system for education, including the outer circulation for human learners to promote knowledge construction and the inner circulation for LLM-based-Agents to enhance swarm intelligence. Through collaboration and reflection, the multi-Agent system can better facilitate human learners' learning and enhance their teaching abilities in this process.

Yuan-Hao Jiang, Ruijia Li, Yizhou Zhou, Changyong Qi, Hanglei Hu, Yuang Wei, Bo Jiang, Yonghe Wu1/3/2025

arXiv:2501.00017v1 Announce Type: new Abstract: This study investigates students' perceptions of Generative Artificial Intelligence (AI), with a focus on Higher Education institutions in Northern Ireland and India. We collect quantitative Likert ratings and qualitative comments from 1,211 students on their awareness and perceptions of AI and investigate variations in attitudes toward AI across institutions and subject areas, as well as interactions between these variables with demographic variables (focusing on gender). We find that: (a) while perceptions varied across institutions, responses for Computer Sciences students were similar; and (b) after controlling for institution and subject area, we observed no effect of gender. These results are consistent with previous studies, which find that students' perceptions are predicted by prior experience. We consider the implications of this relation and some considerations for the role of experience.

Juliana Gerard, Sahajpreet Singh, Morgan Macleod, Michael McKay, Antoine Rivoire, Tanmoy Chakraborty, Muskaan Singh1/3/2025

arXiv:2501.00056v1 Announce Type: new Abstract: Air pollution in cities, especially NO\textsubscript{2}, is linked to numerous health problems, ranging from mortality to mental health challenges and attention deficits in children. While cities globally have initiated policies to curtail emissions, real-time monitoring remains challenging due to limited environmental sensors and their inconsistent distribution. This gap hinders the creation of adaptive urban policies that respond to the sequence of events and daily activities affecting pollution in cities. Here, we demonstrate how city CCTV cameras can act as a pseudo-NO\textsubscript{2} sensors. Using a predictive graph deep model, we utilised traffic flow from London's cameras in addition to environmental and spatial factors, generating NO\textsubscript{2} predictions from over 133 million frames. Our analysis of London's mobility patterns unveiled critical spatiotemporal connections, showing how specific traffic patterns affect NO\textsubscript{2} levels, sometimes with temporal lags of up to 6 hours. For instance, if trucks only drive at night, their effects on NO\textsubscript{2} levels are most likely to be seen in the morning when people commute. These findings cast doubt on the efficacy of some of the urban policies currently being implemented to reduce pollution. By leveraging existing camera infrastructure and our introduced methods, city planners and policymakers could cost-effectively monitor and mitigate the impact of NO\textsubscript{2} and other pollutants.

Mohamed R. Ibrahim, Terry Lyons1/3/2025

arXiv:2501.00074v1 Announce Type: new Abstract: As technology advances, the integration of physical, virtual, and social worlds has led to a complex landscape of ``Realities'' such as Virtual Reality (VR), Augmented Reality (AR), metaverse, spatial computing, and other emerging paradigms. This paper builds upon and refines the concept of eXtended Reality (XR) as the unifying framework that not only interpolates across these diverse realities but also extrapolates (extends) to create entirely new possibilities. XR is the ``physical spatial metaverse,'' bridging the physical world, the virtual world of artificial intelligence, and the social world of human interaction. These three worlds define the Socio-Cyber-Physical Taxonomy of XR that allows us to identify underexplored research areas such as Diminished Reality (DR), and chart future directions to {\bf advance technology for people and planet}. We highlight the six core properties of XR for applications in sustainability, healthcare, frontline work, and daily life. Central to this vision is the development of AI-driven wearable technologies, such as the smart eyeglass, that sustainably extend human capabilities.

Steve Mann, Martin Cooper, Bran Ferren, Thomas M. Coughlin, Paul Travers1/3/2025

arXiv:2501.01128v1 Announce Type: new Abstract: Winter road maintenance is a critical priority for the Indiana Department of Transportation, which manages an extensive fleet across thousands of lane miles. The current manual tracking of snowplow workloads is inefficient and prone to errors. To address these challenges, we developed an in-browser web application that automates the creation and verification of work orders using a large-scale GPS dataset from telematics systems. The application processes millions of GPS data points from hundreds of vehicles over winter, significantly reducing manual labor and minimizing errors. Key features include geohashing for efficient road segment identification, detailed segment-level work records, and robust visualization of vehicle movements, even on repeated routes. Our proposed solution has the potential to enhance the accuracy and granularity of work records, support more effective resource allocation, ensure timely compensation for drivers, alleviate administrative burdens, and allow managers to focus on strategic planning and real-time challenges.

Anugunj Naman, Yaguang Zhang, Aaron Ault, James Krogmeier1/3/2025