cs.CR

208 posts

arXiv:2501.00261v1 Announce Type: new Abstract: The introduction sets the stage for exploring collaborative approaches to bolstering smart vehicle cybersecurity through AI-driven threat detection. As the automotive industry increasingly adopts connected and automated vehicles (CAVs), the need for robust cybersecurity measures becomes paramount. With the emergence of new vulnerabilities and security requirements, the integration of advanced technologies such as 5G networks, blockchain, and quantum computing presents promising avenues for enhancing CAV cybersecurity . Additionally, the roadmap for cybersecurity in autonomous vehicles emphasizes the importance of efficient intrusion detection systems and AI-based techniques, along with the integration of secure hardware, software stacks, and advanced threat intelligence to address cybersecurity challenges in future autonomous vehicles.

Syed Atif Ali, Salwa Din1/3/2025

arXiv:2501.00446v1 Announce Type: new Abstract: As the scope and impact of cyber threats have expanded, analysts utilize audit logs to hunt threats and investigate attacks. The provenance graphs constructed from kernel logs are increasingly considered as an ideal data source due to their powerful semantic expression and attack historic correlation ability. However, storing provenance graphs with traditional databases faces the challenge of high storage overhead, given the high frequency of kernel events and the persistence of attacks. To address this, we propose Dehydrator, an efficient provenance graph storage system. For the logs generated by auditing frameworks, Dehydrator uses field mapping encoding to filter field-level redundancy, hierarchical encoding to filter structure-level redundancy, and finally learns a deep neural network to support batch querying. We have conducted evaluations on seven datasets totaling over one billion log entries. Experimental results show that Dehydrator reduces the storage space by 84.55%. Dehydrator is 7.36 times more efficient than PostgreSQL, 7.16 times than Neo4j, and 16.17 times than Leonard (the work most closely related to Dehydrator, published at Usenix Security'23).

Jie Ying, Tiantian Zhu, Mingqi Lv, Tieming Chen1/3/2025

arXiv:2501.00193v1 Announce Type: new Abstract: Pseudo-random number generators (PRNGs) are essential in a wide range of applications, from cryptography to statistical simulations and optimization algorithms. While uniform randomness is crucial for security-critical areas like cryptography, many domains, such as simulated annealing and CMOS-based Ising Machines, benefit from controlled or non-uniform randomness to enhance solution exploration and optimize performance. This paper presents a hardware PRNG that can simultaneously generate multiple uncorrelated sequences with programmable statistics tailored to specific application needs. Designed in 65nm process, the PRNG occupies an area of approximately 0.0013mm^2 and has an energy consumption of 0.57pJ/bit. Simulations confirm the PRNG's effectiveness in modulating the statistical distribution while demonstrating high-quality randomness properties.

Jianan Wu, Ahmet Yusuf Salim, Eslam Elmitwalli, Sel\c{c}uk K\"ose, Zeljko Ignjatovic1/3/2025

arXiv:2501.00260v1 Announce Type: new Abstract: Phishing is an online identity theft technique where attackers steal users personal information, leading to financial losses for individuals and organizations. With the increasing adoption of smartphones, which provide functionalities similar to desktop computers, attackers are targeting mobile users. Smishing, a phishing attack carried out through Short Messaging Service (SMS), has become prevalent due to the widespread use of SMS-based services. It involves deceptive messages designed to extract sensitive information. Despite the growing number of smishing attacks, limited research focuses on detecting these threats. This work presents a smishing detection model using a content-based analysis approach. To address the challenge posed by slang, abbreviations, and short forms in text communication, the model normalizes these into standard forms. A machine learning classifier is employed to classify messages as smishing or ham. Experimental results demonstrate the model effectiveness, achieving classification accuracies of 97.14% for smishing and 96.12% for ham messages, with an overall accuracy of 96.20%.

Diksha Goel1/3/2025

arXiv:2501.00337v1 Announce Type: new Abstract: In the almost-everywhere reliable message transmission problem, introduced by [Dwork, Pippenger, Peleg, Upfal'86], the goal is to design a sparse communication network $G$ that supports efficient, fault-tolerant protocols for interactions between all node pairs. By fault-tolerant, we mean that that even if an adversary corrupts a small fraction of vertices in $G$, then all but a small fraction of vertices can still communicate perfectly via the constructed protocols. Being successful to do so allows one to simulate, on a sparse graph, any fault-tolerant distributed computing task and secure multi-party computation protocols built for a complete network, with only minimal overhead in efficiency. Previous works on this problem achieved either constant-degree networks tolerating $o(1)$ faults, constant-degree networks tolerating a constant fraction of faults via inefficient protocols (exponential work complexity), or poly-logarithmic degree networks tolerating a constant fraction of faults. We show a construction of constant-degree networks with efficient protocols (i.e., with polylogarithmic work complexity) that can tolerate a constant fraction of adversarial faults, thus solving the main open problem of Dwork et al.. Our main contribution is a composition technique for communication networks, based on graph products. Our technique combines two networks tolerant to adversarial edge-faults to construct a network with a smaller degree while maintaining efficiency and fault-tolerance. We apply this composition result multiple times, using the polylogarithmic-degree edge-fault tolerant networks constructed in a recent work of [Bafna, Minzer, Vyas'24] (that are based on high-dimensional expanders) with itself, and then with the constant-degree networks (albeit with inefficient protocols) of [Upfal'92].

Mitali Bafna, Dor Minzer1/3/2025

arXiv:2501.00438v1 Announce Type: new Abstract: As Advanced Persistent Threat (APT) complexity increases, provenance data is increasingly used for detection. Anomaly-based systems are gaining attention due to their attack-knowledge-agnostic nature and ability to counter zero-day vulnerabilities. However, traditional detection paradigms, which train on offline, limited-size data, often overlook concept drift - unpredictable changes in streaming data distribution over time. This leads to high false positive rates. We propose incremental learning as a new paradigm to mitigate this issue. However, we identify FOUR CHALLENGES while integrating incremental learning as a new paradigm. First, the long-running incremental system must combat catastrophic forgetting (C1) and avoid learning malicious behaviors (C2). Then, the system needs to achieve precise alerts (C3) and reconstruct attack scenarios (C4). We present METANOIA, the first lifelong detection system that mitigates the high false positives due to concept drift. It connects pseudo edges to combat catastrophic forgetting, transfers suspicious states to avoid learning malicious behaviors, filters nodes at the path-level to achieve precise alerts, and constructs mini-graphs to reconstruct attack scenarios. Using state-of-the-art benchmarks, we demonstrate that METANOIA improves precision performance at the window-level, graph-level, and node-level by 30%, 54%, and 29%, respectively, compared to previous approaches.

Jie Ying, Tiantian Zhu, Aohan Zheng, Tieming Chen, Mingqi Lv, Yan Chen1/3/2025

arXiv:2501.00066v1 Announce Type: new Abstract: We investigate the adversarial robustness of LLMs in transfer learning scenarios. Through comprehensive experiments on multiple datasets (MBIB Hate Speech, MBIB Political Bias, MBIB Gender Bias) and various model architectures (BERT, RoBERTa, GPT-2, Gemma, Phi), we reveal that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods. Our work highlights the crucial need for considering adversarial robustness in transfer learning scenarios and provides insights into maintaining model security without compromising performance. These findings have significant implications for the development and deployment of LLMs in real-world applications where both performance and robustness are paramount.

Bohdan Turbal, Anastasiia Mazur, Jiaxu Zhao, Mykola Pechenizkiy1/3/2025

arXiv:2501.00186v1 Announce Type: new Abstract: Rapid progress in the development of information technology has led to a significant increase in the number and complexity of cyber threats. Traditional methods of cybersecurity training based on theoretical knowledge do not provide a sufficient level of practical skills to effectively counter real threats. The article explores the possibilities of integrating simulation environments into the cybersecurity training process as an effective approach to improving the quality of training. The article presents the architecture of a simulation environment based on a cluster of KVM hypervisors, which allows creating scalable and flexible platforms at minimal cost. The article describes the implementation of various scenarios using open source software tools such as pfSense, OPNsense, Security Onion, Kali Linux, Parrot Security OS, Ubuntu Linux, Oracle Linux, FreeBSD, and others, which create realistic conditions for practical training.

Dmytro Tymoshchuk, Vasyl Yatskiv, Vitaliy Tymoshchuk, Nataliya Yatskiv1/3/2025

arXiv:2501.00214v1 Announce Type: new Abstract: In this work, we propose an error-free, information-theoretically secure, asynchronous multi-valued validated Byzantine agreement (MVBA) protocol, called OciorMVBA. This protocol achieves MVBA consensus on a message $\boldsymbol{w}$ with expected $O(n |\boldsymbol{w}|\log n + n^2 \log q)$ communication bits, expected $O(n^2)$ messages, expected $O(\log n)$ rounds, and expected $O(\log n)$ common coins, under optimal resilience $n \geq 3t + 1$ in an $n$-node network, where up to $t$ nodes may be dishonest. Here, $q$ denotes the alphabet size of the error correction code used in the protocol. When error correction codes with a constant alphabet size (e.g., Expander Codes) are used, $q$ becomes a constant. An MVBA protocol that guarantees all required properties without relying on any cryptographic assumptions, such as signatures or hashing, except for the common coin assumption, is said to be information-theoretically secure (IT secure). Under the common coin assumption, an MVBA protocol that guarantees all required properties in all executions is said to be error-free. We also propose another error-free, IT-secure, asynchronous MVBA protocol, called OciorMVBArr. This protocol achieves MVBA consensus with expected $O(n |\boldsymbol{w}| + n^2 \log n)$ communication bits, expected $O(1)$ rounds, and expected $O(1)$ common coins, under a relaxed resilience (RR) of $n \geq 5t + 1$. Additionally, we propose a hash-based asynchronous MVBA protocol, called OciorMVBAh. This protocol achieves MVBA consensus with expected $O(n |\boldsymbol{w}| + n^3)$ bits, expected $O(1)$ rounds, and expected $O(1)$ common coins, under optimal resilience $n \geq 3t + 1$.

Jinyuan Chen1/3/2025

arXiv:2501.00230v1 Announce Type: new Abstract: This paper introduces FDSC, a private-protected subspace clustering (SC) approach with federated learning (FC) schema. In each client, there is a deep subspace clustering network accounting for grouping the isolated data, composed of a encode network, a self-expressive layer, and a decode network. FDSC is achieved by uploading the encode network to communicate with other clients in the server. Besides, FDSC is also enhanced by preserving the local neighborhood relationship in each client. With the effects of federated learning and locality preservation, the learned data features from the encoder are boosted so as to enhance the self-expressiveness learning and result in better clustering performance. Experiments test FDSC on public datasets and compare with other clustering methods, demonstrating the effectiveness of FDSC.

Yupei Zhang, Ruojia Feng, Yifei Wang, Xuequn Shang1/3/2025

arXiv:2501.00264v1 Announce Type: new Abstract: Wireless Sensor Networks (WSNs) continue to experience rapid developments and integration into modern-day applications. Overall, WSNs collect and process relevant data through sensors or nodes and communicate with different networks for superior information management. Nevertheless, a primary concern relative to WSNs is security. Considering the high constraints on throughput, battery, processing power, and memory, typical security procedures present limitations for application in WSNs. This research focuses on the integration of WSNs with the cloud platform, specifically to address these security risks. The cloud platform also adopts a security-driven approach and has attracted many applications across various sectors globally. This research specifically explores how cloud computing could be exploited to impede Denial of Service attacks from endangering WSNs. WSNs are now deployed in various low-powered applications, including disaster management, homeland security, battlefield surveillance, agriculture, and the healthcare industry. WSNs are distinguished from traditional networks by the numerous wireless connected sensors being deployed to conduct an assigned task. In testing scenarios, the size of WSNs ranges from a few to several thousand. The overarching requirements of WSNs include rapid processing of collected data, low-cost installation and maintenance, and low latency in network operations. Given that a substantial amount of WSN applications are used in high-risk and volatile environments, they must effectively address security concerns. This includes the secure movement, storage, and communication of data through networks, an environment in which WSNs are notably vulnerable. The limitations of WSNs have meant that they are predominantly used in unsecured applications despite positive advancements. This study explores methods for integrating the WSN with the cloud.

Syed Atif Ali, Salwa Din1/3/2025

arXiv:2501.00281v1 Announce Type: new Abstract: Noisy channels are valuable resources for cryptography, enabling information-theoretically secure protocols for cryptographic primitives like bit commitment and oblivious transfer. While existing work has primarily considered memoryless channels, we consider more flexible channel resources that a dishonest player can configure arbitrarily within some constraints on their min-entropy. We present a protocol for string commitment over such channels that is complete, hiding, and binding, and derive its achievable commitment rate, demonstrating the possibility of string commitment in noisy channels with a stronger adversarial model. The asymptotic commitment rate coincides with previous results when the adversarial channels are the same binary symmetric channel as in the honest case.

Jiawei Wu, Masahito Hayashi, Marco Tomamichel1/3/2025

arXiv:2501.00356v1 Announce Type: new Abstract: Malicious URL (Uniform Resource Locator) classification is a pivotal aspect of Cybersecurity, offering defense against web-based threats. Despite deep learning's promise in this area, its advancement is hindered by two main challenges: the scarcity of comprehensive, open-source datasets and the limitations of existing models, which either lack real-time capabilities or exhibit suboptimal performance. In order to address these gaps, we introduce a novel, multi-class dataset for malicious URL classification, distinguishing between benign, phishing and malicious URLs, named DeepURLBench. The data has been rigorously cleansed and structured, providing a superior alternative to existing datasets. Notably, the multi-class approach enhances the performance of deep learning models, as compared to a standard binary classification approach. Additionally, we propose improvements to string-based URL classifiers, applying these enhancements to URLNet. Key among these is the integration of DNS-derived features, which enrich the model's capabilities and lead to notable performance gains while preserving real-time runtime efficiency-achieving an effective balance for cybersecurity applications.

Ilan Schvartzman, Roei Sarussi, Maor Ashkenazi, Ido kringel, Yaniv Tocker, Tal Furman Shohet1/3/2025

arXiv:2501.00363v1 Announce Type: new Abstract: Privacy computing receives increasing attention but writing privacy computing code remains challenging for developers due to limited library functions that necessitate extensive function implementation from scratch as well as the data-oblivious requirement which contradicts intuitive thinking and usual practices of programmers. Large language models (LLMs) have demonstrated surprising capabilities in coding tasks and achieved state-of-the-art performance across many benchmarks. However, even with extensive prompting, existing LLMs struggle with code translation task for privacy computing, such as translating Python to MP-SPDZ, due to the scarcity of MP-SPDZ data required for effective pre-training or fine-tuning. To address the limitation, this paper proposes SPDZCoder, a rule-based framework to teach LLMs to synthesize privacy computing code without asking experts to write tons of code and by leveraging the instruction-following and in-context learning ability of LLMs. Specifically, SPDZCoder decouples the translation task into the refactoring stage and the generation stage, which can mitigate the semantic-expressing differences at different levels. In addition, SPDZCoder can further improve its performance by a feedback stage. SPDZCoder does not require fine-tuning since it adopts an in-context learning paradigm of LLMs. To evaluate SPDZCoder, we manually created a benchmark dataset, named SPDZEval, containing six classes of difficult tasks to implement in MP-SPDZ. We conduct experiments on SPDZEval and the experimental results shows that SPDZCoder achieves the state-of-the-art performance in pass@1 and pass@2 across six data splits. Specifically, SPDZCoder achieves an overall correctness of 85.94% and 92.01% in pass@1 and pass@2, respectively, significantly surpassing baselines (at most 30.35% and 49.84% in pass@1 and pass@2, respectively) by a large margin.

Xiaoning Dong, Peilin Xin, Wei Xu1/3/2025

arXiv:2501.00050v1 Announce Type: new Abstract: Network intrusion detection systems face significant challenges in identifying emerging attack patterns, especially when limited data samples are available. To address this, we propose a novel Multi-Space Prototypical Learning (MSPL) framework tailored for few-shot attack detection. The framework operates across multiple metric spaces-Euclidean, Cosine, Chebyshev, and Wasserstein distances-integrated through a constrained weighting scheme to enhance embedding robustness and improve pattern recognition. By leveraging Polyak-averaged prototype generation, the framework stabilizes the learning process and effectively adapts to rare and zero-day attacks. Additionally, an episodic training paradigm ensures balanced representation across diverse attack classes, enabling robust generalization. Experimental results on benchmark datasets demonstrate that MSPL outperforms traditional approaches in detecting low-profile and novel attack types, establishing it as a robust solution for zero-day attack detection.

Fernando Martinez-Lopez, Lesther Santana, Mohamed Rahouti1/3/2025

arXiv:2501.00055v1 Announce Type: new Abstract: While safety-aligned large language models (LLMs) are increasingly used as the cornerstone for powerful systems such as multi-agent frameworks to solve complex real-world problems, they still suffer from potential adversarial queries, such as jailbreak attacks, which attempt to induce harmful content. Researching attack methods allows us to better understand the limitations of LLM and make trade-offs between helpfulness and safety. However, existing jailbreak attacks are primarily based on opaque optimization techniques (e.g. token-level gradient descent) and heuristic search methods like LLM refinement, which fall short in terms of transparency, transferability, and computational cost. In light of these limitations, we draw inspiration from the evolution and infection processes of biological viruses and propose LLM-Virus, a jailbreak attack method based on evolutionary algorithm, termed evolutionary jailbreak. LLM-Virus treats jailbreak attacks as both an evolutionary and transfer learning problem, utilizing LLMs as heuristic evolutionary operators to ensure high attack efficiency, transferability, and low time cost. Our experimental results on multiple safety benchmarks show that LLM-Virus achieves competitive or even superior performance compared to existing attack methods.

Miao Yu, Junfeng Fang, Yingjie Zhou, Xing Fan, Kun Wang, Shirui Pan, Qingsong Wen1/3/2025

arXiv:2501.00085v1 Announce Type: new Abstract: Security-Enhanced Linux (SELinux) is a robust security mechanism that enforces mandatory access controls (MAC), but its policy language's complexity creates challenges for policy analysis and management. This research investigates the automation of SELinux policy analysis using graph-based techniques combined with machine learning approaches to detect policy anomalies. The study addresses two key questions: Can SELinux policy analysis be automated through graph analysis, and how do different anomaly detection models compare in analyzing SELinux policies? We will be comparing different machine learning models by evaluating their effectiveness in detecting policy violations and anomalies. Our approach utilizes Neo4j for graph representation of policies, with Node2vec transforming these graph structures into meaningful vector embeddings that can be processed by our machine learning models. In our results, the MLP Neural Network consistently demonstrated superior performance across different dataset sizes, achieving 95% accuracy with balanced precision and recall metrics, while both Random Forest and SVM models showed competitive but slightly lower performance in detecting policy violations. This combination of graph-based modeling and machine learning provides a more sophisticated and automated approach to understanding and analyzing complex SELinux policies compared to traditional manual analysis methods.

Krish Jain, Joann Sum, Pranav Kapoor, Amir Eaman1/3/2025

arXiv:2501.00175v1 Announce Type: new Abstract: Timestamps play a pivotal role in digital forensic event reconstruction, but due to their non-essential nature, tampering or manipulation of timestamps is possible by users in multiple ways, even on running systems. This has a significant effect on the reliability of the results from applying a timeline analysis as part of an investigation. In this paper, we investigate the problem of users tampering with timestamps on a running (``live'') system. While prior work has shown that digital evidence tampering is hard, we focus on the question of \emph{why} this is so. By performing a qualitative user study with advanced university students, we observe, for example, a commonly applied multi-step approach in order to deal with second-order traces (traces of traces). We also derive factors that influence the reliability of successful tampering, such as the individual knowledge about temporal traces, and technical restrictions to change them. These insights help to assess the reliability of timestamps from individual artifacts that are relied on for event reconstruction and subsequently reduce the risk of incorrect event reconstruction during investigations.

C\'eline Vanini, Jan Gruber, Christopher Hargreaves, Zinaida Benenson, Felix Freiling, Frank Breitinger1/3/2025

arXiv:2501.00200v1 Announce Type: new Abstract: Recently, cutting-plane methods such as GCP-CROWN have been explored to enhance neural network verifiers and made significant advances. However, GCP-CROWN currently relies on generic cutting planes (cuts) generated from external mixed integer programming (MIP) solvers. Due to the poor scalability of MIP solvers, large neural networks cannot benefit from these cutting planes. In this paper, we exploit the structure of the neural network verification problem to generate efficient and scalable cutting planes specific for this problem setting. We propose a novel approach, Branch-and-bound Inferred Cuts with COnstraint Strengthening (BICCOS), which leverages the logical relationships of neurons within verified subproblems in the branch-and-bound search tree, and we introduce cuts that preclude these relationships in other subproblems. We develop a mechanism that assigns influence scores to neurons in each path to allow the strengthening of these cuts. Furthermore, we design a multi-tree search technique to identify more cuts, effectively narrowing the search space and accelerating the BaB algorithm. Our results demonstrate that BICCOS can generate hundreds of useful cuts during the branch-and-bound process and consistently increase the number of verifiable instances compared to other state-of-the-art neural network verifiers on a wide range of benchmarks, including large networks that previous cutting plane methods could not scale to. BICCOS is part of the $\alpha,\beta$-CROWN verifier, the VNN-COMP 2024 winner. The code is available at http://github.com/Lemutisme/BICCOS .

Duo Zhou, Christopher Brix, Grani A Hanasusanto, Huan Zhang1/3/2025

arXiv:2501.00463v1 Announce Type: new Abstract: The proliferation of AI-generated images necessitates effective watermarking to protect intellectual property and identify fake content. While existing training-based watermarking methods show promise, they often struggle with generalization across diverse prompts and tend to produce noticeable artifacts. To this end, we introduce a provably generalizable image watermarking method for Latent Diffusion Models with Self-Augmented Training (SAT-LDM), which aligns the training and testing phases by a free generation distribution to bolster the watermarking module's generalization capabilities. We theoretically consolidate our method by proving that the free generation distribution contributes to its tight generalization bound without the need to collect new data. Extensive experimental results show that SAT-LDM achieves robust watermarking while significantly improving the quality of watermarked images across diverse prompts. Furthermore, we conduct experimental analyses to demonstrate the strong generalization abilities of SAT-LDM. We hope our method offers a practical and convenient solution for securing high-fidelity AI-generated content.

Lu Zhang, Liang Zeng1/3/2025