cs.CR

546 posts

arXiv:2503.22413v1 Announce Type: new Abstract: The growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the urgent need for reliable data-use auditing mechanisms to ensure accountability and transparency in ML. In this paper, we present the first proactive instance-level data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML models, providing more fine-grained auditing results. Our approach integrates any black-box membership inference technique with a sequential hypothesis test, providing a quantifiable and tunable false-detection rate. We evaluate our method on three types of visual ML models: image classifiers, visual encoders, and Contrastive Image-Language Pretraining (CLIP) models. In additional, we apply our method to evaluate the performance of two state-of-the-art approximate unlearning methods. Our findings reveal that neither method successfully removes the influence of the unlearned data instances from image classifiers and CLIP models even if sacrificing model utility by $10.33\%$.

Zonghao Huang, Neil Zhenqiang Gong, Michael K. Reiter3/31/2025

arXiv:2307.10187v3 Announce Type: replace Abstract: For scalable machine learning on large data sets, subsampling a representative subset is a common approach for efficient model training. This is often achieved through importance sampling, whereby informative data points are sampled more frequently. In this paper, we examine the privacy properties of importance sampling, focusing on an individualized privacy analysis. We find that, in importance sampling, privacy is well aligned with utility but at odds with sample size. Based on this insight, we propose two approaches for constructing sampling distributions: one that optimizes the privacy-efficiency trade-off; and one based on a utility guarantee in the form of coresets. We evaluate both approaches empirically in terms of privacy, efficiency, and accuracy on the differentially private $k$-means problem. We observe that both approaches yield similar outcomes and consistently outperform uniform sampling across a wide range of data sets. Our code is available on GitHub: https://github.com/smair/personalized-privacy-amplification-via-importance-sampling

Dominik Fay, Sebastian Mair, Jens Sj\"olund3/31/2025

arXiv:2503.22232v1 Announce Type: new Abstract: Traditional Neighbor Discovery (ND) and Secure Neighbor Discovery (SND) are key elements for network functionality. SND is a hard problem, satisfying not only typical security properties (authentication, integrity) but also verification of direct communication, which involves distance estimation based on time measurements and device coordinates. Defeating relay attacks, also known as "wormholes", leading to stealthy Byzantine links and significant degradation of communication and adversarial control, is key in many wireless networked systems. However, SND is not concerned with privacy; it necessitates revealing the identity and location of the device(s) participating in the protocol execution. This can be a deterrent for deployment, especially involving user-held devices in the emerging Internet of Things (IoT) enabled smart environments. To address this challenge, we present a novel Privacy-Preserving Secure Neighbor Discovery (PP-SND) protocol, enabling devices to perform SND without revealing their actual identities and locations, effectively decoupling discovery from the exposure of sensitive information. We use Homomorphic Encryption (HE) for computing device distances without revealing their actual coordinates, as well as employing a pseudonymous device authentication to hide identities while preserving communication integrity. PP-SND provides SND [1] along with pseudonymity, confidentiality, and unlinkability. Our presentation here is not specific to one wireless technology, and we assess the performance of the protocols (cryptographic overhead) on a Raspberry Pi 4 and provide a security and privacy analysis.

Ahmed Mohamed Hussain, Panos Papadimitratos3/31/2025

arXiv:2503.22406v1 Announce Type: new Abstract: Typosquatting is a long-standing cyber threat that exploits human error in typing URLs to deceive users, distribute malware, and conduct phishing attacks. With the proliferation of domain names and new Top-Level Domains (TLDs), typosquatting techniques have grown more sophisticated, posing significant risks to individuals, businesses, and national cybersecurity infrastructure. Traditional detection methods primarily focus on well-known impersonation patterns, leaving gaps in identifying more complex attacks. This study introduces a novel approach leveraging large language models (LLMs) to enhance typosquatting detection. By training an LLM on character-level transformations and pattern-based heuristics rather than domain-specific data, a more adaptable and resilient detection mechanism develops. Experimental results indicate that the Phi-4 14B model outperformed other tested models when properly fine tuned achieving a 98% accuracy rate with only a few thousand training samples. This research highlights the potential of LLMs in cybersecurity applications, specifically in mitigating domain-based deception tactics, and provides insights into optimizing machine learning strategies for threat detection.

Jackson Welch3/31/2025

arXiv:2503.22612v1 Announce Type: new Abstract: This study evaluates the adoption of DevSecOps among small and medium-sized enterprises (SMEs), identifying key challenges, best practices, and future trends. Through a mixed methods approach backed by the Technology Acceptance Model (TAM) and Diffusion of Innovations (DOI) theory, we analyzed survey data from 405 SME professionals, revealing that while 68% have implemented DevSecOps, adoption is hindered by technical complexity (41%), resource constraints (35%), and cultural resistance (38%). Despite strong leadership prioritization of security (73%), automation gaps persist, with only 12% of organizations conducting security scans per commit. Our findings highlight a growing integration of security tools, particularly API security (63%) and software composition analysis (62%), although container security adoption remains low (34%). Looking ahead, SMEs anticipate artificial intelligence and machine learning to significantly influence DevSecOps, underscoring the need for proactive adoption of AI-driven security enhancements. Based on our findings, this research proposes strategic best practices to enhance CI/CD pipeline security including automation, leadership-driven security culture, and cross-team collaboration.

Jayaprakashreddy Cheenepalli, John D. Hastings, Khandaker Mamun Ahmed, Chad Fenner3/31/2025

arXiv:2211.09810v2 Announce Type: replace Abstract: The robustness of neural network classifiers is important in the safety-critical domain and can be quantified by robustness verification. At present, efficient and scalable verification techniques are always sound but incomplete, and thus, the improvement of verified robustness results is the key criterion to evaluate the performance of incomplete verification approaches. The multi-variate function MaxPool is widely adopted yet challenging to verify. In this paper, we present Ti-Lin, a robustness verifier for MaxPool-based CNNs with Tight Linear Approximation. Following the sequel of minimizing the over-approximation zone of the non-linear function of CNNs, we are the first to propose the provably neuron-wise tightest linear bounds for the MaxPool function. By our proposed linear bounds, we can certify larger robustness results for CNNs. We evaluate the effectiveness of Ti-Lin on different verification frameworks with open-sourced benchmarks, including LeNet, PointNet, and networks trained on the MNIST, CIFAR-10, Tiny ImageNet and ModelNet40 datasets. Experimental results show that Ti-Lin significantly outperforms the state-of-the-art methods across all networks with up to 78.6% improvement in terms of the certified accuracy with almost the same time consumption as the fastest tool. Our code is available at https://github.com/xiaoyuanpigo/Ti-Lin-Hybrid-Lin.

Yuan Xiao, Yuchen Chen, Shiqing Ma, Chunrong Fang, Tongtong Bai, Mingzheng Gu, Yuxin Cheng, Yanwei Chen, Zhenyu Chen3/31/2025

arXiv:2503.22010v1 Announce Type: new Abstract: Self-Sovereign Identity (SSI) is an emerging paradigm for authentication and credential presentation that aims to give users control over their data and prevent any kind of tracking by (even trusted) third parties. In the European Union, the EUDI Digital Identity wallet is about to become a concrete implementation of this paradigm. However, a debate is still ongoing, partially reflecting some aspects that are not yet consolidated in the scientific state of the art. Among these, an effective, efficient, and privacy-preserving implementation of verifiable credential revocation remains a subject of discussion. In this work-in-progress paper, we propose the basis of a novel method that customizes the use of anonymous hierarchical identity-based encryption to restrict the Verifier access to the temporal authorizations granted by the Holder. This way, the Issuer cannot track the Holder's credential presentations, and the Verifier cannot check revocation information beyond what is permitted by the Holder.

Francesco Buccafurri, Carmen Licciardi3/31/2025

arXiv:2503.22161v1 Announce Type: new Abstract: Traffic analysis using machine learning and deep learning models has made significant progress over the past decades. These models address various tasks in network security and privacy, including detection of anomalies and attacks, countering censorship, etc. They also reveal privacy risks to users as demonstrated by the research on LLM token inference as well as fingerprinting (and counter-fingerprinting) of user-visiting websites, IoT devices, and different applications. However, challenges remain in securing our networks from threats and attacks. After briefly reviewing the tasks and recent ML models in network security and privacy, we discuss the challenges that lie ahead.

Dinil Mon Divakaran3/31/2025

arXiv:2503.22330v1 Announce Type: new Abstract: Invisible watermarking is critical for content provenance and accountability in Generative AI. Although commercial companies have increasingly committed to using watermarks, the robustness of existing watermarking schemes against forgery attacks is understudied. This paper proposes DiffForge, the first watermark forgery framework capable of forging imperceptible watermarks under a no-box setting. We estimate the watermark distribution using an unconditional diffusion model and introduce shallow inversion to inject the watermark into a non-watermarked image seamlessly. This approach facilitates watermark injection while preserving image quality by adaptively selecting the depth of inversion steps, leveraging our key insight that watermarks degrade with added noise during the early diffusion phases. Comprehensive evaluations show that DiffForge deceives open-source watermark detectors with a 96.38% success rate and misleads a commercial watermark system with over 97% success rate, achieving high confidence.1 This work reveals fundamental security limitations in current watermarking paradigms.

Ziping Dong, Chao Shuai, Zhongjie Ba, Peng Cheng, Zhan Qin, Qinglong Wang, Kui Ren3/31/2025

arXiv:2503.22379v1 Announce Type: new Abstract: The task of $\textit{Differentially Private Text Rewriting}$ is a class of text privatization techniques in which (sensitive) input textual documents are $\textit{rewritten}$ under Differential Privacy (DP) guarantees. The motivation behind such methods is to hide both explicit and implicit identifiers that could be contained in text, while still retaining the semantic meaning of the original text, thus preserving utility. Recent years have seen an uptick in research output in this field, offering a diverse array of word-, sentence-, and document-level DP rewriting methods. Common to these methods is the selection of a privacy budget (i.e., the $\varepsilon$ parameter), which governs the degree to which a text is privatized. One major limitation of previous works, stemming directly from the unique structure of language itself, is the lack of consideration of $\textit{where}$ the privacy budget should be allocated, as not all aspects of language, and therefore text, are equally sensitive or personal. In this work, we are the first to address this shortcoming, asking the question of how a given privacy budget can be intelligently and sensibly distributed amongst a target document. We construct and evaluate a toolkit of linguistics- and NLP-based methods used to allocate a privacy budget to constituent tokens in a text document. In a series of privacy and utility experiments, we empirically demonstrate that given the same privacy budget, intelligent distribution leads to higher privacy levels and more positive trade-offs than a naive distribution of $\varepsilon$. Our work highlights the intricacies of text privatization with DP, and furthermore, it calls for further work on finding more efficient ways to maximize the privatization benefits offered by DP in text rewriting.

Stephen Meisenbacher, Chaeeun Joy Lee, Florian Matthes3/31/2025

arXiv:2503.22503v1 Announce Type: new Abstract: This paper evaluates the Audio Spectrogram Transformer (AST) architecture for synthesized speech detection, with focus on generalization across modern voice generation technologies. Using differentiated augmentation strategies, the model achieves 0.91% EER overall when tested against ElevenLabs, NotebookLM, and Minimax AI voice generators. Notably, after training with only 102 samples from a single technology, the model demonstrates strong cross-technology generalization, achieving 3.3% EER on completely unseen voice generators. This work establishes benchmarks for rapid adaptation to emerging synthesis technologies and provides evidence that transformer-based architectures can identify common artifacts across different neural voice synthesis methods, contributing to more robust speech verification systems.

Andrew Ustinov, Matey Yordanov, Andrei Kuchma, Mikhail Bychkov3/31/2025

arXiv:2503.22573v1 Announce Type: new Abstract: The increasing integration of Artificial Intelligence across multiple industry sectors necessitates robust mechanisms for ensuring transparency, trust, and auditability of its development and deployment. This topic is particularly important in light of recent calls in various jurisdictions to introduce regulation and legislation on AI safety. In this paper, we propose a framework for complete verifiable AI pipelines, identifying key components and analyzing existing cryptographic approaches that contribute to verifiability across different stages of the AI lifecycle, from data sourcing to training, inference, and unlearning. This framework could be used to combat misinformation by providing cryptographic proofs alongside AI-generated assets to allow downstream verification of their provenance and correctness. Our findings underscore the importance of ongoing research to develop cryptographic tools that are not only efficient for isolated AI processes, but that are efficiently `linkable' across different processes within the AI pipeline, to support the development of end-to-end verifiable AI technologies.

Kar Balan, Robert Learney, Tim Wood3/31/2025

arXiv:2503.21535v1 Announce Type: cross Abstract: The Deligne-Ogus-Shioda theorem guarantees the existence of isomorphisms between products of supersingular elliptic curves over finite fields. In this paper, we present methods for explicitly computing these isomorphisms in polynomial time, given the endomorphism rings of the curves involved. Our approach leverages the Deuring correspondence, enabling us to reformulate computational isogeny problems into algebraic problems in quaternions. Specifically, we reduce the computation of isomorphisms to solving systems of quadratic and linear equations over the integers derived from norm equations. We develop $\ell$-adic techniques for solving these equations when we have access to a low discriminant subring. Combining these results leads to the description of an efficient probabilistic Las Vegas algorithm for computing the desired isomorphisms. Under GRH, it is proved to run in expected polynomial time.

Pierrick Gaudry, Julien Soumier, Pierre-Jean Spaenlehauer3/31/2025

arXiv:2503.22016v1 Announce Type: cross Abstract: We show how to construct simulation secure one-time memories, and thus one-time programs, without computational assumptions in the presence of constraints on quantum hardware. Specifically, we build one-time memories from random linear codes and quantum random access codes (QRACs) when constrained to non-adaptive, constant depth, and $D$-dimensional geometrically-local quantum circuit for some constant $D$. We place no restrictions on the adversary's classical computational power, number of qubits it can use, or the coherence time of its qubits. Notably, our construction can still be secure even in the presence of fault tolerant quantum computation as long as the input qubits are encoded in a non-fault tolerant manner (e.g. encoded as high energy states in non-ideal hardware). Unfortunately though, our construction requires decoding random linear codes and thus does not run in polynomial time. We leave open the question of whether one can construct a polynomial time information theoretically secure one-time memory from geometrically local quantum circuits. Of potentially independent interest, we develop a progress bound for information leakage via collision entropy (Renyi entropy of order $2$) along with a few key technical lemmas for a "mutual information" for collision entropies. We also develop new bounds on how much information a specific $2 \mapsto 1$ QRAC can leak about its input, which may be of independent interest as well.

Lev Stambler3/31/2025

arXiv:2503.21824v1 Announce Type: new Abstract: Recently, video-based large language models (video-based LLMs) have achieved impressive performance across various video comprehension tasks. However, this rapid advancement raises significant privacy and security concerns, particularly regarding the unauthorized use of personal video data in automated annotation by video-based LLMs. These unauthorized annotated video-text pairs can then be used to improve the performance of downstream tasks, such as text-to-video generation. To safeguard personal videos from unauthorized use, we propose two series of protective video watermarks with imperceptible adversarial perturbations, named Ramblings and Mutes. Concretely, Ramblings aim to mislead video-based LLMs into generating inaccurate captions for the videos, thereby degrading the quality of video annotations through inconsistencies between video content and captions. Mutes, on the other hand, are designed to prompt video-based LLMs to produce exceptionally brief captions, lacking descriptive detail. Extensive experiments demonstrate that our video watermarking methods effectively protect video data by significantly reducing video annotation performance across various video-based LLMs, showcasing both stealthiness and robustness in protecting personal video content. Our code is available at https://github.com/ttthhl/Protecting_Your_Video_Content.

Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia3/31/2025

arXiv:2503.21891v1 Announce Type: new Abstract: Timekeeping is a fundamental component of modern computing; however, the security of system time remains an overlooked attack surface, leaving critical systems vulnerable to manipulation.

Muhammad Abdullah Soomro, Adeel Nasrullah, Fatima Muhammad Anwar3/31/2025

arXiv:2503.22065v1 Announce Type: new Abstract: Recent Intrusion Detection System (IDS) research has increasingly moved towards the adoption of machine learning methods. However, most of these systems rely on supervised learning approaches, necessitating a fully labeled training set. In the realm of network intrusion detection, the requirement for extensive labeling can become impractically burdensome. Moreover, while IDS training could benefit from inter-company knowledge sharing, the sensitive nature of cybersecurity data often precludes such cooperation. To address these challenges, we propose an IDS architecture that utilizes unsupervised learning to reduce the need for labeling. We further facilitate collaborative learning through the implementation of a federated learning framework. To enhance privacy beyond what current federated clustering models offer, we introduce an innovative federated K-means++ initialization technique. Our findings indicate that transitioning from a centralized to a federated setup does not significantly diminish performance.

Maxime Gourceyraud, Rim Ben Salem, Christopher Neal, Fr\'ed\'eric Cuppens, Nora Boulahia Cuppens3/31/2025

arXiv:2503.22156v1 Announce Type: new Abstract: Cryptocurrency is a novel exploration of a form of currency that proposes a decentralized electronic payment scheme based on blockchain technology and cryptographic theory. While cryptocurrency has the security characteristics of being distributed and tamper-proof, increasing market demand has led to a rise in malicious transactions and attacks, thereby exposing cryptocurrency to vulnerabilities, privacy issues, and security threats. Particularly concerning are the emerging types of attacks and threats, which have made securing cryptocurrency increasingly urgent. Therefore, this paper classifies existing cryptocurrency security threats and attacks into five fundamental categories based on the blockchain infrastructure and analyzes in detail the vulnerability principles exploited by each type of threat and attack. Additionally, the paper examines the attackers' logic and methods and successfully reproduces the vulnerabilities. Furthermore, the author summarizes the existing detection and defense solutions and evaluates them, all of which provide important references for ensuring the security of cryptocurrency. Finally, the paper discusses the future development trends of cryptocurrency, as well as the public challenges it may face.

Zekai Liu, Xiaoqi Li3/31/2025

arXiv:2503.22227v1 Announce Type: new Abstract: We introduce an open-source GPU-accelerated fully homomorphic encryption (FHE) framework CAT, which surpasses existing solutions in functionality and efficiency. \emph{CAT} features a three-layer architecture: a foundation of core math, a bridge of pre-computed elements and combined operations, and an API-accessible layer of FHE operators. It utilizes techniques such as parallel executed operations, well-defined layout patterns of cipher data, kernel fusion/segmentation, and dual GPU pools to enhance the overall execution efficiency. In addition, a memory management mechanism ensures server-side suitability and prevents data leakage. Based on our framework, we implement three widely used FHE schemes: CKKS, BFV, and BGV. The results show that our implementation on Nvidia 4090 can achieve up to 2173$\times$ speedup over CPU implementation and 1.25$\times$ over state-of-the-art GPU acceleration work for specific operations. What's more, we offer a scenario validation with CKKS-based Privacy Database Queries, achieving a 33$\times$ speedup over its CPU counterpart. All query tasks can handle datasets up to $10^3$ rows on a single GPU within 1 second, using 2-5 GB storage. Our implementation has undergone extensive stability testing and can be easily deployed on commercial GPUs. We hope that our work will significantly advance the integration of state-of-the-art FHE algorithms into diverse real-world systems by providing a robust, industry-ready, and open-source tool.

Qirui Li, Rui Zong3/31/2025

arXiv:2406.08198v3 Announce Type: replace Abstract: Many software products are composed of components integrated from other teams or external parties. Each additional link in a software product's supply chain increases the risk of the injection of malicious behavior. To improve supply chain provenance, many cybersecurity frameworks, standards, and regulations recommend the use of software signing. However, recent surveys and measurement studies have found that the adoption rate and quality of software signatures are low. We lack in-depth industry perspectives on the challenges and practices of software signing. To understand software signing in practice, we interviewed 18 experienced security practitioners across 13 organizations. We study the challenges that affect the effective implementation of software signing in practice. We also provide possible impacts of experienced software supply chain failures, security standards, and regulations on software signing adoption. To summarize our findings: (1) We present a refined model of the software supply chain factory model highlighting practitioner's signing practices; (2) We highlight the different challenges-technical, organizational, and human-that hamper software signing implementation; (3) We report that experts disagree on the importance of signing; and (4) We describe how internal and external events affect the adoption of software signing. Our work describes the considerations for adopting software signing as one aspect of the broader goal of improved software supply chain security.

Kelechi G. Kalu, Tanya Singla, Chinenye Okafor, Santiago Torres-Arias, James C. Davis3/31/2025