cs.SI

152 posts

arXiv:2408.16629v2 Announce Type: replace Abstract: Generating social networks is essential for many applications, such as epidemic modeling and social simulations. The emergence of generative AI, especially large language models (LLMs), offers new possibilities for social network generation: LLMs can generate networks without additional training or need to define network parameters, and users can flexibly define individuals in the network using natural language. However, this potential raises two critical questions: 1) are the social networks generated by LLMs realistic, and 2) what are risks of bias, given the importance of demographics in forming social ties? To answer these questions, we develop three prompting methods for network generation and compare the generated networks to a suite of real social networks. We find that more realistic networks are generated with "local" methods, where the LLM constructs relations for one persona at a time, compared to "global" methods that construct the entire network at once. We also find that the generated networks match real networks on many characteristics, including density, clustering, connectivity, and degree distribution. However, we find that LLMs emphasize political homophily over all other types of homophily and significantly overestimate political homophily compared to real social networks.

Serina Chang, Alicja Chaszczewicz, Emma Wang, Maya Josifovska, Emma Pierson, Jure Leskovec3/31/2025

arXiv:2503.19316v3 Announce Type: replace Abstract: Understanding the evolution of public opinion is crucial for informed decision-making in various domains, particularly public affairs. The rapid growth of social networks, such as Twitter (now rebranded as X), provides an unprecedented opportunity to analyze public opinion at scale without relying on traditional surveys. With the rise of deep learning, Graph Neural Networks (GNNs) have shown great promise in modeling online opinion dynamics. Notably, classical opinion dynamics models, such as DeGroot, can be reformulated within a GNN framework. We introduce Latent Social Dynamical System (LSDS), a novel framework for modeling the latent dynamics of social media users' opinions based on textual content. Since expressed opinions may not fully reflect underlying beliefs, LSDS first encodes post content into latent representations. It then leverages a GraphODE framework, using a GNN-based ODE function to predict future opinions. A decoder subsequently utilizes these predicted latent opinions to perform downstream tasks, such as interaction prediction, which serve as benchmarks for model evaluation. Our framework is highly flexible, supporting various opinion dynamic models as ODE functions, provided they can be adapted into a GNN-based form. It also accommodates different encoder architectures and is compatible with diverse downstream tasks. To validate our approach, we constructed dynamic datasets from Twitter data. Experimental results demonstrate the effectiveness of LSDS, highlighting its potential for future applications. We plan to publicly release our dataset and code upon the publication of this paper.

Zhiping Xiao, Xinyu Wang, Yifang Qin, Zijie Huang, Mason A. Porter, Yizhou Sun3/31/2025

arXiv:2503.22264v1 Announce Type: new Abstract: Typically, for analysing and modelling social phenomena, networks are a convenient framework that allows for the representation of the interconnectivity of individuals. These networks are often considered transmission structures for processes that happen in society, e.g. diffusion of information, epidemics, and spread of influence. However, constructing a network can be challenging, as one needs to choose its type and parameters accurately. As a result, the outcomes of analysing dynamic processes often heavily depend on whether this step was done correctly. In this work, we advocate that it might be more beneficial to step down from the tedious process of building a network and base it on the level of the interactions instead. By taking this perspective, we can be closer to reality, and from the cognitive perspective, human beings are directly exposed to events, not networks. However, we can also draw a parallel to stream data mining, which brings a valuable apparatus for stream processing. Apart from taking the interaction stream perspective as a typical way in which we should study social phenomena, this work advocates that it is possible to map the concepts embodied in human nature and cognitive processes to the ones that occur in interaction streams. Exploiting this mapping can help reduce the diversity of problems that one can find in data stream processing for machine learning problems. Finally, we demonstrate one of the use cases in which the interaction stream perspective can be applied, namely, the social learning process.

Damian Serwata, Mateusz Nurek, Radoslaw Michalski3/31/2025

arXiv:2503.21983v1 Announce Type: new Abstract: As artificial intelligence (AI) assistants become more widely adopted in safety-critical domains, it becomes important to develop safeguards against potential failures or adversarial attacks. A key prerequisite to developing these safeguards is understanding the ability of these AI assistants to mislead human teammates. We investigate this attack problem within the context of an intellective strategy game where a team of three humans and one AI assistant collaborate to answer a series of trivia questions. Unbeknownst to the humans, the AI assistant is adversarial. Leveraging techniques from Model-Based Reinforcement Learning (MBRL), the AI assistant learns a model of the humans' trust evolution and uses that model to manipulate the group decision-making process to harm the team. We evaluate two models -- one inspired by literature and the other data-driven -- and find that both can effectively harm the human team. Moreover, we find that in this setting our data-driven model is capable of accurately predicting how human agents appraise their teammates given limited information on prior interactions. Finally, we compare the performance of state-of-the-art LLM models to human agents on our influence allocation task to evaluate whether the LLMs allocate influence similarly to humans or if they are more robust to our attack. These results enhance our understanding of decision-making dynamics in small human-AI teams and lay the foundation for defense strategies.

Abed Kareem Musaffar, Anand Gokhale, Sirui Zeng, Rasta Tadayon, Xifeng Yan, Ambuj Singh, Francesco Bullo3/31/2025

arXiv:2503.22584v1 Announce Type: new Abstract: Researchers are no longer limited to producing knowledge; in today's complex world, they also address societal challenges by engaging in policymaking. Although involvement in policymaking has expanded, direct empirical evidence of its career benefits remains underexplored. Prior survey-based studies suggest potential advantages-such as broader professional networks and enhanced opportunities-yet raise concerns about insufficient institutional support. Here, we examine the 2021 WHO global air quality guideline-a science-based regulatory guideline-as a case study. To evaluate the impact of guideline development on research outcomes, we match guideline researchers with a control group of peers sharing similar research topics and prior performance. Our analysis reveals that guideline researchers attain higher future citation counts in both academic and policy domains. New collaborations formed during development yield publications with higher citation impact and the disruptive index. Moreover, about half the guideline's references are derived from guideline researchers' papers, highlighting their central role in shaping the evidence base. These results provide empirical support for the career benefits of policy engagement. Our findings indicate that engaging in international guideline development offers tangible career incentives for researchers, and that institutions can enhance research impact and promote innovative scientific progress by actively supporting their researchers' participation in such initiatives.

Yuta Tomokiyo, Keita Nishimoto, Kimitaka Asatani, Ichiro Sakata3/31/2025

arXiv:2503.00599v2 Announce Type: replace Abstract: Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth. In this paper, we create such ground truth by detecting the campaigns promoted by ephemeral astroturfing attacks. These attacks push any topic to Twitter's (X) trends list by employing bots that tweet in a coordinated manner in a short period and then immediately delete their tweets. We manually curate a dataset of organic Twitter trends. We then create engagement networks out of these datasets which can serve as a challenging testbed for graph classification task to distinguish between campaigns and organic trends. Engagement networks consist of users as nodes and engagements as edges (retweets, replies, and quotes) between users. We release the engagement networks for 179 campaigns and 135 non-campaigns, and also provide finer-grain labels to characterize the type of the campaigns and non-campaigns. Our dataset, LEN (Large Engagement Networks), is available in the URL below. In comparison to traditional graph classification datasets, which are small with tens of nodes and hundreds of edges at most, graphs in LEN are larger. The average graph in LEN has ~11K nodes and ~23K edges. We show that state-of-the-art GNN methods give only mediocre results for campaign vs. non-campaign and campaign type classification on LEN. LEN offers a unique and challenging playfield for the graph classification problem. We believe that LEN will help advance the frontiers of graph classification techniques on large networks and also provide an interesting use case in terms of distinguishing coordinated campaigns and organic trends.

Atul Anand Gopalakrishnan, Jakir Hossain, Tugrulcan Elmas, Ahmet Erdem Sariyuce3/31/2025

arXiv:2503.22049v1 Announce Type: new Abstract: Next Point-of-Interest (POI) recommendation aims to predict users' next locations by leveraging historical check-in sequences. Although existing methods have shown promising results, they often struggle to capture complex high-order relationships and effectively adapt to diverse user behaviors, particularly when addressing the cold-start issue. To address these challenges, we propose Hypergraph-enhanced Meta-learning Adaptive Network (HyperMAN), a novel framework that integrates heterogeneous hypergraph modeling with a difficulty-aware meta-learning mechanism for next POI recommendation. Specifically, three types of heterogeneous hyperedges are designed to capture high-order relationships: user visit behaviors at specific times (Temporal behavioral hyperedge), spatial correlations among POIs (spatial functional hyperedge), and user long-term preferences (user preference hyperedge). Furthermore, a diversity-aware meta-learning mechanism is introduced to dynamically adjust learning strategies, considering users behavioral diversity. Extensive experiments on real-world datasets demonstrate that HyperMAN achieves superior performance, effectively addressing cold start challenges and significantly enhancing recommendation accuracy.

Jinze Wang, Tiehua Zhang, Lu Zhang, Yang Bai, Xin Li, Jiong Jin3/31/2025

arXiv:2503.21953v1 Announce Type: new Abstract: The paper examines social media content to measure and model risk behavior in natural emergencies from an appraisal theory perspective. We calculate individual risk behavior quotients and relate them to individual and peer emotional and actionable cognitive responses for 774 individual Twitter users affected by the Sandy hurricane landfall. We employ vector analysis to compute risk behavior quotients. By utilizing geographic information associated with the tweets, both implicitly and explicitly, we track each user's path and determine the average vector of their movement. The risk quotient is obtained by comparing risk exposure at the origin and destination of the average vector. We assess risk exposure for each zone in the study area by combining pre-hurricane evacuation plans with post-event flooding data, as reported by the National Weather Service. By using the emotional and actionable content of the tweets as predictors for risk, we found that sharing actionable information relates to slightly higher risk exposure. At the same time, overall, the subjects tended to move away from the riskiest areas of the storm. Finally, individuals surrounded by more peers are less likely to be affected, while those surrounded by more tweeting activity are more likely to be affected risk-prone.

Sorin Adam Matei, Rajesh Kalyanam3/31/2025

arXiv:2503.22191v1 Announce Type: new Abstract: The outbreak of a pandemic, such as COVID-19, causes major health crises worldwide. Typical measures to contain the rapid spread usually include effective vaccination and strict interventions (Nature Human Behaviour, 2021). Motivated by such circumstances, we study the problem of limiting the spread of a disease over a social network system. In their seminal work (KDD 2003), Kempe, Kleinberg, and Tardos introduced two fundamental diffusion models, the linear threshold and independent cascade, for the influence maximization problem. In this work, we adopt these models in the context of disease spreading and study effective vaccination mechanisms. Our broad goal is to limit the spread of a disease in human networks using only a limited number of vaccines. However, unlike the influence maximization problem, which typically does not require spatial awareness, disease spreading occurs in spatially structured population networks. Thus, standard Erdos-Renyi graphs do not adequately capture such networks. To address this, we study networks modeled as generalized random geometric graphs, introduced in the seminal work of Waxman (IEEE J. Sel. Areas Commun. 1988). We show that for disease spreading, the optimization function is neither submodular nor supermodular, in contrast to influence maximization, where the function is submodular. Despite this intractability, we develop novel algorithms leveraging local search and greedy techniques, which perform exceptionally well in practice. We compare them against an exact ILP-based approach to further demonstrate their robustness. Moreover, we introduce an iterative rounding mechanism for the relaxed LP formulation. Overall, our methods establish tight trade-offs between efficiency and approximation loss.

Gargi Bakshi, Sujoy Bhore, Suraj Shetiya3/31/2025

arXiv:2503.22066v1 Announce Type: new Abstract: Open-source software communities thrive on global collaboration and contributions from diverse participants. This study explores the Rust programming language ecosystem to understand its contributors' demographic composition and interaction patterns. Our objective is to investigate the phenomenon of participation inequality in key Rust projects and the presence of diversity among them. We studied GitHub pull request data from the year leading up to the release of the latest completed Rust community annual survey in 2023. Specifically, we extracted information from three leading repositories: Rust, Rust Analyzer, and Cargo, and used social network graphs to visualize the interactions and identify central contributors and sub-communities. Social network analysis has shown concerning disparities in gender and geographic representation among contributors who play pivotal roles in collaboration networks and the presence of varying diversity levels in the sub-communities formed. These results suggest that while the Rust community is globally active, the contributor base does not fully reflect the diversity of the wider user community. We conclude that there is a need for more inclusive practices to encourage broader participation and ensure that the contributor base aligns more closely with the diverse global community that utilizes Rust.

Rohit Dandamudi, Ifeoma Adaji, Gema Rodr\'iguez-P\'erez3/31/2025

arXiv:2503.09811v1 Announce Type: new Abstract: Understanding the mechanisms driving the distribution of scientific citations is a key challenge in assessing the scientific impact of authors. We investigate the influence of the preferential attachment rule (PAR) in this process by analyzing individual citation events from the DBLP dataset, enabling us to estimate the probability of citations being assigned preferentially. Our findings reveal that, for the aggregated dataset, PAR dominates the citation distribution process, with approximately 70% of citations adhering to this mechanism. However, analysis at the individual level shows significant variability, with some authors experiencing a greater prevalence of preferential citations, particularly in the context of external citations. In contrast, self-citations exhibit notably different behaviour, with only 20% following PAR. We also demonstrate that the prominence of PAR increases with an author's citability (average citations per paper), suggesting that more citable authors are preferentially cited, while less-cited authors experience more random citation patterns. Furthermore, we show that self-citations may influence bibliometric indexes. Our results emphasise the distinct dynamics of self-citations compared to external citations, raising questions about the mechanisms driving self-citation patterns. These findings provide new insights into citation behaviours and highlight the limitations of existing approaches in capturing the nuances of scientific impact.

Maciej J. Mrowinski, Aleksandra Buczek, Agata Fronczak3/14/2025

arXiv:2503.09626v1 Announce Type: new Abstract: Social bot detection is crucial for mitigating misinformation, online manipulation, and coordinated inauthentic behavior. While existing neural network-based detectors perform well on benchmarks, they struggle with generalization due to distribution shifts across datasets and frequently produce overconfident predictions for out-of-distribution accounts beyond the training data. To address this, we introduce a novel Uncertainty Estimation for Social Bot Detection (UESBD) framework, which quantifies the predictive uncertainty of detectors beyond mere classification. For this task, we propose Robust Multi-modal Neural Processes (RMNP), which aims to enhance the robustness of multi-modal neural processes to modality inconsistencies caused by social bot camouflage. RMNP first learns unimodal representations through modality-specific encoders. Then, unimodal attentive neural processes are employed to encode the Gaussian distribution of unimodal latent variables. Furthermore, to avoid social bots stealing human features to camouflage themselves thus causing certain modalities to provide conflictive information, we introduce an evidential gating network to explicitly model the reliability of modalities. The joint latent distribution is learned through the generalized product of experts, which takes the reliability of each modality into consideration during fusion. The final prediction is obtained through Monte Carlo sampling of the joint latent distribution followed by a decoder. Experiments on three real-world benchmarks show the effectiveness of RMNP in classification and uncertainty estimation, as well as its robustness to modality conflicts.

Qi Wu, Yingguang Yang, hao liu, Hao Peng, Buyun He, Yutong Xia, Yong Liao3/14/2025

arXiv:2503.10458v1 Announce Type: new Abstract: Social media platforms have been accused of causing a range of harms, resulting in dozens of lawsuits across jurisdictions. These lawsuits are situated within the context of a long history of American product safety litigation, suggesting opportunities for remediation outside of financial compensation. Anticipating that at least some of these cases may be successful and/or lead to settlements, this article outlines an implementable mechanism for an abatement and/or settlement plan capable of mitigating abuse. The paper describes the requirements of such a mechanism, implications for privacy and oversight, and tradeoffs that such a procedure would entail. The mechanism is framed to operate at the intersection of legal procedure, standards for transparent public health assessment, and the practical requirements of modern technology products.

Nathaniel Lubin, Yuning Liu, Amanda Yarnell, S. Bryn Austin, Zachary J. Ward, Ravi Iyer, Jonathan Stray, Matthew Lawrence, Alissa Cooper, Peter Chapman3/14/2025

arXiv:2503.09725v1 Announce Type: new Abstract: Avian Influenza Virus (AIV) poses significant threats to the poultry industry, humans, domestic animals, and wildlife health worldwide. Monitoring this infectious disease is important for rapid and effective response to potential outbreaks. Conventional avian influenza surveillance systems have exhibited limitations in providing timely alerts for potential outbreaks. This study aimed to examine the idea of using online activity on social media, and Google searches to improve the identification of AIV in the early stage of an outbreak in a region. To this end, to evaluate the feasibility of this approach, we collected historical data on online user activities from X (formerly known as Twitter) and Google Trends and assessed the statistical correlation of activities in a region with the AIV outbreak officially reported case numbers. In order to mitigate the effect of the noisy content on the outbreak identification process, large language models were utilized to filter out the relevant online activity on X that could be indicative of an outbreak. Additionally, we conducted trend analysis on the selected internet-based data sources in terms of their timeliness and statistical significance in identifying AIV outbreaks. Moreover, we performed an ablation study using autoregressive forecasting models to identify the contribution of X and Google Trends in predicting AIV outbreaks. The experimental findings illustrate that online activity on social media and search engine trends can detect avian influenza outbreaks, providing alerts earlier compared to official reports. This study suggests that real-time analysis of social media outlets and Google search trends can be used in avian influenza outbreak early warning systems, supporting epidemiologists and animal health professionals in informed decision-making.

Marzieh Soltani, Rozita Dara, Zvonimir Poljak, Caroline Dub\'e, Neil Bruce, Shayan Sharif3/14/2025

arXiv:2306.02766v5 Announce Type: replace Abstract: We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases. We provide the order of the difference in these bounds in terms of network structure and number of communication rounds, and also contribute a policy-update stability guarantee. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence. We therefore show that in practical settings where the theoretical parameters are not observed (leading to poor estimation of the Q-function), our communication scheme considerably accelerates learning over the independent case, often performing similarly to a centralised learner while removing the restrictive assumption of the latter. We contribute further practical enhancements to all three theoretical algorithms, allowing us to present their first empirical demonstrations. Our experiments confirm that we can remove several of the theoretical assumptions of the algorithms, and display the empirical convergence benefits brought by our new networked communication. We additionally show that our networked approach has significant advantages over both alternatives in terms of robustness to update failures and to changes in population size.

Patrick Benjamin, Alessandro Abate3/14/2025

arXiv:2503.10560v1 Announce Type: new Abstract: Community-based fact-checking is a promising approach to address misinformation on social media at scale. However, an understanding of what makes community-created fact-checks helpful to users is still in its infancy. In this paper, we analyze the determinants of the helpfulness of community-created fact-checks. For this purpose, we draw upon a unique dataset of real-world community-created fact-checks and helpfulness ratings from X's (formerly Twitter) Community Notes platform. Our empirical analysis implies that the key determinant of helpfulness in community-based fact-checking is whether users provide links to external sources to underpin their assertions. On average, the odds for community-created fact-checks to be perceived as helpful are 2.70 times higher if they provide links to external sources. Furthermore, we demonstrate that the helpfulness of community-created fact-checks varies depending on their level of political bias. Here, we find that community-created fact-checks linking to high-bias sources (of either political side) are perceived as significantly less helpful. This suggests that the rating mechanism on the Community Notes platform successfully penalizes one-sidedness and politically motivated reasoning. These findings have important implications for social media platforms, which can utilize our results to optimize their community-based fact-checking systems.

Kirill Solovev, Nicolas Pr\"ollochs3/14/2025

arXiv:2503.09788v1 Announce Type: new Abstract: This study examines the communication mechanisms that shape the formation of digitally-enabled mobilization networks. Informed by the logic of connective action, we postulate that the emergence of networks enabled by organizations and individuals is differentiated by network and framing mechanisms. From a case comparison within two mobilization networks -- one crowd-enabled and one organizationally-enabled -- of the 2011 Chilean student movement, we analyze their network structures and users' communication roles. We found that organizationally-enabled networks are likely to form from hierarchical cascades and crowd-enabled networks are likely to form from triadic closure mechanisms. Moreover, we found that organizations are essential for both kinds of networks: compared to individuals, organizations spread more messages among unconnected users, and organizations' messages are more likely to be spread. We discuss our findings in light of the network mechanisms and participation of organizations and influential users.

Diego Gomez-Zara, Carolina Perez-Arredondo, Denis Parra3/14/2025

arXiv:2410.07388v2 Announce Type: replace Abstract: The Densest $k$-Subgraph (D$k$S) problem aims to find a subgraph comprising $k$ vertices with the maximum number of edges between them. A continuous reformulation of the binary quadratic D$k$S problem is considered, which incorporates a diagonal loading term. It is shown that this non-convex, continuous relaxation is tight for a range of diagonal loading parameters, and the impact of the diagonal loading parameter on the optimization landscape is studied. On the algorithmic side, two projection-free algorithms are proposed to tackle the relaxed problem, based on Frank-Wolfe and explicit constraint parametrization, respectively. Experiments suggest that both algorithms have merits relative to the state-of-art, while the Frank-Wolfe-based algorithm stands out in terms of subgraph density, computational complexity, and ability to scale up to very large datasets.

Qiheng Lu, Nicholas D. Sidiropoulos, Aritra Konar3/14/2025

arXiv:2503.09660v1 Announce Type: cross Abstract: Point signatures based on the Laplacian operators on graphs, point clouds, and manifolds have become popular tools in machine learning for graphs, clustering, and shape analysis. In this work, we propose a novel point signature, the power spectrum signature, a measure on $\mathbb{R}$ defined as the squared graph Fourier transform of a graph signal. Unlike eigenvectors of the Laplacian from which it is derived, the power spectrum signature is invariant under graph automorphisms. We show that the power spectrum signature is stable under perturbations of the input graph with respect to the Wasserstein metric. We focus on the signature applied to classes of indicator functions, and its applications to generating descriptive features for vertices of graphs. To demonstrate the practical value of our signature, we showcase several applications in characterizing geometry and symmetries in point cloud data, and graph regression problems.

Karamatou Yacoubou Djima, Ka Man Yim3/14/2025

arXiv:2405.12797v2 Announce Type: replace Abstract: This paper introduces a refined graph encoder embedding method, enhancing the original graph encoder embedding through linear transformation, self-training, and hidden community recovery within observed communities. We provide the theoretical rationale for the refinement procedure, demonstrating how and why our proposed method can effectively identify useful hidden communities under stochastic block models. Furthermore, we show how the refinement method leads to improved vertex embedding and better decision boundaries for subsequent vertex classification. The efficacy of our approach is validated through numerical experiments, which exhibit clear advantages in identifying meaningful latent communities and improved vertex classification across a collection of simulated and real-world graph data.

Cencheng Shen, Jonathan Larson, Ha Trinh, Carey E. Priebe3/10/2025