The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has explored various adversarial attacks in misinformation detection, the specific transformations examined in this paper have not been systematically studied. In particular, we investigate language-switching across English, French, Spanish, Arabic, Hindi, and Chinese, followed by translation. We also study query length inflation preceding summarization and structural reformatting into multiple-choice questions. In this paper, we present a multilingual, multi-agent large language model framework with retrieval-augmented generation that can be deployed as a web plugin into online platforms. Our work underscores the importance of AI-driven misinformation detection in safeguarding online factual integrity against diverse attacks, while showcasing the feasibility of plugin-based deployment for real-world web applications.
- Paper ID: 2510.08605
- Title: Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks
- Authors: Nouar Aldahoul, Yasir Zaki (New York University Abu Dhabi)
- Categories: cs.CL (Computational Linguistics), cs.AI, cs.CR, cs.LG
- Publication Date: October 7, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.08605
The rapid dissemination of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has explored various adversarial attacks in misinformation detection, the specific transformations investigated in this paper have not been systematically studied. Specifically, this paper investigates language switching across English, French, Spanish, Arabic, Hindi, and Chinese, along with subsequent translation. It also examines query length expansion prior to summarization and structural reformatting into multiple-choice questions. The paper proposes a multilingual, multi-agent large language model framework incorporating retrieval-augmented generation techniques, deployable as a web plugin on online platforms. This work emphasizes the importance of AI-driven misinformation detection in protecting online factual integrity while demonstrating the feasibility of plugin-based deployment in real-world web applications.
The core problem addressed in this research is the lack of effective misinformation detection capabilities in large language models (LLMs) when facing adversarial attacks, which can inadvertently amplify the spread of false information.
- Social Impact: Rapid dissemination of misinformation severely threatens public discourse, emotional stability, and decision-making
- Technical Challenges: Existing LLMs perform near random guessing in misinformation detection
- Security Requirements: Need for robust detection systems against diverse attacks
- Embedded Knowledge Constraints: LLMs rely solely on knowledge embedded during training, lacking real-time fact-checking capabilities
- Language Bias: Significantly degraded performance on non-English languages
- Adversarial Attack Vulnerability: Lack of resistance to format conversion, translation, and summarization attacks
- Lack of Systematic Study: Existing work has not systematically evaluated multilingual, multi-structural adversarial attacks
The authors propose the need to develop a multilingual misinformation detection system capable of resisting multiple adversarial attacks and deploy it as a practical web plugin.
- Proposed Multi-Agent RAG Framework: Multi-agent architecture combining Llama 3.1-8B and retrieval-augmented generation techniques
- Constructed Novel Adversarial Attack Dataset: Dataset containing three attack forms: multiple-choice questions (MCQ), translation, and summarization
- Implemented Multilingual Detection Capability: Support for six languages: English, French, Spanish, Arabic, Hindi, and Chinese
- Verified Practical Deployment Feasibility: Designed as a deployable web plugin
- Provided Comprehensive Experimental Evaluation: Achieved over 95% accuracy in misinformation detection
Input: Text content from the web (news articles, user comments, social media posts, etc.), potentially containing adversarial transformations
Output: Binary classification result (True/False) determining whether the input text contains misinformation
Constraints: System must operate in a black-box setting, making judgments based solely on binary feedback
- Embedding Models: Comparison of three multilingual embedding models
- OpenAI's text-embedding-3-large (proprietary)
- jina-embeddings-v3 (proprietary)
- multilingual-e5-large (open-source)
- Retrieval Mechanism: Cosine similarity-based retrieval system
- Store false headline embeddings in CSV files
- Retrieve false headlines most relevant to the query
- Use Llama for contextual analysis to make final judgment
The system comprises four collaborative agents:
- Web Crawler Agent
- Extract structured content from dynamic websites
- Segment text into manageable chunks
- Pass to manager agent for processing
- Manager Agent
- Interact with web crawler to receive text
- Route to topic and misinformation detection agents
- Send notifications to users
- Misinformation Detection Agent
- Leverage RAG-Llama for detection
- Retrieve from database containing 5,000 verified false headlines
- Use open-source Llama model for final judgment
- Topic Agent (Optional)
- Classify queries into 10 predefined categories
- Accelerate RAG search process
- Use GPT-4o-mini for topic classification
- Adjudicator Agent
- Ensure all text chunks are processed
- Verify consistency across system components
- Serve as additional validation layer to enhance robustness
- Multi-Modal Adversarial Attack Handling: First systematic treatment of MCQ, translation, and summarization attacks
- Multilingual Retrieval Capability: Leverage multilingual embedding models for cross-lingual detection
- Negative Sample Matching Strategy: Use only false information database for negative matching detection
- Modular Plugin Design: Directly deployable as web browser plugin
- False Headlines: 20,950 false headlines collected from Snopes and Politifact
- Factual Headlines: 4,000 true headlines collected
- Experimental Data: Selected 5,000 false headlines and 2,000 factual headlines
- MCQ Dataset: Convert headlines to "why"-prefixed multiple-choice questions
- Translation Dataset: Translate expanded text into six languages
- Summarization Dataset: Generate 500-word long texts for summarization tasks
- Factual Accuracy: Percentage of correctly classified factual information
- False Accuracy: Percentage of correctly classified false information
- Attack Success Rate (ASR): Ratio of adversarial inputs causing system failure (lower is better)
- Baseline Model: Original Llama 3.1-8B-Instruct
- RAG-Llama Variants with Different Embedding Models
- System Variants With/Without Topic Classification
- Model: Llama 3.1-8B-Instruct
- Hardware: GPU A100 80GB
- Hyperparameters: temperature=0.1, top-p=1
- Embedding Storage: CSV file format
- Direct Question ASR: 46.74%
- MCQ Attack ASR: 97.72%
- Translation Attack ASR: 100%
- Summarization Attack ASR: 100%
| Attack Type | False Detection Accuracy | Factual Detection Accuracy |
|---|
| Direct Question | 99.76% | 85.25% |
| MCQ | 97.38% | 89.85% |
| Summarization | 99.3% | 95.15% |
| French Translation | 97.72% | 87.25% |
| Arabic Translation | 97.26% | 88.65% |
| Hindi Translation | 95.2% | 87.4% |
| Chinese Translation | 96.44% | 93.5% |
| Spanish Translation | 97.9% | 90.9% |
| Embedding Model | MCQ Avg Accuracy | Summarization Avg Accuracy | Translation Avg Accuracy |
|---|
| text-embedding-3-large | 93.62% | 97.23% | 93.22% |
| jina-embeddings-v3 | 95.29% | 89.08% | 93.35% |
| multilingual-e5-large | 95.26% | 89.02% | 93.92% |
- Speed Improvement: Median 2x or more, average 3x or more
- Accuracy: Ranges from 78.27%-91.18%
- Lower MCQ Task Accuracy: Due to classification difficulty caused by multiple-topic answers in multiple-choice questions
- RAG Significantly Outperforms Baseline: Substantial improvements across all attack types
- Multilingual Capability: Maintained over 95% false detection accuracy across six languages
- Embedding Model Impact: multilingual-e5-large performed best in balancing performance and accessibility
- Topic Classification Acceleration: Effectively improved retrieval speed, but accuracy decreased on complex queries
- BERT-based approaches (FakeBERT, etc.)
- T5 instruction fine-tuning
- Llama-2 PEFT/LoRA fine-tuning
- Reinforcement learning methods
- Mixtral-8x7B combined with RAG
- Real-time web data integration
- Adaptive Topic RAG (AT-RAG)
- LLM-Consensus for visual misinformation detection
- TruEDebate (TED) structured debate system
- Complete misinformation lifecycle processing framework
- Gradient-based token-level substitution
- Reinforcement learning-driven claim perturbation
- Black-box attack strategies
- Significant LLM Vulnerabilities: Vanilla LLMs are highly susceptible to spreading misinformation under adversarial attacks
- RAG Effectively Enhances Robustness: RAG-Llama significantly outperforms baseline across various attacks
- Multilingual Detection Feasibility: System effectively handles misinformation in six major languages
- Practical Deployment Potential: Multi-agent architecture suitable for deployment as web plugin
- Topic Classification Accuracy: Topic misclassification affects retrieval precision
- Database Dependency: System performance heavily depends on quality and completeness of misinformation database
- Dynamic Update Requirements: Requires continuous database updates to address emerging misinformation
- Security Vulnerabilities: RAG systems may face database poisoning and embedding attacks
- Improve Topic Classification: Enhance classification accuracy for complex queries
- Explore Other LLMs: Evaluate performance of different language models in RAG
- Enhance Security: Develop protective mechanisms against embedding attacks and database poisoning
- Expand Attack Types: Investigate more varieties of adversarial transformations
- Problem Importance: Addresses critical security issues in LLM-based misinformation detection
- Methodological Innovation: First systematic study of multilingual, multi-structural adversarial attacks
- Experimental Comprehensiveness: Comprehensive evaluation covering six languages and three attack types
- Practical Value: Provides deployable plugin solution
- Technical Advancement: Incorporates latest RAG and multi-agent technologies
- Dataset Scale Limitation: Uses only 7,000 headlines, relatively small scale
- Limited Attack Types: Considers only three specific attack forms
- Single Evaluation Metric: Primarily focuses on accuracy, lacking efficiency and cost metrics
- Insufficient Theoretical Analysis: Lacks theoretical explanation for method effectiveness
- Unverified Long-Term Stability: Has not evaluated performance degradation during extended use
- Academic Contribution: Provides new research direction for multilingual misinformation detection
- Practical Value: Directly applicable to social media and news platforms
- Reproducibility: Uses open-source models, facilitating reproduction and improvement
- Industry Impact: Provides technical foundation for content moderation and fact-checking
- Social Media Platforms: Real-time detection of false information posted by users
- News Aggregation Websites: Verify authenticity of news articles
- Educational Platforms: Help users identify misinformation
- Enterprise Content Moderation: Automated review of large-scale content
- Government Regulation: Assist relevant departments in monitoring online misinformation
This paper cites 50 relevant references covering important works in multiple domains including LLMs, RAG, multi-agent systems, and adversarial attacks, providing a solid theoretical foundation for the research.
Overall Assessment: This is an important contribution to the misinformation detection field, proposing an innovative multi-agent RAG framework and achieving excellent experimental results under multilingual, multi-attack-type settings. Despite some limitations, its practical value and technical innovation make it a significant advance in the field.