2025-11-12T03:37:09.269038

Detecting Conspiracy Theory Against COVID-19 Vaccines

Amin, Madanu, Lavu et al.
Since the beginning of the vaccination trial, social media has been flooded with anti-vaccination comments and conspiracy beliefs. As the day passes, the number of COVID- 19 cases increases, and online platforms and a few news portals entertain sharing different conspiracy theories. The most popular conspiracy belief was the link between the 5G network spreading COVID-19 and the Chinese government spreading the virus as a bioweapon, which initially created racial hatred. Although some disbelief has less impact on society, others create massive destruction. For example, the 5G conspiracy led to the burn of the 5G Tower, and belief in the Chinese bioweapon story promoted an attack on the Asian-Americans. Another popular conspiracy belief was that Bill Gates spread this Coronavirus disease (COVID-19) by launching a mass vaccination program to track everyone. This Conspiracy belief creates distrust issues among laypeople and creates vaccine hesitancy. This study aims to discover the conspiracy theory against the vaccine on social platforms. We performed a sentiment analysis on the 598 unique sample comments related to COVID-19 vaccines. We used two different models, BERT and Perspective API, to find out the sentiment and toxicity of the sentence toward the COVID-19 vaccine.
academic

Detecting Conspiracy Theory Against COVID-19 Vaccines

Basic Information

  • Paper ID: 2211.13003
  • Title: Detecting Conspiracy Theory Against COVID-19 Vaccines
  • Authors: Md Hasibul Amin, Harika Madanu, Sahithi Lavu, Hadi Mansourifar, Dana Alsagheer, Weidong Shi (University of Houston)
  • Classification: cs.CY (Computers and Society), cs.AI, cs.CL, cs.LG, cs.SI
  • Publication Date: November 20, 2022 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2211.13003

Abstract

Since the initiation of vaccine trials, social media has been inundated with anti-vaccine rhetoric and conspiracy theory beliefs. With the increasing number of COVID-19 cases, various conspiracy theories have been propagated through online platforms and certain news portals. The most prevalent conspiracy theories include claims that 5G networks spread COVID-19, that the Chinese government disseminated the virus as a biological weapon, and others that initially sparked racial hatred. While certain forms of distrust have had limited societal impact, others have caused substantial damage. For instance, the 5G conspiracy theory led to the burning of 5G base stations, and belief in the Chinese bioweapon narrative promoted attacks against Asian Americans. Another popular conspiracy theory claims that Bill Gates spreads COVID-19 through massive vaccination programs designed to track everyone. Such conspiracy beliefs have created distrust among the general population and led to vaccine hesitancy. This research aims to discover conspiracy theories targeting vaccines on social platforms. The researchers conducted sentiment analysis on 598 unique sample comments related to COVID-19 vaccines, employing two different models—BERT and Google Perspective API—to identify the sentiment and toxicity of sentences regarding COVID-19 vaccines.

Research Background and Motivation

Problem Definition

The core problem this research addresses is how to automatically detect and identify conspiracy theory discourse targeting COVID-19 vaccines on social media. Specifically, it includes:

  1. Identifying anti-vaccine sentiment and conspiracy theory viewpoints
  2. Assessing the toxicity and aggressiveness level of comments
  3. Understanding the distribution of public attitudes toward vaccines

Problem Significance

This problem carries important social implications:

  1. Public Health Threat: According to WHO data, as of September 2022, 613 million people globally have been infected with COVID-19, with over 6.5 million deaths
  2. Social Disruption: Conspiracy theories have led to actual violent incidents, such as the burning of 5G base stations and attacks against Asian Americans
  3. Vaccine Hesitancy: Misinformation creates public distrust in vaccines, hindering large-scale vaccination programs
  4. Information Dissemination Speed: Research shows that false news spreads one million times faster than factual news

Limitations of Existing Methods

  1. Detection Complexity: Social media users express opinions using emojis, unique terminology, and symbols, increasing text classification complexity
  2. Language Structure Diversity: Sentence structures and sentiment expression methods vary significantly across different languages
  3. Annotation Difficulty: In certain cases, it is difficult to distinguish which comments are valid and which are false

Core Contributions

  1. Constructed a COVID-19 vaccine conspiracy theory detection dataset: Collected and annotated 598 English comments from social media in North America
  2. Proposed a dual-model detection framework: Combined BERT model and Google Perspective API for sentiment analysis and toxicity detection
  3. Conducted comprehensive comparative experiments: Evaluated model performance using three different classifiers (logistic regression, XGBoost, Gaussian Naive Bayes)
  4. Provided benchmark results for conspiracy theory detection: Established baseline performance for subsequent research

Methodology Details

Task Definition

  • Input: Text comments about COVID-19 vaccines from social media
  • Output: Binary classification labels (0: neutral or pro-vaccine, 1: anti-vaccine/conspiracy theory)
  • Additional Output: Multi-dimensional assessment metrics including toxicity scores and aggressiveness scores

Data Collection and Preprocessing

  1. Data Collection:
    • Initial collection of 950 user comments
    • Sources: Various online news portals and their Facebook pages
    • Manual collection approach
  2. Data Cleaning:
    • Removal of duplicate and near-duplicate comments
    • Filtering of non-English comments
    • Final retention of 598 sample comments
  3. Data Annotation:
    • Manual reading and annotation of all comments
    • Binary labels: 0 (neutral/supportive) and 1 (opposed/conspiracy theory)
    • Ensuring balanced label distribution
  4. Preprocessing Steps:
    • Removal of noise and stopwords
    • Conversion to lowercase
    • Correction of common abbreviations (e.g., vac→vaccine, CVD→Covid)

Model Architecture

BERT Model

  • Model Selection: BERT-Base, Uncased
  • Architecture Parameters:
    • 12 transformer layers
    • 768 hidden units
    • 12 attention heads
    • 110 million parameters
  • Characteristics:
    • Bidirectional encoder representations
    • WordPiece embeddings with vocabulary size of 30,000
    • Sentence-level vector training to extract more contextual information

Google Perspective API

  • Functionality: Uses machine learning technology to identify abusive comments
  • Detection Dimensions:
    • Toxicity
    • Severe toxicity
    • Identity attack
    • Insult
    • Profanity
    • Threat
    • Sexually explicit
    • Flirtation
  • Output: 0-1 score for each dimension

Classifier Configuration

Three different classifiers were used for comparative analysis:

  1. Logistic Regression (LR)
  2. XGBoost
  3. Gaussian Naive Bayes (NB)

Experimental Setup

Dataset Characteristics

  • Total Samples: 598 comments
  • Label Distribution: Balanced distribution (approximately 50% supportive, 50% opposed)
  • Geographic Range: Primarily from North America
  • Language: English comments only
  • Privacy Protection: No personal information included (names, locations, gender, etc.)

Evaluation Metrics

  • Accuracy
  • F1-Score
  • Precision
  • Recall

Validation Method

  • 10-Fold Cross-Validation: Ensures reliability and generalization capability of results
  • Train-Validation Set Split: Evaluates model performance

Experimental Results

Main Results Comparison

BERT Model Performance

ClassifierAccuracyF1-ScorePrecisionRecall
Logistic Regression69%68%67%68%
XGBoost66%66%67%65%
Naive Bayes51%51%52%51%

Perspective API Performance

ClassifierAccuracyF1-ScorePrecisionRecall
Logistic Regression55%53%55%55%
XGBoost65%63%65%65%
Naive Bayes75%70%75%75%

Key Findings

  1. Best Performance: Google Perspective API combined with Gaussian Naive Bayes achieved 75% accuracy
  2. BERT Performance: BERT combined with logistic regression achieved 69% accuracy
  3. Data Volume Impact: Increasing data volume from 400 to 598 samples improved performance of both models by 8-9%
  4. Toxicity Detection Capability: Perspective API effectively identifies the abusive degree and toxicity level of comments

Perspective API Toxicity Score Examples

The paper provides specific toxicity score cases, demonstrating multi-dimensional scoring of different comment types and providing intuitive insights into model behavior.

Current State of Conspiracy Theory Research

  1. Prevalence: Approximately 1/4 to 1/3 of the North American population expresses views related to conspiracy theories
  2. COVID-19 Related: A 2020 U.S. survey showed approximately 5% of people believed COVID-19 was pre-planned, with 20% considering it possibly true
  3. Dissemination Mechanism: Social media influences people's views more readily than traditional communication methods

Technical Methods

  1. Text Mining: A popular method for detecting conspiracy theories
  2. Deep Learning: Performs well in semantic content identification
  3. Sentiment Analysis Tools: Applications of BERT and Perspective API in sentiment and toxicity detection

Social Impact Research

  1. Political Factors: Political agendas play an important role in vaccine hesitancy
  2. Media Influence: Mainstream television news and political agendas significantly impact conspiracy theory beliefs
  3. Psychological Mechanisms: Research on the psychological foundations of conspiracy theory dissemination

Conclusions and Discussion

Main Conclusions

  1. Detection Feasibility: Machine learning methods can effectively detect conspiracy theories related to COVID-19 vaccines
  2. Model Selection Importance: Performance differences between different model and classifier combinations are significant
  3. Data Quality Impact: Increasing data volume significantly improves model performance
  4. Social Attitude Insights: The number of pro-vaccine comments is lower than anti-vaccine comments

Limitations

  1. Geographic Limitations: Sample data primarily from North America cannot accurately reflect attitudes in other regions
  2. Data Scale: Manually collected sample data is insufficient to represent global conspiracy theories
  3. Missing User Information: User information was not collected, preventing demographic analysis such as age
  4. Annotation Subjectivity: In certain cases, it is difficult to determine the authenticity of comments

Future Directions

  1. Expand Data Scale: Collect larger and more diverse datasets
  2. Multilingual Support: Extend to other languages and cultural backgrounds
  3. User Profile Analysis: Conduct deeper analysis incorporating user demographic information
  4. Real-Time Monitoring System: Develop real-time conspiracy theory detection and early warning systems

In-Depth Evaluation

Strengths

  1. Problem Importance: Addresses the important social issue of COVID-19 vaccine conspiracy theories
  2. Sufficient Method Comparison: Employs two different technical approaches for comparative verification
  3. Reasonable Experimental Design: Utilizes 10-fold cross-validation with multiple evaluation metrics
  4. Result Transparency: Provides specific performance values and case analysis
  5. Social Value: Research results have reference value for public health policy formulation

Weaknesses

  1. Dataset Scale Limitation: 598 samples are relatively small, potentially affecting model generalization
  2. Geographic and Cultural Bias: Limited to English comments from North America, lacking global representativeness
  3. Annotation Quality: Manual annotation may contain subjectivity, lacking inter-annotator agreement assessment
  4. Limited Technical Innovation: Primarily applies existing models, lacking methodological innovation
  5. Insufficient Deep Analysis: Lacks deeper analysis of conspiracy theory types and dissemination mechanisms

Impact

  1. Academic Contribution: Provides foundational data and methods for COVID-19 related computational social science research
  2. Practical Value: Can provide technical support for content moderation on social media platforms
  3. Policy Reference: Provides data support for public health departments in formulating anti-conspiracy strategies
  4. Reproducibility: Authors commit to providing data and code on GitHub, enhancing research reproducibility

Applicable Scenarios

  1. Social Media Monitoring: Real-time detection and flagging of vaccine-related conspiracy content
  2. Public Health Communication: Evaluating the effectiveness of vaccine promotion campaigns and public response
  3. Policy Development Support: Providing quantitative analysis of public attitudes for government departments
  4. Research Foundation: Providing benchmark datasets for subsequent conspiracy theory detection and analysis research

References

The paper cites 46 relevant references covering multiple disciplines including conspiracy theory psychology, social media analysis, natural language processing, and public health, reflecting the interdisciplinary nature of the research and the robustness of its theoretical foundation.


Overall Assessment: This is an application-oriented research addressing an important social problem. While relatively limited in technical innovation, it possesses significant social value and practical utility. The research methodology is reasonable, the experimental design relatively comprehensive, and the results have certain reference value. Future improvements are needed in data scale, geographic coverage, and technical innovation.