2025-11-12T03:37:09.269038

Detecting Conspiracy Theory Against COVID-19 Vaccines

Amin, Madanu, Lavu et al.

Since the beginning of the vaccination trial, social media has been flooded with anti-vaccination comments and conspiracy beliefs. As the day passes, the number of COVID- 19 cases increases, and online platforms and a few news portals entertain sharing different conspiracy theories. The most popular conspiracy belief was the link between the 5G network spreading COVID-19 and the Chinese government spreading the virus as a bioweapon, which initially created racial hatred. Although some disbelief has less impact on society, others create massive destruction. For example, the 5G conspiracy led to the burn of the 5G Tower, and belief in the Chinese bioweapon story promoted an attack on the Asian-Americans. Another popular conspiracy belief was that Bill Gates spread this Coronavirus disease (COVID-19) by launching a mass vaccination program to track everyone. This Conspiracy belief creates distrust issues among laypeople and creates vaccine hesitancy. This study aims to discover the conspiracy theory against the vaccine on social platforms. We performed a sentiment analysis on the 598 unique sample comments related to COVID-19 vaccines. We used two different models, BERT and Perspective API, to find out the sentiment and toxicity of the sentence toward the COVID-19 vaccine.

academic

Detecting Conspiracy Theory Against COVID-19 Vaccines

基本信息

论文ID: 2211.13003
标题: Detecting Conspiracy Theory Against COVID-19 Vaccines
作者: Md Hasibul Amin, Harika Madanu, Sahithi Lavu, Hadi Mansourifar, Dana Alsagheer, Weidong Shi (University of Houston)
分类: cs.CY (Computers and Society), cs.AI, cs.CL, cs.LG, cs.SI
发表时间: 2022年11月20日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2211.13003

摘要

自疫苗试验开始以来，社交媒体充斥着反疫苗言论和阴谋论信念。随着COVID-19病例数量的增加，在线平台和一些新闻门户网站传播各种阴谋论。最流行的阴谋论包括5G网络传播COVID-19、中国政府将病毒作为生物武器传播等，这些最初引发了种族仇恨。虽然某些不信任对社会影响较小，但其他一些造成了巨大破坏。例如，5G阴谋论导致了5G基站被烧毁，对中国生物武器故事的信念促进了对亚裔美国人的攻击。另一个流行的阴谋论是比尔·盖茨通过启动大规模疫苗接种计划来追踪每个人从而传播COVID-19。这种阴谋论信念在普通民众中造成了不信任问题并导致疫苗犹豫。本研究旨在发现社交平台上针对疫苗的阴谋论。研究者对598条与COVID-19疫苗相关的独特样本评论进行了情感分析，使用BERT和Perspective API两种不同模型来识别句子对COVID-19疫苗的情感和毒性。

研究背景与动机

问题定义

本研究要解决的核心问题是如何自动检测和识别社交媒体上针对COVID-19疫苗的阴谋论言论。具体包括：

识别反疫苗情绪和阴谋论观点
评估评论的毒性和攻击性程度
理解公众对疫苗的态度分布

问题重要性

该问题具有重要的社会意义：

公共健康威胁：根据WHO数据，截至2022年9月，全球已有6.13亿人感染COVID-19，超过650万人死亡
社会破坏性：阴谋论导致实际暴力事件，如5G基站被烧毁、亚裔美国人遭受攻击
疫苗犹豫：虚假信息造成公众对疫苗的不信任，阻碍大规模疫苗接种计划
信息传播速度：研究显示假新闻的传播速度比真实新闻快100万倍

现有方法局限性

检测复杂性：社交媒体用户使用表情符号、独特术语和符号表达观点，增加了文本分类的复杂性
语言结构多样性：不同语言的句子结构和情感表达方式差异很大
标注困难：某些情况下很难区分哪些评论是有效的，哪些是虚假的

核心贡献

构建了COVID-19疫苗阴谋论检测数据集：收集并标注了598条来自北美地区社交媒体的英文评论
提出了双模型检测框架：结合BERT模型和Google Perspective API进行情感分析和毒性检测
进行了全面的对比实验：使用三种不同分类器（逻辑回归、XGBoost、高斯朴素贝叶斯）评估模型性能
提供了阴谋论检测的基准结果：为后续研究提供了可参考的基线性能

方法详解

任务定义

输入：社交媒体上关于COVID-19疫苗的文本评论
输出：二分类标签（0：中性或支持疫苗，1：反对疫苗/阴谋论）
附加输出：毒性评分、攻击性评分等多维度评估指标

数据收集与预处理

数据收集：
- 初始收集950条用户评论
- 来源：各种在线新闻门户网站及其Facebook页面
- 采用人工收集方式
数据清洗：
- 移除重复和近似重复的评论
- 过滤非英文评论
- 最终保留598条样本评论
数据标注：
- 人工阅读并标注所有评论
- 二分类标签：0（中性/支持）和1（反对/阴谋论）
- 确保标签分布均衡
预处理步骤：
- 移除噪声和停用词
- 转换为小写
- 修正常见缩写（如vac→vaccine, CVD→Covid）

模型架构

BERT模型

模型选择：BERT-Base, Uncased
架构参数：
- 12层transformer
- 768个隐藏单元
- 12个注意力头
- 1.1亿参数
特点：
- 双向编码器表示
- 使用WordPiece嵌入，词汇量30,000
- 句子级向量训练，从上下文中提取更多信息

Google Perspective API

功能：使用机器学习技术识别滥用评论
检测维度：
- 毒性（Toxicity）
- 严重性（Severe）
- 身份攻击（Identity Attack）
- 侮辱（Insult）
- 亵渎（Profanity）
- 威胁（Threat）
- 性暗示（Sexually Explicit）
- 调情（Flirtation）
输出：每个维度的0-1评分