2025-11-12T11:16:10.224319

DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images

Dalal, Vashishtha, Rani et al.

The rise in harmful online content not only distorts public discourse but also poses significant challenges to maintaining a healthy digital environment. In response to this, we introduce a multimodal dataset uniquely crafted for identifying hate in digital content. Central to our methodology is the innovative application of watermarked, stability-enhanced, stable diffusion techniques combined with the Digital Attention Analysis Module (DAAM). This combination is instrumental in pinpointing the hateful elements within images, thereby generating detailed hate attention maps, which are used to blur these regions from the image, thereby removing the hateful sections of the image. We release this data set as a part of the dehate shared task. This paper also describes the details of the shared task. Furthermore, we present DeHater, a vision-language model designed for multimodal dehatification tasks. Our approach sets a new standard in AI-driven image hate detection given textual prompts, contributing to the development of more ethical AI applications in social media.

academic

DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images

基本信息

论文ID: 2509.21787
标题: DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images
作者: Dwip Dalal, Gautam Vashishtha, Anku Rani, Aishwarya Reganti, Parth Patwa, Mohd Sarique, Chandan Gupta, Keshav Nath, Viswanatha Reddy, Vinija Jain, Aman Chadha, Amitava Das, Amit Sheth, Asif Ekbal
分类: cs.CV cs.CL
发表会议: Defactify 3: Third Workshop on Multimodal Fact Checking and Hate Speech Detection, co-located with AAAI 2024
论文链接: https://arxiv.org/abs/2509.21787

数字环境健康: 网络仇恨内容的激增严重影响公共话语质量
AI伦理: 训练数据中的仇恨内容直接影响AI系统的可信度和伦理完整性
社会责任: 需要开发负责任的AI系统来应对社交媒体中的仇恨言论

现有方法局限性

缺乏高质量的多模态仇恨言论检测数据集
现有方法主要关注文本或图像单一模态，缺乏有效的多模态融合
缺乏针对性的仇恨内容定位和去除技术

研究动机

基于对高质量数据集的需求和多模态仇恨言论检测的技术挑战，本文旨在构建一个创新的数据集和方法框架，推动负责任AI的发展。

核心贡献

创新的数据集构建方法: 提出了基于Stable Diffusion和DAAM的多模态仇恨言论数据集生成方法
多模态去仇恨化模型: 设计了DeHater模型，能够在文本提示指导下进行图像仇恨内容的无监督掩码
共享任务组织: 发布了包含2411个实例的DeHate数据集，并组织了相关的共享任务
技术方法创新: 结合了CLIP编码器、U-Net架构和FiLM调制技术的创新架构设计

Hatenorm数据集: 使用手工标注的仇恨文本及其规范化版本的平行语料库
Stable Diffusion生成: 利用stable-diffusion-2-base模型将仇恨文本转换为视觉表示

核心技术流程

图像生成: 从仇恨文本提取关键词构建提示，使用Stable Diffusion生成对应图像
注意力图生成: 应用DAAM技术生成热力图，突出显示特定像素与提示组件的相关性
选择性模糊:
- 计算全局热力图值并建立阈值生成二值掩码
- 对高热力图值像素设置为黑色(0,0,0)
- 对标记像素计算局部邻域平均颜色并应用

CLIP编码器:
- 使用冻结的CLIP模型作为编码器
- 利用其在多样化图像-文本对上的预训练优势
- 提取丰富的多模态特征表示
U-Net启发的连接:
- 采用U-Net架构的跳跃连接设计
- 将CLIP编码器的局部信息传递给解码器
- 保持解码器紧凑性的同时保留关键细节
特征整合机制:
- 将编码器激活(包括CLS token)整合到解码器的每个transformer块
- 丰富解码器对上下文的理解
FiLM调制:
- 使用Feature-wise Linear Modulation技术
- 通过条件向量调制解码器输入激活
- 增强解码器聚焦和准确分割仇恨内容的能力
可学习投影网络:
- 将多个仇恨片段嵌入组合成单一投影
- 实现多样化仇恨元素的细致有效压缩