2025-11-18T15:52:13.572441

Detection and Prevention of Smishing Attacks

Goel

Phishing is an online identity theft technique where attackers steal users personal information, leading to financial losses for individuals and organizations. With the increasing adoption of smartphones, which provide functionalities similar to desktop computers, attackers are targeting mobile users. Smishing, a phishing attack carried out through Short Messaging Service (SMS), has become prevalent due to the widespread use of SMS-based services. It involves deceptive messages designed to extract sensitive information. Despite the growing number of smishing attacks, limited research focuses on detecting these threats. This work presents a smishing detection model using a content-based analysis approach. To address the challenge posed by slang, abbreviations, and short forms in text communication, the model normalizes these into standard forms. A machine learning classifier is employed to classify messages as smishing or ham. Experimental results demonstrate the model effectiveness, achieving classification accuracies of 97.14% for smishing and 96.12% for ham messages, with an overall accuracy of 96.20%.

academic

Detection and Prevention of Smishing Attacks

基本信息

论文ID: 2501.00260
标题: Detection and Prevention of Smishing Attacks
作者: Diksha Goel (Roll No.: 31603217)
导师: Mr. Ankit Kumar Jain (Assistant Professor)
分类: cs.CR cs.SI
发表时间: June 2018 (Master of Technology Dissertation)
机构: Department of Computer Engineering, National Institute of Technology Kurukshetra-136119, Haryana (India)
论文链接: https://arxiv.org/abs/2501.00260

主要问题：随着智能手机用户激增（预计2020年达28.7亿），SMS成为攻击者进行钓鱼攻击的主要渠道。Smishing攻击利用用户对SMS的高信任度（35%的用户认为SMS是最可信的消息平台）进行诈骗。
问题重要性：
- 33%的移动用户收到过smishing消息
- 42%的移动用户会点击恶意链接
- 智能手机用户遭受钓鱼攻击的风险是桌面用户的3倍
- 2017年45%用户收到smishing消息，较2016年增长2%
现有方法局限性：
- 垃圾短信检测技术较多，但专门针对smishing的研究较少
- 文本中的俚语、缩写和简写形式降低了分类器效率
- 缺乏有效的文本标准化处理机制
研究动机：
- 移动设备硬件限制（小屏幕、缺乏安全指示器）增加了攻击成功率
- 需要在保护用户隐私的前提下有效检测smishing攻击
- 现有解决方案准确率有待提高

核心贡献

提出了完整的smishing安全模型：基于内容分析的两阶段检测框架
创新的文本标准化方法：使用NoSlang词典处理俚语、缩写和简写，显著提高分类准确率
全面的移动钓鱼攻击分类法：系统梳理了7大类移动钓鱼攻击方式
优异的检测性能：在公开数据集上实现96.20%的总体准确率
深入的文献综述：提供了移动钓鱼攻击和防御机制的全面分析

Algorithm 1: Preprocessing and Normalization Algorithm
Input: msg (message), dict (NoSlang dictionary), stop (stop words)
Output: n_msg (preprocessed and normalized message)

具体步骤：

分词（Tokenization）：将文本分割为token
小写化（Lowercasing）：统一转换为小写
标准化（Normalization）：使用NoSlang词典替换俚语和缩写
停用词移除：删除153个NLTK英语停用词
词干提取（Stemming）：还原词汇到根形式

阶段2：分类

Algorithm 2: Classification Algorithm
Input: D (dataset), n_msg (preprocessed and normalized message)
Output: ham or smishing message

贝叶斯分类器：使用朴素贝叶斯定理进行分类：

$p(C_k|x) = \frac{p(x|C_k)p(C_k)}{p(x)}$

其中：

$p(C_k|x)$ ：给定特征x下属于类别 $C_k$ 的后验概率
$p(x|C_k)$ ：给定类别 $C_k$ 下特征x的似然概率
$p(C_k)$ ：类别 $C_k$ 的先验概率

技术创新点

文本标准化创新：
- 首次将NoSlang词典应用于smishing检测
- 系统性处理SMS中的非正式语言表达
- 显著提升分类器对变形文本的识别能力
两阶段处理框架：
- 预处理阶段确保文本一致性
- 分类阶段基于标准化文本进行准确判断
隐私保护设计：
- 本地处理，不涉及第三方服务
- 仅基于文本内容特征，不收集用户个人信息

实验设置

数据集

数据源：SMS Spam Dataset v.1（公开数据集）
原始规模：5574条消息（4827条ham，747条spam）
处理后规模：5169条消息（4807条ham，362条smishing）
数据来源：
- Grumbletext网站：425条spam
- Caroline Tag博士论文：450条ham
- NUS SMS Corpus：3375条ham
- SMS Spam Corpus v.0.1：1002条ham，322条spam
- Pinterest收集：71条smishing