2025-11-24T02:19:18.891948

Leveraging Twitter Data for Sentiment Analysis of Transit User Feedback: An NLP Framework

Das, Prajapati, Zhang et al.

Traditional methods of collecting user feedback through transit surveys are often time-consuming, resource intensive, and costly. In this paper, we propose a novel NLP-based framework that harnesses the vast, abundant, and inexpensive data available on social media platforms like Twitter to understand users' perceptions of various service issues. Twitter, being a microblogging platform, hosts a wealth of real-time user-generated content that often includes valuable feedback and opinions on various products, services, and experiences. The proposed framework streamlines the process of gathering and analyzing user feedback without the need for costly and time-consuming user feedback surveys using two techniques. First, it utilizes few-shot learning for tweet classification within predefined categories, allowing effective identification of the issues described in tweets. It then employs a lexicon-based sentiment analysis model to assess the intensity and polarity of the tweet sentiments, distinguishing between positive, negative, and neutral tweets. The effectiveness of the framework was validated on a subset of manually labeled Twitter data and was applied to the NYC subway system as a case study. The framework accurately classifies tweets into predefined categories related to safety, reliability, and maintenance of the subway system and effectively measured sentiment intensities within each category. The general findings were corroborated through a comparison with an agency-run customer survey conducted in the same year. The findings highlight the effectiveness of the proposed framework in gauging user feedback through inexpensive social media data to understand the pain points of the transit system and plan for targeted improvements.

academic

Leveraging Twitter Data for Sentiment Analysis of Transit User Feedback: An NLP Framework

基本信息

论文ID: 2310.07086
标题: Urban Echoes: Decoding Transit Riders' Sentiments on Social Media for Smarter Mobility
作者: Adway Das, Abhishek Kumar Prajapati, Pengxiang Zhang, Mukund Srinath, Andisheh Ranjbari
所属机构: The Pennsylvania State University, Optym Inc.
分类: cs.AI cs.SI
发表时间: 2023年10月 (arXiv v2: 2025年10月)
论文链接: https://arxiv.org/abs/2310.07086v2

摘要

传统的公交调查耗费大量资源且耗时，限制了其有效解决特定地点问题的能力。本研究提出了一个基于NLP的框架，利用Twitter（现为X）的实时数据作为预筛选工具来优化和定向公交机构调查。该框架采用两步方法：Few-Shot学习将推文分类为安全、可靠性和维护等类别，而基于词典的情感分析模型评估情感极性（正面、负面、中性）和强度。此外，空间分析将情感趋势映射到特定地理区域，使公交机构能够精确定位和优先处理问题区域。

研究背景与动机

核心问题

传统调查的局限性：公交用户反馈调查成本高昂、耗时且地理覆盖有限。研究显示，公交机构进行调查的人均成本约为36美元，中等规模调查的平均总成本约为35万美元。
社交媒体数据的潜力：Twitter拥有超过3.3亿活跃用户，每天产生约5亿条推文，为大规模实时洞察用户情感和体验提供了独特机会。
地理精确性需求：社交媒体数据可以揭示特定位置的问题和情感，使公交机构能够识别不同社区的独特需求和挑战。

研究重要性

资源优化：通过社交媒体数据预筛选，可以大幅降低调查成本并提高效率
实时监控：能够持续监控公众意见并用于决策制定
空间精确性：识别高关注区域进行定向干预
交通公平：确保所有社区都能获得安全可靠的交通选择

核心贡献

提出了创新的NLP框架：结合Few-Shot学习和VADER情感分析的多面方法
实现了精确的推文分类：将推文分类为维护、安全、调度等服务相关类别
提供了空间-时间分析：识别特定地理位置的反复投诉或关注点
验证了框架有效性：通过NYC地铁系统案例研究和MTA官方调查对比验证
构建了可扩展的解决方案：适用于不同地区、时间和多种服务提供商

方法详解

任务定义

输入：Twitter推文文本、时间戳、地理标签输出：推文类别分类、情感极性和强度评分、空间分布分析 约束条件：推文必须与公交系统相关，需要处理非正式语言和社交媒体特有表达

模型架构

1. 数据收集与预处理

数据来源：通过Twitter API和snscrape工具收集
搜索策略：使用10个独特搜索词（"MTA"、"NYC SUBWAY"等）和12个相关位置
过滤处理：去除重复推文和嵌入链接
数据规模：从102,530条推文中随机抽样36,000条进行分析

2. Few-Shot学习分类模块

模型选择：OpenAI GPT-3.5 Turbo 分类类别：

清洁与维护：讨论地铁系统清洁和维护问题
调度与运营：涉及地铁时刻表、延误、准时性等
安全与保障：突出用户安全和保障相关关注
其他：与公交系统用户体验无关的推文

Few-Shot设置：每个类别使用5个样本进行训练，在性能和资源效率间取得平衡

3. VADER情感分析模块

核心原理：基于预构建的情感词典，将词汇特征映射到情感强度评分 评分范围：词级评分-4到4，句级复合评分-1到+1 标准化公式： $CSC_i = \frac{x_i}{\sqrt{x_i^2 + \alpha}}$ 其中 $x_i$ 是推文i中构成词汇的情感评分总和， $\alpha=15$ 为标准化参数