2025-11-10T02:45:09.159666

ProtoTopic: Prototypical Network for Few-Shot Medical Topic Modeling

Licht, Ketabi, Khalvati

Topic modeling is a useful tool for analyzing large corpora of written documents, particularly academic papers. Despite a wide variety of proposed topic modeling techniques, these techniques do not perform well when applied to medical texts. This can be due to the low number of documents available for some topics in the healthcare domain. In this paper, we propose ProtoTopic, a prototypical network-based topic model used for topic generation for a set of medical paper abstracts. Prototypical networks are efficient, explainable models that make predictions by computing distances between input datapoints and a set of prototype representations, making them particularly effective in low-data or few-shot learning scenarios. With ProtoTopic, we demonstrate improved topic coherence and diversity compared to two topic modeling baselines used in the literature, demonstrating the ability of our model to generate medically relevant topics even with limited data.

academic

ProtoTopic: 소수 샷 의료 주제 모델링을 위한 원형 네트워크

기본 정보

논문 ID: 2510.13542
제목: ProtoTopic: Prototypical Network for Few-Shot Medical Topic Modeling
저자: Martin Licht, Sara Ketabi, Farzad Khalvati
분류: cs.LG (머신러닝)
발표 시간: 2025년 10월 15일
논문 링크: https://arxiv.org/abs/2510.13542v1

초록

주제 모델링은 대규모 문서 말뭉치(특히 학술 논문)를 분석하는 데 유용한 도구입니다. 다양한 주제 모델링 기법이 존재하지만, 의료 텍스트에 적용할 때 성능이 저하되는데, 이는 의료 분야의 특정 주제에 대해 사용 가능한 문서 수가 제한적이기 때문일 수 있습니다. 본 논문은 의료 논문 초록의 주제 생성을 위한 원형 네트워크 기반 주제 모델인 ProtoTopic을 제안합니다. 원형 네트워크는 입력 데이터 포인트와 원형 표현 집합 간의 거리를 계산하여 예측하는 효율적이고 해석 가능한 모델로, 저데이터 또는 소수 샷 학습 시나리오에서 특히 효과적입니다. ProtoTopic을 통해 저자들은 문헌의 두 가지 주제 모델링 기준선과 비교하여 개선된 주제 일관성과 다양성을 보여주며, 제한된 데이터에서도 의료 관련 주제를 생성할 수 있는 모델의 능력을 입증합니다.

연구 배경 및 동기

문제 정의

핵심 문제: 기존 주제 모델링 기법이 의료 텍스트에서 성능이 저하되며, 특히 데이터 부족 상황에서 그러합니다
중요성: 의료 문헌의 급속한 증가로 인해 연구자와 임상의가 관련 정보를 빠르게 검색하고 찾을 수 있도록 돕는 효과적인 주제 모델링 도구가 필요합니다
기존 방법의 한계:
- 훈련 데이터 부족: 임상 환경에서 고품질 훈련 데이터가 부족합니다
- 해석 가능성 부족: 대부분의 최신 기술 모델은 블랙박스 모델입니다
- 의료 용어의 특수성: 의료 텍스트는 특정 용어와 형식 차이를 가집니다