2025-11-10T02:45:09.159666

ProtoTopic: Prototypical Network for Few-Shot Medical Topic Modeling

Licht, Ketabi, Khalvati

Topic modeling is a useful tool for analyzing large corpora of written documents, particularly academic papers. Despite a wide variety of proposed topic modeling techniques, these techniques do not perform well when applied to medical texts. This can be due to the low number of documents available for some topics in the healthcare domain. In this paper, we propose ProtoTopic, a prototypical network-based topic model used for topic generation for a set of medical paper abstracts. Prototypical networks are efficient, explainable models that make predictions by computing distances between input datapoints and a set of prototype representations, making them particularly effective in low-data or few-shot learning scenarios. With ProtoTopic, we demonstrate improved topic coherence and diversity compared to two topic modeling baselines used in the literature, demonstrating the ability of our model to generate medically relevant topics even with limited data.

academic

ProtoTopic: 少数ショット医学トピックモデリングのための原型ネットワーク

基本情報

論文ID: 2510.13542
タイトル: ProtoTopic: Prototypical Network for Few-Shot Medical Topic Modeling
著者: Martin Licht, Sara Ketabi, Farzad Khalvati
分類: cs.LG（機械学習）
発表日: 2025年10月15日
論文リンク: https://arxiv.org/abs/2510.13542v1

要約

トピックモデリングは、特に学術論文などの大規模文書コーパスを分析するための有用なツールである。既存の多くのトピックモデリング技術が存在するが、医学テキストに適用した場合、これらの技術は不十分な性能を示す。これは医療分野における特定のトピックに対して利用可能な文書数が限定されていることが原因である可能性がある。本論文ではProtoTopicを提案する。これは医学論文の要約からトピックを生成するための原型ネットワークに基づくトピックモデルである。原型ネットワークは、入力データポイントと原型表現集合間の距離を計算することで予測を行う効率的で解釈可能なモデルであり、データが限定的または少数ショット学習シナリオにおいて特に有効である。ProtoTopicを通じて、著者らは文献における2つのトピックモデリング基準モデルと比較して改善されたトピック一貫性と多様性を実証し、限定的なデータ下においても医学関連トピックを生成するモデルの能力を証明している。

研究背景と動機

問題定義

中核的問題：既存のトピックモデリング技術は医学テキストにおいて不十分な性能を示す。特にデータが稀少な場合である
重要性：医学文献の急速な増加は、研究者と臨床医が関連情報を迅速に選別し検索するのを支援する効果的なトピックモデリングツールを必要とする
既存手法の限界：
- 訓練データの不足：臨床環境では高品質な訓練データが稀少である
- 解釈可能性の欠如：ほとんどの最先端モデルはブラックボックスモデルである
- 医学用語の特殊性：医学テキストは特定の用語と形式の差異を有する