2025-11-18T22:34:14.672296

FactAppeal: Identifying Epistemic Factual Appeals in News Media

Mor-Lan, Sheafer, Shenhav
How is a factual claim made credible? We propose the novel task of Epistemic Appeal Identification, which identifies whether and how factual statements have been anchored by external sources or evidence. To advance research on this task, we present FactAppeal, a manually annotated dataset of 3,226 English-language news sentences. Unlike prior resources that focus solely on claim detection and verification, FactAppeal identifies the nuanced epistemic structures and evidentiary basis underlying these claims and used to support them. FactAppeal contains span-level annotations which identify factual statements and mentions of sources on which they rely. Moreover, the annotations include fine-grained characteristics of factual appeals such as the type of source (e.g. Active Participant, Witness, Expert, Direct Evidence), whether it is mentioned by name, mentions of the source's role and epistemic credentials, attribution to the source via direct or indirect quotation, and other features. We model the task with a range of encoder models and generative decoder models in the 2B-9B parameter range. Our best performing model, based on Gemma 2 9B, achieves a macro-F1 score of 0.73.
academic

FactAppeal: Identifying Epistemic Factual Appeals in News Media

Basic Information

  • Paper ID: 2510.10627
  • Title: FactAppeal: Identifying Epistemic Factual Appeals in News Media
  • Authors: Guy Mor-Lan, Tamir Sheafer, Shaul R. Shenhav (Hebrew University of Jerusalem)
  • Classification: cs.CL (Computational Linguistics)
  • Publication Date: October 12, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.10627

Abstract

This paper proposes a novel task—Epistemic Appeal Identification—aimed at identifying whether and how factual statements are supported by external sources or evidence. To advance research on this task, the authors construct the FactAppeal dataset, containing manually annotated 3,226 English news sentences. Unlike previous resources that focused solely on claim detection and verification, FactAppeal identifies fine-grained epistemic structures and evidence foundations supporting these claims. The dataset includes span-level annotations identifying factual statements and the source mentions they depend upon. Additionally, annotations include fine-grained features of factual appeals, such as source types (e.g., active participants, witnesses, experts, direct evidence), whether sources are named, source roles, mentions of epistemic credentials, and attribution to sources through direct or indirect quotation. The authors model this task using encoder models and generative decoder models in the 2B-9B parameter range, with the best-performing model based on Gemma 2 9B achieving a macro-averaged F1 score of 0.73.

Research Background and Motivation

Problem Definition

In an era of misinformation proliferation and widespread skepticism toward media reporting, understanding how factual claims are presented has become increasingly important. The credibility of factual statements depends not only on their content but also on how they appeal to external knowledge sources—whether through expert testimony, official statements, or direct experiential evidence.

Limitations of Existing Approaches

Despite substantial progress in claim detection and verification research, existing methods primarily focus on isolated statement content, neglecting the epistemic structures that confer credibility and persuasiveness to these claims. Traditional factuality detection frameworks lack deep understanding of how claims are constructed and supported in news media.

Research Motivation

  1. Need for Epistemic Structure Analysis: Understanding how factual statements gain support through external authoritative sources
  2. Media Credibility Research: Analyzing knowledge flow and verification mechanisms in news media
  3. Improved Automated Fact-Checking: Providing foundations for more context-aware fact-checking
  4. Social Science Applications: Providing tools for research in political philosophy, social epistemology, and communication studies

Core Contributions

  1. Novel Task Formulation: First-time definition of the epistemic appeal identification task, transcending traditional factuality detection by introducing rich epistemic reasoning layers
  2. Annotated Dataset Construction: Creation of the FactAppeal dataset with fine-grained span-level annotations for 3,226 news sentences
  3. Classification Framework Development: Development of a structured epistemic appeal taxonomy based on source-event proximity (internal vs. external) and source type (human vs. non-human)
  4. Baseline Model Implementation: Establishment of task baselines using encoder and generative decoder models, with the best model achieving 0.73 macro-averaged F1 score
  5. Interdisciplinary Value: Provision of important tools for computational linguistics, social sciences, and media research

Methodology Details

Task Definition

The epistemic appeal identification task requires:

  1. Determining whether a sentence presents a factual statement
  2. If so, identifying how it invokes external sources or evidence to support the statement
  3. Identifying the sources of epistemic authority
  4. Classifying the type and manner of appeal

Annotation Scheme

Main Label Types

  1. Fact Without Appeal: Factual statements without epistemic appeal
  2. Fact With Appeal: Factual statements with epistemic appeal
    • Modifiers: Direct quote / Indirect quote
  3. Source: Epistemic source to which the statement is attributed
    • Named status: Named / Unnamed
    • Source type: 7-type classification
  4. Source Attribute: Relevant epistemic attributes of the source
  5. Recipient: Object receiving the information
  6. Appeal Time: Time when the appeal occurs
  7. Appeal Location: Location where the appeal occurs

Source Type Classification System

Classification constructed based on two dimensions:

  • Proximity to Event: Internal (direct contact) vs. External (general professional knowledge)
  • Source Nature: Human vs. Non-human

Internal Sources (based on direct contact):

  • Active Participant: Active participant in the event
  • Witness: Observer providing first-hand testimony
  • Official: Participant with legal, political, or bureaucratic authority
  • Direct Evidence: Direct evidence found at the scene

External Sources (based on professional knowledge):

  • Expert: Scientist or specialist with professional expertise
  • Expert Document: Research documents, scientific and institutional reports
  • News Report: References to previous news reports

Technical Innovations

  1. Span-level Annotation: Allows distinction between factual appeals, appeals-free facts, and non-factual components within single texts
  2. Nested Label Support: Different label types can be nested, supporting complex epistemic structures
  3. Fine-grained Features: Capturing multi-dimensional information including source type, named status, quotation manner, etc.
  4. Epistemic Authority Classification: Systematized source classification framework based on cognitive theory

Experimental Setup

Dataset

  • Scale: 3,226 sentences from English news articles (2020-2022)
  • Annotators: Two annotators (one author and research assistant)
  • Data Split: Training 70%, development 15%, test 15%
  • Annotation Agreement: Overall IoU of 0.74, Cohen's Kappa of 0.82

Evaluation Metrics

  • Token-level macro-averaged precision, recall, and F1 scores
  • Multi-label binary classification evaluation across 18 label categories

Baseline Methods

Encoder Models (token-level multi-label classification):

  • RoBERTa (base, 125M)
  • DeBERTa v3 (base, 184M)
  • ModernBERT (base, 150M)

Generative Decoder Models (sequence-to-sequence):

  • Gemma 2 (2B, 9B)
  • Llama 3.1 (8B)
  • Mistral v0.3 (7B)

Implementation Details

  • Encoder Models: Trained with focal loss for up to 12 epochs
  • Decoder Models: Fine-tuned with QLoRA using 4-bit quantization, trained for 3 epochs
  • Hardware: Single A100 GPU (40GB VRAM)
  • Learning Rate: 1e-5

Experimental Results

Main Results

ModelPrecisionRecallF1
Gemma 2 9B0.760.730.73
RoBERTa (base)0.750.670.70
Mistral v0.3 7B0.730.680.70
DeBERTa v3 (base)0.730.670.69
Llama 3.1 8B0.750.650.68

Key Findings

  1. Generative Model Advantages: The largest decoder model Gemma 2 9B achieves best performance
  2. Encoder Model Limitations: Encoder models show greater performance variance across categories
  3. Label Frequency Impact: Encoder model performance shows stronger correlation with label counts (ρs = 0.72 vs 0.66)
  4. Source Type Detection: Source type annotation performance shows lower correlation with label popularity

Per-Category Performance Analysis

  • Factuality Detection: Fact w/o Appeal (0.89), Fact with Appeal (0.85)
  • Source Detection: Source (0.84), Source Attribute (0.79)
  • Quotation Type: Indirect Quote (0.83), Direct Quote (0.80)
  • Source Type: Greater performance variance, Active Participant (0.54), News Report (0.68)

Dataset Statistics

  • Factual Sentence Proportion: Over 80% of sentences annotated as factual
  • Appeal Type Distribution: Appeal-free facts approximately twice as frequent as facts with appeals
  • Quotation Method: 66% use paraphrasing, 34% use direct quotation
  • Named Status: 64% of sources are named mentions

Claim Verification Research

  • Early Work: Focus on determining verifiable events (Sauri and Pustejovsky, 2009)
  • Large-scale Benchmarks: FEVER, SciFact, FactRel and other datasets
  • Limitations: Primarily focus on claim detection and inter-claim relationships, lacking complete epistemic pattern descriptions

Epistemic Modality and Argument Mining

  • Epistemic Modality: Capturing linguistic markers of certainty and belief
  • Argument Mining: Exploring how claims are constructed and supported in discourse
  • Epistemic Stance Detection: Modeling source commitment to claims

Source Attribution and Citation Analysis

  • Citation Detection: Detecting citations and attributing them to entities
  • Limitations: Typically do not classify sources by type or capture whether appeals invoke direct speech or paraphrasing

Conclusions and Discussion

Main Conclusions

  1. Task Feasibility: The epistemic appeal identification task is feasible but remains challenging
  2. Generative Model Advantages: Generative models perform better at handling complex epistemic structures
  3. Fine-grained Analysis Value: Span-level annotation reveals complex epistemic structures in news media

Limitations

  1. Sentence-level Constraint: Use of only sentence-level annotations limits contextual information capture
  2. Source-Claim Linking: Current annotations do not explicitly link each source to its corresponding claim
  3. Language and Temporal Scope: Limited to English news articles from 2020-2022
  4. Annotation Scale: Relatively small dataset size may impact model generalization

Future Directions

  1. Extension to Paragraph/Article Level: Modeling complex discourse structures in larger text units
  2. Multilingual Extension: Application to other languages and cultural contexts
  3. Source-Claim Relationship Modeling: Explicit modeling of correspondences between sources and claims
  4. Social Media Application: Extension to other discourse types such as social media
  5. Temporal Dynamics Analysis: Investigating temporal changes in epistemic appeal patterns

In-Depth Evaluation

Strengths

  1. Task Innovation: First systematic definition and study of epistemic appeal identification task, filling an important research gap
  2. Solid Theoretical Foundation: Classification framework based on cognitive and linguistic theory with strong theoretical grounding
  3. High Annotation Quality: Fine-grained span-level annotations with good inter-annotator agreement (Kappa=0.82)
  4. Interdisciplinary Value: Provision of valuable resources for computational linguistics, political science, communication studies, and other fields
  5. Comprehensive Experiments: Comparison of multiple model architectures with detailed performance analysis

Weaknesses

  1. Dataset Scale Limitation: 3,226 sentences is relatively small, potentially limiting model performance and generalization
  2. Annotation Complexity: Some label categories have sparse samples, affecting model learning
  3. Single Evaluation Metric: Primarily uses F1 score, lacking task-specific evaluation metrics
  4. Insufficient Error Analysis: Lack of in-depth analysis of model error types
  5. Unvalidated Real-world Application: Effectiveness not verified on actual fact-checking or media analysis tasks

Impact

  1. Academic Contribution: Opens new research directions in natural language processing
  2. Practical Value: Applicable to automated fact-checking, media bias detection, knowledge graph construction, and other tasks
  3. Social Significance: Helps understand and analyze information dissemination and verification mechanisms in media
  4. Reproducibility: Public release of data and code facilitates subsequent research

Applicable Scenarios

  1. News Media Analysis: Analyzing evidence usage patterns in news reporting
  2. Fact-Checking Assistance: Providing richer contextual information for automated fact-checking systems
  3. Media Literacy Education: Helping identify and analyze epistemic appeal strategies in media
  4. Political Discourse Analysis: Studying authority appeal patterns in political communication
  5. Knowledge Graph Construction: Providing foundations for constructing knowledge graphs with evidence relationships

References

  • Thorne et al. (2018): FEVER dataset for large-scale fact extraction and verification
  • Sauri and Pustejovsky (2009): Early factuality detection work
  • Da San Martino et al. (2019): Fine-grained analysis of propaganda techniques
  • Collins and Evans (2002): Third wave of research on expertise and experience
  • Anderson (2021): Epistemic bubbles and authoritarian politics

This paper makes pioneering contributions to the emerging task of epistemic appeal identification, providing not only a high-quality annotated dataset but also establishing a systematic theoretical framework and experimental baselines. While there remains room for improvement in dataset scale and model performance, its interdisciplinary research value and practical application potential make it an important work in the field.