2025-11-14T15:49:11.600012

When or What? Understanding Consumer Engagement on Digital Platforms

Wu, Liang
Understanding what drives popularity is critical in today's digital service economy, where content creators compete for consumer attention. Prior studies have primarily emphasized the role of content features, yet creators often misjudge what audiences actually value. This study applies Latent Dirichlet Allocation (LDA) modeling to a large corpus of TED Talks, treating the platform as a case of digital service provision in which creators (speakers) and consumers (audiences) interact. By comparing the thematic supply of creators with the demand expressed in audience engagement, we identify persistent mismatches between producer offerings and consumer preferences. Our longitudinal analysis further reveals that temporal dynamics exert a stronger influence on consumer engagement than thematic content, suggesting that when content is delivered may matter more than what is delivered. These findings challenge the dominant assumption that content features are the primary drivers of popularity and highlight the importance of timing and contextual factors in shaping consumer responses. The results provide new insights into consumer attention dynamics on digital platforms and carry practical implications for marketers, platform managers, and content creators seeking to optimize audience engagement strategies.
academic

When or What? Understanding Consumer Engagement on Digital Platforms

Basic Information

  • Paper ID: 2510.10474
  • Title: When or What? Understanding Consumer Engagement on Digital Platforms
  • Authors: Jingyi Wu (Zhejiang University), Junying Liang (Zhejiang University)
  • Classification: cs.CL (Computational Linguistics), cs.CY (Computers and Society)
  • Publication Date: October 12, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.10474

Abstract

This study investigates the drivers of consumer engagement on digital platforms. While prior research has emphasized the role of content characteristics, content creators often misjudge audience needs. The paper employs Latent Dirichlet Allocation (LDA) modeling to analyze a large-scale TED Talks corpus, treating the platform as a case of creator-consumer interaction in digital services. By comparing creator topic supply with audience engagement-expressed demand, the research identifies persistent mismatches between producer supply and consumer preferences. Longitudinal analysis further reveals that temporal dynamics have a stronger impact on consumer engagement than topic content, suggesting that "when" content is delivered may be more important than "what" content is delivered.

Research Background and Motivation

Core Research Question

The central question addressed by this study is: On digital platforms, are content characteristics ("what") or temporal factors ("when") more effective in driving consumer engagement?

Importance of the Problem

  1. Economic Value: Videos exceeding one million views on YouTube typically generate over 2,000inadvertisingrevenue,withtopcreatorsearningupto2,000 in advertising revenue, with top creators earning up to 54 million annually
  2. Intense Competition: YouTube hosts over 51 million channels, yet only a tiny fraction achieve the million-subscriber milestone
  3. Practical Necessity: Content creators, platform managers, and marketers urgently need to understand how to optimize audience engagement strategies

Limitations of Existing Approaches

  1. Overemphasis on Content Characteristics: Existing research primarily focuses on content quality and topic selection as intrinsic factors
  2. Neglect of Supply-Demand Mismatch: Lack of quantitative analysis of discrepancies between creator supply and audience demand
  3. Underestimation of Temporal Factors: Insufficient understanding of the impact of content publication timing and temporal dynamics

Research Motivation

Based on selective exposure theory and attention economics, this study hypothesizes that systematic preference differences exist between creators and audiences, and that temporal factors may be more important than content itself.

Core Contributions

  1. Proposed the "Difference Index" methodology: Quantifies preference differences between creators and audiences
  2. Challenged the traditional content-centric paradigm: Demonstrates that temporal dynamics have greater impact on audience engagement than topic content
  3. Constructed a large-scale TED Talks dataset: Comprising 4,475 talks from 2006-2022, totaling 8,065,104 words
  4. Provided practical strategic guidance: Data-driven optimization recommendations for content creators and platform managers

Methodology Details

Task Definition

Input: TED talk transcripts, view counts, publication year Output: Topic distribution, quantified preference differences, relative impact of temporal and topic factors on engagement Constraints: Analysis limited to English-language TED talks from 2006-2022

Model Architecture

1. LDA Topic Modeling

Documents → Preprocessing → LDA Model → 14 Topics
  • Preprocessing: Retained verbs, nouns, adjectives, adverbs; removed stopwords; tokenization
  • Topic Count: 14 topics selected based on perplexity
  • Topic Annotation: Manual semantic labeling based on high-frequency terms

2. Preference Quantification Method

Creator Preference: Proportion of videos on a topic relative to total videos in a given year Audience Preference: Log-transformed average view count for a topic

3. Difference Index Calculation

Difference Index_{topic,year} = |Average View Count_{topic,year}/Total View Counts_{year} - Video Counts_{topic,year}/Total Video Counts_{year}|

Difference Index_{year} = ∑_{topics} Difference Index_{topic,year}

Technical Innovations

  1. Multi-dimensional Analysis Framework: Simultaneously considers dual influences of topic content and temporal dynamics
  2. Supply-Demand Mismatch Quantification: First systematic quantification of discrepancies between creator supply and audience demand
  3. Longitudinal Comparative Analysis: Dynamic trend analysis spanning 17 years
  4. Statistical Model Validation: Beta regression modeling to verify relative importance of topic and temporal factors

Experimental Setup

Dataset

  • Data Source: Official TED website, strictly adhering to usage terms
  • Scale: 4,475 talks, 8,065,104 words
  • Time Span: 2006-2022
  • Variables: Talk transcripts, view counts, publication year

Data Preprocessing

  1. Text Cleaning: Removed words with fewer than 3 characters
  2. Stopword Processing: Based on NLTK stopword list, additionally removed 'kind', 'little', 'sort', etc.
  3. Data Normalization: Natural logarithm transformation applied to view counts to address skewed distribution

Evaluation Metrics

  • Topic Coherence: Semantic consistency based on high-frequency terms
  • Model Fit: Perplexity
  • Statistical Significance: Chi-square test, Kruskal-Wallis H test
  • Model Explanatory Power: Pseudo R² of Beta regression

Statistical Analysis Methods

  • Independence Testing: Chi-square test to assess association between topics and years
  • Non-parametric Testing: Kruskal-Wallis H test to compare view differences across topics
  • Regression Analysis: Beta regression to assess relative impact of topic and temporal factors
  • Correlation Analysis: Spearman correlation test to examine creator-audience preference association

Experimental Results

Main Findings

1. Topic Distribution Discoveries

Identified 14 topics with highly uneven distribution:

  • Popular Topics: Emotions (20.02%), Social Interaction (14.03%)
  • Scientific Topics: Universe (5.92%), Technology (5.90%), Brain (5.34%)
  • Niche Topics: Minorities (1.09%)

2. Creator Preference Analysis

  • Topic Factor More Important: Beta regression pseudo R²=0.361, topic coefficients generally exceed year coefficients
  • Preference Stability: Emotions (β=2.695) and Social Interaction (β=2.231) show highest coefficients
  • Temporal Sensitivity: Climate/Energy and Political topics significantly affected by time

3. Audience Preference Analysis

  • Temporal Factor More Important: Beta regression pseudo R²=0.249, year coefficients generally exceed topic coefficients
  • Popular Topics: Brain, Social Interaction, Minorities show highest average view counts
  • Supply-Demand Mismatch: Minority topics have lowest supply but highest demand

4. Preference Difference Quantification

  • Overall Weak Correlation: Spearman correlation coefficient r=0.143 (p=0.028)
  • Large Fluctuation in Differences: Annual difference indices show no clear trend with significant volatility
  • Topic Differences: Emotions, Minorities, and Brain topics show largest difference indices

Ablation Study Results

Residual Analysis Findings

  • Stable Topics: Arts, Healthcare unaffected by temporal factors
  • Sensitive Topics: Climate/Energy significantly increased in 2009, 2021, 2022
  • Event-Driven: Political topics peaked in 2020 (pandemic impact)

Beta Regression Model Comparison

Factor TypeCreator PreferenceAudience Preference
Topic ImpactStrong (Large coefficients)Moderate
Temporal ImpactWeak (Small coefficients)Strong
Model Explanatory Power36.1%24.9%

Case Studies

Successful Matching Cases

  • Political Topics: Relatively stable creator and audience preference curves with lower difference indices
  • Healthcare: As a universally relevant topic, supply-demand matching is good

Typical Mismatch Cases

  • Minority Topics: Severely undersupplied (1.09%) but high viewing demand
  • Emotions: Creator oversupply (20.02%) but moderate audience interest
  • Brain Science: Significant supply-demand disparity from 2016-2019

Major Research Directions

  1. Social Network Effects: Mechanisms of real social networks' influence on online popularity
  2. Content Characteristic Analysis: Popularity prediction based on tags and topics
  3. Selective Exposure Theory: Relationship between user preferences and content selection
  4. Recommendation Algorithm Impact: Algorithm's role in shaping content visibility

Innovations of This Paper

  1. Bidirectional Analysis: First systematic comparison of creator supply and audience demand
  2. Temporal Dimension: Emphasizes importance of temporal dynamics, challenging content-centric paradigm
  3. Quantification Methods: Proposes operational measurement tools such as difference index
  4. Practice-Oriented: Provides concrete strategic recommendations rather than purely theoretical analysis

Conclusions and Discussion

Main Conclusions

  1. Temporal Factors Trump Content: For audiences, "when" has greater impact on engagement than "what"
  2. Systematic Supply-Demand Mismatch: Persistent differences exist between creator preferences and audience needs
  3. Significant Topic Variations: Supply-demand matching levels vary dramatically across topics
  4. Traditional Paradigm Requires Revision: Content quality is not the sole or primary driver of popularity

Limitations

  1. Platform Limitations: Based solely on TED platform; generalizability requires verification
  2. Incomplete Variables: Does not account for likes, shares, and other engagement metrics
  3. Interaction Effects: Model convergence issues limit analysis of topic-time interaction terms
  4. Causal Relationships: Correlation analysis cannot establish causality

Future Directions

  1. Multi-Platform Validation: Extension to YouTube, podcasts, and other platforms
  2. Interaction Effect Modeling: Improved statistical models for handling complex interactions
  3. Real-Time Prediction Systems: Development of popularity prediction tools based on temporal dynamics
  4. Content Optimization Strategies: Research on narrative structure and expression optimization

In-Depth Evaluation

Strengths

  1. Strong Methodological Innovation: Novel difference index concept provides quantitative tools for supply-demand analysis
  2. Large Data Scale: 17-year span with 4,475 samples provides sufficient statistical power
  3. Counter-Intuitive Findings: Challenges content-centric paradigm with temporal priority hypothesis
  4. High Practical Value: Provides specific actionable recommendations for content creators
  5. Comprehensive Analysis: Combines qualitative and quantitative methods with multi-angle verification

Weaknesses

  1. Weak Theoretical Foundation: Lacks deep mechanistic explanation for why temporal factors are more important
  2. Method Limitations: Subjective nature of LDA topic count selection may affect result stability
  3. External Validity Issues: TED platform's unique characteristics may limit generalizability
  4. Variable Omission: Overlooks important factors such as speaker reputation and video quality
  5. Insufficient Causal Inference: Primarily based on correlation analysis lacking causal identification strategies

Impact

  1. Academic Contribution: Provides new analytical framework for digital platform research
  2. Practical Value: Direct guidance for content marketing and platform operations
  3. Interdisciplinary Significance: Connects communication studies, computational linguistics, and consumer behavior
  4. Policy Implications: Provides data support for platform governance and content regulation

Applicable Scenarios

  1. Content Platforms: Content strategy development for YouTube, Bilibili, and similar platforms
  2. Marketing Domain: Brand content marketing timing and topic planning
  3. Academic Research: Empirical research in digital communication and consumer behavior
  4. Platform Governance: Optimization of content recommendation algorithms and bias identification

References

This paper cites 89 relevant references, including:

  • Classic social network analysis literature (Kwak et al., 2010)
  • Topic modeling methodology papers (Blei et al., 2003)
  • Selective exposure theory literature (Stroud, 2010)
  • Digital communication empirical research (Cinelli et al., 2021)

Overall Assessment: This is an innovative and practically valuable research paper that challenges traditional content-driven paradigms through large-scale data analysis and proposes a new temporal-priority perspective. While there remains room for improvement in theoretical depth and methodological refinement, its core findings have significant implications for both academic and practical communities.