2025-11-25T10:34:17.502250

From Rational Answers to Emotional Resonance: The Role of Controllable Emotion Generation in Language Models

Dong, Jin, Yang et al.
Purpose: Emotion is a fundamental component of human communication, shaping understanding, trust, and engagement across domains such as education, healthcare, and mental health. While large language models (LLMs) exhibit strong reasoning and knowledge generation capabilities, they still struggle to express emotions in a consistent, controllable, and contextually appropriate manner. This limitation restricts their potential for authentic human-AI interaction. Methods: We propose a controllable emotion generation framework based on Emotion Vectors (EVs) - latent representations derived from internal activation shifts between neutral and emotion-conditioned responses. By injecting these vectors into the hidden states of pretrained LLMs during inference, our method enables fine-grained, continuous modulation of emotional tone without any additional training or architectural modification. We further provide theoretical analysis proving that EV steering enhances emotional expressivity while maintaining semantic fidelity and linguistic fluency. Results: Extensive experiments across multiple LLM families show that the proposed approach achieves consistent emotional alignment, stable topic adherence, and controllable affect intensity. Compared with existing prompt-based and fine-tuning-based baselines, our method demonstrates superior flexibility and generalizability. Conclusion: Emotion Vector (EV) steering provides an efficient and interpretable means of bridging rational reasoning and affective understanding in large language models, offering a promising direction for building emotionally resonant AI systems capable of more natural human-machine interaction.
academic

From Rational Answers to Emotional Resonance: The Role of Controllable Emotion Generation in Language Models

Basic Information

  • Paper ID: 2502.04075
  • Title: From Rational Answers to Emotional Resonance: The Role of Controllable Emotion Generation in Language Models
  • Authors: Yurui Dong, Luozhijie Jin, Yao Yang, Bingjie Lu, Jiaxi Yang, Zhi Liu
  • Classification: cs.CL (Computation and Language)
  • Publication Date: February 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2502.04075

Abstract

This paper addresses the limitations of large language models (LLMs) in emotional expression by proposing a controllable emotion generation framework based on Emotion Vectors (EVs). The method constructs latent representations by extracting internal activation differences between neutral and emotion-conditioned responses, and injects these vectors into the hidden states of pretrained LLMs during inference to achieve fine-grained continuous modulation of emotional tone without requiring additional training or architectural modifications. Theoretical analysis demonstrates that EV guidance enhances emotional expressiveness while maintaining semantic fidelity and linguistic fluency.

Research Background and Motivation

Problem Definition

Although current large language models excel at reasoning and knowledge generation, they exhibit significant limitations in emotional expression:

  1. Inconsistent Emotional Expression: Model-generated content is either emotionally neutral, tonally inconsistent, or emotionally uncontrollable
  2. Lack of Emotional Intelligence: In domains such as education, healthcare, and mental health, purely factual yet emotionally cold responses often fail to meet user expectations
  3. Limited Application Scenarios: The deficiency in emotional expression capability restricts the application of AI systems in human-computer interaction scenarios requiring emotional resonance

Research Significance

Emotion is a fundamental component of human communication, playing crucial roles in multiple critical domains:

  • Education: Teacher encouragement and patience significantly influence student motivation and persistence
  • Healthcare: Physician emotional engagement and empathetic communication improve patient compliance, satisfaction, and even clinical recovery trajectories
  • Mental Health: Emotional resonance capability is a prerequisite for providing meaningful support

Limitations of Existing Approaches

  1. Instruction Tuning Methods: Often lack flexibility and struggle to adapt to diverse applications and model architectures
  2. Prompting Strategies: Depend on carefully designed templates and external evaluation modules
  3. Inference-Time Vector Editing: Primarily focus on the last token position, lack global significance, and are difficult to apply to tasks requiring high generalizability such as emotion

Core Contributions

  1. Proposed a controllable emotion generation framework based on Emotion Vectors (EV): Extracts reusable and efficient emotion vectors by comparing model responses to emotion-induced and neutral prompts
  2. Achieved unsupervised, highly robust emotion control: Without requiring training or architectural changes, with global consistency
  3. Provided rigorous theoretical analysis: Demonstrates that EV guidance enhances emotional expression while maintaining semantic fidelity
  4. Constructed specialized evaluation datasets: EmotionQuery and EmotionQuery+ datasets for emotion generation assessment
  5. Enabled continuous fine-grained control: Provides continuous fine-grained control over emotional intensity through scalar scaling, supporting broad applicability across model families

Methodology Details

Task Definition

Given a pretrained language model M and a target emotional state e ∈ {joy, anger, disgust, fear, sadness}, the task objective is to control the emotional tone of generated text by modifying the model's internal representations during inference, while maintaining semantic content and linguistic fluency.

Model Architecture

Emotion Vector Construction

  1. Dataset Construction: Creates the EmotionQuery dataset containing 500 queries, with 100 queries per emotional state
  2. Internal Output Capture: For each query, the model generates internal representations under neutral and emotion settings
    Ōl = (1/T) Σ(t=1 to T) Ol[t]
    
  3. Emotion Offset Measurement: Computes output differences between emotion and neutral settings
    ΔO^(ek)_l = Ō^(emotion(ek))_l - Ō^(neutral)_l
    
  4. Emotion Vector Construction: Averages emotion offsets across the dataset
    EV^(ek)_l = (1/N) Σ(i=1 to N) ΔO^(i,ek)_l
    

Emotion Vector Guidance

During inference, emotion vectors are applied by modifying the hidden states of each layer:

Ĥl = Hl + αEV^(ek)_l

where α is a scaling factor controlling emotional intensity.

Technical Innovations

  1. Global Consistency: Unlike previous methods primarily focusing on sentence-level control, this approach achieves global emotion control
  2. Training-Free: Operates entirely at inference time without modifying model parameters
  3. Continuous Control: Enables continuous adjustment of emotional intensity through scalar α
  4. Additivity: Multiple emotions can be linearly combined: Σk αk EV^(ek)_l

Experimental Setup

Datasets

  1. EmotionQuery: 500 queries covering 5 basic emotions, 100 per emotion
  2. EmotionQuery+ (EQ+): Extended version with 400 queries, including 250 emotion queries and 150 neutral queries

Evaluation Metrics

  1. Sentence Fluency: Computed using Llama 3.1 perplexity
  2. Topic Consistency: Assessed using GPT-4o-mini to evaluate alignment between generated responses and user queries
  3. Emotion Probability Score (EPS): Measures emotion expression probability using bart-large-mnli classifier
  4. Emotion Absolute Score (EAS): GPT-4o-mini provides 0-100 ratings for five basic emotions
  5. Target Emotion Confidence (TEC): Measures classifier confidence in target emotion

Comparison Methods

  • Original model (without EV)
  • EV application at different intensities (-1×EV, 1×EV, 2×EV, 4×EV)
  • Baseline methods based on prompting and fine-tuning

Implementation Details

  • Tested 11 representative large language models, including Llama series, Qwen series, Baichuan2, etc.
  • Used base emotion vectors EVbase (average of all emotion vectors) for general emotion regulation

Experimental Results

Main Results

Fluency and Topic Consistency

  • Perplexity Results: EV application has negligible impact on sentence fluency, and in some cases even shows improvement
  • Topic Consistency: Most models maintain high topic consistency comparable to original responses after EV application

Emotional Expression Capability

  • Emotion Probability Score: After applying 2×EV, most models show significant improvement in EPS, such as Llama3.1, Qwen2, MiniCPM reaching 1.000, 0.9825, 0.9950
  • Emotion Absolute Score: After applying 1×EV, most models' EAS increases by at least 400%, while -1×EV reduces EAS by nearly 90%

Ablation Studies

Effects of Different EV Intensities

Model
Llama2-7B (anger)21.40%45.93%98.07%90.71%
Qwen2.5-7B (anger)14.01%33.36%94.89%95.68%

Results show that 1× and 2× EV significantly enhance emotion alignment, with 4× intensity exhibiting diminishing returns and even slight degradation.

Case Analysis

The paper provides rich case studies demonstrating output variations under different emotional conditions:

  • Anger Condition: Model transitions from neutral response to "I'm so angry and frustrated! I've been busting my butt..."
  • Joy Condition: Generates "I was absolutely over the moon! My heart was bursting with love!"

Experimental Findings

  1. Linear Controllability: Emotional intensity exhibits approximately linear relationship with scaling factor α
  2. Cross-Model Generalization: Method proves effective across models of different architectures and scales
  3. Emotion Specificity: Different emotion vectors reliably guide models to produce corresponding emotional expressions

Theoretical Analysis

Mathematical Foundation

The paper provides rigorous theoretical proofs based on first-order Taylor expansion:

  1. Monotonic Emotion Gain: If the Fisher discriminant direction aligns with EV on average, small positive α monotonically increases target emotion scores
  2. Semantic Preservation: Since EV is constructed from semantically identical but emotionally different prompt pairs, its projection onto semantic gradients approximates zero
  3. Linear Controllability: Linear dependence of emotional intensity on α, with multi-emotion additive composability

Near-Optimal Approximation

In the sense of Fisher Linear Discriminant Analysis, EV construction approaches statistical optimality: under whitening approximation, the optimal Fisher direction is parallel to the mean difference vector.

Emotion Representation and Dialogue Systems

  • Classification approaches (discrete emotions such as joy, sadness, anger)
  • Dimensional approaches (valence-arousal scales)
  • Existing methods are overly complex or require further training

Instruction Tuning and Prompt-Based Emotion Control

  • Fine-tuning methods often lack flexibility and struggle to adapt to diverse applications
  • Prompting strategies depend on carefully designed templates

Inference-Time Vector Editing

  • Existing methods primarily focus on the last token position, lacking global significance
  • Most control vector-related work performs sentence-level control, requiring training

Conclusions and Discussion

Main Conclusions

  1. EV Guidance Provides an Efficient and Interpretable Method: Bridges rational reasoning and emotional understanding in large language models
  2. Achieves Fine-Grained Emotion Control: Enables continuous, controllable emotional adjustment without additional training
  3. Maintains Semantic Fidelity: Both theory and experiments demonstrate that the method enhances emotional expression while maintaining semantic consistency

Limitations

  1. Saturation Effects at High EV Intensities: 4× intensity may lead to repetitive outputs and performance degradation
  2. Model-Dependent EV Magnitude: Certain models (such as Llama-3.1) extract larger EV magnitudes, potentially affecting subsequent decoding
  3. Basic Emotion Limitations: Currently focuses on five basic emotions; handling of complex emotions requires further exploration

Future Directions

  1. Extension to More Complex Emotional States
  2. Optimization of EV Extraction and Application Strategies
  3. Exploration of Multimodal Emotion Control
  4. Investigation of Emotion-Personalization Integration

In-Depth Evaluation

Strengths

  1. Strong Method Innovation: First to propose globally consistent emotion vector guidance method enabling fine-grained emotion control without training
  2. Solid Theoretical Foundation: Provides rigorous mathematical proofs explaining near-optimal approximation from Fisher discriminant analysis perspective
  3. Comprehensive Experiments: Extensive experiments across 11 different models with diverse and reasonable evaluation metrics
  4. High Practical Value: Simple and easy to implement with good cross-model generalization capability

Weaknesses

  1. Limited Emotion Categories: Only considers five basic emotions; handling capability for complex emotional states remains unknown
  2. Cultural Adaptability: Does not account for differences in emotional expression across cultural backgrounds
  3. Long-Text Consistency: Effectiveness in maintaining emotional consistency for long dialogues or document-level contexts requires further verification
  4. Computational Overhead Analysis: Lacks detailed analysis of method computational complexity and inference speed impact

Impact

  1. Academic Contribution: Provides new research paradigm for emotion computing and controllable text generation fields
  2. Practical Value: Broad application prospects in education, healthcare, and mental health domains
  3. Reproducibility: Authors commit to open-sourcing code and datasets, facilitating subsequent research

Applicable Scenarios

  1. Educational AI Assistants: Providing personalized, emotionally appropriate learning support
  2. Medical Dialogue Systems: Enhancing emotional resonance in patient-physician communication
  3. Mental Health Support: Building more empathetic AI counselors
  4. Customer Service Robots: Improving user experience and satisfaction

References

The paper cites abundant related research, primarily including:

  • Emotion Theory Foundations: Ekman's basic emotion model
  • Large Language Models: Mainstream models such as Llama and Qwen series
  • Emotion Computing: MNLI model for emotion classification
  • Vector Editing: Related inference-time intervention methods

Overall Assessment: This is a high-quality research paper proposing an innovative emotion vector guidance method with solid theoretical foundations and comprehensive experimental validation. This work provides an effective technical pathway for constructing AI systems with greater emotional intelligence, possessing significant academic value and practical significance.