2025-11-19T03:22:13.853095

Asking Clarifying Questions for Preference Elicitation With Large Language Models

Montazeralghaem, Tennenholtz, Boutilier et al.
Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge. To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models. Starting from a user profile, the forward process generates clarifying questions to obtain answers and then removes those answers step by step, serving as a way to add ``noise'' to the user profile. The reverse process involves training a model to ``denoise'' the user profile by learning to ask effective clarifying questions. Our results show that our method significantly improves the LLM's proficiency in asking funnel questions and eliciting user preferences effectively.
academic

Asking Clarifying Questions for Preference Elicitation With Large Language Models

Basic Information

  • Paper ID: 2510.12015
  • Title: Asking Clarifying Questions for Preference Elicitation With Large Language Models
  • Authors: Ali Montazeralghaem, Guy Tennenholtz, Craig Boutilier, Ofer Meshi (Google)
  • Classification: cs.AI
  • Conference: GENNEXT@SIGIR'25
  • Paper Link: https://arxiv.org/abs/2510.12015

Abstract

Large Language Models (LLMs) enable recommendation systems to interact with users through open-ended conversational interfaces. To personalize LLM responses, especially when user history is limited, effective preference elicitation is crucial. This paper proposes a novel approach to train LLMs to ask sequentialized clarifying questions that reveal user preferences. The method employs a two-stage process inspired by diffusion models: the forward process generates clarifying questions starting from a user profile and progressively removes answers as "noise"; the reverse process trains the model to "denoise" the user profile by learning to ask effective clarifying questions. Experimental results demonstrate that this approach significantly enhances the LLM's ability to ask funnel-style questions and effectively elicit user preferences.

Research Background and Motivation

Problem Definition

Recommendation systems typically rely on user interaction history to learn preferences, but face challenges in the following scenarios:

  1. Cold-start Problem: Insufficient interaction history
  2. Privacy Constraints: Restrictions on using historical interaction data
  3. Contextual Uncertainty: Current preferences influenced by mood, social environment, and other factors

Research Significance

With the rapid development of LLMs, conversational recommendation systems (CRS) have become feasible, enabling direct preference elicitation questions through which systems can clarify user needs and provide high-quality personalized recommendations.

Limitations of Existing Methods

Simple prompting techniques can guide LLMs to ask acquisition questions at appropriate times, but generating effective sequentialized clarifying questions across domains remains a challenge.

Research Motivation

This paper aims to optimize LLMs' ability to ask high-quality acquisition questions, particularly learning to ask "funnel-style" questions—starting from general concepts and becoming progressively more specific as the conversation advances.

Core Contributions

  1. Innovative Framework: Proposes a two-stage preference elicitation framework inspired by discrete diffusion models
  2. Sequentialized Question Generation: Develops a training method capable of generating effective sequentialized clarifying questions
  3. Funnel-style Dialogue Strategy: Implements a question-asking strategy progressing from general to specific
  4. User Simulator: Constructs a user simulator model for evaluation
  5. Significant Performance Improvement: Validates the method's effectiveness on the MovieLens dataset

Methodology Details

Task Definition

Given a user profile P, the objective is to reconstruct the complete user profile Pₙ from an empty profile P₀ = ∅ through sequentialized questions Q₀, Q₁, ..., Qₙ₋₁ and corresponding answers A₀, A₁, ..., Aₙ₋₁.

Model Architecture

1. Sequentialized Question-Answering Process (SQN)

Uses chain rule and conditional independence assumptions:

p_θ,φ(Pₙ) = ∏ᵢ₌₁ⁿ p(Pᵢ|Pᵢ₋₁; θ, φ)

Where each transition probability decomposes into three components:

p(Pᵢ|Pᵢ₋₁; θ, φ) = p_θ(Qᵢ₋₁|Pᵢ₋₁) × p_φ(Aᵢ₋₁|Qᵢ₋₁, Pᵢ₋₁) × p(Pᵢ|Pᵢ₋₁, Qᵢ₋₁, Aᵢ₋₁)
  • p_θ(Qᵢ₋₁|Pᵢ₋₁): Question generator probability
  • p_φ(Aᵢ₋₁|Qᵢ₋₁, Pᵢ₋₁): User simulator probability
  • p(Pᵢ|Pᵢ₋₁, Qᵢ₋₁, Aᵢ₋₁): Deterministic update function

2. Forward Process: Profile Corruption

  1. Structured Transformation: Converts textual user profiles to JSON format
  2. Label Ordering: Orders labels by degree of generality
  3. Funnel-style Question Generation: Generates question sequences from general to specific
  4. Progressive Information Removal: Progressively removes corresponding information according to question order

Partial user profile definition:

JP_u^t = JP_u \ ⋃ᵢ₌ₜⁿ⁻¹ Tᵢ

3. Reverse Process: Question Learning

Training data construction:

D_u = {(Qₙ₋₁, JP_u^{n-1}), (Qₙ₋₂, JP_u^{n-2}), ..., (Q₀, JP_u^0)}

Technical Innovations

  1. Diffusion Model Inspiration: Analogizes user preference profiles to denoising tasks in discrete diffusion processes
  2. Funnel-style Strategy: Ensures natural progression from general to specific questions through label ordering
  3. Joint Training: Simultaneously optimizes question generator and user simulator
  4. Question History Mechanism: Includes questions and answers in profile updates to avoid repetition

Experimental Setup

Dataset

  • MovieLens Dataset: Widely used in recommendation system research
  • User Profiles: Uses user profiles generated by Jeong et al. and Tennenholtz et al., created by LLMs based on complete rating histories and validated for predictive power over user ratings

Evaluation Metrics

  • ROUGE Score: Measures overlap between generated and ground-truth profiles
  • BLEU Score: Evaluates text generation quality
  • Unanswered Question Percentage: Assesses question relevance

Baseline Methods

  • Non-fine-tuned Gemma model vs. fine-tuned Gemma model
  • Non-fine-tuned Gemini user simulator vs. fine-tuned Gemma user simulator

Implementation Details

  • Base Model: Gemma 7B (28 layers) as question generator and user simulator
  • Data Generation: Gemini 2.0 for high-quality data generation in forward process
  • Fine-tuning Method: Parameter-Efficient Fine-Tuning (PEFT) + LoRA
  • Training Parameters: Batch size 64, learning rate 0.001
  • Question Limit: Maximum 10 questions or until profile matches

Experimental Results

Main Results

Fine-tuning significantly improved model performance:

  • ROUGE Score: Improved from 0.4 to 0.68
  • BLEU Score: Improved from 0.28 to 0.49
  • User Simulator: Fine-tuned Gemma simulator outperformed non-fine-tuned Gemini simulator

Ablation Studies

1. Fine-tuning Effect Analysis

  • Fine-tuned question generator asks more effective sequentialized questions
  • Fine-tuned user simulator answers questions more accurately
  • Percentage of unanswered questions significantly reduced

2. Question Number Effect

  • Best model collects broad information in first 5 rounds
  • Transitions to more specific and detailed questions in rounds 6-7
  • Demonstrates good funnel-style dialogue strategy

3. Question History Effect

  • Adding question history improves performance in fine-tuned models
  • Question history reduces performance in non-fine-tuned models
  • Question history helps avoid repetitive questioning

4. Fine-tuning Steps Impact

  • More fine-tuning steps (40,000) yield better performance
  • Progressive improvement across 4,000, 28,000, and 40,000 steps

Case Analysis

Funnel-style Question Analysis

Weighted Ranking (WR) analysis reveals:

  • Early Questions: Broad concepts like Genre, Film Era, Decade
  • Mid-stage Questions: Specific concepts like Directors, Visual Style, Tone
  • Late Questions: Detailed concepts like Special Effects, Humor, Atmosphere

This validates that the model learned to progress from broad to specific question-asking strategies.

Experimental Findings

  1. Synergistic Effect: Joint optimization of question generator and user simulator produces synergistic effects
  2. Sequentialized Strategy: Funnel-style questioning is more effective than random questioning
  3. Context Utilization: Including question history helps avoid repetition and improves dialogue quality

Main Research Directions

  1. Conversational Recommendation Systems: Preference elicitation techniques in CRS
  2. Clarifying Question Generation: Teaching language models to ask clarifying questions
  3. Bayesian Optimization Methods: Natural language preference elicitation frameworks like PEBOL
  4. Active Preference Learning: Algorithms using LLMs and probabilistic reasoning

Advantages of This Work

  • First application of diffusion model ideas to preference elicitation
  • Proposes systematic funnel-style question generation strategy
  • Simultaneously optimizes both question generation and user simulation components

Conclusions and Discussion

Main Conclusions

  1. The diffusion model-inspired two-stage framework effectively trains LLMs to ask high-quality clarifying questions
  2. Funnel-style questioning strategy significantly outperforms random questioning
  3. Joint optimization of question generator and user simulator produces synergistic effects

Limitations

  1. Data Dependency: Relies on high-quality user profile data
  2. Domain Specificity: Primarily validated in movie recommendation domain
  3. Simulated Environment: Evaluation mainly based on user simulator rather than real users
  4. Computational Cost: Requires substantial computational resources for fine-tuning

Future Directions

  1. Extension to more recommendation domains
  2. Validation with real user interactions
  3. Exploration of more efficient training strategies
  4. Integration of multimodal information

In-depth Evaluation

Strengths

  1. Methodological Innovation: Cleverly applies diffusion model ideas to dialogue systems with novel and well-motivated concepts
  2. Technical Completeness: Provides comprehensive training framework including data generation, model training, and evaluation
  3. Experimental Sufficiency: Comprehensive ablation studies validate effectiveness of each component
  4. Practical Value: Addresses real problems in recommendation systems with strong application potential

Weaknesses

  1. Evaluation Limitations: Primarily relies on simulated environment, lacking real user interaction validation
  2. Domain Limitations: Validated only in movie recommendation domain, generalization capability needs verification
  3. Baseline Comparison: Lacks direct comparison with other advanced preference elicitation methods
  4. Theoretical Analysis: Lacks in-depth analysis of theoretical properties of the method

Impact

  1. Academic Contribution: Provides new research directions for conversational recommendation systems
  2. Practical Value: Can be directly applied to real recommendation systems
  3. Reproducibility: Provides detailed implementation details facilitating reproduction

Applicable Scenarios

  1. Cold-start Recommendation: Particularly suitable for new user preference elicitation
  2. Conversational Systems: Can be integrated into various conversational recommendation systems
  3. Personalized Services: Suitable for scenarios requiring rapid user preference understanding
  4. Multi-turn Interaction: Appropriate for applications requiring progressive information collection

References

The paper cites 31 related works covering multiple relevant fields including conversational recommendation systems, large language models, diffusion models, and preference elicitation, providing solid theoretical foundation for this research.


Overall Assessment: This is a high-quality research paper that innovatively applies diffusion model ideas to preference elicitation problems, proposes a comprehensive solution, and validates its effectiveness through experiments. Despite some limitations, its technical contributions and practical value make it an important advance in the field of conversational recommendation systems.