2025-11-19T03:22:13.853095

Asking Clarifying Questions for Preference Elicitation With Large Language Models

Montazeralghaem, Tennenholtz, Boutilier et al.

Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge. To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models. Starting from a user profile, the forward process generates clarifying questions to obtain answers and then removes those answers step by step, serving as a way to add ``noise'' to the user profile. The reverse process involves training a model to ``denoise'' the user profile by learning to ask effective clarifying questions. Our results show that our method significantly improves the LLM's proficiency in asking funnel questions and eliciting user preferences effectively.

academic

Asking Clarifying Questions for Preference Elicitation With Large Language Models

Basic Information

Paper ID: 2510.12015
Title: Asking Clarifying Questions for Preference Elicitation With Large Language Models
Authors: Ali Montazeralghaem, Guy Tennenholtz, Craig Boutilier, Ofer Meshi (Google)
Classification: cs.AI
Conference: GENNEXT@SIGIR'25
Paper Link: https://arxiv.org/abs/2510.12015

Abstract

Large Language Models (LLMs) enable recommendation systems to interact with users through open-ended conversational interfaces. To personalize LLM responses, especially when user history is limited, effective preference elicitation is crucial. This paper proposes a novel approach to train LLMs to ask sequentialized clarifying questions that reveal user preferences. The method employs a two-stage process inspired by diffusion models: the forward process generates clarifying questions starting from a user profile and progressively removes answers as "noise"; the reverse process trains the model to "denoise" the user profile by learning to ask effective clarifying questions. Experimental results demonstrate that this approach significantly enhances the LLM's ability to ask funnel-style questions and effectively elicit user preferences.

Research Background and Motivation

Problem Definition

Recommendation systems typically rely on user interaction history to learn preferences, but face challenges in the following scenarios:

Cold-start Problem: Insufficient interaction history
Privacy Constraints: Restrictions on using historical interaction data
Contextual Uncertainty: Current preferences influenced by mood, social environment, and other factors

Research Significance

With the rapid development of LLMs, conversational recommendation systems (CRS) have become feasible, enabling direct preference elicitation questions through which systems can clarify user needs and provide high-quality personalized recommendations.

Limitations of Existing Methods

Simple prompting techniques can guide LLMs to ask acquisition questions at appropriate times, but generating effective sequentialized clarifying questions across domains remains a challenge.

Research Motivation

This paper aims to optimize LLMs' ability to ask high-quality acquisition questions, particularly learning to ask "funnel-style" questions—starting from general concepts and becoming progressively more specific as the conversation advances.

Core Contributions

Innovative Framework: Proposes a two-stage preference elicitation framework inspired by discrete diffusion models
Sequentialized Question Generation: Develops a training method capable of generating effective sequentialized clarifying questions
Funnel-style Dialogue Strategy: Implements a question-asking strategy progressing from general to specific
User Simulator: Constructs a user simulator model for evaluation
Significant Performance Improvement: Validates the method's effectiveness on the MovieLens dataset

Methodology Details

Task Definition

Given a user profile P, the objective is to reconstruct the complete user profile Pₙ from an empty profile P₀ = ∅ through sequentialized questions Q₀, Q₁, ..., Qₙ₋₁ and corresponding answers A₀, A₁, ..., Aₙ₋₁.

Model Architecture

1. Sequentialized Question-Answering Process (SQN)

Uses chain rule and conditional independence assumptions:

p_θ,φ(Pₙ) = ∏ᵢ₌₁ⁿ p(Pᵢ|Pᵢ₋₁; θ, φ)

Where each transition probability decomposes into three components:

p(Pᵢ|Pᵢ₋₁; θ, φ) = p_θ(Qᵢ₋₁|Pᵢ₋₁) × p_φ(Aᵢ₋₁|Qᵢ₋₁, Pᵢ₋₁) × p(Pᵢ|Pᵢ₋₁, Qᵢ₋₁, Aᵢ₋₁)

p_θ(Qᵢ₋₁|Pᵢ₋₁): Question generator probability
p_φ(Aᵢ₋₁|Qᵢ₋₁, Pᵢ₋₁): User simulator probability
p(Pᵢ|Pᵢ₋₁, Qᵢ₋₁, Aᵢ₋₁): Deterministic update function

2. Forward Process: Profile Corruption

Structured Transformation: Converts textual user profiles to JSON format
Label Ordering: Orders labels by degree of generality
Funnel-style Question Generation: Generates question sequences from general to specific
Progressive Information Removal: Progressively removes corresponding information according to question order

Partial user profile definition:

JP_u^t = JP_u \ ⋃ᵢ₌ₜⁿ⁻¹ Tᵢ

3. Reverse Process: Question Learning

Training data construction:

D_u = {(Qₙ₋₁, JP_u^{n-1}), (Qₙ₋₂, JP_u^{n-2}), ..., (Q₀, JP_u^0)}

Technical Innovations

Diffusion Model Inspiration: Analogizes user preference profiles to denoising tasks in discrete diffusion processes
Funnel-style Strategy: Ensures natural progression from general to specific questions through label ordering
Joint Training: Simultaneously optimizes question generator and user simulator
Question History Mechanism: Includes questions and answers in profile updates to avoid repetition

Experimental Setup

Dataset

MovieLens Dataset: Widely used in recommendation system research
User Profiles: Uses user profiles generated by Jeong et al. and Tennenholtz et al., created by LLMs based on complete rating histories and validated for predictive power over user ratings

Evaluation Metrics

ROUGE Score: Measures overlap between generated and ground-truth profiles
BLEU Score: Evaluates text generation quality
Unanswered Question Percentage: Assesses question relevance

Baseline Methods

Non-fine-tuned Gemma model vs. fine-tuned Gemma model
Non-fine-tuned Gemini user simulator vs. fine-tuned Gemma user simulator

Implementation Details

Base Model: Gemma 7B (28 layers) as question generator and user simulator
Data Generation: Gemini 2.0 for high-quality data generation in forward process
Fine-tuning Method: Parameter-Efficient Fine-Tuning (PEFT) + LoRA
Training Parameters: Batch size 64, learning rate 0.001
Question Limit: Maximum 10 questions or until profile matches

Experimental Results

Main Results

Fine-tuning significantly improved model performance:

ROUGE Score: Improved from 0.4 to 0.68
BLEU Score: Improved from 0.28 to 0.49
User Simulator: Fine-tuned Gemma simulator outperformed non-fine-tuned Gemini simulator

Ablation Studies

1. Fine-tuning Effect Analysis

Fine-tuned question generator asks more effective sequentialized questions
Fine-tuned user simulator answers questions more accurately
Percentage of unanswered questions significantly reduced

2. Question Number Effect

Best model collects broad information in first 5 rounds
Transitions to more specific and detailed questions in rounds 6-7
Demonstrates good funnel-style dialogue strategy

3. Question History Effect

Adding question history improves performance in fine-tuned models
Question history reduces performance in non-fine-tuned models
Question history helps avoid repetitive questioning

4. Fine-tuning Steps Impact

More fine-tuning steps (40,000) yield better performance
Progressive improvement across 4,000, 28,000, and 40,000 steps

Case Analysis

Funnel-style Question Analysis

Weighted Ranking (WR) analysis reveals:

Early Questions: Broad concepts like Genre, Film Era, Decade
Mid-stage Questions: Specific concepts like Directors, Visual Style, Tone
Late Questions: Detailed concepts like Special Effects, Humor, Atmosphere

This validates that the model learned to progress from broad to specific question-asking strategies.

Experimental Findings

Synergistic Effect: Joint optimization of question generator and user simulator produces synergistic effects
Sequentialized Strategy: Funnel-style questioning is more effective than random questioning
Context Utilization: Including question history helps avoid repetition and improves dialogue quality

Main Research Directions

Conversational Recommendation Systems: Preference elicitation techniques in CRS
Clarifying Question Generation: Teaching language models to ask clarifying questions
Bayesian Optimization Methods: Natural language preference elicitation frameworks like PEBOL
Active Preference Learning: Algorithms using LLMs and probabilistic reasoning

Advantages of This Work

First application of diffusion model ideas to preference elicitation
Proposes systematic funnel-style question generation strategy
Simultaneously optimizes both question generation and user simulation components

Conclusions and Discussion

Main Conclusions

The diffusion model-inspired two-stage framework effectively trains LLMs to ask high-quality clarifying questions
Funnel-style questioning strategy significantly outperforms random questioning
Joint optimization of question generator and user simulator produces synergistic effects

Limitations

Data Dependency: Relies on high-quality user profile data
Domain Specificity: Primarily validated in movie recommendation domain
Simulated Environment: Evaluation mainly based on user simulator rather than real users
Computational Cost: Requires substantial computational resources for fine-tuning

Future Directions

Extension to more recommendation domains
Validation with real user interactions
Exploration of more efficient training strategies
Integration of multimodal information

In-depth Evaluation

Strengths

Methodological Innovation: Cleverly applies diffusion model ideas to dialogue systems with novel and well-motivated concepts
Technical Completeness: Provides comprehensive training framework including data generation, model training, and evaluation
Experimental Sufficiency: Comprehensive ablation studies validate effectiveness of each component
Practical Value: Addresses real problems in recommendation systems with strong application potential

Weaknesses

Evaluation Limitations: Primarily relies on simulated environment, lacking real user interaction validation
Domain Limitations: Validated only in movie recommendation domain, generalization capability needs verification
Baseline Comparison: Lacks direct comparison with other advanced preference elicitation methods
Theoretical Analysis: Lacks in-depth analysis of theoretical properties of the method

Impact

Academic Contribution: Provides new research directions for conversational recommendation systems
Practical Value: Can be directly applied to real recommendation systems
Reproducibility: Provides detailed implementation details facilitating reproduction

Applicable Scenarios

Cold-start Recommendation: Particularly suitable for new user preference elicitation
Conversational Systems: Can be integrated into various conversational recommendation systems
Personalized Services: Suitable for scenarios requiring rapid user preference understanding
Multi-turn Interaction: Appropriate for applications requiring progressive information collection

References

The paper cites 31 related works covering multiple relevant fields including conversational recommendation systems, large language models, diffusion models, and preference elicitation, providing solid theoretical foundation for this research.

Overall Assessment: This is a high-quality research paper that innovatively applies diffusion model ideas to preference elicitation problems, proposes a comprehensive solution, and validates its effectiveness through experiments. Despite some limitations, its technical contributions and practical value make it an important advance in the field of conversational recommendation systems.