Constrained by the cost and ethical concerns of involving real seekers in AI-driven mental health, researchers develop LLM-based conversational agents (CAs) with tailored configurations, such as profiles, symptoms, and scenarios, to simulate seekers. While these efforts advance AI in mental health, achieving more realistic seeker simulation remains hindered by two key challenges: dynamic evolution and multi-session memory. Seekers' mental states often fluctuate during counseling, which typically spans multiple sessions. To address this, we propose AnnaAgent, an emotional and cognitive dynamic agent system equipped with tertiary memory. AnnaAgent incorporates an emotion modulator and a complaint elicitor trained on real counseling dialogues, enabling dynamic control of the simulator's configurations. Additionally, its tertiary memory mechanism effectively integrates short-term and long-term memory across sessions. Evaluation results, both automated and manual, demonstrate that AnnaAgent achieves more realistic seeker simulation in psychological counseling compared to existing baselines. The ethically reviewed and screened code can be found on https://github.com/sci-m-wang/AnnaAgent.
- Paper ID: 2506.00551
- Title: AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation
- Authors: Ming Wang, Peidong Wang, Lin Wu, Xiaocui Yang, Daling Wang, Shi Feng, Yuxin Chen, Bixuan Wang, Yifei Zhang
- Classification: cs.CL cs.AI
- Publication Date: June 10, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2506.00551
Due to cost and ethical concerns associated with involving real help-seekers in AI-driven mental health research, researchers have developed Large Language Model (LLM)-based conversational agents (CA) to simulate help-seekers using customized configurations such as personal profiles, symptoms, and scenarios. Although these efforts advance AI applications in mental health, achieving more realistic help-seeker simulation faces two critical challenges: dynamic evolution and multi-session memory. Help-seekers' psychological states frequently fluctuate throughout the counseling process, which typically spans multiple sessions. To address this issue, this paper proposes AnnaAgent, an emotion and cognition dynamic agent system equipped with three-level memory. AnnaAgent integrates an emotion regulator and chief complaint guide trained on real counseling dialogues, enabling dynamic control of the simulator's configuration. Furthermore, its three-level memory mechanism effectively integrates short-term and long-term memory across sessions. Evaluation results demonstrate that AnnaAgent achieves more realistic help-seeker simulation in psychological counseling compared to existing baselines.
The core problem this research addresses is how to more realistically simulate help-seeker behavior in AI-driven mental health research. Specifically:
- Cost and Ethical Constraints: Involving large numbers of real help-seekers in research is not only costly but may also raise ethical concerns
- Limitations of Existing Simulation Methods: Current LLM-based conversational agents exhibit problems such as flat emotional expression and excessive acceptance of suggestions when simulating help-seekers
- Lack of Dynamism: Existing methods cannot simulate emotional fluctuations and cognitive changes that help-seekers experience during counseling
- Absence of Multi-Session Memory: Psychological counseling is typically a long-term, multi-session process, yet existing methods lack cross-session memory mechanisms
Mental health issues represent a critical challenge facing contemporary society, while the number of trained therapists is limited. AI technology holds tremendous potential for mental health support but requires more realistic help-seeker simulation to:
- Construct datasets and evaluate effectiveness
- Train psychological counselors
- Conduct psychological research and experiments
Through literature review, the authors identified the following problems with existing help-seeker simulation methods:
- Static Configuration: Emotions and symptom cognition remain unchanged throughout the counseling process
- Lack of Memory Mechanisms: Inability to handle dialogues involving content from previous sessions
- Unrealistic Behavior: Tendency to agree with suggestions, excessive compliance, and flat emotional expression
- First to Propose two key challenges—dynamic evolution and multi-session memory—and formalize dynamic evolution as changes in emotion and chief complaint, while categorizing multi-session memory into different stages
- Designed the AnnaAgent System: An emotion and cognition dynamic agent system with three-level memory that simulates dynamic evolution in counseling by controlling changes in emotion and symptom cognition during dialogue
- Validated System Effectiveness: Through experimental evaluation, demonstrated that AnnaAgent can more realistically simulate help-seeker behavior in psychological counseling
The help-seeker simulation task requires assigning role configurations to the LLM, including:
- Profile: Basic personal information (age, gender, occupation, etc.)
- Complaint: Help-seeker's cognition of symptoms and primary concerns
- Situation: Living environment and experienced events
- Status: Physical and psychological-related states
- Emotion: Expected emotional response style
AnnaAgent employs a multi-agent system architecture containing two main agent groups:
Emotion Regulation:
- Emotion Reasoner: Trained on Qwen2.5-7B-Instruct using the D4 dataset to learn emotion evolution patterns from real counseling
- Emotion Perturbator: Introduces random perturbations to avoid fixed emotion change patterns, assigning probability weights based on emotion distance:
P(emoT)=∑Gjw(d(GB,Gj))×∣Gj∣w(d(GT,GB)×∣GT∣)
where GB and GT represent base and target emotion groups respectively, and d(⋅) denotes the distance between emotion groups.
Chief Complaint Guidance:
- Chief Complaint Chain Generation: Generates chief complaint evolution chains based on help-seeker configuration and recent events
- Chief Complaint Switching Control: Determines through algorithms whether to switch to the next stage's chief complaint in the chain
- Real-Time Memory: Dialogue content from the current session
- Short-Term Memory: Recent events and state changes, captured through self-report scales
- Long-Term Memory: Dialogue and scale records from previous sessions, scheduled through Agentic RAG
- Dynamic Evolution Modeling: First to formalize help-seeker dynamic changes as evolution across two dimensions: emotion and chief complaint
- Three-Level Memory Mechanism: A temporally-stratified memory system designed based on memory theory
- Data-Driven Evolution Learning: Trains emotion and chief complaint change models based on real counseling data
- Multi-Agent Coordination: Achieves complex dynamic control and memory scheduling through inter-agent collaboration
- D4 Dataset: Chinese depression diagnosis-oriented dialogue dataset
- DAIC-WOZ Dataset: English mental health dialogue dataset
- Data annotation performed using GPT-4o, with chief complaint chain data reviewed by 3 psychology experts
- Anthropomorphism: Evaluates consistency between simulator utterances and real help-seekers using BERT-score
- Personality Fidelity: Designs interview questions and evaluates configuration matching using G-Eval scoring
- Previous Session Cognition Accuracy: Assesses the effectiveness of long-term memory
Three baseline methods were selected:
- Chen et al. (2023a)
- Duro et al. (2024)
- Qiu and Lan (2024)
- Backbone Model: Qwen2.5-7B-Instruct
- Counselor Models: PsycoLLM, EmoLLM, SoulChat
- Emotion Classification: Based on GoEmotions emotion categories
- Assessment Tools: SCL-90, BDI, SAAS and other self-report scales
Anthropomorphism Comparison:
On D4 and DAIC datasets, AnnaAgent achieved best or second-best performance when interacting with different counselor models:
| Dataset | Counselor | Chen et al. | Duro et al. | Qiu & Lan | AnnaAgent |
|---|
| D4 | PsycoLLM | 0.6293 | 0.6455 | 0.6866 | 0.6691 |
| D4 | EmoLLM | 0.6529 | 0.6469 | 0.6449 | 0.6649 |
| DAIC | PsycoLLM | 0.3458 | 0.4864 | 0.3426 | 0.4910 |
Personality Fidelity: AnnaAgent overall outperformed baseline methods in G-Eval scoring.
- Dynamic Evolution Ablation: Removing the dynamic evolution component reduced F1 score from 0.6691 to 0.6144 (D4 dataset)
- Long-Term Memory Ablation: Removing long-term memory significantly decreased the virtual help-seeker's cognition accuracy regarding previous sessions
Experiments on GPT-4o-mini and Llama-3.1-8B-Instruct demonstrated that AnnaAgent exhibits good cross-model stability, with relative standard deviations below 10%.
- Dialogue Systems: ChatCounselor, Serena and others provide mental health counseling support
- Diagnosis and Treatment: Improve diagnostic accuracy, treatment effectiveness, and service accessibility
- Standardized Patients: Real people role-playing, higher cost but greater realism
- Virtual Help-Seekers: Lower cost but insufficient realism
- Role Knowledge Construction: Through fine-grained role information and emotion annotation
- Personalized Training: Conditional instruction tuning combined with personality trait information
- AnnaAgent successfully addresses the challenges of dynamic evolution and multi-session memory in help-seeker simulation
- Emotion and chief complaint evolution models trained on real data effectively enhance simulation realism
- The three-level memory mechanism demonstrates excellent performance in handling cross-session information
- Formalization Simplification: Certain formalization simplification of the dynamic evolution process was necessary for technical implementation convenience
- Crude Memory System: The coordination mechanism of the three-level memory system remains relatively preliminary
- Data Dependency: Highly dependent on the quality and quantity of real counseling data
- More fine-grained dynamic evolution modeling
- More sophisticated multi-session memory coordination mechanisms
- Extension to more mental health scenarios and languages
- Accurate Problem Identification: First to explicitly propose two core challenges—dynamic evolution and multi-session memory
- Reasonable Method Design: Clear multi-agent system architecture with well-defined module functions
- Comprehensive Experiments: Includes main results, ablation studies, and generalization verification
- High Practical Value: Provides important tools for mental health AI research
- Limited Theoretical Depth: Lacks deep psychological theoretical analysis of dynamic evolution mechanisms
- Single Evaluation Metrics: Primarily relies on automated metrics, lacking human evaluation by professional psychologists
- Insufficient Ethical Consideration: While ethical review is mentioned, discussion of potential misuse risks is insufficient
- Academic Contribution: Provides new research directions and benchmarks for the AI mental health field
- Practical Value: Applicable to counselor training, psychological research, and multiple other scenarios
- Reproducibility: Provides open-source code facilitating research reproduction and extension
- Psychological counselor training and evaluation
- Mental health dialogue system development
- Psychological research and experimentation
- Mental health data augmentation
The paper cites abundant related work, including:
- Survey works on AI applications in mental health
- Research on LLM role-playing and multi-agent systems
- Research on psychological counseling and standardized patients
- Literature on memory theory and RAG technology
Overall Assessment: This is an important paper in the AI mental health field that systematically addresses key technical challenges in help-seeker simulation. While there is room for improvement in theoretical depth and evaluation methods, its innovative approach and practical value make it a significant advance in the field.