Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents. VIDEE instantiates a human-agent collaroration workflow consisting of three stages: (1) Decomposition, which incorporates a human-in-the-loop Monte-Carlo Tree Search algorithm to support generative reasoning with human feedback, (2) Execution, which generates an executable text analytics pipeline, and (3) Evaluation, which integrates LLM-based evaluation and visualizations to support user validation of execution results. We conduct two quantitative experiments to evaluate VIDEE's effectiveness and analyze common agent errors. A user study involving participants with varying levels of NLP and text analytics experience -- from none to expert -- demonstrates the system's usability and reveals distinct user behavior patterns. The findings identify design implications for human-agent collaboration, validate the practical utility of VIDEE for non-expert users, and inform future improvements to intelligent text analytics systems.
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
- Paper ID: 2506.21582
- Title: VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
- Authors: Sam Yu-Te Lee, Chenyang Ji, Shicheng Wen, Lifu Huang, Dongyu Liu, Kwan-Liu Ma
- Classification: cs.CL cs.AI cs.HC
- Publication Date: October 13, 2025 (arXiv v4)
- Paper Link: https://arxiv.org/abs/2506.21582
Text analytics traditionally requires expertise in natural language processing (NLP) or text analysis, presenting technical barriers for entry-level analysts. Recent advances in large language models (LLMs) have transformed the NLP landscape by enabling more accessible and automated text analytics (such as topic detection, summarization, information extraction, etc.). This paper introduces the VIDEE system, which enables entry-level data analysts to collaborate with intelligent agents for advanced text analytics. VIDEE instantiates a three-stage human-in-the-loop workflow: (1) Decomposition stage, combining human-in-the-loop Monte Carlo Tree Search (MCTS) to support generative reasoning with human feedback; (2) Execution stage, generating executable text analytics pipelines; (3) Evaluation stage, integrating LLM-based evaluation and visualization to support user verification of execution results.
Traditional text analytics faces four major challenges:
- Large Decomposition Space Problem: The flexibility of prompting allows multiple decomposition approaches through different subtask combinations to achieve objectives. Analysts must balance subtask difficulty against overall pipeline robustness.
- Technical Knowledge Barrier: Analysts possess varying levels of technical knowledge, particularly regarding LLMs. The LLM-related field is rapidly evolving, and analysts may struggle to keep pace with the latest technologies.
- Implementation and Experimentation Difficulty: Constructing and implementing text analytics pipelines requires substantial engineering effort, including handling input/output formats, intermediate data transformations, and parameter analysis.
- Evaluation Challenges: Evaluating LLM-based text analytics pipelines requires unique evaluation methodologies that are not yet widely established.
These challenges motivate the need for an agent system to support text analysts. Given user objectives and datasets, an agent with sufficient technical knowledge can automatically decompose objectives, search the large decomposition space, and generate text analytics plans, then implement and execute pipelines, and finally evaluate results.
- Proposed Three-Stage Human-in-the-Loop Workflow: Designed a complete workflow encompassing Decomposition, Execution, and Evaluation to achieve complex text analytics objectives.
- Developed VIDEE System: Implemented an agent system with a visual interface that enables data analysts to perform text analytics in a code-free environment.
- Technical Innovations:
- Human-in-the-loop decomposition algorithm based on Monte Carlo Tree Search (MCTS)
- Conceptual framework of analysis units to handle data structure variations
- Evaluation mechanism integrating LLM judges with visualization
- Empirical Research Findings: Through systematic evaluation and user studies, provided new insights into agent systems and human-AI collaboration.
Input: User objectives (natural language description) and text datasets
Output: Complete text analytics pipeline and its execution results
Constraints: Support code-free environment, accommodate users with varying technical levels
- Objective: Decompose user objectives into sequences of semantic tasks
- Core Algorithm: Enhanced Monte Carlo Tree Search (MCTS)
- Human-AI Collaboration: Humans monitor the search process while agents explore possible pipeline options
MCTS Algorithm Enhancements:
- Utilize LLM judges as reward functions
- Define three evaluation criteria: complexity, coherence, and importance
- Support human feedback to adjust search direction
- Replace random rollout with comprehensive reward calculation
- Transformation Process: Semantic tasks → Primitive tasks → Executable pipelines
- Compilation Process: Generate input/output patterns, algorithm selection, and hyperparameters
- Technical Support: Execution graph construction based on LangGraph
Analysis Unit Conceptual Framework:
- Define input units for each primitive task
- Adopt MapReduce paradigm to handle data structure variations
- Automatically create new analysis units
- Evaluation Method: LLM judge-based evaluation without ground truth labels
- Visualization: Bar charts and extended topic radial graphs
- Automatic Recommendation: System recommends three evaluation criteria for each task
- Combining Generative Reasoning with MCTS: Compared to the greedy strategy of beam search, MCTS's backpropagation provides backward feedback, making it more suitable for text analytics pipeline planning.
- Analysis Unit Framework: Automatically handles data structure variations through MapReduce paradigm, supporting diverse combinations of primitive tasks.
- Human-AI Collaboration Dynamics: Users serve as managers, LLM judges as advisors, reducing the necessity for LLM alignment.
- Decomposer Evaluation:
- LLooM scenario: HCI paper abstracts dataset
- TnT-LLM scenario: Microsoft Bing Copilot user conversation dataset
- Executor Evaluation:
- Wikipedia dataset (n=210) with ground truth labels as topics
- User Study:
- HCI paper abstracts dataset (100 papers)
- Concept induction task
- Decomposer Evaluation: Arena method using o3-mini model to compare generated pipelines with human-crafted pipelines
- Executor Evaluation: Concept coverage
- User Study: Task completion, user behavior patterns, usability feedback
- Decomposer: Human-crafted pipelines (LLooM and TnT-LLM)
- Executor: BERTopic and GPT-4o baseline methods
- Models: GPT-4o, Claude-3.5-Sonnet, Gemini-2.0
- Framework: AutoGen + LangGraph
- Cost: Average $0.005 per expansion, approximately 7 minutes for complete tree
- Performance: In 10 comparisons, 6 generated pipelines were rated as better (2 for LLooM, 4 for TnT-LLM)
- Advantages: Generated pipelines are more direct and concise
- Limitations: Failed to consider context window constraints for long data processing
- Concept Coverage: 83% vs BERTopic (52.6%) vs GPT-4o (53%)
- Performance Improvement: 30% improvement over baseline methods
- Reliability: Achieves comparable results to LLooM human-crafted pipelines
Positive Feedback:
- Clear and Intuitive Workflow: All participants completed tasks within reasonable timeframes
- Importance of Automation: Even expert-level participants found it more efficient than coding
- Trust in Programmatic Generation: Users trust explicit processes more than black-box systems like ChatGPT
User Behavior Patterns:
- Search Strategy Preference: "Exploit-first then explore" rather than balanced strategies
- Alignment vs. Recommendations: Users view LLM judges as advisors rather than ground truth
- Understanding Role of Analysis Units: Explicit analysis units aid pipeline understanding and error debugging
- Execution Errors: Incorrect analysis unit selection may occur during compilation
- Learning Curve: Requires 30 minutes of training for proficient use
- Technical Dependency: Heavily relies on parallelized cloud-based LLM queries
- Individual Analytics: LLMs excel at text classification, information extraction, and other tasks
- End-to-End Pipelines: TnT-LLM, LLooM, topic analysis frameworks, etc.
- Data cleaning and transformation tools (Data Wrangler)
- Visual data exploration systems (LightVA, InterChat)
- Text analytics presents unique challenges compared to traditional data analysis
- Prompt engineering challenges and solutions
- User control and evaluation requirements in agent systems
- Multi-level abstraction and interactive system design
- Feasibility Validation: The three-stage workflow effectively reduces technical barriers to text analytics
- User Acceptance: Users with varying technical levels can successfully use the system
- Technical Effectiveness: Generated pipeline quality is comparable to expert-crafted pipelines
- User Study Scale: Only 6 participants with sample bias toward graduate students
- Technical Constraints: Dependent on cloud-based LLMs, lacking self-correction mechanisms
- Functional Limitations: Does not support time series analysis, network analysis, or external knowledge bases
- Conversational Agents: Integrate natural language command conversion
- Feedback Loops: Feed execution and evaluation results back to the decomposition stage
- Evaluation Method Extension: Support evaluation for non-text tasks such as clustering analysis
- Open-Source Ecosystem Integration: Integration with tools like LangSmith
- Systematic Innovation: First to propose a complete human-AI collaboration workflow for text analytics
- Technical Depth: MCTS algorithm enhancements and analysis unit framework provide theoretical contributions
- Practical Value: Genuinely reduces technical barriers to text analytics
- Comprehensive Evaluation: Combines quantitative experiments with qualitative user studies
- Scalability: Heavily dependent on cloud APIs with cost and latency concerns
- Error Handling: Lacks robust error detection and recovery mechanisms
- Applicable Scope: Primarily suitable for standard text analytics tasks with limited support for specialized domains
- Academic Contribution: Provides new paradigms for human-AI collaboration and agent system design
- Practical Value: Likely to advance democratization of text analytics
- Reproducibility: Built on open-source frameworks, facilitating reproduction and extension
- Target Users: Entry-level data analysts, social science researchers, journalists
- Application Domains: Customer feedback analysis, academic literature mining, social media analysis
- Usage Conditions: Requires basic data analysis knowledge and 30 minutes of training time
This paper cites 63 related references, primarily including:
- LLM text analytics applications (TnT-LLM, LLooM, etc.)
- Human-AI collaboration interface design (AutoGen, LangGraph, etc.)
- Visualization and interactive system design
- Monte Carlo Tree Search algorithms
Overall Assessment: This is a high-quality systems paper that makes important contributions to the field of human-AI collaborative text analytics. The technical innovations are solid, experimental evaluation is comprehensive, and it has significant implications for advancing the democratization of text analytics tools. Despite some technical limitations, it provides clear directions for future research.