Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance on HPC platforms to guide future research and optimization.
- Paper ID: 2509.12384
- Title: Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant
- Authors: Seth Ockerman, Amal Gueroudji, Song Young Oh, Robert Underwood, Nicholas Chia, Kyle Chard, Robert Ross, Shivaram Venkataraman
- Classification: cs.DC cs.DB
- Publication Venue/Conference: SC'25 Workshop Frontiers in Generative AI for HPC Science and Engineering: Foundations, Challenges, and Opportunities
- Paper Link: https://arxiv.org/abs/2509.12384
Vector databases play a central role in modern AI workflows, particularly in Retrieval-Augmented Generation (RAG) systems, which enhance large language model performance by associating model outputs with external literature. Despite the growing importance of vector databases in AI applications, their performance characteristics on High-Performance Computing (HPC) systems remain poorly understood. This study presents an empirical evaluation of the distributed vector database Qdrant on the Polaris supercomputer at Argonne National Laboratory. The research constructs realistic biological text workloads based on BV-BRC, generates embedding vectors using the Qwen3-Embedding-4B model, and evaluates insertion, index construction, and query performance across up to 32 worker nodes.
- Core Issue: Vector database performance characteristics in HPC environments lack in-depth investigation, with existing research primarily focused on single-GPU or small-scale environments
- Significance: Large-scale scientific computing increasingly executes on HPC systems, requiring vector databases to adapt to HPC's unique characteristics (dedicated interconnects, parallel file systems, deep memory hierarchies, heterogeneous hardware architectures)
- Existing Limitations:
- Absence of performance evaluation of vector databases tailored for HPC environments
- Existing research primarily focuses on functional feature comparisons, lacking empirical performance assessment
- Significant differences between scientific workloads and commercial applications
With the widespread application of AI systems in scientific research, particularly the proliferation of RAG technology, understanding vector database performance on HPC architectures provides important guidance for system design, performance optimization, and future research.
- First HPC Environment Evaluation: Evaluated Qdrant's distributed performance on the Polaris supercomputer, testing insertion, index construction, and query performance across up to 32 worker nodes (spanning 8 compute nodes)
- Realistic Scientific Workloads: Constructed authentic workloads based on BV-BRC biological data and peS2o scientific text corpus
- Performance Characterization Analysis: Provided the first systematic analysis of vector database performance characteristics on HPC platforms
- Open Dataset Release: Published scientific embedding datasets and query workloads for future research
- Practical Guidance: Provided actionable recommendations and future research directions based on deployment experience
This study constructed an end-to-end biological RAG workflow, including:
- Input: 22,723 genome-related terms from BV-BRC
- Processing: Searched for relevant data using each term in the peS2o dataset (8 million full-text papers)
- Output: Retrieval results providing contextual information for RAG systems
The paper compared two primary distributed architectures:
- Stateful Architecture (adopted by Qdrant):
- Each worker node stores state (indices or data) and performs computation
- Worker nodes both "own" and are responsible for a portion of the dataset
- Queries are broadcast to all worker nodes, each executes ANN search followed by result aggregation
- Stateless Architecture (compute-storage separation):
- Worker nodes perform computation but do not persist data
- Data is stored in an independent persistent storage layer
- Data is loaded into cache layer as needed
- Hardware: Polaris Supercomputer
- Per compute node: 2.8 GHz AMD EPYC Milan 7543P 32-core CPU
- Memory: 512 GB DDR4 RAM
- GPU: 4 NVIDIA A100 GPUs
- Interconnect: HPE Slingshot 11, Dragonfly topology
- Software: Qdrant vector database, using HNSW indexing
- Adaptive Embedding Generation Pipeline:
- Batching strategy based on user parameters
- Multi-process parallel processing for full GPU utilization
- Automatic degradation mechanism on OOM errors
- Performance Tuning Methods:
- Systematic tuning of batch sizes and concurrent request numbers
- Asynchronous client implementation optimizing data insertion
- Multi-process allocation strategy optimizing client-server communication
- BV-BRC Biological Data: 22,723 genome-related terms
- peS2o Scientific Text Corpus: 8,293,485 full-text academic papers
- Embedding Model: Qwen3-Embedding-4B (suitable for single 40GB GPU)
- Embedding Generation Time: Model loading, I/O, and inference time
- Data Insertion Time: Insertion performance under different batch sizes and concurrency levels
- Index Construction Time: HNSW index construction scalability
- Query Latency: Query performance under different dataset sizes and worker node counts
- Worker Node Count: 1, 4, 8, 16, 32 nodes
- Data Distribution: Each worker node responsible for approximately 80GB/#Workers of data
- Client Configuration: One client allocated per Qdrant worker node, all clients running on a single compute node
- Deployment Strategy: 4 Qdrant worker nodes per machine
| Stage | Average Time (seconds) | Proportion |
|---|
| Model Loading | 28.17 | 1.2% |
| I/O | 7.49 | 0.3% |
| Inference | 2381.97 | 98.5% |
Key Findings: Model inference dominates overall runtime. The batching heuristic successfully prevented memory errors, with fewer than 0.10% of papers requiring sequential processing.
- Optimal Batch Size: 32 (optimized from 468s to 381s)
- Optimal Concurrent Requests: 2 (further optimized to 367s)
- Scalability Performance:
| Worker Nodes | 1 | 4 | 8 | 16 | 32 |
|---|
| Insertion Time | 8.22h | 2.11h | 1.14h | 35.92m | 21.67m |
Key Findings:
- CPU-bound batch transformation limits asyncio concurrency effectiveness
- Multi-process approach more suitable than asyncio for single-client parallel data insertion
- Data insertion rate may become a bottleneck for large-scale HPC workloads
- Maximum Speedup: 21.32× on 32 worker nodes relative to single node
- Scalability Limitations: Only 1.27× speedup from 1 to 4 worker nodes
- Resource Utilization: Single worker node already uses 90-97% of CPU capacity
Key Findings: Deploying multiple Qdrant worker nodes per machine is unnecessary for CPU-saturated index construction. GPU acceleration may be more effective.
- Optimal Query Batch Size: 16 (optimized from 139s to 73s)
- Optimal Concurrent Batch Requests: 2
- Dataset Size Threshold: Increasing worker node count only shows benefits when dataset reaches at least 30GB
- Maximum Speedup: 3.57× (on sufficiently large datasets)
- Communication Overhead: Beyond 4 worker nodes, further cluster expansion yields only marginal improvements
Key Findings: Communication overhead in the query execution model exceeds parallelization benefits on small datasets. Clusters should adaptively scale based on data size.
| System | Parallel Read/Write | Compute-Storage Separation | Load Balancing | Auto-scaling | GPU Indexing | GPU ANN |
|---|
| Vespa | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Vald | ✓ | ✗ | ✓ | ✓ | ✓ | ✓ |
| Weaviate | ✓ | ✗ | ✓ | ✓ | ✓ | ✓ |
| Qdrant | ✓ | ✗ | ✓ | ✓ | ✓ | ✗ |
| Milvus | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
- Existing surveys primarily focus on functional feature comparisons, lacking empirical performance evaluation
- Shen et al. evaluated multiple index types in single-GPU RAG but did not address distributed systems or HPC environments
- Absence of vector database performance research targeting HPC environments
- Embedding Generation Optimization Focus: For datasets fitting within HPC compute node memory, prioritize improving model inference efficiency over I/O or model loading
- Data Insertion Bottleneck: Qdrant's asynchronous approach is limited by CPU-bound tasks in data upload; multi-process may be more suitable for single-client parallelism
- Index Construction Resource Utilization: A single worker node suffices to saturate CPU; GPU acceleration may improve multi-worker node benefits
- Query Performance Threshold: Increasing worker node count only effectively reduces query runtime on sufficiently large datasets
- Single System Evaluation: Only evaluated Qdrant, lacking cross-system comparison
- CPU-Focused Evaluation: Primarily addressed CPU index construction without deep GPU implementation assessment
- Insufficient Variability Analysis: Did not focus on runtime variability and reproducibility
- Workload Limitations: Primarily based on biological workloads, potentially unrepresentative of other scientific domains
- Multi-System Comparative Study: Conduct comprehensive multi-system evaluation on different HPC platforms
- GPU Acceleration Optimization: Deeply investigate GPU-accelerated index construction and query performance
- Adaptive Scaling: Develop systems that adaptively scale based on data size and workload characteristics
- Domain-Specific Optimization: Optimize vector databases for specific requirements of different scientific domains
- Pioneering Research: First systematic evaluation of vector database performance in HPC environments, filling an important research gap
- Realistic Workloads: Uses authentic biological data and scientific literature, providing practical relevance
- Comprehensive Performance Analysis: Covers complete workflow performance from embedding generation to querying
- Practical Value: Provides specific configuration recommendations and performance tuning strategies
- Open Data: Dataset release promotes field advancement
- Limited System Coverage: Only evaluated Qdrant, lacking horizontal comparison
- Insufficient Theoretical Analysis: Primarily based on experimental observations, lacking deep theoretical analysis
- Scalability Limitations: Maximum tested scale of 32 worker nodes may be insufficient for large-scale HPC systems
- Underutilized GPU: Primarily focused on CPU performance, insufficiently exploring GPU acceleration potential
- Academic Contribution: Establishes foundation for vector database research in HPC environments
- Practical Guidance: Provides important deployment reference for HPC centers and scientific computing users
- Benchmark Establishment: Establishes benchmark methodology for vector database performance evaluation in HPC environments
- Future Research Directions: Identifies multiple directions worthy of in-depth investigation
- Large-Scale Scientific Computing: Applicable to scientific research projects requiring vector database deployment in HPC environments
- Bioinformatics: Particularly suitable for genomics and biomedical research involving literature retrieval and knowledge discovery
- RAG System Deployment: Provides performance reference for deploying large-scale RAG systems in HPC environments
- System Optimization: Guides vector database vendors in optimizing HPC environment performance
This study cites 52 relevant references, primarily covering:
- Vector database systems and algorithms
- High-performance computing platforms and architectures
- Embedding models and RAG technology
- Related performance evaluation research
Overall Assessment: This is a pioneering research paper that systematically evaluates distributed vector database performance characteristics in HPC environments for the first time. The research methodology is scientifically rigorous, experimental design is sound, and results possess significant practical value. Despite certain limitations, it establishes an important foundation for this emerging research field and holds significant implications for advancing vector database applications in scientific computing.