2025-11-13T13:37:11.114102

Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

Ockerman, Gueroudji, Oh et al.

Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance on HPC platforms to guide future research and optimization.

academic

Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

Basic Information

Paper ID: 2509.12384
Title: Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant
Authors: Seth Ockerman, Amal Gueroudji, Song Young Oh, Robert Underwood, Nicholas Chia, Kyle Chard, Robert Ross, Shivaram Venkataraman
Classification: cs.DC cs.DB
Publication Venue/Conference: SC'25 Workshop Frontiers in Generative AI for HPC Science and Engineering: Foundations, Challenges, and Opportunities
Paper Link: https://arxiv.org/abs/2509.12384

Abstract

Vector databases play a central role in modern AI workflows, particularly in Retrieval-Augmented Generation (RAG) systems, which enhance large language model performance by associating model outputs with external literature. Despite the growing importance of vector databases in AI applications, their performance characteristics on High-Performance Computing (HPC) systems remain poorly understood. This study presents an empirical evaluation of the distributed vector database Qdrant on the Polaris supercomputer at Argonne National Laboratory. The research constructs realistic biological text workloads based on BV-BRC, generates embedding vectors using the Qwen3-Embedding-4B model, and evaluates insertion, index construction, and query performance across up to 32 worker nodes.

Research Background and Motivation

Problem Definition

Core Issue: Vector database performance characteristics in HPC environments lack in-depth investigation, with existing research primarily focused on single-GPU or small-scale environments
Significance: Large-scale scientific computing increasingly executes on HPC systems, requiring vector databases to adapt to HPC's unique characteristics (dedicated interconnects, parallel file systems, deep memory hierarchies, heterogeneous hardware architectures)
Existing Limitations:
- Absence of performance evaluation of vector databases tailored for HPC environments
- Existing research primarily focuses on functional feature comparisons, lacking empirical performance assessment
- Significant differences between scientific workloads and commercial applications

Research Motivation

With the widespread application of AI systems in scientific research, particularly the proliferation of RAG technology, understanding vector database performance on HPC architectures provides important guidance for system design, performance optimization, and future research.

Core Contributions

First HPC Environment Evaluation: Evaluated Qdrant's distributed performance on the Polaris supercomputer, testing insertion, index construction, and query performance across up to 32 worker nodes (spanning 8 compute nodes)
Realistic Scientific Workloads: Constructed authentic workloads based on BV-BRC biological data and peS2o scientific text corpus
Performance Characterization Analysis: Provided the first systematic analysis of vector database performance characteristics on HPC platforms
Open Dataset Release: Published scientific embedding datasets and query workloads for future research
Practical Guidance: Provided actionable recommendations and future research directions based on deployment experience

Methodology Details

Task Definition

This study constructed an end-to-end biological RAG workflow, including:

Input: 22,723 genome-related terms from BV-BRC
Processing: Searched for relevant data using each term in the peS2o dataset (8 million full-text papers)
Output: Retrieval results providing contextual information for RAG systems

System Architecture

Distributed Vector Database Architecture

The paper compared two primary distributed architectures:

Stateful Architecture (adopted by Qdrant):
- Each worker node stores state (indices or data) and performs computation
- Worker nodes both "own" and are responsible for a portion of the dataset
- Queries are broadcast to all worker nodes, each executes ANN search followed by result aggregation
Stateless Architecture (compute-storage separation):
- Worker nodes perform computation but do not persist data
- Data is stored in an independent persistent storage layer
- Data is loaded into cache layer as needed

Experimental Platform Configuration

Hardware: Polaris Supercomputer
- Per compute node: 2.8 GHz AMD EPYC Milan 7543P 32-core CPU
- Memory: 512 GB DDR4 RAM
- GPU: 4 NVIDIA A100 GPUs
- Interconnect: HPE Slingshot 11, Dragonfly topology
Software: Qdrant vector database, using HNSW indexing

Technical Innovations

Adaptive Embedding Generation Pipeline:
- Batching strategy based on user parameters
- Multi-process parallel processing for full GPU utilization
- Automatic degradation mechanism on OOM errors
Performance Tuning Methods:
- Systematic tuning of batch sizes and concurrent request numbers
- Asynchronous client implementation optimizing data insertion
- Multi-process allocation strategy optimizing client-server communication

Experimental Setup

Datasets

BV-BRC Biological Data: 22,723 genome-related terms
peS2o Scientific Text Corpus: 8,293,485 full-text academic papers
Embedding Model: Qwen3-Embedding-4B (suitable for single 40GB GPU)

Evaluation Metrics

Embedding Generation Time: Model loading, I/O, and inference time
Data Insertion Time: Insertion performance under different batch sizes and concurrency levels
Index Construction Time: HNSW index construction scalability
Query Latency: Query performance under different dataset sizes and worker node counts

Experimental Configuration

Worker Node Count: 1, 4, 8, 16, 32 nodes
Data Distribution: Each worker node responsible for approximately 80GB/#Workers of data
Client Configuration: One client allocated per Qdrant worker node, all clients running on a single compute node
Deployment Strategy: 4 Qdrant worker nodes per machine

Experimental Results

Embedding Generation Performance

Stage	Average Time (seconds)	Proportion
Model Loading	28.17	1.2%
I/O	7.49	0.3%
Inference	2381.97	98.5%

Key Findings: Model inference dominates overall runtime. The batching heuristic successfully prevented memory errors, with fewer than 0.10% of papers requiring sequential processing.

Data Insertion Performance

Parameter Tuning Results

Optimal Batch Size: 32 (optimized from 468s to 381s)
Optimal Concurrent Requests: 2 (further optimized to 367s)
Scalability Performance:

Worker Nodes	1	4	8	16	32
Insertion Time	8.22h	2.11h	1.14h	35.92m	21.67m

Key Findings:

CPU-bound batch transformation limits asyncio concurrency effectiveness
Multi-process approach more suitable than asyncio for single-client parallel data insertion
Data insertion rate may become a bottleneck for large-scale HPC workloads

Index Construction Performance

Maximum Speedup: 21.32× on 32 worker nodes relative to single node
Scalability Limitations: Only 1.27× speedup from 1 to 4 worker nodes
Resource Utilization: Single worker node already uses 90-97% of CPU capacity

Key Findings: Deploying multiple Qdrant worker nodes per machine is unnecessary for CPU-saturated index construction. GPU acceleration may be more effective.

Query Performance

Parameter Tuning

Optimal Query Batch Size: 16 (optimized from 139s to 73s)
Optimal Concurrent Batch Requests: 2

Scalability Analysis

Dataset Size Threshold: Increasing worker node count only shows benefits when dataset reaches at least 30GB
Maximum Speedup: 3.57× (on sufficiently large datasets)
Communication Overhead: Beyond 4 worker nodes, further cluster expansion yields only marginal improvements

Key Findings: Communication overhead in the query execution model exceeds parallelization benefits on small datasets. Clusters should adaptively scale based on data size.

Vector Database System Comparison

System	Parallel Read/Write	Compute-Storage Separation	Load Balancing	Auto-scaling	GPU Indexing	GPU ANN
Vespa	✓	✓	✓	✓	✗	✗
Vald	✓	✗	✓	✓	✓	✓
Weaviate	✓	✗	✓	✓	✓	✓
Qdrant	✓	✗	✓	✓	✓	✗
Milvus	✓	✓	✓	✓	✓	✓

Research Status

Existing surveys primarily focus on functional feature comparisons, lacking empirical performance evaluation
Shen et al. evaluated multiple index types in single-GPU RAG but did not address distributed systems or HPC environments
Absence of vector database performance research targeting HPC environments

Conclusions and Discussion

Main Conclusions

Embedding Generation Optimization Focus: For datasets fitting within HPC compute node memory, prioritize improving model inference efficiency over I/O or model loading
Data Insertion Bottleneck: Qdrant's asynchronous approach is limited by CPU-bound tasks in data upload; multi-process may be more suitable for single-client parallelism
Index Construction Resource Utilization: A single worker node suffices to saturate CPU; GPU acceleration may improve multi-worker node benefits
Query Performance Threshold: Increasing worker node count only effectively reduces query runtime on sufficiently large datasets

Limitations

Single System Evaluation: Only evaluated Qdrant, lacking cross-system comparison
CPU-Focused Evaluation: Primarily addressed CPU index construction without deep GPU implementation assessment
Insufficient Variability Analysis: Did not focus on runtime variability and reproducibility
Workload Limitations: Primarily based on biological workloads, potentially unrepresentative of other scientific domains

Future Directions

Multi-System Comparative Study: Conduct comprehensive multi-system evaluation on different HPC platforms
GPU Acceleration Optimization: Deeply investigate GPU-accelerated index construction and query performance
Adaptive Scaling: Develop systems that adaptively scale based on data size and workload characteristics
Domain-Specific Optimization: Optimize vector databases for specific requirements of different scientific domains

In-Depth Evaluation

Strengths

Pioneering Research: First systematic evaluation of vector database performance in HPC environments, filling an important research gap
Realistic Workloads: Uses authentic biological data and scientific literature, providing practical relevance
Comprehensive Performance Analysis: Covers complete workflow performance from embedding generation to querying
Practical Value: Provides specific configuration recommendations and performance tuning strategies
Open Data: Dataset release promotes field advancement

Weaknesses

Limited System Coverage: Only evaluated Qdrant, lacking horizontal comparison
Insufficient Theoretical Analysis: Primarily based on experimental observations, lacking deep theoretical analysis
Scalability Limitations: Maximum tested scale of 32 worker nodes may be insufficient for large-scale HPC systems
Underutilized GPU: Primarily focused on CPU performance, insufficiently exploring GPU acceleration potential

Impact

Academic Contribution: Establishes foundation for vector database research in HPC environments
Practical Guidance: Provides important deployment reference for HPC centers and scientific computing users
Benchmark Establishment: Establishes benchmark methodology for vector database performance evaluation in HPC environments
Future Research Directions: Identifies multiple directions worthy of in-depth investigation

Applicable Scenarios

Large-Scale Scientific Computing: Applicable to scientific research projects requiring vector database deployment in HPC environments
Bioinformatics: Particularly suitable for genomics and biomedical research involving literature retrieval and knowledge discovery
RAG System Deployment: Provides performance reference for deploying large-scale RAG systems in HPC environments
System Optimization: Guides vector database vendors in optimizing HPC environment performance

References

This study cites 52 relevant references, primarily covering:

Vector database systems and algorithms
High-performance computing platforms and architectures
Embedding models and RAG technology
Related performance evaluation research

Overall Assessment: This is a pioneering research paper that systematically evaluates distributed vector database performance characteristics in HPC environments for the first time. The research methodology is scientifically rigorous, experimental design is sound, and results possess significant practical value. Despite certain limitations, it establishes an important foundation for this emerging research field and holds significant implications for advancing vector database applications in scientific computing.