2025-11-16T20:43:12.511354

Review of Three Algorithms That Build k-d Trees

Brown

The original description of the k-d tree recognized that rebalancing techniques, such as used to build an AVL tree or a red-black tree, are not applicable to a k-d tree. Hence, in order to build a balanced k-d tree, it is necessary to find the median of a set of data for each recursive subdivision of that set. The sort or selection used to find the median, and the technique used to partition the set about that median, strongly influence the computational complexity of building a k-d tree. This article describes and contrasts three k-d tree-building algorithms that differ in their technique used to partition the set, and compares the performance of the algorithms. In addition, dual-threaded execution is proposed for one of the three algorithms.

academic

Review of Three Algorithms That Build k-d Trees

基本信息

论文ID: 2506.20687
标题: Review of Three Algorithms That Build k-d Trees
作者: Russell A. Brown
分类: cs.DS (Data Structures and Algorithms)
发表时间: 2025年10月13日 (arXiv v10)
论文链接: https://arxiv.org/abs/2506.20687

摘要

k-d树的原始描述认识到，用于构建AVL树或红黑树的重新平衡技术不适用于k-d树。因此，为了构建平衡的k-d树，必须为每个递归子分区找到数据集的中位数。用于找到中位数的排序或选择算法，以及围绕该中位数对集合进行分区的技术，强烈影响构建k-d树的计算复杂度。本文描述并对比了三种k-d树构建算法，它们在分区技术上有所不同，并比较了算法的性能。此外，还为其中一种算法提出了双线程执行方案。

研究背景与动机

问题定义

核心问题: k-d树无法使用传统的自平衡二叉树技术（如AVL树或红黑树的旋转操作）来维持平衡，因此需要通过寻找中位数来递归分区数据集以构建平衡的k-d树。
重要性: k-d树是多维空间数据结构的重要工具，广泛应用于最近邻搜索、范围查询等场景。构建算法的效率直接影响其实用性。
现有方法局限:
- 不同的中位数查找和分区技术导致算法复杂度差异显著
- 缺乏对不同算法的系统性比较和性能分析
- 多线程优化潜力未充分挖掘
研究动机: 通过系统比较三种不同的k-d树构建算法，为实际应用提供选择指导，并探索并行化优化的可能性。

核心贡献

系统性比较: 详细描述并对比了三种时间复杂度分别为O(n log n)、O(kn log n)和O(kn log n) + O(n log n)的k-d树构建算法
性能基准测试: 在现代硬件平台上进行了全面的性能测试，涵盖不同数据规模(2^16到2^24个节点)和维度(2-6维)
并行化方案: 为O(kn log n) + O(n log n)算法提出了双线程执行方案，并分析了其性能特征
内存和缓存分析: 深入分析了各算法的内存需求和缓存性能，解释了性能差异的根本原因
实用指导: 基于实验结果为不同应用场景提供了算法选择建议