Perfect hash functions give unique "names" to arbitrary keys requiring only a few bits per key. This is an essential building block in applications like static hash tables, databases, or bioinformatics. This paper introduces the PHast approach that combines the fastest available queries, very fast construction, and good space consumption (below 2 bits per key). PHast improves bucket-placement which first hashes each key k to a bucket, and then looks for the bucket seed s such that a placement function maps pairs (s,k) in a collision-free way. PHast can use small-range hash functions with linear mapping, fixed-width encoding of seeds, and parallel construction. This is achieved using small overlapping slices of allowed values and bumping to handle unsuccessful seed assignment. A variant we called PHast+ uses additive placement, which enables bit-parallel seed searching, speeding up the construction by an order of magnitude.
Perfect hash functions assign unique "names" to arbitrary keys requiring only a few bits per key. This is an essential building block in applications such as static hash tables, databases, and bioinformatics. This paper introduces the PHast approach that combines the fastest available queries, very fast construction, and good space consumption (below 2 bits per key). PHast improves bucket-placement by first hashing each key k to a bucket, then searching for the bucket seed s such that a placement function maps pairs (s,k) in a collision-free manner. PHast can use small-range hash functions with linear mapping, fixed-width encoding of seeds, and parallel construction. This is achieved using small overlapping slices of allowed values and bumping to handle unsuccessful seed assignment. A variant called PHast+ uses additive placement, which enables bit-parallel seed searching, accelerating construction by an order of magnitude.
A Perfect Hash Function (PHF) is an injective function that maps a set of keys {k₁, ..., kₙ} to {0, ..., m-1}. When m = n, it is called a Minimal Perfect Hash Function (MPHF). This is an important building block for applications in databases, text indexing, and bioinformatics.
Linear Bucket Mapping: Achieves cache-friendly multi-threaded construction through linear mapping to small overlapping slices, avoiding non-linear bucket allocation
Bumping Mechanism: Enables use of small-range hash functions and fixed-width seed encoding, avoiding complex local search
Heuristic Seed Assignment: Reduces space consumption by selecting seeds that occupy the minimum function values
PHast+ Variant: Uses additive placement function to enable bit-parallel seed searching, achieving an order of magnitude speedup in construction
Comprehensive Experimental Evaluation: Detailed performance comparison with existing methods
PHast avoids variable-length encoding and complex local search through the bumping mechanism while maintaining the simplicity of linear bucket allocation.
Fredman & Komlós: Theoretical lower bounds for perfect hash functions
Belazzougui et al.: Foundational work on bucket placement methods
The PHast paper demonstrates that in algorithmic engineering, simple methods carefully optimized through deep understanding of problem essence and modern hardware characteristics can achieve or even surpass the performance of complex methods. This provides important insights for data structure design: sometimes the key to solving problems is not adding complexity, but finding the correct direction for simplification.