2025-11-17T12:40:13.303016

Over-Threshold Multiparty Private Set Intersection for Collaborative Network Intrusion Detection

Arpaci, Boutaba, Kerschbaum
An important function of collaborative network intrusion detection is to analyze the network logs of the collaborators for joint IP addresses. However, sharing IP addresses in plain is sensitive and may be even subject to privacy legislation as it is personally identifiable information. In this paper, we present the privacy-preserving collection of IP addresses. We propose a single collector, over-threshold private set intersection protocol. In this protocol $N$ participants identify the IP addresses that appear in at least $t$ participant's sets without revealing any information about other IP addresses. Using a novel hashing scheme, we reduce the computational complexity of the previous state-of-the-art solution from $O(M(N \log{M}/t)^{2t})$ to $O(t^2M\binom{N}{t})$, where $M$ denotes the dataset size. This reduction makes it practically feasible to apply our protocol to real network logs. We test our protocol using joint networks logs of multiple institutions. Additionally, we present two deployment options: a collusion-safe deployment, which provides stronger security guarantees at the cost of increased communication overhead, and a non-interactive deployment, which assumes a non-colluding collector but offers significantly lower communication costs and applicable to many use cases of collaborative network intrusion detection similar to ours.
academic

Over-Threshold Multiparty Private Set Intersection for Collaborative Network Intrusion Detection

Basic Information

  • Paper ID: 2510.12045
  • Title: Over-Threshold Multiparty Private Set Intersection for Collaborative Network Intrusion Detection
  • Authors: Onur Eren Arpaci, Raouf Boutaba, Florian Kerschbaum (University of Waterloo)
  • Classification: cs.CR (Cryptography and Security), cs.NI (Networking and Internet Architecture)
  • Publication Date: October 14, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.12045

Abstract

A critical function in collaborative network intrusion detection is analyzing network logs from collaborators to identify common IP addresses. However, sharing IP addresses in plaintext is sensitive and may be constrained by privacy legislation, as it constitutes personally identifiable information. This paper proposes a privacy-preserving collection method for IP addresses through a single-aggregator, over-threshold private set intersection protocol. In this protocol, N participants identify IP addresses appearing in at least t participants' sets without revealing any information about other IP addresses. Through a novel hashing scheme, the computational complexity of the previous state-of-the-art solution is reduced from O(M(NlogM/t)2t)O(M(N \log{M}/t)^{2t}) to O(t2M(Nt))O(t^2M\binom{N}{t}), where M denotes the dataset size. This reduction makes applying the protocol to real network logs practically feasible.

Research Background and Motivation

Problem Definition

The core challenge in collaborative network intrusion detection is identifying multi-institutional attacks while preserving privacy. Research demonstrates that 75% of institutional attacks spread to a second institution within one day, with over 40% spreading within one hour. Attackers typically exploit a small number of external IP addresses to simultaneously attack multiple institutions. If an external IP connects to at least t institutions within a specific time window, it can be classified as malicious with 95% recall.

Privacy Challenges

Traditional approaches require institutions to share network logs in plaintext, introducing severe privacy risks:

  1. Legal Compliance: IP addresses are classified as personally identifiable information under GDPR, PIPEDA, CCPA, and other regulations
  2. Data Sensitivity: Raw network data is more sensitive than security alerts, containing extensive irrelevant sensitive information
  3. Data Scale: Raw data is several orders of magnitude larger than security alerts, rendering existing solutions computationally infeasible

Limitations of Existing Methods

  • Kissner and Song 26: Requires O(N) communication rounds with computational complexity O(N³M³)
  • Mahdavi et al. 34: While improving communication complexity, computational cost remains prohibitive at O(M(N logM/t)²ᵗ)

Core Contributions

  1. Novel Hashing Scheme: Proposes an innovative hashing algorithm reducing computational complexity from O(M(N logM/t)²ᵗ) to O(t²M(N choose t)), achieving linear complexity in M
  2. Practical Enhancement: Enables the protocol to handle real-scale network logs, completing detection within 170 seconds for 33 participating institutions with up to 144,045 IPs
  3. Dual Deployment Options:
    • Collusion-resistant deployment: Provides stronger security guarantees but higher communication overhead
    • Non-interactive deployment: Assumes non-colluding aggregator, significantly reducing communication costs
  4. Security Proof: Proves protocol security under the semi-honest multiparty computation model
  5. Practical Validation: Evaluation using real network logs from the CANARIE IDS project

Methodology Details

Task Definition

Over-Threshold Multiparty Private Set Intersection (OT-MP-PSI):

  • Input: N participants, each holding at most M elements in set Si
  • Output: Identifies elements appearing in at least t sets without revealing information about below-threshold elements
  • Constraints: Semi-honest security model where participants follow the protocol but may attempt to learn additional information

Core Technical Components

1. Shamir Secret Sharing

Uses (t,n) threshold scheme where any t parties can reconstruct secret V, while fewer than t parties gain no information:

P(x) = c_{t-1}x^{t-1} + c_{t-2}x^{t-2} + ... + c_1x + V

2. Oblivious Pseudorandom Function (OPRF)

Participants learn H_K(x) without knowing key K, while the key holder remains unaware of input x or output values.

3. Oblivious Pseudorandom Secret Sharing (OPR-SS)

Combines secret sharing and OPRF security properties, allowing participants to obtain unique secret shares from the key holder.

Protocol Architecture

Non-Interactive Deployment

  1. Share Creation: Each participant creates secret shares for every element in their set
  2. Hash Mapping: Uses the novel hashing scheme to map shares into size-1 buckets
  3. Reconstruction: Aggregator attempts Lagrange interpolation for all combinations of t participants
  4. Result Notification: Aggregator notifies participants of successfully reconstructed indices

Collusion-Resistant Deployment

Replaces shared keys with OPR-SS protocol, computing hash functions through multi-key OPRF protocol for enhanced collusion resistance.

Technical Innovation

Novel Hashing Scheme

Core Concept: Uses size-1 buckets to avoid hash collisions rather than accommodate them:

  1. Collision Handling: When multiple elements map to the same bucket, uses pseudorandom ordering to select the smallest
  2. Multi-Table Strategy: Each participant creates multiple tables, ensuring lost values appear in other tables
  3. Failure Probability Control: Controls failure probability below 2⁻⁴⁰ using 20 tables

Key Advantages:

  • Aggregator only attempts participant combinations, not share combinations
  • Complexity reduces from exponential to linear (in M)
  • Avoids large constant factors of traditional bucketing methods

Optimization Techniques

  1. Sorting Inversion: Consecutive two tables use the same sorting function with even tables reversing the order
  2. Empty Bucket Utilization: Second insertion leverages empty buckets from first insertion

Experimental Setup

Dataset

  • Real Data: Network connection logs from 54 institutions in the CANARIE IDS project
  • Time Range: November 1-8, 2023, approximately 8 billion log entries daily
  • Data Scale: Approximately 700GB per day (gzip compressed)
  • Processing: Batch processing by hour, extracting external-to-internal IP connections

Evaluation Metrics

  • Reconstruction Time: Time for aggregator to complete reconstruction
  • Share Generation Time: Time for individual participant to create shares
  • Communication Complexity: Total communication overhead of the protocol
  • Correctness: Failure probability verification

Comparison Methods

  • Mahdavi et al. 34: Current state-of-the-art OT-MP-PSI solution
  • Theoretical Bounds: Comparison with computed failure probability upper bounds

Implementation Details

  • Programming Language: Julia, 430 lines of code
  • Cryptographic Libraries: SHA.jl, Nettle.jl
  • Finite Field: 61-bit Mersenne prime
  • Hardware Environment: 8×Intel Xeon E7-8870 processors, 80 physical cores, 1TB memory

Experimental Results

Main Results

Performance Comparison

Compared to Mahdavi et al. 34:

  • Speed Improvement: At least two orders of magnitude, up to 23,066× faster
  • Scalability: At M=10⁵, this method requires hundreds of seconds versus days for the comparison method

Real Data Performance

Performance on CANARIE data:

  • Average Reconstruction Time: 170 seconds
  • Maximum Reconstruction Time: 438 seconds (40 institutions, 220,011 IPs)
  • Average Participating Institutions: 33
  • Average Maximum Set Size: 144,045 IPs

Complexity Analysis

Computational Complexity

  • Aggregator: O(t²M(N choose t))
  • Participants (Non-interactive): O(tM)
  • Special Cases: O(M) when N=t=2, O(N²M) when N=t

Communication Complexity

  • Non-interactive Deployment: O(tMN)
  • Collusion-resistant Deployment: O(tkMN), 5 rounds of communication

Correctness Verification

  • Theoretical Failure Probability: 2⁻⁴⁰
  • Experimental Verification: In 10⁷ trials, actual failures far below theoretical upper bound
  • Table Count Optimization: Optimized from theoretical 28 tables to practical 20 tables

OT-MP-PSI Solutions

  1. Kissner and Song 26: First solution using polynomial rings, O(N³M³) complexity
  2. Mahdavi et al. 34: Constant rounds, O(M(N logM/t)²ᵗ) complexity
  3. Ma et al. 33: Suitable for small input sets and small domains, O(N|S|) complexity
  • Threshold Private Set Intersection (TPSI): Learning intersection if and only if intersection size exceeds threshold
  • Quorum-PSI: Special case of OT-MP-PSI with output restricted to specific participants

Hashing Techniques

  • Cuckoo Hashing: Used to avoid hash collisions, widely applied in PSI protocols
  • 2D Cuckoo Hashing: Designed for two-party PSI, achieving O(M) complexity

Conclusions and Discussion

Main Conclusions

  1. Practical Breakthrough: First to make OT-MP-PSI practical for real network log scales
  2. Efficiency Gains: Achieves orders of magnitude performance improvement through novel hashing scheme
  3. Flexible Deployment: Provides two deployment options accommodating different security requirements
  4. Practical Validation: Verifies protocol practicality in real multi-institutional network environments

Limitations

  1. Semi-Honest Model: While extensible to malicious model, remains vulnerable to input substitution attacks
  2. Set Size Leakage: Core protocol does not treat participant set sizes as private information
  3. Aggregator Trust: Non-interactive deployment requires trusting the aggregator not to collude with participants
  4. Threshold Constraints: Complexity advantages may diminish when t approaches N/2 with large N

Future Directions

  1. Malicious Security: Extend protocol to resist active attackers
  2. Dynamic Thresholds: Support multi-threshold computation without additional client cost
  3. Large-Scale Optimization: Further optimize handling of participant combinations
  4. Privacy Enhancement: Explore differential privacy techniques to protect set size information

In-Depth Evaluation

Strengths

  1. Significant Theoretical Contribution: Novel hashing scheme represents important breakthrough, reducing exponential to linear complexity
  2. High Practical Value: Addresses critical privacy challenges in real-world collaborative intrusion detection
  3. Comprehensive Experiments: Combines theoretical analysis with real data validation, well-designed experimental setup
  4. Complete Engineering Implementation: Provides open-source implementation, enhancing reproducibility
  5. Rigorous Security: Offers formal security proofs and dual deployment options

Weaknesses

  1. Security Model Limitations: Semi-honest model may be insufficient for certain practical scenarios
  2. Parameter Selection: Choice of 20 tables based on empirical optimization lacks sufficient theoretical guidance
  3. Scalability Boundaries: Applicability to extreme scales (e.g., global internet-level) insufficiently explored
  4. Limited Comparison Baselines: Primarily compares against one baseline method, lacking more comprehensive performance comparison

Impact

  1. Academic Value: Provides new technical pathways for multiparty secure computation research
  2. Practical Significance: Directly addresses real-world network security requirements
  3. Technology Dissemination: Hashing scheme potentially applicable to other multiparty computation problems
  4. Standardization Potential: Likely to become standard protocol for collaborative intrusion detection

Applicable Scenarios

  1. Multi-Institutional Network Security: Collaborative protection among similar institutions like universities and hospitals
  2. Cloud Security Services: Privacy-preserving log analysis in security operations centers
  3. Threat Intelligence Sharing: Threat indicator exchange under privacy constraints
  4. Compliance Requirements: Data collaboration satisfying GDPR and other privacy regulations

References

This paper cites 53 relevant references spanning cryptography, network security, and multiparty computation, providing solid theoretical foundation and comprehensive technical background.


Overall Assessment: This is a high-quality applied cryptography paper achieving excellent balance between theoretical innovation and practical application. The proposed hashing scheme represents significant theoretical breakthrough while demonstrating substantial practical value. The paper provides comprehensive experimental validation and rigorous security analysis, making important technical contributions to collaborative network security.