Holographic video communication is considered a paradigm shift in visual communications, becoming increasingly popular for its ability to offer immersive experiences. This article provides an overview of holographic video communication and outlines the requirements of a holographic video communication system. Particularly, following a brief review of semantic com- munication, an architecture for a semantic-enabled holographic video communication system is presented. Key technologies, including semantic sampling, joint semantic-channel coding, and semantic-aware transmission, are designed based on the proposed architecture. Two related use cases are presented to demonstrate the performance gain of the proposed methods. Finally, potential research topics are discussed to pave the way for the realization of semantic-enabled holographic video communications.
Semantic Communication Enabled Holographic Video Processing and Transmission
- Paper ID: 2510.13408
- Title: Semantic Communication Enabled Holographic Video Processing and Transmission
- Authors: Jingkai Ying, Zhiyuan Qi, Yulong Feng, Zhijin Qin, Zhu Han, Rahim Tafazolli, Yonina C. Eldar
- Categories: eess.IV cs.AI cs.IT cs.MM eess.SP math.IT
- Publication Date: October 15, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.13408
Holographic video communication (HVC) is increasingly recognized as a paradigm shift in visual communication due to its ability to provide immersive experiences. This paper provides an overview of holographic video communication and elucidates the requirements of HVC systems. Specifically, following a brief review of semantic communication, a semantic-enhanced holographic video communication system architecture is proposed. Based on the proposed architecture, key technologies are designed, including semantic sampling, joint semantic-channel coding, and semantic-aware transmission. The performance gains of the proposed methods are demonstrated through two relevant use cases. Finally, potential research directions are discussed to pave the way for realizing semantic-enhanced holographic video communication.
Holographic video communication (HVC) as the dominant paradigm for future visual communication faces tremendous technical challenges:
- Explosive Data Growth: Holographic video requires transmission bandwidth of 0.1-1 Tbps, with peaks reaching 10 Tbps
- Stringent Latency Requirements: Air interface transmission latency must be less than 1ms, and end-to-end network latency must be less than 20ms
- High Reliability Demands: Packet error rate must reach 10^-7 level
- Limitations of Existing Systems: Even 6G networks cannot fully guarantee support for high-quality HVC services
Holographic video communication is a key technology for realizing the metaverse and numerous applications (such as holographic conferencing, education, and entertainment), and has been identified by 6G wireless networks as a typical use case for immersive communication.
Existing research on holographic video transmission suffers from the following issues:
- Based on traditional bit transmission paradigm with enormous resource consumption
- Lack of optimization designs tailored to holographic content characteristics
- Insufficient exploitation of the powerful nonlinear representation capabilities of deep learning
Semantic communication transmits the meaning of information rather than bits, enabling effective extraction and compression of meaningful information in holographic content, significantly reducing bandwidth requirements, and providing globally optimal performance through end-to-end joint training.
- Proposed a novel semantic-aware holographic video communication architecture: Integrating semantic sampling, joint semantic-channel coding, and semantic-aware transmission modules
- Designed an attention mechanism-based semantic-aware sampling method: Capable of capturing key regions of point clouds
- Developed an efficient and robust joint semantic-channel coding modulation scheme: Enabling adaptive transmission of point clouds based on semantic features and channel conditions
- Provided two use case validations: Demonstrating performance gains of semantic sampling and joint coding-modulation
This paper investigates how to apply semantic communication techniques to holographic video transmission, with particular focus on efficient transmission of point cloud data. The input is raw holographic data (primarily point clouds), the output is high-quality holographic content reconstructed at the receiver, and constraints include bandwidth limitations, latency requirements, and channel noise.
The proposed semantic-aware HVC system employs a server as an intermediate processing node, forming uplink and downlink transmission chains:
Uplink:
- Sensor → Semantic Sampling → Joint Semantic-Channel Coding → Semantic-Aware Transmission → Server Decoding and Reconstruction
Downlink:
- Server → Joint Semantic-Channel Coding → Semantic-Aware Transmission → User Decoding and Display
- Semantic Sampling Module
- Uses multi-layer perceptron (MLP) to embed points into latent space
- Partitions point cloud into patches, each containing a center point and its k nearest neighbors
- Local attention layer processes patch embeddings to generate intermediate features and semantic maps
- Computes scores for each point based on normalized standard deviation and selects top M points
- Joint Semantic-Channel Coding (JSCC)
- Encoder: Employs PointNet++ for initial processing, uses Point Transformer for semantic feature refinement
- Dual-branch design: Main branch captures fine-grained structural features, auxiliary branch extracts coarse-grained semantic features
- Decoder: Uses Point Transformer to refine noisy features, reconstructs input point cloud through upsampling
- Semantic-Aware Transmission
- Differentiable modulation model: Uses JSCC output semantic features as probabilities for modulation constellation point positions
- Adaptive transmission: Generates segmentation points based on JSCC output; constellation points after segmentation points are not transmitted
- Channel adaptation: Concatenates channel information with JSCC output to learn more robust features
- Server-Mediated Architecture: Addresses the inability of edge devices to handle the enormous storage and computational demands of HVC
- Semantically-Driven Point Cloud Sampling: More effectively preserves geometric structure and task-specific representational capacity compared to traditional mathematical statistical methods
- Differentiable Modulation with Probabilistic Sampling: Avoids non-differentiability issues when directly quantizing JSCC output to constellation points
- Dual-Branch Semantic Feature Extraction: Simultaneously captures semantic information at different granularities
- Point Cloud Classification: Uses point cloud datasets containing 2048 points for classification task evaluation
- Point Cloud Reconstruction: Uses standard point cloud datasets to evaluate reconstruction quality
- Classification Accuracy: Evaluates semantic sampling performance
- D1 PSNR/D2 PSNR: Evaluates point cloud reconstruction quality
- D1: Peak signal-to-noise ratio of point-to-point mean squared error
- D2: Peak signal-to-noise ratio of point-to-plane projection mean squared error considering human visual system perception characteristics
- Chamfer Distance: Measures geometric differences between reconstructed and original point clouds
Semantic Sampling Comparisons:
- Farthest Point Sampling (FPS)
- S-Net
- SampleNet
Joint Coding-Modulation Comparisons:
- Separated scheme of G-PCC + LDPC
- SEPT (deep learning-based JSCC scheme)
- Employs two-stage training strategy: First stage trains with complete point clouds, second stage freezes downstream networks and trains sampling model
- Loss function combines reconstruction metrics (Chamfer distance) and task loss (cross-entropy)
- Channel model uses Rayleigh fading channel
- Significant performance improvements over traditional methods at low sampling ratios
- At sampling ratio of 0.125, classification accuracy improves approximately 15% compared to FPS
- Shows clear advantages over deep learning methods such as S-Net and SampleNet
- At SNR=15dB with same number of transmitted constellation points, D2 PSNR improves over 3dB compared to baseline methods
- Even at SNR=0dB, performance surpasses baseline methods at SNR=15dB
- Traditional separation schemes fail to decode at SNR=0dB due to cliff effect
Visualization results demonstrate that the semantic sampling method effectively preserves structural features of objects like airplanes at different sampling ratios, validating that models optimized for classification accuracy also ensure good reconstruction performance.
- Effectiveness of Attention Mechanisms: Attention-based semantic sampling more effectively captures point cloud semantic information
- Advantages of Joint Optimization: End-to-end trained JSCC exhibits stronger noise robustness compared to separated schemes
- Robustness at Low SNR: Semantic communication methods maintain good performance even under adverse channel conditions
- MPEG-standardized point cloud compression (V-PCC and G-PCC)
- Deep learning-based point cloud compression methods
- Existing HVC architectures primarily based on traditional transmission and network technologies
- Deep learning-driven semantic extraction and compression
- Joint semantic-channel coding frameworks
- Semantic communication systems for modalities such as images and videos
Compared to existing work, this paper is the first to systematically apply semantic communication to holographic video transmission, proposing a complete system architecture and key technology implementations.
- Semantic communication provides an effective pathway for addressing bandwidth and latency challenges in holographic video transmission
- The proposed semantic-aware architecture can significantly improve transmission efficiency and noise robustness
- Point clouds as the most suitable 3D data representation at the current stage provide a feasible path for HVC implementation
- High Computational Complexity: Deep learning-based semantic communication methods incur substantial computational overhead
- Data Representation Limitations: Primarily focuses on point clouds, with insufficient research on representations closer to ideal holograms such as light fields
- Insufficient Exploitation of Temporal Correlation: Existing methods focus mainly on intra-frame compression, lacking sufficient utilization of temporal redundancy
The paper proposes three important research directions:
- Temporal Correlation Exploitation: Exploring semantic information in holographic video across the temporal dimension
- Computational Complexity Optimization: Designing lightweight attention mechanisms that balance performance and complexity
- Light Field Transmission Research: Effectively converting light fields into more mature processing representations such as point clouds or multi-view images
- Strong Systematicity: Proposes a complete semantic-aware HVC system architecture covering the entire process from sampling to transmission
- Technical Innovation: Server-mediated architecture, semantically-driven sampling, and differentiable modulation designs demonstrate innovation
- Comprehensive Experiments: Validates the effectiveness of key technologies through two use cases
- Forward-Looking Perspective: Provides important technical pathways for immersive communication in the 6G era
- Limited Experimental Scale: Use cases primarily based on small-scale point clouds, lacking experimental validation on large-scale holographic video
- Insufficient Theoretical Analysis: Lacks theoretical analysis of semantic information preservation and transmission efficiency
- Practical Considerations: Insufficient discussion of hardware constraints and energy consumption issues in actual deployment
- Academic Value: Opens new directions for cross-disciplinary research between semantic communication and holographic video transmission
- Practical Value: Provides technical reference for 6G networks to support immersive communication
- Reproducibility: Paper provides sufficient technical details with good reproducibility
- Holographic conferencing systems in 6G network environments
- 3D content transmission in metaverse applications
- Real-time 3D data stream transmission for AR/VR devices
- Immersive media services in edge computing environments
The paper cites 15 important references covering core works in holographic communication, semantic communication, and point cloud processing, providing readers with a solid knowledge foundation.
Overall Assessment: This is a forward-looking, high-quality paper that systematically applies semantic communication technology to the holographic video transmission domain, proposing innovative system architecture and key technical solutions. While there is room for improvement in large-scale experimental validation and theoretical analysis, it provides important technical foundations and development directions for immersive communication research in the 6G era.