2025-11-13T11:28:10.724842

Semantic Communication Enabled Holographic Video Processing and Transmission

Ying, Qi, Feng et al.

Holographic video communication is considered a paradigm shift in visual communications, becoming increasingly popular for its ability to offer immersive experiences. This article provides an overview of holographic video communication and outlines the requirements of a holographic video communication system. Particularly, following a brief review of semantic com- munication, an architecture for a semantic-enabled holographic video communication system is presented. Key technologies, including semantic sampling, joint semantic-channel coding, and semantic-aware transmission, are designed based on the proposed architecture. Two related use cases are presented to demonstrate the performance gain of the proposed methods. Finally, potential research topics are discussed to pave the way for the realization of semantic-enabled holographic video communications.

academic

Semantic Communication Enabled Holographic Video Processing and Transmission

Basic Information

Paper ID: 2510.13408
Title: Semantic Communication Enabled Holographic Video Processing and Transmission
Authors: Jingkai Ying, Zhiyuan Qi, Yulong Feng, Zhijin Qin, Zhu Han, Rahim Tafazolli, Yonina C. Eldar
Categories: eess.IV cs.AI cs.IT cs.MM eess.SP math.IT
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13408

Abstract

Holographic video communication (HVC) is increasingly recognized as a paradigm shift in visual communication due to its ability to provide immersive experiences. This paper provides an overview of holographic video communication and elucidates the requirements of HVC systems. Specifically, following a brief review of semantic communication, a semantic-enhanced holographic video communication system architecture is proposed. Based on the proposed architecture, key technologies are designed, including semantic sampling, joint semantic-channel coding, and semantic-aware transmission. The performance gains of the proposed methods are demonstrated through two relevant use cases. Finally, potential research directions are discussed to pave the way for realizing semantic-enhanced holographic video communication.

Research Background and Motivation

Problem Definition

Holographic video communication (HVC) as the dominant paradigm for future visual communication faces tremendous technical challenges:

Explosive Data Growth: Holographic video requires transmission bandwidth of 0.1-1 Tbps, with peaks reaching 10 Tbps
Stringent Latency Requirements: Air interface transmission latency must be less than 1ms, and end-to-end network latency must be less than 20ms
High Reliability Demands: Packet error rate must reach 10^-7 level
Limitations of Existing Systems: Even 6G networks cannot fully guarantee support for high-quality HVC services

Research Significance

Holographic video communication is a key technology for realizing the metaverse and numerous applications (such as holographic conferencing, education, and entertainment), and has been identified by 6G wireless networks as a typical use case for immersive communication.

Limitations of Existing Approaches

Existing research on holographic video transmission suffers from the following issues:

Based on traditional bit transmission paradigm with enormous resource consumption
Lack of optimization designs tailored to holographic content characteristics
Insufficient exploitation of the powerful nonlinear representation capabilities of deep learning

Research Motivation

Semantic communication transmits the meaning of information rather than bits, enabling effective extraction and compression of meaningful information in holographic content, significantly reducing bandwidth requirements, and providing globally optimal performance through end-to-end joint training.

Core Contributions

Proposed a novel semantic-aware holographic video communication architecture: Integrating semantic sampling, joint semantic-channel coding, and semantic-aware transmission modules
Designed an attention mechanism-based semantic-aware sampling method: Capable of capturing key regions of point clouds
Developed an efficient and robust joint semantic-channel coding modulation scheme: Enabling adaptive transmission of point clouds based on semantic features and channel conditions
Provided two use case validations: Demonstrating performance gains of semantic sampling and joint coding-modulation

Methodology Details

Task Definition

This paper investigates how to apply semantic communication techniques to holographic video transmission, with particular focus on efficient transmission of point cloud data. The input is raw holographic data (primarily point clouds), the output is high-quality holographic content reconstructed at the receiver, and constraints include bandwidth limitations, latency requirements, and channel noise.

Model Architecture

Overall System Architecture

The proposed semantic-aware HVC system employs a server as an intermediate processing node, forming uplink and downlink transmission chains:

Uplink:

Sensor → Semantic Sampling → Joint Semantic-Channel Coding → Semantic-Aware Transmission → Server Decoding and Reconstruction

Downlink:

Server → Joint Semantic-Channel Coding → Semantic-Aware Transmission → User Decoding and Display

Key Module Design

Semantic Sampling Module
- Uses multi-layer perceptron (MLP) to embed points into latent space
- Partitions point cloud into patches, each containing a center point and its k nearest neighbors
- Local attention layer processes patch embeddings to generate intermediate features and semantic maps
- Computes scores for each point based on normalized standard deviation and selects top M points
Joint Semantic-Channel Coding (JSCC)
- Encoder: Employs PointNet++ for initial processing, uses Point Transformer for semantic feature refinement
- Dual-branch design: Main branch captures fine-grained structural features, auxiliary branch extracts coarse-grained semantic features
- Decoder: Uses Point Transformer to refine noisy features, reconstructs input point cloud through upsampling
Semantic-Aware Transmission
- Differentiable modulation model: Uses JSCC output semantic features as probabilities for modulation constellation point positions
- Adaptive transmission: Generates segmentation points based on JSCC output; constellation points after segmentation points are not transmitted
- Channel adaptation: Concatenates channel information with JSCC output to learn more robust features

Technical Innovations

Server-Mediated Architecture: Addresses the inability of edge devices to handle the enormous storage and computational demands of HVC
Semantically-Driven Point Cloud Sampling: More effectively preserves geometric structure and task-specific representational capacity compared to traditional mathematical statistical methods
Differentiable Modulation with Probabilistic Sampling: Avoids non-differentiability issues when directly quantizing JSCC output to constellation points
Dual-Branch Semantic Feature Extraction: Simultaneously captures semantic information at different granularities

Experimental Setup

Datasets

Point Cloud Classification: Uses point cloud datasets containing 2048 points for classification task evaluation
Point Cloud Reconstruction: Uses standard point cloud datasets to evaluate reconstruction quality

Evaluation Metrics

Classification Accuracy: Evaluates semantic sampling performance
D1 PSNR/D2 PSNR: Evaluates point cloud reconstruction quality
- D1: Peak signal-to-noise ratio of point-to-point mean squared error
- D2: Peak signal-to-noise ratio of point-to-plane projection mean squared error considering human visual system perception characteristics
Chamfer Distance: Measures geometric differences between reconstructed and original point clouds

Comparison Methods

Semantic Sampling Comparisons:

Farthest Point Sampling (FPS)
S-Net
SampleNet

Joint Coding-Modulation Comparisons:

Separated scheme of G-PCC + LDPC
SEPT (deep learning-based JSCC scheme)

Implementation Details

Employs two-stage training strategy: First stage trains with complete point clouds, second stage freezes downstream networks and trains sampling model
Loss function combines reconstruction metrics (Chamfer distance) and task loss (cross-entropy)
Channel model uses Rayleigh fading channel

Experimental Results

Main Results

Semantic Sampling Performance

Significant performance improvements over traditional methods at low sampling ratios
At sampling ratio of 0.125, classification accuracy improves approximately 15% compared to FPS
Shows clear advantages over deep learning methods such as S-Net and SampleNet

Joint Semantic-Channel Coding-Modulation Performance

At SNR=15dB with same number of transmitted constellation points, D2 PSNR improves over 3dB compared to baseline methods
Even at SNR=0dB, performance surpasses baseline methods at SNR=15dB
Traditional separation schemes fail to decode at SNR=0dB due to cliff effect

Case Analysis

Visualization results demonstrate that the semantic sampling method effectively preserves structural features of objects like airplanes at different sampling ratios, validating that models optimized for classification accuracy also ensure good reconstruction performance.

Experimental Findings

Effectiveness of Attention Mechanisms: Attention-based semantic sampling more effectively captures point cloud semantic information
Advantages of Joint Optimization: End-to-end trained JSCC exhibits stronger noise robustness compared to separated schemes
Robustness at Low SNR: Semantic communication methods maintain good performance even under adverse channel conditions

Holographic Video Communication Research

MPEG-standardized point cloud compression (V-PCC and G-PCC)
Deep learning-based point cloud compression methods
Existing HVC architectures primarily based on traditional transmission and network technologies

Semantic Communication Research

Deep learning-driven semantic extraction and compression
Joint semantic-channel coding frameworks
Semantic communication systems for modalities such as images and videos

Advantages of This Work

Compared to existing work, this paper is the first to systematically apply semantic communication to holographic video transmission, proposing a complete system architecture and key technology implementations.

Conclusions and Discussion

Main Conclusions

Semantic communication provides an effective pathway for addressing bandwidth and latency challenges in holographic video transmission
The proposed semantic-aware architecture can significantly improve transmission efficiency and noise robustness
Point clouds as the most suitable 3D data representation at the current stage provide a feasible path for HVC implementation

Limitations

High Computational Complexity: Deep learning-based semantic communication methods incur substantial computational overhead
Data Representation Limitations: Primarily focuses on point clouds, with insufficient research on representations closer to ideal holograms such as light fields
Insufficient Exploitation of Temporal Correlation: Existing methods focus mainly on intra-frame compression, lacking sufficient utilization of temporal redundancy

Future Directions

The paper proposes three important research directions:

Temporal Correlation Exploitation: Exploring semantic information in holographic video across the temporal dimension
Computational Complexity Optimization: Designing lightweight attention mechanisms that balance performance and complexity
Light Field Transmission Research: Effectively converting light fields into more mature processing representations such as point clouds or multi-view images

In-Depth Evaluation

Strengths

Strong Systematicity: Proposes a complete semantic-aware HVC system architecture covering the entire process from sampling to transmission
Technical Innovation: Server-mediated architecture, semantically-driven sampling, and differentiable modulation designs demonstrate innovation
Comprehensive Experiments: Validates the effectiveness of key technologies through two use cases
Forward-Looking Perspective: Provides important technical pathways for immersive communication in the 6G era

Weaknesses

Limited Experimental Scale: Use cases primarily based on small-scale point clouds, lacking experimental validation on large-scale holographic video
Insufficient Theoretical Analysis: Lacks theoretical analysis of semantic information preservation and transmission efficiency
Practical Considerations: Insufficient discussion of hardware constraints and energy consumption issues in actual deployment

Impact

Academic Value: Opens new directions for cross-disciplinary research between semantic communication and holographic video transmission
Practical Value: Provides technical reference for 6G networks to support immersive communication
Reproducibility: Paper provides sufficient technical details with good reproducibility

Applicable Scenarios

Holographic conferencing systems in 6G network environments
3D content transmission in metaverse applications
Real-time 3D data stream transmission for AR/VR devices
Immersive media services in edge computing environments

References

The paper cites 15 important references covering core works in holographic communication, semantic communication, and point cloud processing, providing readers with a solid knowledge foundation.

Overall Assessment: This is a forward-looking, high-quality paper that systematically applies semantic communication technology to the holographic video transmission domain, proposing innovative system architecture and key technical solutions. While there is room for improvement in large-scale experimental validation and theoretical analysis, it provides important technical foundations and development directions for immersive communication research in the 6G era.