2025-11-13T07:01:10.346871

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications

Kondo, Asano, Ochiai
We present Instant Skinned Gaussian Avatars, a real-time and cross-platform 3D avatar system. Many approaches have been proposed to animate Gaussian Splatting, but they often require camera arrays, long preprocessing times, or high-end GPUs. Some methods attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance but sacrificing visual fidelity. In contrast, our system efficiently animates Gaussian Splatting by leveraging parallel splat-wise processing to dynamically follow the underlying skinned mesh in real time while preserving high visual fidelity. From smartphone-based 3D scanning to on-device preprocessing, the entire process takes just around five minutes, with the avatar generation step itself completed in only about 30 seconds. Our system enables users to instantly transform their real-world appearance into a 3D avatar, making it ideal for seamless integration with social media and metaverse applications. Website: https://sites.google.com/view/gaussian-vrm
academic

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications

Basic Information

  • Paper ID: 2510.13978
  • Title: Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications
  • Authors: Naruya Kondo, Yuto Asano, Yoichi Ochiai (University of Tsukuba)
  • Classification: cs.CG (Computer Graphics)
  • Publication Date/Venue: SUI '25 (ACM Symposium on Spatial User Interaction), November 10–11, 2025, Montreal, QC, Canada
  • Paper Link: https://arxiv.org/abs/2510.13978

Abstract

This paper presents Instant Skinned Gaussian Avatars, a real-time cross-platform 3D avatar system. Existing Gaussian Splatting animation methods typically require camera arrays, lengthy preprocessing, or high-end GPUs. Some approaches attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance at the cost of visual fidelity. In contrast, this system efficiently animates Gaussian Splatting through parallel splat processing, enabling real-time animation that follows the dynamic deformations of the underlying skinned mesh while maintaining high visual fidelity. The entire process from smartphone-based 3D scanning to on-device preprocessing requires approximately 5 minutes, with the avatar generation step itself taking only about 30 seconds. This system enables users to instantly convert real-world appearance into 3D avatars, making it ideal for seamless integration with social media and metaverse applications.

Research Background and Motivation

Problem Definition

Traditional 3D character avatar creation relies on manual modeling or photogrammetry pipelines, which are either time-consuming and labor-intensive or require professional equipment. While Gaussian Splatting technology has demonstrated excellence in high-fidelity scene reconstruction and real-time rendering, existing Gaussian Splatting animation methods suffer from the following limitations:

  1. High Hardware Requirements: Necessitate expensive equipment such as camera arrays and high-end GPUs
  2. Long Preprocessing Time: Methods like ExAvatar require 2-3 hours of preprocessing
  3. Loss of Visual Fidelity: Conversion to mesh representations reduces expressiveness
  4. Poor Accessibility: Difficult for ordinary users to utilize

Research Significance

This research aims to address the accessibility challenges in 3D avatar creation, enabling ordinary users to quickly and conveniently create high-quality 3D avatars. This is significant for:

  • Popularization of social media applications
  • User experience in metaverse platforms
  • Virtual conferences and digital twin applications
  • AR/VR experiences on mobile devices

Core Contributions

  1. Rapid Avatar Generation System: Proposes a complete pipeline from scanning to avatar creation in approximately 5 minutes, with the core generation step requiring only 30 seconds
  2. Efficient Animation Method: Achieves real-time animation of Gaussian Splatting through parallel splat processing while maintaining high visual fidelity
  3. Cross-Platform Compatibility: WebXR-based implementation supports mobile devices, VR headsets, and web platforms
  4. Mobile Device Optimization: Specifically optimized for mobile device performance, achieving 40-50 fps on iPhone 13 Pro

Methodology

Task Definition

Input: Short video captured with a single camera (via Scaniverse application) Output: Real-time animatable high-fidelity 3D avatar Constraints:

  • Mobile device compatibility
  • Real-time rendering performance
  • Preservation of visual fidelity

System Architecture

Core Concept

The system's core concept is to allow Gaussian splats to follow the vertex motion of a background 3D mesh. During preprocessing, splats are assigned to mesh vertices and relative transformation relationships are stored. At runtime, real-time animation is achieved by animating the background mesh and parallelly updating Gaussian splat positions.

Preprocessing Pipeline

Step 1: 3D Scanning

  • Capture subject using Scaniverse application in Gaussian Splatting format
  • Requires subject to be in A-pose to simplify subsequent processing

Step 2: Point Cloud Filtering

  • Remove points not belonging to the subject
  • Rule-based horizontal and vertical filtering
  • Normalize splat positions and scales

Step 3: Pose Estimation and Mesh Registration

  • Infer subject's frontal direction and limb angles
  • Place background 3D mesh at identical position, pose, and scale

Step 4: Splat-Vertex Binding

  • Select nearest mesh vertex for each splat via nearest neighbor search
  • Compute relative transformation relationships

Step 5: Data Export

  • Output subject pose, scale, nearest vertex indices, and relative transformations

Animation System

Three steps per frame:

  1. Mesh Animation: Animate the background skinned mesh
  2. Splat Update: Parallelly update Gaussian splat positions and orientations
  3. Depth Sorting: Sort splats according to observer viewpoint

Technical Innovations

1. Parallel Splat Processing

Traditional dynamic Gaussian Splatting requires updating position data each frame, causing severe performance degradation. This paper addresses this through parallel splat processing.

2. Grouped Sorting Optimization

To reduce sorting computational cost, a grouped sorting strategy is employed:

  • Group splats at the skeletal level
  • Perform sorting at group level rather than individual splat level
  • Balance between number of groups and hardware capabilities

3. Mobile Device Optimization

  • Use 32k polygon VRM format mesh
  • Browser implementation based on JavaScript and Three.js
  • Performance optimization for mobile GPUs

Experimental Setup

Implementation Platform

  • Development Environment: JavaScript + Three.js (browser application)
  • 3D Scanning: Scaniverse application
  • Background Mesh: VRM format, 32k polygons, neutral body type
  • Test Devices: iPhone 13 Pro, laptop with NVIDIA GeForce RTX 3060

Performance Metrics

  • Total Processing Time: Approximately 5 minutes (including scanning)
  • Avatar Generation Time: Approximately 30 seconds
  • 3D Reconstruction Time: Approximately 1 minute (Scaniverse)
  • Rendering Frame Rate: 40-50 fps on mobile devices, 240 fps on laptop

Experimental Results

Performance Evaluation

Time Efficiency:

  • Complete pipeline: ~5 minutes
  • Avatar generation: ~30 seconds
  • 3D scanning: ~1 minute (iPhone 13 Pro)

Rendering Performance:

  • iPhone 13 Pro: 40-50 fps
  • RTX 3060 laptop: 240 fps (limited by display refresh rate)

System Characteristics

  1. High Automation: Preprocessing steps are fully automated
  2. Cross-Platform Compatibility: Supports mobile devices, VR headsets, and web platforms
  3. Standard Format Support: Uses VRM format for easy integration with existing applications
  4. Real-Time Performance: Maintains real-time rendering while preserving high visual quality

Gaussian Splatting Avatar Research

The paper references multiple related works:

  • GaussianAvatar1: Generates photorealistic character avatars from single video
  • GauHuman2: Articulated Gaussian Splatting for real-time 3D human rendering
  • HUGS4: Human Gaussian Splats
  • ExAvatar6: Expressive full-body 3D Gaussian avatars

Advantages Over Existing Methods

Compared to existing approaches, this paper's main advantages are:

  1. Processing Speed: Requires only 30 seconds compared to ExAvatar's 2-3 hours
  2. Device Requirements: No need for high-end GPUs or camera arrays
  3. Accessibility: Entirely based on mobile devices and browsers
  4. Fidelity: Maintains high visual quality of Gaussian Splatting

Conclusions and Discussion

Main Conclusions

  1. Successfully implements a rapid, high-quality 3D avatar generation system
  2. Effectively addresses dynamic Gaussian Splatting performance issues through parallel processing and grouped sorting
  3. WebXR-based implementation ensures cross-platform compatibility
  4. Mobile device optimization enables convenient usage by ordinary users

Limitations

  1. Third-Party Application Dependency: Requires Scaniverse for 3D scanning
  2. Pose Constraints: Preprocessing requires A-pose, limiting use cases
  3. Mesh Precision: Background mesh quality may affect final results
  4. Grouped Sorting Trade-off: Sacrifices some rendering precision for mobile compatibility

Future Directions

  1. Integrate additional 3D scanning solutions to reduce dependency on specific applications
  2. Support more diverse initial poses
  3. Optimize grouped sorting algorithm to improve rendering quality
  4. Extend to more complex animation scenarios

In-Depth Evaluation

Strengths

1. Strong Practicality

  • Addresses genuine user needs
  • Complete end-to-end solution
  • Well-designed user experience

2. Technical Innovation

  • Effective parallel processing approach
  • Clever grouped sorting optimization
  • Mobile device performance optimization

3. Accessibility

  • Based on widely available mobile devices
  • Browser implementation requiring no installation
  • Rapid processing time

4. Standards Compliance

  • Uses VRM standard format
  • Facilitates integration with existing ecosystem

Weaknesses

1. Method Simplicity

  • Relatively simple core methodology with limited technical depth
  • Primarily engineering optimization rather than algorithmic innovation

2. Insufficient Evaluation

  • Lacks quantitative comparison with other methods
  • Absence of user studies or quality assessment
  • Limited testing across different scenarios

3. Dependency Issues

  • Dependent on third-party Scaniverse application
  • Requires specific initial pose

4. Technical Details

  • Insufficient detail on grouped sorting implementation
  • Lacks failure case analysis

Impact

1. Academic Contribution

  • Provides reference for Gaussian Splatting applications on mobile devices
  • Demonstrates practical system design approach

2. Practical Value

  • High practical value suitable for actual deployment
  • Significant implications for metaverse and social media applications

3. Reproducibility

  • Based on standard technology stack, easy to reproduce
  • Strong potential for open-source release

Applicable Scenarios

  1. Social Media Applications: Rapid personal avatar generation
  2. Metaverse Platforms: User identity representation
  3. Virtual Conferences: Enhanced presence
  4. Gaming Applications: Character customization
  5. AR/VR Experiences: Personalized virtual avatars

References

The paper cites 12 related references, primarily covering:

  • Gaussian Splatting foundational techniques3
  • Human avatar generation methods1,2,4,5,6,8,9,11,12
  • 3D reconstruction techniques10
  • Commercial scanning applications7

These references adequately cover the related research domain and provide sufficient background support for this work.


Overall Assessment: This is a highly practical systems paper that, while relatively limited in algorithmic innovation, makes important contributions in solving real-world problems and improving accessibility. The system's speed and mobile compatibility provide significant practical value, making it suitable for deployment in real-world applications.