We present Instant Skinned Gaussian Avatars, a real-time and cross-platform 3D avatar system. Many approaches have been proposed to animate Gaussian Splatting, but they often require camera arrays, long preprocessing times, or high-end GPUs. Some methods attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance but sacrificing visual fidelity. In contrast, our system efficiently animates Gaussian Splatting by leveraging parallel splat-wise processing to dynamically follow the underlying skinned mesh in real time while preserving high visual fidelity. From smartphone-based 3D scanning to on-device preprocessing, the entire process takes just around five minutes, with the avatar generation step itself completed in only about 30 seconds. Our system enables users to instantly transform their real-world appearance into a 3D avatar, making it ideal for seamless integration with social media and metaverse applications. Website: https://sites.google.com/view/gaussian-vrm
Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications
- Paper ID: 2510.13978
- Title: Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications
- Authors: Naruya Kondo, Yuto Asano, Yoichi Ochiai (University of Tsukuba)
- Classification: cs.CG (Computer Graphics)
- Publication Date/Venue: SUI '25 (ACM Symposium on Spatial User Interaction), November 10–11, 2025, Montreal, QC, Canada
- Paper Link: https://arxiv.org/abs/2510.13978
This paper presents Instant Skinned Gaussian Avatars, a real-time cross-platform 3D avatar system. Existing Gaussian Splatting animation methods typically require camera arrays, lengthy preprocessing, or high-end GPUs. Some approaches attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance at the cost of visual fidelity. In contrast, this system efficiently animates Gaussian Splatting through parallel splat processing, enabling real-time animation that follows the dynamic deformations of the underlying skinned mesh while maintaining high visual fidelity. The entire process from smartphone-based 3D scanning to on-device preprocessing requires approximately 5 minutes, with the avatar generation step itself taking only about 30 seconds. This system enables users to instantly convert real-world appearance into 3D avatars, making it ideal for seamless integration with social media and metaverse applications.
Traditional 3D character avatar creation relies on manual modeling or photogrammetry pipelines, which are either time-consuming and labor-intensive or require professional equipment. While Gaussian Splatting technology has demonstrated excellence in high-fidelity scene reconstruction and real-time rendering, existing Gaussian Splatting animation methods suffer from the following limitations:
- High Hardware Requirements: Necessitate expensive equipment such as camera arrays and high-end GPUs
- Long Preprocessing Time: Methods like ExAvatar require 2-3 hours of preprocessing
- Loss of Visual Fidelity: Conversion to mesh representations reduces expressiveness
- Poor Accessibility: Difficult for ordinary users to utilize
This research aims to address the accessibility challenges in 3D avatar creation, enabling ordinary users to quickly and conveniently create high-quality 3D avatars. This is significant for:
- Popularization of social media applications
- User experience in metaverse platforms
- Virtual conferences and digital twin applications
- AR/VR experiences on mobile devices
- Rapid Avatar Generation System: Proposes a complete pipeline from scanning to avatar creation in approximately 5 minutes, with the core generation step requiring only 30 seconds
- Efficient Animation Method: Achieves real-time animation of Gaussian Splatting through parallel splat processing while maintaining high visual fidelity
- Cross-Platform Compatibility: WebXR-based implementation supports mobile devices, VR headsets, and web platforms
- Mobile Device Optimization: Specifically optimized for mobile device performance, achieving 40-50 fps on iPhone 13 Pro
Input: Short video captured with a single camera (via Scaniverse application)
Output: Real-time animatable high-fidelity 3D avatar
Constraints:
- Mobile device compatibility
- Real-time rendering performance
- Preservation of visual fidelity
The system's core concept is to allow Gaussian splats to follow the vertex motion of a background 3D mesh. During preprocessing, splats are assigned to mesh vertices and relative transformation relationships are stored. At runtime, real-time animation is achieved by animating the background mesh and parallelly updating Gaussian splat positions.
Step 1: 3D Scanning
- Capture subject using Scaniverse application in Gaussian Splatting format
- Requires subject to be in A-pose to simplify subsequent processing
Step 2: Point Cloud Filtering
- Remove points not belonging to the subject
- Rule-based horizontal and vertical filtering
- Normalize splat positions and scales
Step 3: Pose Estimation and Mesh Registration
- Infer subject's frontal direction and limb angles
- Place background 3D mesh at identical position, pose, and scale
Step 4: Splat-Vertex Binding
- Select nearest mesh vertex for each splat via nearest neighbor search
- Compute relative transformation relationships
Step 5: Data Export
- Output subject pose, scale, nearest vertex indices, and relative transformations
Three steps per frame:
- Mesh Animation: Animate the background skinned mesh
- Splat Update: Parallelly update Gaussian splat positions and orientations
- Depth Sorting: Sort splats according to observer viewpoint
Traditional dynamic Gaussian Splatting requires updating position data each frame, causing severe performance degradation. This paper addresses this through parallel splat processing.
To reduce sorting computational cost, a grouped sorting strategy is employed:
- Group splats at the skeletal level
- Perform sorting at group level rather than individual splat level
- Balance between number of groups and hardware capabilities
- Use 32k polygon VRM format mesh
- Browser implementation based on JavaScript and Three.js
- Performance optimization for mobile GPUs
- Development Environment: JavaScript + Three.js (browser application)
- 3D Scanning: Scaniverse application
- Background Mesh: VRM format, 32k polygons, neutral body type
- Test Devices: iPhone 13 Pro, laptop with NVIDIA GeForce RTX 3060
- Total Processing Time: Approximately 5 minutes (including scanning)
- Avatar Generation Time: Approximately 30 seconds
- 3D Reconstruction Time: Approximately 1 minute (Scaniverse)
- Rendering Frame Rate: 40-50 fps on mobile devices, 240 fps on laptop
Time Efficiency:
- Complete pipeline: ~5 minutes
- Avatar generation: ~30 seconds
- 3D scanning: ~1 minute (iPhone 13 Pro)
Rendering Performance:
- iPhone 13 Pro: 40-50 fps
- RTX 3060 laptop: 240 fps (limited by display refresh rate)
- High Automation: Preprocessing steps are fully automated
- Cross-Platform Compatibility: Supports mobile devices, VR headsets, and web platforms
- Standard Format Support: Uses VRM format for easy integration with existing applications
- Real-Time Performance: Maintains real-time rendering while preserving high visual quality
The paper references multiple related works:
- GaussianAvatar1: Generates photorealistic character avatars from single video
- GauHuman2: Articulated Gaussian Splatting for real-time 3D human rendering
- HUGS4: Human Gaussian Splats
- ExAvatar6: Expressive full-body 3D Gaussian avatars
Compared to existing approaches, this paper's main advantages are:
- Processing Speed: Requires only 30 seconds compared to ExAvatar's 2-3 hours
- Device Requirements: No need for high-end GPUs or camera arrays
- Accessibility: Entirely based on mobile devices and browsers
- Fidelity: Maintains high visual quality of Gaussian Splatting
- Successfully implements a rapid, high-quality 3D avatar generation system
- Effectively addresses dynamic Gaussian Splatting performance issues through parallel processing and grouped sorting
- WebXR-based implementation ensures cross-platform compatibility
- Mobile device optimization enables convenient usage by ordinary users
- Third-Party Application Dependency: Requires Scaniverse for 3D scanning
- Pose Constraints: Preprocessing requires A-pose, limiting use cases
- Mesh Precision: Background mesh quality may affect final results
- Grouped Sorting Trade-off: Sacrifices some rendering precision for mobile compatibility
- Integrate additional 3D scanning solutions to reduce dependency on specific applications
- Support more diverse initial poses
- Optimize grouped sorting algorithm to improve rendering quality
- Extend to more complex animation scenarios
- Addresses genuine user needs
- Complete end-to-end solution
- Well-designed user experience
- Effective parallel processing approach
- Clever grouped sorting optimization
- Mobile device performance optimization
- Based on widely available mobile devices
- Browser implementation requiring no installation
- Rapid processing time
- Uses VRM standard format
- Facilitates integration with existing ecosystem
- Relatively simple core methodology with limited technical depth
- Primarily engineering optimization rather than algorithmic innovation
- Lacks quantitative comparison with other methods
- Absence of user studies or quality assessment
- Limited testing across different scenarios
- Dependent on third-party Scaniverse application
- Requires specific initial pose
- Insufficient detail on grouped sorting implementation
- Lacks failure case analysis
- Provides reference for Gaussian Splatting applications on mobile devices
- Demonstrates practical system design approach
- High practical value suitable for actual deployment
- Significant implications for metaverse and social media applications
- Based on standard technology stack, easy to reproduce
- Strong potential for open-source release
- Social Media Applications: Rapid personal avatar generation
- Metaverse Platforms: User identity representation
- Virtual Conferences: Enhanced presence
- Gaming Applications: Character customization
- AR/VR Experiences: Personalized virtual avatars
The paper cites 12 related references, primarily covering:
- Gaussian Splatting foundational techniques3
- Human avatar generation methods1,2,4,5,6,8,9,11,12
- 3D reconstruction techniques10
- Commercial scanning applications7
These references adequately cover the related research domain and provide sufficient background support for this work.
Overall Assessment: This is a highly practical systems paper that, while relatively limited in algorithmic innovation, makes important contributions in solving real-world problems and improving accessibility. The system's speed and mobile compatibility provide significant practical value, making it suitable for deployment in real-world applications.