Temporal Gaussian Hierarchy

In the field of computer graphics, the ability to reconstruct long volumetric videos from multi-view RGB videos has been a challenge. Recent methods have shown promise in dynamic view synthesis using 4D representations, but they are often limited to short video clips and require significant memory resources for longer videos. In response to this issue, a new approach called Temporal Gaussian Hierarchy has been proposed in a paper presented at SIGGRAPH Asia 2024.

The key idea behind Temporal Gaussian Hierarchy is to efficiently model long volumetric videos by identifying different levels of temporal redundancy in dynamic scenes. By dividing the video into multiple temporal segments, each storing a set of 4D Gaussians, the method can effectively capture varying granularities of motion. This hierarchical structure allows for compact representation of video dynamics while maintaining high rendering quality.

The appearance model of the proposed method leverages gradient thresholding to obtain sparse Spherical Harmonics coefficients, resulting in efficient storage and preserving view-dependent effects. The experimental results demonstrate the superiority of Temporal Gaussian Hierarchy over alternative methods in terms of training cost, rendering speed, and storage usage. This approach is the first to efficiently handle minutes of volumetric video data while achieving state-of-the-art rendering quality.

Real-time demos on various datasets showcase the effectiveness of the method in generating compact volumetric videos with minimal training and memory usage. By achieving real-time rendering capabilities, the Temporal Gaussian Hierarchy offers a practical solution for handling long video sequences with complex dynamics.

Methodology

The proposed method generates a compact volumetric video from a long multi-view video sequence while minimizing training and memory usage. The hierarchical structure of the approach divides the video into temporal segments, each containing 4D Gaussians to parametrize scenes. The appearance model utilizes gradient thresholding to obtain sparse Spherical Harmonics coefficients for efficient storage.

Real-Time Demos

DNA-Rendering Dataset
Sports Dataset
MobileStage Dataset
CMU-Panoptic Dataset
Neural3DV Dataset
ENeRF-Outdoor Dataset

Baseline Comparisons

Comparisons with other methods such as 4K4D, K-Planes, and ENeRF highlight the advantages of the Temporal Gaussian Hierarchy in terms of efficiency and rendering quality.

Visit Site