ISTD PhD Oral Defense presented by Gong Jia – Towards Data Efficient, Reliable and Flexible 3D Digital Human Modeling
ISTD PhD Oral Defense presented by Gong Jia – Towards Data Efficient, Reliable and Flexible 3D Digital Human Modeling
Abstract
3D digital human has been widely used in fields like virtual reality, fashion, and film/game production. Traditionally, creating and animating digital humans requires skilled engineers and expensive equipment, typically accessible only to large companies. Thus, developing deep learning tools to democratize the creation and animation of digital humans is urgently needed.
However, existing deep learning tools face challenges that hinder their practical application: 1. Huge Data Requirements: Predicting human poses from monocular RGB images is popular for low-cost digital human animation, but training deep learning algorithms for this task requires extensive annotated data, which is costly to collect in varied environments. 2. Performance Stability: Predicting 3D human poses from monocular data remains unreliable due to depth ambiguity and occlusion, leading to inconsistent performance. 3. Customization Limitations: Generating digital humans via deep learning-based tools is also a popular way to reduce the cost. However, current methods often treat digital humans as uniform entities, lacking the ability to separate garments, preventing users from decorating their digital humans conveniently.
In this thesis, we aim to address these critical challenges to improve deep learning-based techniques for 3D digital human modeling. To reduce the human effort for collecting annotated data, we propose a Meta Agent Teaming Active Learning (MATAL) framework that actively selects and labels informative images for effective pose estimator training. MATAL formulates the image selection process as a Markov Decision Process, learning an optimal sampling policy that maximizes the pose estimator’s performance. Experimental results on human hand and body pose estimation benchmarks demonstrate that our method can save around 40% of labeling efforts compared to state-of-the-art active learning frameworks.
Next, we explore a diffusion-based 3D pose estimation framework (DiffPose) to handle instability. Our DiffPose framework formulates 3D pose estimation as a reverse diffusion process, which regards the 3D pose estimation as a process of transferring a uncertain 3D pose to a certain one and solves it via diffusion models. Our results show that the proposed DiffPose significantly outperforms existing methods on widely used pose estimation benchmarks. Furthermore, we extend our Diffpose framework to the monocular human mesh reconstruction task and also achieve remarkable performance.
Finally, to enable convenient customization of digital humans, we present the LAyered Gaussian Avatar (LAGA), a framework for creating high-fidelity decomposable digital humans with diverse garments. In this work, we decouple garments from digital humans during the generation process, which allows users to conveniently edit the clothes of their digital humans. Extensive experimentation demonstrates that LAGA surpasses existing methods in generating 3D clothed humans.
Speaker’s Profile
Gong Jia received his B.Eng degree in Optoelectronic Engineering from Chongqing University, China. He is currently pursuing his Ph.D. in the Information Systems Technology and Design pillar at the Singapore University of Technology and Design. His research interests include human pose estimation, human mesh reconstruction, avatars, and active learning.