Categorical Deformable Neural Representation
Foreground objects reconstruction, both rigid and non-rigid.
This is the project that I’m primarily working on.
The target of this project is to reconstruct all kinds of foreground objects in various dynamic scenes with relatively low complexity.
In this project, only foreground obejcts are taken into account, while other factors like background street scenes and sky model, are ignored. The foreground objects include rigid vehicles and non-rigid people and animals. I design categorical neural representations so that it can be avoided to assign too many models for every objects.
By learning categorical representations, although it is very difficult to collect enough multi-view images for each object in real-world scenes, a large number of images of different instances of the same category can be used together to complete the shape representation of the category.
Synthetic Dataset
I briefly verify my method on a synthetic dataset. Below is a toy demo on ShapeNetV2.
Realistic Dataset
When I want to extend my experiemnt on street scenes, existed datasets like nuScenes and WaymoDataset lack the crucial tracking and segmentation for each object. Thus, I recollected a dataset based on these two datasets and SOTA MOTS algorithms. The illustration of what is in the new dataset is shown below.
The pedestrian in the bounding box is my target to reconstruct. Tracking and Segmentation information are needed together with camera poses. All images are shown below in time order.
I try to collect as many viewpoints as possible, and here is the path of camera.
Results on New Dataset
For rigid vehicles, the time-varying module could be excluded. The result of vehicle reconstruction on the new dataset is displayed below.
It is much more difficult to represent deformbale shape, such as pedestrians. I introduce the time varient into my network. The categorical deformation and motion warping control the final shape in a cascaded manner.
To further enhance the efficiency of the reconstruction process, I also investigate the voxelized representations of dynamic objects.
With the neural representations, we can use them to reproduce the 3D scenes, or even edit and recompose the representations to create a new scene.
(New Update is in progress)
P.S. In the real-world datasets, pedestrians tend to wear backpacks and take umbrellas, which brings some obstacles for methods using SMPL. I will focus on solving this problem in the future.