Abstract

overview

Neural Radiance Field (NeRF) has been a mainstream in novel view synthesis with its remarkable quality of rendered images and simple architecture. Although NeRF has been developed in various directions improving continuously its performance, the necessity of a dense set of multi-view images still exists as a stumbling block to progress for practical application. In this work, we propose FlipNeRF, a novel regularization method for few-shot novel view synthesis by utilizing our proposed flipped reflection rays. The flipped reflection rays are explicitly derived from the input ray directions and estimated normal vectors, and play a role of effective additional training rays while enabling to estimate more accurate surface normals and learn the 3D geometry effectively. Since the surface normal and the scene depth are both derived from the estimated densities along a ray, the accurate surface normal leads to more exact depth estimation, which is a key factor for few-shot novel view synthesis. Furthermore, with our proposed Uncertainty-aware Emptiness Loss and Bottleneck Feature Consistency Loss, FlipNeRF is able to estimate more reliable outputs with reducing floating artifacts effectively across the different scene structures, and enhance the feature-level consistency between the pair of the rays cast toward the photo-consistent pixels without any additional feature extractor, respectively. Our FlipNeRF achieves the SOTA performance on the multiple benchmarks across all the scenarios.

Video

Flipped Reflection Ray

We exploit a batch of flipped reflection rays r′ ∈ R′ as extra training resources, which are derived from the original input ray directions and estimated surface normals.


First, we derive a flipped reflection direction from the original one and the estimated surface normal:


To generate the additional training rays based on the flipped reflection direction, we need a set of imaginary ray origins located in a suitable space considering the hitting point and the original input ray origins. Since the vanilla NeRF models, which are trained with a dense set of images, tend to have the blending weight distribution whose peak is located on the point around the object surface ps = o + tsd, i.e., the s-th sample whose blending weight is the highest along a ray, we place o′ so that the s-th sample of r′ is ps:


resulting in our proposed flipped reflection ray, r′(t) = o′ + td′.

However, since the estimated surface normals, which are used to derive the flipped reflection directions, are not the ground truth but the estimation, there exists a concern that even miscreated r′, which do not satisfy photo-consistency, can be used for training. To address this problem, we mask the ineffective r′ by considering the angle θ between the estimated surface normals and reflection directions as follows:


where −(d̂ · n̂) amounts to cosθ of original input rays and normal vectors, and τ indicates the threshold for filtering the invalid rays, which we set as 90 unless specified. Through this masking process, only r′ which are cast toward the photo-consistent point can be remained as we intend.

Uncertainty-aware Regularization

The naive application of existing regularization techniques with limited training views might not be consistently helpful across the different scenes due to the scene-by-scene different structure, resulting in overall performance degradation. To address this problem, we propose Uncertainty-aware Emptiness Loss (UE Loss), which reduces the floating artifacts consistently over the different scenes by considering the output uncertainty:


ρ amounts to the average of the summation of estimated scale parameters of RGB color distributions from all samples along a ray, which we use as the uncertainty of a ray. By our proposed UE Loss, we are able to regularize the blending weights adaptively, i.e., the more uncertain a ray is, the more penalized the blending weights along the ray are. It is able to reduce floating artifacts consistently across the scenes with different structures and enables to synthesize more reliable outputs by considering uncertainty.

With our proposed UE Loss, ours improves both of the rendering quality and the reliability of the model outputs by a large margin compared to MixNeRF.
(From the left to the right: RGB, RGB_Std, Depth, Depth_Std)

MixNeRF

FlipNeRF (Ours)

Bottleneck Feature Consistency

We encourage the consistency of bottleneck feature distributions between r and r′, which are intermediate feature vectors, i.e., outputs of the spatial MLP of NeRF, by Jensen-Shannon Divergence (JSD):


where ψ(·), b and b′ denote the softmax function, the bottleneck features of r and r′, respectively. We regulate the pair of features effectively by enhancing consistency between bottleneck features without depending on additional feature extractors.

Comparison with Baselines

Our FlipNeRF estimates more accurate surface normals than other baselines, leading to the performance gain with better reconstructed fine details from limited input views.
(From the left to the right: mip-NeRF, Ref-NeRF, MixNeRF, FlipNeRF (Ours))

Citation

Acknowledgements

This work was supported by NRF (2021R1A2C3006659) and IITP (2021-0-01343) both funded by Korean Government. It was also supported by Samsung Electronics (IO201223-08260-01).