论文标题
DeepVideOMVS:视频上的多视图立体声,带有复发时空融合
DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion
论文作者
论文摘要
我们在姿势的视频流上提出了一种在线多视图深度预测方法,其中以前时间步骤计算的场景几何信息以有效且几何形式合理的方式传播到当前时间步骤。我们方法的骨干是一种实时,轻巧的编码器,它依赖于从成对的图像中计算出的成本量。我们通过将Convlstm单元放在瓶颈层中扩展,该孔隙层压缩了其状态中的过去信息。新颖性在于通过考虑时间步骤之间的观点变化来传播单元的隐藏状态。在给定的时间步中,我们使用先前的深度预测将先前的隐藏状态扭曲到当前的摄像机平面中。我们的扩展只带来了一小部分计算时间和记忆消耗的开销,同时大大改善了深度预测。结果,我们在数百个室内场景中的大多数评估指标上胜过现有的最新多视图立体声方法,同时保持实时性能。可用代码:https://github.com/ardaduz/deep-video-mvs
We propose an online multi-view depth prediction approach on posed video streams, where the scene geometry information computed in the previous time steps is propagated to the current time step in an efficient and geometrically plausible way. The backbone of our approach is a real-time capable, lightweight encoder-decoder that relies on cost volumes computed from pairs of images. We extend it by placing a ConvLSTM cell at the bottleneck layer, which compresses an arbitrary amount of past information in its states. The novelty lies in propagating the hidden state of the cell by accounting for the viewpoint changes between time steps. At a given time step, we warp the previous hidden state into the current camera plane using the previous depth prediction. Our extension brings only a small overhead of computation time and memory consumption, while improving the depth predictions significantly. As a result, we outperform the existing state-of-the-art multi-view stereo methods on most of the evaluated metrics in hundreds of indoor scenes while maintaining a real-time performance. Code available: https://github.com/ardaduz/deep-video-mvs
