Improving Robustness of 3D Reconstruction for Sparse Captures and Challenging Environments
The applications of 3D modeling in the world are ubiquitous. For example, the construction industry models ongoing projects to monitor progress. The housing industry uses 360 images to develop 3D floor plans to help customers visualize home interiors. These applications rely on real-world data that poses challenges for 3D modeling systems. In this dissertation, I discuss the specific challenges that arise during the modeling process, and how we address them. First, images capture the scene of interest, which can be onerous as planned capture paths can result in reconstruction failures or yield incomplete models. Our feature track simulator uses a camera trajectory and scene geometry to evaluate planned paths prior to the collection process. Next, a structure from motion (SfM) system reconstructs the scene, and outputs camera parameters and image poses. Images that contain repeated or duplicate structures present ambiguities and can cause catastrophic failures in reconstruction. Our approach discounts matches on repeated structures and estimates correct poses using a set of reliable images in the resectioning process. Then, a multi-view stereo (MVS) system uses the camera parameters and image poses to generate a dense model. MVS systems require sufficient overlap between images for accurate depth estimation, which is often burdensome and costly. Our solution detects and completes planar surfaces with only one or two views, and circumvents the overlap requirement.
Addressing Low-Shot MVS by Detecting and Completing Planar Surfaces
Rajbir Kataria, Zhizhong Li, Joseph DeGol, Derek Hoiem
Multiview stereo (MVS) systems typically require at least three views to reconstruct each scene point. This requirement increases the burden of image captures and leads to incomplete reconstructions. Our main idea to address this low-shot MVS problem is to detect planar surfaces in depth maps generated by any MVS system and complete these surfaces by reformulating the MVS depth prediction task to a simpler planar surface assignment problem. We use single and multi-view cues (when available) and employ the DeepLabv3 architecture to infer the extent of planar regions and accurately complete missing surfaces. We show that our approach reconstructs portions of surfaces viewed by only one image, yielding denser models than existing MVS systems.
Improving Structure from Motion with Reliable Resectioning
Rajbir Kataria, Joseph DeGol, Derek Hoiem
Paper Video Project page
A common cause of failure in structure-from-motion (SfM) is misregistration of images due to visual patterns that occur in more than one scene location. Most work to solve this problem ignores image matches that are inconsistent according to the statistics of the tracks graph, but these methods often need to be tuned for each dataset and can lead to reduced completeness of normally good reconstructions when valid matches are removed. Our key idea is to address ambiguity directly in the reconstruction process by using only a subset of reliable matches to determine resectioning order and the initial pose. We also introduce a new measure of similarity that adjusts the influence of feature matches based on their track length. We show this improves reconstruction robustness for two state-of-the-art SfM algorithms on many diverse datasets.
FEATS: Synthetic Feature Tracks for Structure from Motion Evaluation
Joseph DeGol, Jae Yong Lee, Rajbir Kataria, , Timothy Bretl, Derek Hoiem
Paper Project page
We present FEATS (Feature Extraction and Tracking Simulator), that synthesizes feature tracks using a camera trajectory and scene geometry (e.g. CAD, laser, multi-view stereo). We introduce 2D feature and matching noise models that can be controlled through a simple set of parameters. We also provide a new dataset of images and ground truth camera pose. We process this data (and a synthetic version) with several current SfM algorithms and show that the synthetic tracks are representative of the real tracks. We then demonstrate several practical uses of FEATS: (1) we generate hundreds of trajectories with varying noise and show that COLMAP is more robust to noise than OpenSfM and VisualSfM; and (2) we calculate 3D point error and show that accurate camera pose estimates do not guarantee accurate 3D maps.