Structure-aware Indoor Scene Reconstruction via Two Levels of Abstraction

2021
ISPRS
三维重建

Fig. 1. Goal of our approach. Our framework starts from a raw mesh as input data (a). The indoor scene is reconstructed as a watertight and compact structure mesh (b) and a detailed scene mesh (c) preserving different levels of abstraction. Note that the texture map can be attached to the scene mesh for visualization use with method of Waechter et al., 2014 (d).

Fig. 2. Overview of our approach. Our algorithm starts from a dense triangular raw mesh generated from the point cloud of indoor scene (a). Then the whole scene is abstracted by 225 planar primitives to represent all parts of the input mesh (b). Among them, 109 planar primitives are selected that best approximate the structure objects of the indoor scene (c) and 27 isolated objects are extracted from nonstructure parts (e). After that, the 109 structure planar primitives are assembled together to form a structure mesh of all structure objects (d). Finally, the scene mesh is the union of the structure mesh and all the 27 non-structure objects which are repaired and simplified (f). Note that the back faces of mesh in (a), (d) and (f) are not shown, and ceiling planes are eliminated in (c) to better visualize the inside environment. Planar primitives in (b) and (c) are approximated by alpha-shape of corresponding triangular facets, each primitive is illustrated by a random color.

Fig. 3. Scene decomposition. First, the input mesh (a) is over-segmented into a large number of planar primitives (b). After that, all the pairs of adjacent quasicoplanr primitives are merged to a bigger one iteratively until a meaningful plane configuration is attained (c). Next, ceiling and floor planes (d), wall planes (e), as well as small structure planes like yellow ones in (f) are detected in a hierarchical manner and compose the structure planes. Finally, isolated non-structure objects are extracted by detecting connected triangular facets in the original mesh.

Fig. 9. Qualitative comparisons with shape approximation methods on RGBD (left) and LIDAR (right) scenes. With the similar number of facets (about 1200), simplified meshes returned by QEM, VSA and Structure preserve most of the large planar structures inside the indoor environment. However, these simplified models shrink at small structures since the existence of noise retained in the input raw meshes (see the cropped region). In contrast, our method produces more compact and structure-aware models where most of these small but important structures are successfully reconstructed.

Fig. 10. Quantitative comparisons with shape approximation methods on complete (left) and partial scenes (right). For complete scene, Structure produces the model that are closest to input raw mesh (see the colored points). While in case of large missing data, our method is robust enough to output a watertight mesh with the best geometric accuracy, while all the three shape approximation methods are disable to repair the holes. Besides, it takes dozens of seconds for our method to process a whole scene which is faster than Structure by one order of magnitude.

Fig. 12. Qualitative comparisons with FloorPlan generation method FloorSP on LIDAR (row 1–2) and RGBD (row 3–5) data. In case of noisy and strong non- Manhattan scenes, FloorSP generates non-manifold (row 3) and self-intersection (row 4 and 5) models. Besides, some walls are also miss-detected (row 1) and incorrectly aligned (row 2, 3 and 5). In contrast, our method is more robust to recover most of the wall structures even for rooms with curvature walls (row 2 and 4).

Fig. 13. Quantitative comparisons against FloorSP on RGBD (top) and LIDAR (middle and bottom) data. Our method produces 3D models that are closer to input wall points (see the colored points) than 2.5D models assembled by walls with a virtual thickness (10 cm) of FloorPlans by FloorSP. In particular, our method exhibits a lower error by recovering small structure details contained in the original mesh such as two close walls.

Fig. 16. Ablation study. While turning off the scene decomposition step (top row), all the detected planes are considered as structure ones and are employed for structure-aware reconstruction. This choice makes the structure mesh contain both structure and non-structure parts. While turning off the local primitive slicing strategy (middle row), all the structure primitives are sliced everywhere inside the bounding box. This method increases the computational time and the size of polyhedral cells exponentially, and leads to a non-compact model with lots of protrusions. In contrast, turning on both of these two ingredients (bottom row), a compact and structure-aware model is reconstructed within an acceptable time. In addition, our scene mesh reveals the best geometric accuracy thanks to the separation of non-structure objects from structure parts.

Fig. 17. Performance on scalable scenes. Given the input raw mesh (top left), our pipeline generates two models with different levels of abstraction: a compact structure mesh ℳs (top right) and a detailed scene mesh ℳt (bottom left). The textured scene mesh is also presented in practice (bottom right).

Acknowledgements

This work was supported in parts by NSFC (U2001206), GD Science and Technology Program (2020A0505100064, 2015A030312015), GD Talent Program (2019JC05X328), DEGP Key Project (2018KZDXM058), Shenzhen Science and Technology Program (RCJC20200714114435012) and Beike fund. The authors would like to thank Beike for providing various types of indoor scenes, Jiacheng Chen for their code and datasets, Liangliang Nan for the comparison tools, as well as Jing Zhao and Mofang Cheng for technical advices.

Bibtex

@article{P2M21,

title={Structure-aware Indoor Scene Reconstruction via Two Levels of Abstraction},

author={Hao Fang and Cihui Pan and Hui Huang},

journal={ISPRS Journal of Photogrammetry and Remote Sensing},

volume={178},

pages={155--170},

year={2021},

}

产品资讯