Learning Where To Classify In Multi-View Semantic Segmentation

There is an increasing interest in semantically annotated 3D models, e.g. of cities. The typical approaches start with the semantic labelling of all the images used for the 3D model. Such labelling tends to be very time consuming though. The inherent redundancy among the overlapping images calls for more efficient solutions. This paper proposes an alternative approach that exploits the geometry of a 3D mesh model obtained from multi-view reconstruction. Instead of clustering similar views, we predict the best view before the actual labelling. For this we find the single image part that bests supports the correct semantic labelling of each face of the underlying 3D mesh. Moreover, our single-image approach may surprise because it tends to increase the accuracy of the model labelling when compared to approaches that fuse the labels from multiple images. As a matter of fact, we even go a step further, and only explicitly label a subset of faces (e.g. 10\%), to subsequently fill in the labels of the remaining faces. This leads to a further reduction of computation time, again combined with a gain in accuracy. Compared to a process that starts from the semantic labelling of the images, our method to semantically label 3D models yields accelerations of about 2 orders of magnitude. We tested our multi-view semantic labelling on a variety of street scenes.

Overview

View Reduction! View Reduction! View Reduction!

Learning Where To Classify In Multi-View Semantic Segmentation ECCV 2014, Zurich, Switzerland. (PDF, poster, project)
H. Riemenschneider, A. Bodis-Szomoru, J. Weissenberg, L. Van Gool


Downloads


ETHZ CVL RueMonge 2014 dataset
This 3d+annotation contains semantic segmentations for the dataset. READY!



If you are interested, enter this dataset request form and I will contact you.



This dataset comes with the following data:

  1. 2D images for training and testing, labelled in 8 classes
  2. 3D mesh (faces, vertices) as a 3D representation
  3. Index files for faces to pixels in each image
  4. Training / testing splits as txt files
  5. Sample files for classification results
  6. Sample source code for loading and evaluation (see below)
This sample source code allows the following functions
  1. Evaluate 2D/3D labeling results by (classwise or PASCAL IOU) accuracy.
  2. Examples for loading 2D image data into the 3D mesh (color, labels, probabilities)
  3. Fusion of multiview data by the SUMALL principle (see paper)
  4. Mesh labelling optimization via a graphcut approach.
  5. Various helper tools
This dataset allows the evaluation of semantic classification methods in the following tasks: The protocol for training / testing is: Results for ETHZ CVL RueMonge 2014 for the different tasks are:

SourceTASK1TASK2TASK3TASK4TASK5
-[2D IOU %][3D IOU %][3D IOU %][Speedup][Speedup]
[1] MAP 38.72%35.77%---
[1] GCO 40.92%37.33%-11.9x7.1x
[1] GCO+recode41.34%41.92%42.32%--

[1] Learning Where To Classify In Multi-View Semantic Segmentation, H. Riemenschneider, A. Bodis-Szomoru, J. Weissenberg, L. Van Gool, ECCV 2014

This page has been edited by Hayko Riemenschneider