Learning Where To Classify In Multi-View Semantic Segmentation

There is an increasing interest in semantically annotated 3D models, e.g. of cities. The typical approaches start with the semantic labelling of all the images used for the 3D model. Such labelling tends to be very time consuming though. The inherent redundancy among the overlapping images calls for more efficient solutions. This paper proposes an alternative approach that exploits the geometry of a 3D mesh model obtained from multi-view reconstruction. Instead of clustering similar views, we predict the best view before the actual labelling. For this we find the single image part that bests supports the correct semantic labelling of each face of the underlying 3D mesh. Moreover, our single-image approach may surprise because it tends to increase the accuracy of the model labelling when compared to approaches that fuse the labels from multiple images. As a matter of fact, we even go a step further, and only explicitly label a subset of faces (e.g. 10\%), to subsequently fill in the labels of the remaining faces. This leads to a further reduction of computation time, again combined with a gain in accuracy. Compared to a process that starts from the semantic labelling of the images, our method to semantically label 3D models yields accelerations of about 2 orders of magnitude. We tested our multi-view semantic labelling on a variety of street scenes.

Overview

View Reduction!

Learning Where To Classify In Multi-View Semantic Segmentation ECCV 2014, Zurich, Switzerland. (PDF, poster, project)
H. Riemenschneider, A. Bodis-Szomoru, J. Weissenberg, L. Van Gool

Downloads

ETHZ CVL RueMonge 2014 dataset
This 3d+annotation contains semantic segmentations for the dataset. READY!

If you are interested, enter this dataset request form and I will contact you.

This dataset comes with the following data:

2D images for training and testing, labelled in 8 classes
3D mesh (faces, vertices) as a 3D representation
Index files for faces to pixels in each image
Training / testing splits as txt files
Sample files for classification results
Sample source code for loading and evaluation (see below)

This sample source code allows the following functions

Evaluate 2D/3D labeling results by (classwise or PASCAL IOU) accuracy.
Examples for loading 2D image data into the 3D mesh (color, labels, probabilities)
Fusion of multiview data by the SUMALL principle (see paper)
Mesh labelling optimization via a graphcut approach.
Various helper tools

This dataset allows the evaluation of semantic classification methods in the following tasks:

TASK 1 - Image Labelling - vanilla 2d img labelling task
TASK 2 - Mesh Labelling - collect or reason in 3d to label mesh
TASK 3 - Pointcloud Labelling - collect or reason in 3d to label point cloud / mesh
TASK 4 - View selection - reason which images to skip for features & classification via view reduction (ECCV paper)
TASK 5 - Scene Coverage - reason which images to skip for features & classification via scene coverage (ECCV paper)

The protocol for training / testing is:

2D Training images are 113 files in 2D (see above: north side)
2D Testing images are the 119 (labelled) images (south side) (on pixel level)
3D Training images are 113 files (see above: north side)
3D Testing images are 119 (labelled) and 202 RGB images (south side) (on mesh face)

Results for ETHZ CVL RueMonge 2014 for the different tasks are:

Source	TASK1	TASK2	TASK3	TASK4	TASK5
-	[2D IOU %]	[3D IOU %]	[3D IOU %]	[Speedup]	[Speedup]
[1] MAP	38.72%	35.77%	-	-	-
[1] GCO	40.92%	37.33%	-	11.9x	7.1x
[1] GCO+recode	41.34%	41.92%	42.32%	-	-

[1] Learning Where To Classify In Multi-View Semantic Segmentation, H. Riemenschneider, A. Bodis-Szomoru, J. Weissenberg, L. Van Gool, ECCV 2014

This page has been edited by Hayko Riemenschneider