Sparse Reconstruction for Weakly Supervised Semantic Segmentation
Ke Zhang, Wei Zhang, Yingbin Zheng, Xiangyang Xue
For better understanding and management of massive images, semantic segmentation aims to identify the semantic label of pixels of images, i.e., assign each pixel in an image to one of pre-defined semantic categories.
Most state-of-the-art rely on training set that every pixel of images are annotated, which is time-consuming and limited in resources.
- Weakly supervised: Require merely image-level labels of image for training, make use of abundant labeled images. For example: Google, Baidu, Flickr, and large dataset like ImageNet, Labelme and NUS-WIDE. This suits for needs of modern society.
The main difficulty of faced by the weakly supervised semantic segmentation task:
- With only image-level labels of images in the training set, how to train pixel or region-level classifier?
- Multi-Image Model [Vezhnevets et al., ICCV11,CVPR12]
- CRF models with potentials of superpixel with neighbors intra or inter images, and consistency between latent superpixel label predicted by features and image label
- Positive samples for training are not AVAILABLE !
We use "evaluation" of classifiers instead of
training a classifier
- For a given classifier, the effectiveness is specified by the parameter 𝜃. (Here we use SVM , specified by parameter 𝜃 = <𝑤,𝑏>)
- Described by powerful features, samples of the same class should be in same subspace in high dimensional space
Substantiation is given by the costs of samples reconstructed by the subspace of each semantic class
- Use the samples of the class to obtain the basis the subspace of the class
- Reconstruct both positive samples and negative samples of the class. The costs are calculated as follow:
New criterion for evaluation of classifiers:
- Although we don’t have positive samples of each class, we have negative samples of each class.
for a given classifier of the semantic class, a good classifier means:
- the positive samples classified by it are the positive samples of the class
- Large error in reconstructing by the subspace formed by the subspace of the class
Iterative Merge Update:
- Evaluate the scores of each sampled (w,b)
- Use GMM to fit the conditional distribution
- Evaluate the score of the mean of each Gaussian
- Merge the most familiar two Gaussian and evaluate the new mean.
- Lop step 4 until one Gaussian. Choose (w,b) with best scores as the parameter for the classifier of the class.
- Evaluating classification models instead of training while no positive samples available;
- We firstly learn the subspace by sparse reconstruction on classified positive samples, and then evaluate the classifiers by sparse reconstruction on negative samples.
- An Iterative Merging Update (IMU) algorithm to effectively and efficiently search for the best parameters of the classification models.
[Kohli et al., 2009] P. Kohli, L. Ladick`y, and P.H.S. Torr. Robust higher order potentials for enforcing label consistency. IJCV, 82(3):302–324, 2009.
[Ladicky et al., 2009] L. Ladicky, C. Russell, P. Kohli, and P.H.S. Torr. Associative hierarchical crfs for object class image segmentation. In ICCV, 2009.
[Ladicky et al., 2010] L. Ladicky, C. Russell, P. Kohli, and P. Torr. Graph cut based inference with co-occurrence statistics. ECCV, 2010.
[Munoz et al., 2010] Daniel Munoz, J Andrew Bagnell, and Martial Hebert. Stacked hierarchical labeling. In ECCV, 2010.
[Shotton et al., 2008] J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for image categorization and segmentation. In CVPR, 2008.
[Vezhnevets et al., 2011] A. Vezhnevets, V. Ferrari, and J.M. Buhmann. Weakly supervised semantic segmentation with a multi-image model. In ICCV, 2011.
[Vezhnevets et al., 2012] A. Vezhnevets, V. Ferrari, and J.M. Buhmann. Weakly supervised structured output learning for semantic segmentation. In CVPR, 2012.