# Multi-View Embedding Learning for Incompletely Labeled Data

## Wei Zhang, Ke Zhang, Pan Gu, Xiangyang Xue

[pdf]

### Introduction

With the recent development of Web 2.0, many content sharing platforms, e.g., Flickr, PhotoStuff, among others, have provided annotation functions to enable users to tag images at anytime from anywhere. Therefore, there already exist a large number of images tagged with words describing the image contents perceived by human users. These tagged image databases contain rich information about the semantic meanings of the images.

The main characteristics of current multi media data are as follows.

• Multiple Feature & High-dimensionality
• Multi-concept & incomplete-labeled

Figure 1. Example images. At first, They can be usually described from different views, by multiple features ,most of which are high dimensional. There is an apparent intractability of systematically searching through a high-dimensional space. In the second, most of these images is incompletely-labeled, look at the images shown in the slide, only opartial labels for each image is annotated.

### Our Approach

To enhance the efficiency in high-dimension heterogeneous features, We learn compact low-dimension embedding that captures:

#### 1. Feature correlations

To identify the similarity between samples represented by multiple features. Each datum is represented by $J$ heterogeneous features, $i,j = 1,…,J$

##### Intra feature
$w_{ii}(u,v)=exp\{-\gamma\|x_{u}^{(i)}-x_{v}^{(i)}\|^2\}$

##### Inter feature
$w_{ij}(u,v)=exp\{-\sigma\|{\Phi^i}^\top x_{u}^{(i)}-{\Phi^j}^\top x_{v}^{(j)}\|^2\}$

We use CCA technique to obtain matrices $\Phi^i$ and $\Phi^j$ which transfer heterogeneous features into a uniform space. Similarity between two images among heterogeneous features is calculated as follows:

$p(u\to v|u)p(u)=\sum_{i} \frac{w_{ii}(u, v)}{vol_{ii} V}p(G_{ii})+\sum_{i\neq j}\frac{w_{ij}(u, v)}{vol_{ij} V}p(G_{ij})$

$vol_{ij} V = \sum_{u\in V, v\in V} w_{ij}(u, v)$

$p(G_{ij})\ denote\ the\ prior\ probabilities\ of\ the\ random\ walker\ choosing\ the\ graph\ G_{ij}$

#### 2. Label correlations

In real world, semantic concepts usually do not appear independently but occur correlatively. The correlation between concepts $\rho(s,t)$ can be initializ-ed as the harmonic mean of the empirical conditional probabilities:

$\rho(s,t) =\frac{2p(t\lvert s)p(s\lvert t)}{p(t\lvert s)+p(s\lvert t)}$

where the empirical conditional probability $p(t\lvert s)=\frac{\sum_{u=1}^{l}(y^{s}_{u})(y^{t}_{u})}{2\sum_{u=1}^{l}(y^{s}_{u})}$ is derived from the labeled samples and measure the co-occurrence of concept pair on the given data.

#### 3. Feature-label associations

By mapping data from multiple feature spaces to the embedding space and to the concept space, we learn the embedding which preserves the neighborhood context in the original spaces, and complete the labels at the same time.There is semantic gap between the input multi-view feature space and the semantic concept space; and the compact embedding space can be looked on as the bridge between the above spaces.

#### 4. Objective

where $\textbf{col}(\hat{Y},u)$ denotes the u-th column of the matrix $\hat{\Y}$, which is the estimated multi-label vector for the sample $x_u$; $\textbf{row}(\hat{Y},s)$ denotes the s-th row of the matrix $\hat{Y}$, which indicates for the s-th concept which samples are positive while the others are negative. Solving the objective is is difficult and we can relax the domain of $\hat{Y}$ from $\{-1,1\}^{c\times n}$ to $[-1,1]^{c\times n}$. $\rho(s,t)$ is defined to capture the correlations between concepts s and t. $\theta_1$ and $\theta_2$ are the tradeoff parameters.

### Experiments

We experimentally evaluate the performance of the proposed method, denoted by ’ours’, and compare it with the state-of- the-art methods: WELL[Sun et al., 2010]and PU WLR[Lee and Liu, 2003]. PU WLR [Lee and Liu, 2003] is a method learning with Positive and Unlabeled data using Weighted Logistic Regression; WELL (WEak Label Learning) [Sun et al., 2010] is the method designed for incompletely labeled dataset. Furthermore, we also evaluate the degenerated ver- sion of our method denoted by ’ours−‘ where the multiple heterogeneous features are simply concatenated into a high- dimensional vector without capturing inter-feature correla- tions. We conduct experimental evaluations on three image datasets: MSRC, LabelMe [Russell et al., 2008] and NUS- WIDE[Chua et al., 2009].

Figure 2. Performance of the proposed method in comparison with the state-of-the-art on MSRC (left) and lableme (right) image dataset. The dimension of the learned embedding is set r=64.

Figure 3. Left: Performance of the proposed method in comparison with the state-of-the-art on NUS-WIDE(left) image dataset. The dimension of the learned embedding is set r=64.
Right: Performance of the proposed method 'ours' and the degenerated version 'ours−' vs. the dimensions of embedding space on MSRC image dataset.

### Conclusion

In this paper we propose a novel method to learn compact embedding that captures inter-feature correlations, inter-label correlations, and feature-label associations simultaneously from multi-view incompletely-labeled data. By mapping data from multiple feature spaces to the embedding space and to the concept space, we learn the embedding which preserves theneighborhood contextintheoriginal spaces, andcomplete the labels at the same time. There is semantic gap between the input multi-view feature spaces and the semantic concept space; and the compact embedding space can be looked on as the bridge between the above spaces.

### Reference

[Lee and Liu, 2003] Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, 2003.

[Sun et al., 2010] Yu-Yin Sun, Yin Zhang, and Zhi-Hua Zhou. Multi-label learning with weak label. In AAAI, 2010.

[back to index]