Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Computer Sciences | Databases and Information Systems | Data Science
In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2−3% over state-of-the-art models when evaluated on 10 omics data sets.
Mitra, S., Saha, S. & Hasanuzzaman, M. Multi-view clustering for multi-omics data using unified embedding. Sci Rep 10, 13654 (2020). https://doi.org/10.1038/s41598-020-70229-1