Abstract
Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining.
Original language | English |
---|---|
Pages (from-to) | 57-97 |
Number of pages | 41 |
Journal | Data Mining and Knowledge Discovery |
Volume | 26 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 2013 |
Externally published | Yes |
Keywords
- Auxiliary sources
- Multi-task clustering
- Nonnegative shared subspace learning
- Transfer learning