Learning preference models from human generated data is an important task in modern information processing systems. Its popular setting consists of simple input ratings, assigned with numerical values to indicate their relevancy with respect to a specific query. Since ratings are often specified within a small range, several objects may have the same ratings, thus creating ties among objects for a given query. Dealing with this phenomena presents a general problem of modelling preferences in the presence of ties and being query-specific. To this end, we present in this paper a novel approach by constructing probabilistic models directly on the collection of objects exploiting the combinatorial structure induced by the ties among them. The proposed probabilistic setting allows exploration of a super-exponential combinatorial state-space with unknown numbers of partitions and unknown order among them. Learning and inference in such a large state-space are challenging, and yet we present in this paper efficient algorithms to perform these tasks. Our approach exploits discrete choice theory, imposing generative process such that the finite set of objects is partitioned into subsets in a stagewise procedure, and thus reducing the state-space at each stage significantly. Efficient Markov chain Monte Carlo algorithms are then presented for the proposed models. We demonstrate that the model can potentially be trained in a large-scale setting of hundreds of thousands objects using an ordinary computer. In fact, in some special cases with appropriate model specification, our models can be learned in linear time. We evaluate the models on two application areas: (i) document ranking with the data from the Yahoo! challenge and (ii) collaborative filtering with movie data. We demonstrate that the models are competitive against state-of-the-arts.
- Collaborative filtering
- Preference learning
- Probabilistic ordered partition model
- Probabilistic reasoning
- Set-based ranking