Abstract
Tremendous amounts of expensive annotated data are a vital ingredient for state-of-the-art 3d hand pose estimation. Therefore, synthetic data has been popularized as annotations are automatically available. However, models trained only with synthetic samples do not generalize to real data, mainly due to the gap between the distribution of synthetic and real data. In this paper, we propose a novel method that seeks to predict the 3d position of the hand using both synthetic and partially-labeled real data. Accordingly, we form a shared latent space between three modalities: synthetic depth image, real depth image, and pose. We demonstrate that by carefully learning the shared latent space, we can find a regression model that is able to generalize to real data. As such, we show that our method produces accurate predictions in both semi-supervised and unsupervised settings. Additionally, the proposed model is capable of generating novel, meaningful, and consistent samples from all of the three domains. We evaluate our method qualitatively and quantitively on two highly competitive benchmarks (i.e., NYU and ICVL) and demonstrate its superiority over the state-of-the-art methods. The source code will be made available at https://github.com/masabdi/LSPS.
Original language | English |
---|---|
Title of host publication | 29th British Machine Vision Conference, BMVC 2018 |
Place of Publication | London UK |
Publisher | British Machine Vision Association |
Publication status | Published - 2019 |
Externally published | Yes |
Event | British Machine Vision Conference 2018 - Newcastle, United Kingdom Duration: 3 Sept 2018 → 6 Sept 2018 Conference number: 29th http://bmvc2018.org/ https://dblp.org/db/conf/bmvc/bmvc2018.html |
Conference
Conference | British Machine Vision Conference 2018 |
---|---|
Abbreviated title | BMVC 2018 |
Country/Territory | United Kingdom |
City | Newcastle |
Period | 3/09/18 → 6/09/18 |
Internet address |