Deep Autoencoders for Cross-Modal Retrieval
Abstract
Increased accuracy and affordability of depth sensors such as Kinect has created a great depth-data source for 3D processing. Specifically, 3D model retrieval is attracting attention in the field of computer vision and pattern recognition due to its numerous applications. A cross-domain retrieval approach such as depth image based 3D model retrieval has the challenges of occlusion, noise, and view variability present in both query and training data. In this research, we propose a new supervised deep autoencoder approach followed by semantic modeling to retrieve 3D shapes based on depth images. The key novelty is the two-fold feature abstraction to cope with the incompleteness and ambiguity present in the depth images. First, we develop a supervised autoencoder to extract robust features from both real depth images and synthetic ones rendered from 3D models, which are intended to balance reconstruction and classification capabilities of mix-domain data. We investigate the relation between encoder and decoder layers in a deep autoencoder and claim that an asymmetric structure of a supervised deep autoencoder is more capable of extracting robust features than that of a symmetric one. The asymmetric deep autoencoder features are less invariant to small sample changes in mixed domain data. In addition, semantic modeling of the supervised autoencoder features offers the next level of abstraction to the incompleteness and ambiguity of the depth data. It is interesting that, unlike any other pairwise model structures, the cross-domain retrieval is still possible using only one single deep network trained on real and synthetic data. The experimental results on the NYUD2 and ModelNet10 datasets demonstrate that the proposed supervised method outperforms the recent approaches for cross modal 3D model retrieval.
Collections
- OSU Dissertations [11222]