Referring expression image segmentation aims at segmenting out the object described by a natural language query. Due to the diversity of visual content and language descriptions, it is very challenging to accurately model the correspondence between the vision and language, which inevitably produces some undesired segmentation objects from the queries. In this paper, we propose a query reconstruction network (QRN) to build more consistent corresponding relations between the language queries and object segmentation results. QRN not only generates segmentations from the queries and images but also reversely reconstructs the queries from the segmentations and the images. Through query reconstruction, QRN can confirm the vision-language consistency between the segmentations and queries. In the inference stage, for inconsistent segmentations and queries, we propose an iterative segmentation correction (ISC) method to correct them. ISC takes the difference between the reconstructed and input queries as a loss to optimize the proposed QRN. Then, the proposed QRN can generate new segmentations and queries. By iterative optimization, the segmentations can be gradually corrected. Extensive experiments on four referring expression image segmentation databases demonstrate the effectiveness of the proposed method.
- Image segmentation
- referring expression image segmentation