Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Published in ICCV, 2025
Recommended citation: Chen, J., Shen, W., Wei, Z., Sun, L., & Zhang, H. Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension. In ICCV 2025. https://openaccess.thecvf.com/content/ICCV2025/papers/Chen_Leveraging_Debiased_Cross-modal_Attention_Maps_and_Code-based_Reasoning_for_Zero-shot_ICCV_2025_paper.pdf
Abstract. This paper proposes a zero-shot referring expression comprehension method that combines debiased cross-modal attention maps with code-based relation reasoning to improve region-text alignment and spatial understanding.
Authors: Juntao Chen, Wen Shen†, Zhihua Wei†, Lijun Sun†, Hongyun Zhang.
