Imagination Helps Visual Reasoning, But Not Yet in Latent SpaceDate: 2026-02-27Fetched: 2026-02-28T01:46:48.584684+00:00AuthorsYou Li, Chi Chen, Yanghao Li, Fanhu Zeng, Kaiyu Huang, Jinan Xu, Maosong SunLinksHFarXivPDFGitHub32Abstract中文摘要English研究表明多模态模型中的潜在视觉推理存在输入-潜在空间与潜在空间-答案的脱节问题,进而提出了CapImagine这一基于文本的方法,其性能优于复杂的潜在空间方法。