摘要语言

Overview

  • Date: 2026-02-27
  • Total Papers: 28
  • Total Upvotes: 619
  • Papers with GitHub: 16

Key Takeaways

  1. 通用世界模型(General World Models)的三一致性原则研究获得最高社区关注。
  2. 诊断驱动迭代训练(Diagnostic-Driven Iterative Training)成为提升多模态大模型(LMM)性能的关键范式。
  3. 面向真实场景的多模态Agent基准测试(MobilityBench、OmniGAIA)推动具身智能评估体系完善。
  4. Agent系统优化呈现多样化路径,涵盖记忆增强、多Agent信息流剪枝与长程搜索效率提升。

Notable Papers

  • [2602.23152] The Trinity of Consistency as a Defining Principle for General World Models (👍168): 提出World Models的三大一致性原则(模态、空间、时间)并建立评估基准。
  • [2602.22859] From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models (👍142): 通过诊断驱动渐进进化机制实现大模型的持续迭代优化与盲点修复。
  • [2602.22638] MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios (👍87): 构建面向真实出行场景的可扩展基准,评估LLM-based路径规划Agent。
  • [2602.22897] OmniGAIA: Towards Native Omni-Modal AI Agents (👍46): 建立跨视频、音频、图像的复杂推理评估基准,并提升OmniAtlas Agent的工具使用能力。
  • [2602.22766] Imagination Helps Visual Reasoning, But Not Yet in Latent Space (👍32): 揭示多模态模型中潜在视觉推理的输入-潜在空间断裂问题,提出CapImagi改进方案。

Date: 2026-02-27 | Source: moonshotai/kimi-k2.5

2026-02-27

The Trinity of Consistency as a Defining Principle for General World Models

Authors: Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Xi bai, Chang Yu, Yumou Liu, Junnan Zhu, Xuanhe Zhou, Jintao Chen, Xiaobin Hu, Shancheng Pang, Bihui Yu, Ran He, Zhen Lei, Stan Z. Li, Conghui He

世界模型需要模态、空间和时间三种一致性原则以实现通用人工智能,并提出了一个评估多模态学习系统的基准。

OmniGAIA: Towards Native Omni-Modal AI Agents

Authors: Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Shijian Wang, Guanting Dong, Jiajie Jin, Hao Wang, Yinuo Wang, Ji-Rong Wen, Yuan Lu, Zhicheng Dou

OmniGAIA 基准评估多模态智能体在视频、音频和图像模态上的复杂推理任务,OmniAtlas 智能体则通过事后引导的树探索和 OmniDPO 微调提升工具使用能力。

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Authors: Qianben Chen, Tianrui Qin, King Zhu, Qiexiang Wang, Chengjun Yu, Shu Xu, Jiaqi Wu, Jiayu Zhang, Xinpeng Liu, Xin Gui, Jingyi Cao, Piaohong Wang, Dingfeng Shi, He Zhu, Tiannan Wang, Yuqing Wang, Maojia Song, Tianyu Zheng, Ge Zhang, Jian Yang, Jiaheng Liu, Minghao Liu

名为SMTL的深度学习框架通过以并行证据获取替代顺序推理,改进了高效长程智能体搜索,在多个研究基准上达到了最先进的性能,同时将推理步骤减少了70.7%。

MediX-R1: Open Ended Medical Reinforcement Learning

Authors: Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Anwer, Hisham Cholakkal

MediX-R1提出了一种面向医疗多模态大语言模型的开放式强化学习框架,该框架利用多样化的奖励信号和基于LLM的评估,以提升超越多选格式的临床推理能力。

VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale

Authors: Sven Elflein, Ruilong Li, Sérgio Agostinho, Zan Gojcic, Laura Leal-Taixé, Qunjie Zhou, Aljosa Osep

VGG-T³通过测试时训练将可变长度键值表示转换为固定尺寸MLP,解决了三维重建中的可扩展性问题,实现了随输入视图数量的线性扩展,并相比传统softmax注意力方法获得显著加速。

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Authors: Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths, Katherine M. Collins, José Hernández-Orallo, Phillip Isola, Samuel J. Gershman, Joshua B. Tenenbaum

AI系统在各类人工设计游戏中接受评估以衡量通用智能,结果显示相较于人类玩家,其性能存在显著差距,特别是在复杂认知任务中。

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Authors: Zezhou Wang, Youjie Li, Zhiqi Lin, Jiacheng Yang, Cong Xie, Guanyu Feng, Zheng Zhong, Ziyue Huang, Hongyu Zhu, Zhi Zhang, Yanghua Peng, Xin Liu

veScale-FSDP 提出了一种重新设计的完全分片数据并行系统,采用灵活分片和结构感知规划,以提升大规模模型训练的可扩展性和效率。

General Agent Evaluation

Authors: Elron Bandel, Asaf Yehudai, Lilach Eden, Yehoshua Sagron, Yotam Perlitz, Elad Venezian, Natalia Razinkov, Natan Ergas, Shlomit Shachor Ifergan, Segev Shlomov, Michal Jacovi, Leshem Choshen, Liat Ein-Dor, Yoav Katz, Michal Shmueli-Scheuer

尽管已有一些有前景的实现,通用智能体仍然发展不足,需要系统化的评估框架和基准来评估其在多样化环境中的真正通用性。

GeoWorld: Geometric World Models

Authors: Zeyu Zhang, Danning Li, Ian Reid, Richard Hartley

GeoWorld 通过利用双曲几何保留潜在状态结构并改进长程预测性能,解决了基于能量的预测世界模型的局限性。

No One Size Fits All: QueryBandits for Hallucination Mitigation

Authors: Nicole Cho, William Watson, Alec Koppel, Sumitra Ganesh, Manuela Veloso

引入了一种名为 QueryBandits 的上下文赌博机框架,用于自适应选择最优查询重写策略以减少大语言模型中的幻觉,其性能优于静态策略,并可与闭源模型协同部署。

2026-02-27