|
Education
|
Research Experience
- May. 2024 ~ Sep. 2025: RA at
Shanghai Jiao Tong University,
advised by
Weinan Zhang.
Focus on Multi-Agent Reinforcement Fine-Tuning, Deep Reinforcement Learning,
Large Decision/Action Models and Agent Technology.
- Sep. 2023 ~ Mar. 2024: RA at
Tsinghua University,
advised by
Ju Ren.
Focus on Deep Reinforcement Learning and RLHF/RLAIF.
|
Publications
Is monolithic scaling the only path to AGI? This paper challenges the dogma that purely scaling a single model is enough to achieve universal super-intelligence. Instead, we identify Agentic AI as the necessary evolution for handling complex, real-world task distributions to achieve AGI in the human world. Through concrete theoretical derivations, we contrast the optimization constraints of monolithic learners against the efficiency of Agentic systems, evolving from simple routing mechanisms to general Directed Acyclic Graphs (DAGs) of Agents. We demonstrate that Agentic AI offers superior generalization and efficiency. Finally, we reinterpret the instability of current multi-agent frameworks and call for more future actions on Agentic AI.
Robust Function-Calling for On-Device Language Model via Function Masking
Qiqiang Lin*,
Muning Wen*,
Qiuying Peng*,
Guanyu Nie,
Junwei Liao,
Jun Wang,
Xiaoyun Mo,
Jiamu Zhou,
Cheng Cheng,
Yin Zhao,
Jun Wang,
Weinan Zhang
The Thirteenth International Conference on Learning Representations (ICLR), 2025 (Spotlight)
OpenReview /
arXiv /
code /
html /
dataset and models
In this paper, we introduce Hammer, a novel family of foundation
models specifically engineered for on-device function calling. Hammer employs
an augmented dataset that enhances models' sensitivity to irrelevant functions and
incorporates function masking techniques to minimize misleading.
|
Preprints
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs
Junwei Liao,
Haoting Shi,
Ruiwen Zhou,
Jiaqian Wang,
Shengtao Zhang,
Wei Zhang,
Weinan Zhang,
Ying Wen,
Zhiyu Li,
Feiyu Xiong,
Bo Tang,
Muning Wen
arXiv, 2026
arXiv /
code
We introduce MemQ, the first provenance-based credit assignment method for episodic memory valuation. MemQ applies TD(λ) eligibility traces to memory Q-values, propagating credit backward through a provenance DAG that records memory retrieval dependencies. We formalize the setting as an Exogenous-Context MDP and achieve the highest success rate across all six benchmarks spanning OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, and expert-level QA.
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
Shengtao Zhang,
Jiaqian Wang,
Ruiwen Zhou,
Junwei Liao,
Yuchen Feng,
Weinan Zhang,
Ying Wen,
Zhiyu Li,
Feiyu Xiong,
Yutao Qi,
Bo Tang,
Muning Wen
arXiv /
code
While Large Language Models possess strong reasoning capabilities, they struggle to emulate this self-evolution:
fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memorybased methods
rely on passive semantic matching that often retrieves noise. To address these
challenges, we propose MemRL, a framework that enables agents to self-evolve via non-parametric
reinforcement learning on episodic memory. MemRL explicitly separates the stable reasoning of
a frozen LLM from the plastic, evolving memory. Unlike traditional methods, MemRL employs
a Two-Phase Retrieval mechanism that filters candidates by semantic relevance and then selects
them based on learned Q-values (utility). These utilities are continuously refined via environmental
feedback in an trial-and-error manner, allowing the agent to distinguish high-value strategies from
similar noise.
A Survey of AI Agent Protocols
Yingxuan Yang,
Huacan Chai,
Yuanyi Song,
Siyuan Qi,
Muning Wen,
Ning Li,
Junwei Liao,
Haoyi Hu,
Jianghao Lin,
Gaowei Chang,
Weiwen Liu,
Ying Wen,
Yong Yu,
Weinan Zhang
arXiv /
code
In this paper, we provide a systematic overview of existing communication protocols for LLM agents.
We classify them into four main categories and make an analysis to help users and developers select
the most suitable protocols for specific applications. Additionally, we conduct a comparative
performance analysis of these protocols across key dimensions such as security, scalability, and
latency. Finally, we explore future challenges, such as how protocols can adapt and survive in fast-evolving
environments, and what qualities future protocols might need to support the next generation of LLM agent
ecosystems. We expect this work to serve as a practical reference for both researchers and engineers seeking
to design, evaluate, or integrate robust communication infrastructures for intelligent agents.
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao,
Muning Wen,
Jun Wang,
Weinan Zhang
arXiv /
code
In this article, we present a comprehensive study of LLM-based MARL and propose a novel
paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a universal
algorithmic framework tailored specifically for LLM-based Multi-Agent Systems (LaMAS).
Central to this work is the presentation of a robust and scalable MARFT framework.
We detail the core algorithm and provide a complete, open-source implementation to facilitate
adoption and further research. By bridging theoretical underpinnings with practical methodologies,
this work aims to serve as a roadmap for researchers seeking to advance MARFT toward resilient,
adaptive, and human-aligned solutions in agentic systems.
|
|