|
Education
|
Research Experience
- May. 2024 ~ Sep. 2025: RA at
Shanghai Jiao Tong University,
advised by
Weinan Zhang.
Focus on Multi-Agent Reinforcement Fine-Tuning, Deep Reinforcement Learning,
Large Decision/Action Models and Agent Technology.
- Sep. 2023 ~ Mar. 2024: RA at
Tsinghua University,
advised by
Ju Ren.
Focus on Deep Reinforcement Learning and RLHF/RLAIF.
|
Publications
Robust Function-Calling for On-Device Language Model via Function Masking
Qiqiang Lin*,
Muning Wen*,
Qiuying Peng*,
Guanyu Nie,
Junwei Liao,
Jun Wang,
Xiaoyun Mo,
Jiamu Zhou,
Cheng Cheng,
Yin Zhao,
Jun Wang,
Weinan Zhang
The Thirteenth International Conference on Learning Representations (ICLR), 2025 (Spotlight)
OpenReview /
arXiv /
code /
html /
dataset and models
In this paper, we introduce Hammer, a novel family of foundation
models specifically engineered for on-device function calling. Hammer employs
an augmented dataset that enhances models' sensitivity to irrelevant functions and
incorporates function masking techniques to minimize misleading.
|
Preprints
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
Shengtao Zhang,
Jiaqian Wang,
Ruiwen Zhou,
Junwei Liao,
Yuchen Feng,
Weinan Zhang,
Ying Wen,
Zhiyu Li,
Feiyu Xiong,
Yutao Qi,
Bo Tang,
Muning Wen
arXiv
While Large Language Models possess strong reasoning capabilities, they struggle to emulate this self-evolution:
fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memorybased methods
rely on passive semantic matching that often retrieves noise. To address these
challenges, we propose MemRL, a framework that enables agents to self-evolve via non-parametric
reinforcement learning on episodic memory. MemRL explicitly separates the stable reasoning of
a frozen LLM from the plastic, evolving memory. Unlike traditional methods, MemRL employs
a Two-Phase Retrieval mechanism that filters candidates by semantic relevance and then selects
them based on learned Q-values (utility). These utilities are continuously refined via environmental
feedback in an trial-and-error manner, allowing the agent to distinguish high-value strategies from
similar noise.
A Survey of AI Agent Protocols
Yingxuan Yang,
Huacan Chai,
Yuanyi Song,
Siyuan Qi,
Muning Wen,
Ning Li,
Junwei Liao,
Haoyi Hu,
Jianghao Lin,
Gaowei Chang,
Weiwen Liu,
Ying Wen,
Yong Yu,
Weinan Zhang
arXiv
In this paper, we provide a systematic overview of existing communication protocols for LLM agents.
We classify them into four main categories and make an analysis to help users and developers select
the most suitable protocols for specific applications. Additionally, we conduct a comparative
performance analysis of these protocols across key dimensions such as security, scalability, and
latency. Finally, we explore future challenges, such as how protocols can adapt and survive in fast-evolving
environments, and what qualities future protocols might need to support the next generation of LLM agent
ecosystems. We expect this work to serve as a practical reference for both researchers and engineers seeking
to design, evaluate, or integrate robust communication infrastructures for intelligent agents.
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao,
Muning Wen,
Jun Wang,
Weinan Zhang
arXiv /
code
In this article, we present a comprehensive study of LLM-based MARL and propose a novel
paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a universal
algorithmic framework tailored specifically for LLM-based Multi-Agent Systems (LaMAS).
Central to this work is the presentation of a robust and scalable MARFT framework.
We detail the core algorithm and provide a complete, open-source implementation to facilitate
adoption and further research. By bridging theoretical underpinnings with practical methodologies,
this work aims to serve as a roadmap for researchers seeking to advance MARFT toward resilient,
adaptive, and human-aligned solutions in agentic systems.
|
|