首 页
学院概况
  • 学院简介
  • 现任领导
  • 机构设置
  • 委员会
师资队伍
  • 正高名录
  • 副高博士
  • 研究生导师
本科生教学
  • 专业建设
    • 专业设置
    • 重点专业
  • 培养方案
  • 教学大纲
  • 教学信息
研究生培养
  • 研究生招生
科研服务
  • 科研项目
  • 学科概况
  • 科研成果
  • 嵊州研究院
  • 诸暨研究院
  • 上虞研究院
招生就业
  • 本科生招生
  • 本科生就业
  • 研究生招生
党建工作
资源中心
  • 学院黄页
  • 表格下载
诚聘英才
  • 快捷导航
  • 学院新闻
    通知公告
    学生成果
  • 通知公告
  • 【学术报告】Coordination as Inference in Multi-Agent Reinforcement Learning

    2024-06-25  点击:[]

    报告题目:Coordination as Inference in Multi-Agent Reinforcement Learning

    报告专家:苏开乐教授,澳大利亚格里菲斯大学(Griffith University)

    报告时间:7月3日14:00-15:00

    报告地点:人工智能研究院311会议室(老行政楼北楼3楼)

    报告摘要:The Centralized Training and Decentralized Execution (CTDE) paradigm, where a centralized critic is allowed to access global information during the training phase while maintaining the learned policies executed with only local information in a decentralized way, has achieved great progress in recent years. Despite the progress, CTDE may suffer from the issue of Centralized-Decentralized Mismatch (CDM): the suboptimalityof one agent’s policy can exacerbate policy learning of other agents through the centralized joint critic. In contrast to centralized learning, the cooperative model that most closely resembles the way humans cooperate in nature is fully decentralized, i.e. Independent Learning (IL). However, there are still two issues that need to be addressed before agents coordinate through IL: (1) how agents are aware of the presence of other agents, and (2) how to coordinate with other agents to improve joint policy under IL. In this paper, we propose an inference-based coordinated MARL method: Deep Motor System (DMS). DMS first presents the idea of individual intention inference where agents are allowed to disentangle other agents from their environment. Secondly, causal inference was introduced to enhance coordination by reasoning each agent’s effect on others’behavior. The proposed model was extensively experimented on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that the proposed method outperforms independent learning algorithms and the coordination behavior among agents can be learned even without the CTDE paradigm compared to the state-of-the-art baselines including IPPO and HAPPO.

    个人简介:苏开乐,博士,现任澳大利亚格里菲斯大学教授,清华大学逻辑研究中心兼职教授,南京信息工程大学人工智能学院名誉院长。主要研究领域为人工智能逻辑与算法。曾任中山大学(1999-2007)、北京大学(2007-2014)教授和博导。2007年获得了国家自然科学基金“杰出青年基金”项目,2009年获得澳大利亚国家基金研究会ARC Future Fellow。获得AiML 2002(法国图卢兹)最佳论文奖、SAT Challenge 2012: Best Sequential Solver(Random Track)奖。近年来,共发表了中国计算学会(CCF)推荐A类论文30多篇,包括人工智能顶级学术会议AAAI(15篇)、IJCAI(10篇)、顶级期刊Artificial Intelligence、Information and Computation、IEEE Trans. on Computers和IEEE Trans. on Software Engineering等。


    上一条:【学术报告】Twinning The World - The Digital Future Unleashed
    下一条:机械与电气工程学院电气自动化系国有资产处置的公示

    【返回】