马尔可夫决策过程
强化学习
计算机科学
人工智能
帧(网络)
机器学习
马尔可夫过程
部分可观测马尔可夫决策过程
过程(计算)
模仿
机器人
马尔可夫链
马尔可夫模型
数学
操作系统
统计
心理学
社会心理学
电信
作者
Qi Pang,Yuanyuan Yuan,Shuai Wang
标识
DOI:10.1145/3533767.3534388
摘要
The Markov decision process (MDP) provides a mathematical frame- work for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models solving MDPs are neither thoroughly tested nor rigorously reliable.
科研通智能强力驱动
Strongly Powered by AbleSci AI