部分可观测马尔可夫决策过程
计算机科学
马尔可夫决策过程
维数之咒
图形
数学优化
人口
马尔可夫过程
趋同(经济学)
马尔可夫链
可见的
操作员(生物学)
算法
理论计算机科学
马尔可夫模型
人工智能
数学
机器学习
经济增长
社会学
量子力学
统计
物理
人口学
经济
抑制因子
化学
生物化学
转录因子
基因
作者
Liu Yin,Yingping Zhou,Shuai Chen
标识
DOI:10.1109/ihmsc.2014.38
摘要
Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.
科研通智能强力驱动
Strongly Powered by AbleSci AI