泰勒级数
价值(数学)
数理经济学
泰勒定理
数学
计算机科学
应用数学
统计
数学分析
作者
Anton Braverman,Itai Gurvich,Junfei Huang
出处
期刊:Operations Research
[Institute for Operations Research and the Management Sciences]
日期:2020-03-04
被引量:9
标识
DOI:10.1287/opre.2019.1903
摘要
We introduce a framework for approximate dynamic programming that we apply to discrete time chains on $\mathbb{Z}_+^d$ with countable action sets. Our approach is grounded in the approximation of the (controlled) chain's generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function to replace the Bellman equation with one in continuous space and time where the transition matrix is reduced to its first and second moments. In some cases, the resulting equation (which we label {\bf TCP}) can be interpreted as corresponding to a Brownian control problem. When tractable, the TCP serves as a useful modeling tool. More generally, the TCP is a starting point for approximation algorithms. We develop bounds on the optimality gap---the sub-optimality introduced by using the control produced by the Taylored equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian control problems. We prove that, under suitable conditions and for suitably large initial states, (i) the optimality gap is smaller than a $1-\alpha$ fraction of the optimal value, where $\alpha\in (0,1)$ is the discount factor, and (ii) the gap can be further expressed as the infinite horizon discounted value with a lower-order per period reward. Computationally, our framework leads to an aggregation approach with performance guarantees. While the guarantees are grounded in PDE theory, the practical use of this approach requires no knowledge of that theory.
科研通智能强力驱动
Strongly Powered by AbleSci AI