校长(计算机安全)
先验与后验
班级(哲学)
价值(数学)
功能(生物学)
数理经济学
经济
计算机科学
单位(环理论)
期限(时间)
微观经济学
数学
人工智能
物理
哲学
机器学习
操作系统
数学教育
认识论
生物
进化生物学
量子力学
作者
Shivam Gupta,Wěi Chén,Milind Dawande,Ganesh Janakiraman
出处
期刊:Management Science
[Institute for Operations Research and the Management Sciences]
日期:2022-07-18
卷期号:69 (5): 2852-2869
被引量:9
标识
DOI:10.1287/mnsc.2022.4482
摘要
We consider a principal who periodically offers a fixed and costly nonmonetary reward to agents to incentivize them to invest effort over the long run. An agent’s output, as a function of his effort, is a priori uncertain and is worth a fixed per-unit value to the principal. The principal’s goal is to design an attractive reward policy that specifies how the rewards are to be given to an agent over time based on that agent’s past performance. This problem, which we denote by [Formula: see text], is motivated by practical examples from both academia (e.g., a reduced teaching load) and industry (e.g., “Supplier of the Year” awards). The following “limited-term” (LT) reward policy structure has been quite popular in practice. The principal evaluates each agent periodically; if an agent’s performance over a certain (limited) number of periods in the immediate past exceeds a predefined threshold, then the principal rewards him for a certain (limited) number of periods in the immediate future. When agents’ outputs are deterministic in their efforts, we show that there always exists an optimal policy that is an LT policy and also, obtain such a policy. When agents’ outputs are stochastic, we show that the class of LT policies may not contain any optimal policy of problem [Formula: see text] but is guaranteed to contain policies that are arbitrarily near optimal. Given any [Formula: see text], we show how to obtain an LT policy whose performance is within ϵ of that of an optimal policy. This guarantee depends crucially on the use of sufficiently long histories of the agents’ outputs. We also analyze LT policies with short histories and derive structural insights on the role played by (i) the length of the available history and (ii) the variability in the random variable governing an agent’s output. We show that the average performance of these policies is within 5% of the optimum, justifying their popularity in practice. We then introduce and analyze the class of “score-based” reward policies; we show that this class is guaranteed to contain an optimal policy and also, obtain such a policy. Finally, we analyze a generalization in which the principal has a limited number for rewards in any given period and show that the class of score-based policies, with modifications to accommodate the limited availability of the rewards, continues to contain an optimal solution for the principal. This paper was accepted by Jeannette Song, operations management. Supplemental Material: The online appendix is available at https://doi.org/10.1287/mnsc.2022.4482 .
科研通智能强力驱动
Strongly Powered by AbleSci AI