The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function

扣带回前部扣带皮质神经科学心理学价值（数学）功能（生物学）皮质（解剖学）功能连接错误相关否定性控制（管理）认知心理学认知生物计算机科学数学中枢神经系统人工智能统计进化生物学

作者

Amitai Shenhav,Matthew Botvinick,Jonathan D. Cohen

出处

期刊：Neuron [Cell Press]
日期：2013-07-01 卷期号：79 (2): 217-240 被引量：1717

链接

cell.com europepmc.org europepmc.org nih.gov nih.govdoi.org

标识

DOI：10.1016/j.neuron.2013.07.007

摘要

The dorsal anterior cingulate cortex (dACC) has a near-ubiquitous presence in the neuroscience of cognitive control. It has been implicated in a diversity of functions, from reward processing and performance monitoring to the execution of control and action selection. Here, we propose that this diversity can be understood in terms of a single underlying function: allocation of control based on an evaluation of the expected value of control (EVC). We present a normative model of EVC that integrates three critical factors: the expected payoff from a controlled process, the amount of control that must be invested to achieve that payoff, and the cost in terms of cognitive effort. We propose that dACC integrates this information, using it to determine whether, where and how much control to allocate. We then consider how the EVC model can explain the diverse array of findings concerning dACC function. The dorsal anterior cingulate cortex (dACC) has a near-ubiquitous presence in the neuroscience of cognitive control. It has been implicated in a diversity of functions, from reward processing and performance monitoring to the execution of control and action selection. Here, we propose that this diversity can be understood in terms of a single underlying function: allocation of control based on an evaluation of the expected value of control (EVC). We present a normative model of EVC that integrates three critical factors: the expected payoff from a controlled process, the amount of control that must be invested to achieve that payoff, and the cost in terms of cognitive effort. We propose that dACC integrates this information, using it to determine whether, where and how much control to allocate. We then consider how the EVC model can explain the diverse array of findings concerning dACC function. The dorsal anterior cingulate cortex (dACC), spanning the cingulate gyrus and sulcus from the plane of the anterior commissure to the genu of the corpus callosum (Figure 1), is one of the most heavily studied regions of the brain and yet remains one of the least clearly understood. Although there has recently been an explosion of research on the role of dACC in cognition and behavior, this has led to a proliferation of diverging theories concerning its function. The dACC has been proposed to play a key role in pain processing, performance monitoring, value encoding, decision making, emotion, learning, and motivation. A precise and coherent account of dACC function seems as elusive now as it did in the earliest days of theory development. Two opposing tendencies appear to have slowed progress toward an integrated understanding of dACC function. One has been to base theoretical analyses on too narrow a subset of empirical findings, while another has been to embrace a wide range of empirical findings but to reduce them to a single basic computation at the cost of oversimplifying dACC function. Here, we propose an integrative account of dACC function that strives to avoid these pitfalls. We build on one observation which appears to be widely and consistently agreed upon: that dACC is engaged by tasks that demand cognitive control. Broadly, this can be defined as the set of mechanisms required to pursue a goal, especially when distraction and/or strong (e.g., habitual) competing responses must be overcome. Numerous meta-analyses of the neuroimaging literature have confirmed the dACC’s involvement in control-demanding tasks (Nee et al., 2007Nee D.E. Wager T.D. Jonides J. Interference resolution: insights from a meta-analysis of neuroimaging tasks.Cogn. Affect. Behav. Neurosci. 2007; 7: 1-17Crossref PubMed Scopus (214) Google Scholar, Niendam et al., 2012Niendam T.A. Laird A.R. Ray K.L. Dean Y.M. Glahn D.C. Carter C.S. Meta-analytic evidence for a superordinate cognitive control network subserving diverse executive functions.Cogn. Affect. Behav. Neurosci. 2012; 12: 241-268Crossref PubMed Scopus (83) Google Scholar, Ridderinkhof et al., 2004Ridderinkhof K.R. Ullsperger M. Crone E.A. Nieuwenhuis S. The role of the medial frontal cortex in cognitive control.Science. 2004; 306: 443-447Crossref PubMed Scopus (1205) Google Scholar, Shackman et al., 2011Shackman A.J. Salomons T.V. Slagter H.A. Fox A.S. Winter J.J. Davidson R.J. The integration of negative affect, pain and cognitive control in the cingulate cortex.Nat. Rev. Neurosci. 2011; 12: 154-167Crossref PubMed Scopus (297) Google Scholar), and these have been supplemented by evidence of a causal relationship between dACC and cognitive control. For instance, using diffusion tensor imaging (DTI), Metzler-Baddeley et al., 2012Metzler-Baddeley C. Jones D.K. Steventon J. Westacott L. Aggleton J.P. O’Sullivan M.J. Cingulum microstructure predicts cognitive control in older age and mild cognitive impairment.J. Neurosci. 2012; 32: 17612-17619Crossref PubMed Scopus (10) Google Scholar showed that older adults with lower white matter integrity in the anterior cingulum bundle (the white matter bundle projecting to/from dACC) performed more poorly on control-demanding tasks. Despite the strong consensus that dACC is involved in cognitive control, there is little agreement about the specific function(s) it subserves. Here, we synthesize a number of existing proposals concerning the role of dACC into a single theoretical account and show how this can be reconciled with empirical findings concerning dACC function. Specifically, we propose that the dACC integrates information about the reward and costs that can be expected from a control-demanding task, in order to estimate a quantity we refer to as the expected value of control (EVC). Put simply, EVC represents the net value associated with allocating control to a given task. We propose that dACC estimates this quantity in order to determine whether it is worth investing control in a task, how much should be invested and, when several potential tasks are in contention, which is the most worthwhile. We assume that this information is used to select among competing tasks and allocate the appropriate amount of control to performance of the one selected. This proposal ascribes to dACC a specific decision making function regarding the allocation of control that is distinct from other control-related functions, such as the valuative ones that provide input to the decision and the regulative ones responsible for executing it; these are presumed to be subserved by other neural mechanisms. We begin by establishing some foundational points concerning cognitive control and its constituent functions that are necessary for framing the EVC theory and our consideration of dACC. We then introduce the basic elements of the EVC theory. Finally, we review key findings and existing theoretical proposals from the dACC literature, relating these to the EVC theory. Processes that demand control are often distinguished from automatic processes, which involve associations that are sufficiently strong as to be resistant to distraction or interference (Botvinick and Cohen, 2013Botvinick M.M. Cohen J.D. The computational and neural basis of cognitive control: charted territory and new frontiers.Cogn. Sci. 2013; (in press)Google Scholar, Cohen et al., 1990Cohen J.D. Dunbar K. McClelland J.L. On the control of automatic processes: a parallel distributed processing account of the Stroop effect.Psychol. Rev. 1990; 97: 332-361Crossref PubMed Google Scholar, Norman and Shallice, 1986Norman D.A. Shallice T. Attention to action: willed and automatic control of behavior.in: Davidson R.J. Schwartz G.E. Shapiro D. Consciousness and Self-Segulation: Vol. 4. Advances in Research and Theory. Plennum Press, New York1986: 1-18Crossref Google Scholar, Posner and Snyder, 1975Posner M.I. Snyder C.R.R. Attention and cognitive control.in: Solso R.L. Information Processing and Cognition: The Loyola Symposium. Erlbaum Associates, Hillsdale, NJ1975Google Scholar, Shiffrin and Schneider, 1977Shiffrin R.M. Schneider W. Controlled and automatic information processing: II. Perceptual learning, automatic attending, and a general theory.Psychol. Rev. 1977; 84: 127-190Crossref Scopus (2267) Google Scholar). A classic illustration of the distinction between controlled and automatic processing is provided by the Stroop task. Participants are shown a color word and asked to name the color of the font in which it is displayed. When the two dimensions disagree (e.g., “GREEN” written in red text), participants find it harder to name the color than when the two agree (e.g., “RED” written in red text). However, this interference effect does not occur when the task is, instead, to simply read the word. This difference between task conditions is explained by assuming that word reading is automatic (allowing the word to be processed even when the task is color naming), whereas color naming is controlled (preventing the color from being processed unless the task is to do so). This explanation is reinforced by the observation that, when presented with a conflict stimulus in the absence of a specific task instruction, people invariably read the word, illustrating the automatic, or “default,” nature of verbal responses to words. Verbally responding to the color requires an instruction and/or intention to do so, at least in the presence of conflicting word information. A computational model of the mechanisms underlying the Stroop task is shown in Figure 2A (Cohen et al., 1990Cohen J.D. Dunbar K. McClelland J.L. On the control of automatic processes: a parallel distributed processing account of the Stroop effect.Psychol. Rev. 1990; 97: 332-361Crossref PubMed Google Scholar). The model takes the form of a neural network, with units encoding stimulus features projecting forward to intermediate (associative) units, and then to output units representing verbal responses. The automaticity of the response to words is captured by strong connection weights along the pathway from word identity to verbal response. These also make it the default response (i.e., the response generated in the absence of any instruction). However, without any additional apparatus, the model would not be able to respond to the color of a conflict stimulus. To address this, the model also includes a set of control units that represent the current task. When the unit representing the color naming task is active, this provides top-down support for units in the pathway from color to verbal response, priming these units and thereby permitting a response to the color even when there is conflicting information arriving along the word pathway. Thus, in this context, color naming can be considered to be a controlled process to the extent that a correct response to the color depends on activation of the color naming task unit. The model shown in Figure 2A also includes a unit that serves a “conflict monitoring” function, responding to coactivation of the network’s response units (see Botvinick et al., 2001Botvinick M.M. Braver T.S. Barch D.M. Carter C.S. Cohen J.D. Conflict monitoring and cognitive control.Psychol. Rev. 2001; 108: 624-652Crossref PubMed Google Scholar). Such conflict is an indicator of inadequate control. For example, if the color naming task unit is insufficiently activated, then activation of the response to the color will be weaker and compete less effectively with activation of the response to a conflicting word, allowing the latter to become more active. This coactivation will have two potentially adverse consequences for behavior. At best it will slow responding, since the correct response unit must overcome inhibitory competition from the incorrect one. At worst it will produce an error. These dangers can be ameliorated by increasing the activity of the color naming task unit. Thus, conflict serves as an indicator of the need for additional allocation of control. This simple model of the Stroop task and conflict monitoring is of course not intended as a comprehensive model of cognitive control. However, the architecture of the model illustrates three core component functions of cognitive control (Figure 2A). Regulation. The sine qua non feature of control is its capacity to govern or influence lower level information-processing mechanisms, a function we refer to as regulation. In the language of engineering, activity of a task unit represents a control signal, which determines the parameters for more basic processes (in this case, the sensitivity of the associative units in the corresponding pathway). Note that this signal has two defining characteristics: its identity and its intensity (the strength of the signal, both in literal terms—e.g., level of activation of the task unit—and in terms of its impact on information processing). Control signals can determine a wide range of processing parameters, including thresholds and/or biases for responding (governing speed-accuracy tradeoffs; Bogacz et al., 2006Bogacz R. Brown E. Moehlis J. Holmes P. Cohen J.D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.Psychol. Rev. 2006; 113: 700-765Crossref PubMed Scopus (375) Google Scholar, Wiecki and Frank, 2013Wiecki T.V. Frank M.J. A computational model of inhibitory control in frontal cortex and basal ganglia.Psychol. Rev. 2013; 120: 329-355Crossref PubMed Scopus (35) Google Scholar), templates for attention or memory search (Desimone and Duncan, 1995Desimone R. Duncan J. Neural mechanisms of selective visual attention.Annu. Rev. Neurosci. 1995; 18: 193-222Crossref PubMed Google Scholar, Olivers et al., 2011Olivers C.N. Peters J. Houtkamp R. Roelfsema P.R. Different states in visual working memory: when it guides attention and when it does not.Trends Cogn. Sci. 2011; 15: 327-334Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar, Polyn et al., 2009Polyn S.M. Norman K.A. Kahana M.J. A context maintenance and retrieval model of organizational processes in free recall.Psychol. Rev. 2009; 116: 129-156Crossref PubMed Scopus (90) Google Scholar), and modulators of emotion (Johns et al., 2008Johns M. Inzlicht M. Schmader T. Stereotype threat and executive resource depletion: examining the influence of emotion regulation.J. Exp. Psychol. Gen. 2008; 137: 691-705Crossref PubMed Scopus (73) Google Scholar, McClure et al., 2006McClure S.M. Botvinick M.M. Yeung N. Greene J.D. Cohen J.D. Conflict monitoring in cognition-emotion competition.in: Gross J.J. Handbook of Emotion Regulation. Guilford, New York2006: 1-45Google Scholar). In each case, a distinction can be made between signal identity (the parameter targeted) and signal intensity (the degree to which the parameter is displaced from its default value). Specification. In order for regulation to occur, a critical step is for an appropriate control signal to be chosen: Control requires a decision on which, if any, controlled task(s) should be undertaken, and on how intensively it (or they) should be pursued. We refer to this decision-making function as control signal specification, which must determine the identity and intensity of the desired control signal(s). In principle, it is possible to specify more than one identity-intensity pairing, and thereby more than one task (see Figure 2). However, in practice there are strict capacity constraints on control, and thus in this Review we focus on the simplest and most common circumstance, involving specification of a single identity-intensity pairing (i.e., a single control demanding task). Importantly, control signal specification should be distinguished from regulation which consists of implementing the specified control signal so as to actually effect the changes in information processing required for the task. This distinction between specification (the decision process) and regulation (that mediates its effects) is central to the EVC theory. While both are essential components of the control system, the EVC theory ascribes to dACC a role in specification but not regulation, as we discuss below. Monitoring. In order to specify the appropriate control signal and deploy regulative functions in an adaptive manner, the system must have access to information about current circumstances and how well it is serving task demands. Detecting and evaluating these requires a monitoring mechanism. The conflict-detection component in the Stroop model provides one example of such a monitoring function and how it can guide specification: the occurrence of response conflict indicates that insufficient control is being allocated to the current task (see Botvinick, 2007Botvinick M.M. Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function.Cogn. Affect. Behav. Neurosci. 2007; 7: 356-366Crossref PubMed Scopus (287) Google Scholar, Botvinick et al., 2001Botvinick M.M. Braver T.S. Barch D.M. Carter C.S. Cohen J.D. Conflict monitoring and cognitive control.Psychol. Rev. 2001; 108: 624-652Crossref PubMed Google Scholar, Botvinick et al., 2004Botvinick M.M. Cohen J.D. Carter C.S. Conflict monitoring and anterior cingulate cortex: an update.Trends Cogn. Sci. 2004; 8: 539-546Abstract Full Text Full Text PDF PubMed Scopus (1237) Google Scholar). In this instance, conflict indicates the need to re-specify control signal intensity. However, conflict is just one among many signals that can indicate the need to adjust intensity. Others include response delays, errors, negative feedback, and the sensation of pain. These signals all carry information about performance within a task and how to specify control signal intensity. Monitoring must also consider information relevant to the specification of control signal identity; that is, to task choice. Such information can come from external sources (e.g., explicit instructions, cues indicating new opportunities for reward, or the sudden appearance of a threat) or internal ones (e.g., diminishing payoffs from the current task indicating it is no longer worth performing, recollection of another task that needs to be performed, etc.). In all of these cases, monitoring must be responsive to, but should be distinguished from, the sensory and valuative processes that represent the actual information relevant to specification. Thus, just as we distinguish between specification and regulation on the efferent side of control, we distinguish between monitoring and valuation on the afferent side. In each case, the EVC theory ascribes to dACC a role in the former, but not the latter. Early research on control focused on regulative and monitoring mechanisms, but growing attention is being paid to the problem of control-signal specification. Work in this area has been driven increasingly by ideas from research on reward-based decision making and reinforcement learning. One emerging trend has involved reframing control-signal specification as an optimization problem, shaped by learning or planning mechanisms that serve to maximize long-term expected reward (Bogacz et al., 2006Bogacz R. Brown E. Moehlis J. Holmes P. Cohen J.D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks.Psychol. Rev. 2006; 113: 700-765Crossref PubMed Scopus (375) Google Scholar, Dayan, 2012Dayan P. How to set the switches on this thing.Curr. Opin. Neurobiol. 2012; 22: 1068-1074Crossref PubMed Scopus (11) Google Scholar, Hazy et al., 2007Hazy T.E. Frank M.J. O’reilly R.C. Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system.Philos. Trans. R. Soc. Lond. B Biol. Sci. 2007; 362: 1601-1613Crossref PubMed Scopus (109) Google Scholar, O’Reilly and Frank, 2006O’Reilly R.C. Frank M.J. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia.Neural Comput. 2006; 18: 283-328Crossref PubMed Scopus (302) Google Scholar, Todd et al., 2008Todd M.T. Niv Y. Cohen J.D. Learning to use working memory in partially observable environments through dopaminergic reinforcement.in: Advances in Neural Information Processing Systems. Volume 20. MIT Press, Cambridge2008: 1700-1707Google Scholar, Yu et al., 2009Yu A.J. Dayan P. Cohen J.D. Dynamics of attentional selection under conflict: toward a rational Bayesian account.J. Exp. Psychol. Hum. Percept. Perform. 2009; 35: 700-717Crossref PubMed Scopus (32) Google Scholar). Under this view, cognitive control can be defined as the set of mechanisms responsible for configuring behavior in order to maximize the attainment of reward. This definition accords well with the definition of control in other fields, most notably control theory in engineering. From this perspective, cognitive control can be viewed not only as adaptive, but also as motivated. An emphasis on motivation also aligns with the ubiquitous observation that the exertion of cognitive control carries an inherent subjective cost. From the earliest definitions, controlled processing was described as effortful, and like physical effort, mental effort is assumed to carry intrinsic disutility. That is, people spontaneously seek to minimize it. Recent empirical work bears out this assumption, linking effort specifically to the exertion of cognitive control (Kool and Botvinick, 2012Kool W. Botvinick M.M. A labor/leisure tradeoff in cognitive control.J. Exp. Psychol. Gen. 2012; (Published online December 10, 2012)https://doi.org/10.1037/a0031048Crossref PubMed Scopus (5) Google Scholar, Kool et al., 2010Kool W. McGuire J.T. Rosen Z.B. Botvinick M.M. Decision making and the avoidance of cognitive demand.J. Exp. Psychol. Gen. 2010; 139: 665-682Crossref PubMed Scopus (58) Google Scholar). Human decision makers show a bias against tasks demanding top-down control, and within certain bounds they will delay task goals or even forego reward in order to avoid such tasks (Dixon and Christoff, 2012Dixon M.L. Christoff K. The decision to engage cognitive control is driven by expected reward-value: neural and behavioral evidence.PLoS ONE. 2012; 7: e51637Crossref PubMed Scopus (7) Google Scholar, Kool et al., 2010Kool W. McGuire J.T. Rosen Z.B. Botvinick M.M. Decision making and the avoidance of cognitive demand.J. Exp. Psychol. Gen. 2010; 139: 665-682Crossref PubMed Scopus (58) Google Scholar, Westbrook et al., 2013Westbrook A. Kester D. Braver T.S. What is the subjective cost of cognitive effort? Load, trait, and aging effects revealed by economic preference.PLoS ONE. 2013; https://doi.org/10.1371/journal.pone.0068210Crossref Scopus (11) Google Scholar). These effects imply an intrinsic “cost of control,” which scales with the intensity of the control required to perform the task (Dixon and Christoff, 2012Dixon M.L. Christoff K. The decision to engage cognitive control is driven by expected reward-value: neural and behavioral evidence.PLoS ONE. 2012; 7: e51637Crossref PubMed Scopus (7) Google Scholar, Kool et al., 2010Kool W. McGuire J.T. Rosen Z.B. Botvinick M.M. Decision making and the avoidance of cognitive demand.J. Exp. Psychol. Gen. 2010; 139: 665-682Crossref PubMed Scopus (58) Google Scholar). These ideas, combined with the idea that control signals are specified based on the reward potential of the task they support, suggest that the allocation of control is driven by a cost-benefit analysis, weighing potential payoffs against attendant costs, including those inherently associated with the exertion of control itself. Previous work has established links between components of the Stroop model and specific neural structures involved in cognitive control. In particular, lateral prefrontal cortex (lPFC) together with associated structures (e.g., basal ganglia and brainstem dopaminergic nuclei) have been proposed to implement the regulative component of the model (Braver and Cohen, 2000Braver T.S. Cohen J.D. On the control of control: the role of dopamine in regulating prefrontal function and working memory.in: Monsell S. Driver J. Attention and Performance XVIII; Control of Cognitive Processes. MIT Press, Cambridge, MA2000: 713-737Google Scholar, Cohen and Servan-Schreiber, 1992Cohen J.D. Servan-Schreiber D. Context, cortex, and dopamine: a connectionist approach to behavior and biology in schizophrenia.Psychol. Rev. 1992; 99: 45-77Crossref PubMed Google Scholar, Frank et al., 2001Frank M.J. Loughry B. O’Reilly R.C. Interactions between frontal cortex and basal ganglia in working memory: a computational model.Cogn. Affect. Behav. Neurosci. 2001; 1: 137-160Crossref PubMed Google Scholar, Miller and Cohen, 2001Miller E.K. Cohen J.D. An integrative theory of prefrontal cortex function.Annu. Rev. Neurosci. 2001; 24: 167-202Crossref PubMed Scopus (3906) Google Scholar), while dACC has been proposed to implement the monitoring component (Botvinick, 2007Botvinick M.M. Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function.Cogn. Affect. Behav. Neurosci. 2007; 7: 356-366Crossref PubMed Scopus (287) Google Scholar, Botvinick et al., 2001Botvinick M.M. Braver T.S. Barch D.M. Carter C.S. Cohen J.D. Conflict monitoring and cognitive control.Psychol. Rev. 2001; 108: 624-652Crossref PubMed Google Scholar, Botvinick et al., 2004Botvinick M.M. Cohen J.D. Carter C.S. Conflict monitoring and anterior cingulate cortex: an update.Trends Cogn. Sci. 2004; 8: 539-546Abstract Full Text Full Text PDF PubMed Scopus (1237) Google Scholar). According to this mapping, the key step of control-signal specification arises in the communication from dACC to lPFC (Botvinick et al., 2001Botvinick M.M. Braver T.S. Barch D.M. Carter C.S. Cohen J.D. Conflict monitoring and cognitive control.Psychol. Rev. 2001; 108: 624-652Crossref PubMed Google Scholar, Kerns et al., 2004Kerns J.G. Cohen J.D. MacDonald 3rd, A.W. Cho R.Y. Stenger V.A. Carter C.S. Anterior cingulate conflict monitoring and adjustments in control.Science. 2004; 303: 1023-1026Crossref PubMed Scopus (1299) Google Scholar). That is, the model assigns to the dACC responsibility for monitoring and specification, evaluating current demands for control and using the relevant information to decide how to allocate control. The specified control signals are then implemented by lPFC and associated structures, which are assumed to be responsible for the regulative function of control—that is, actually effecting the changes in processing required to perform the task. The EVC model elaborates this proposal, structuring it in a normative description of how both the identity and the intensity of control signals are determined and placing new emphasis on optimization (i.e., reward maximization) in understanding the relationship, within dACC, between monitoring and specification. The operation of cognitive control, as we have characterized it, involves deciding what control signal should be selected (i.e., its identity) and how vigorously this control signal should be engaged (i.e., its intensity) (Figure 2B). We propose that the brain makes this two-part decision in a rational or normative manner to maximize expected future reward. To make this idea precise, we will express the choice of what and how much to control in formal terms, borrowing approaches from reinforcement learning and optimal control theory to analogous problems of motor action selection. We begin by defining a control signal to be an array variable with two components: identity (e.g., “respond to color” or “respond to word”) and intensity. Determining the expected value of each control signal requires integration over two sources of value-related information. First, it must consider the overall payoff that can be expected from engaging a given control signal, taking into account both positive and negative outcomes that could result from performing the corresponding task. Second, as discussed above, it must take into account the fact that there is an intrinsic cost to engaging control itself, which scales with the intensity of the signal required. Taken together, these two components determine what we will refer to as the expected value of control (EVC), which can be formalized as follows (see also Figures 2B, 4A, and 4B):EVC(signal,state)=[∑iPr(outcomei|signal,state)⋅Value(outcomei)]−Cost(signal)(Equation 1) As indicated by the arguments on the left-hand side, the EVC is a function of two variables, signal and state. Signal refers to a specific control signal (e.g., designating a particular task representation and its intensity). State refers to the current situation, spanning both environmental conditions and internal factors (e.g., motivational state, task difficulty, etc.). On the right-hand side, outcomes refer to subsequent states that result from the application of a particular control signal in the context of the current state, each with a particular probability (Pr); for example, the occurrence of a correct response or of an error. Since outcomes are themselves states, the terms “state” and “outcome” in Equation 1 can also be thought of as “current state” and “future state.” The Value of an outcome is defined recursively as follows:Value(outcome)=ImmediateReward(outcome)+γmaxi[EVC(signali,outcome)](Equation 2) where ImmediateReward can be either positive or negative (for example, in the case of an error, monetary loss or pain; the term “reward” is borrowed from reinforcement learning models but can be understood more colloquially as “worth”). Note that the maximization of EVC in the final term is over all feasible control signals (indexed by i), with outcome serving in place of the current state. The estimation of outcome value thus folds in the EVC of control signals implemented in future states. The parameter γ is a discount factor, between zero and one, controlling how much the current decision weighs future rewards relative to more immediate ones. The significance of this final term is that it links outcome value (and thus the EVC) not only to imm

求助该文献

最长约 10秒，即可获得该文献文件

The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function

今日热心研友