摘要
This article reviews a free-energy formulation that advances Helmholtz's agenda to find principles of brain function based on conservation laws and neuronal energy. It rests on advances in statistical physics, theoretical biology and machine learning to explain a remarkable range of facts about brain structure and function. We could have just scratched the surface of what this formulation offers; for example, it is becoming clear that the Bayesian brain is just one facet of the free-energy principle and that perception is an inevitable consequence of active exchange with the environment. Furthermore, one can see easily how constructs like memory, attention, value, reinforcement and salience might disclose their simple relationships within this framework. This article reviews a free-energy formulation that advances Helmholtz's agenda to find principles of brain function based on conservation laws and neuronal energy. It rests on advances in statistical physics, theoretical biology and machine learning to explain a remarkable range of facts about brain structure and function. We could have just scratched the surface of what this formulation offers; for example, it is becoming clear that the Bayesian brain is just one facet of the free-energy principle and that perception is an inevitable consequence of active exchange with the environment. Furthermore, one can see easily how constructs like memory, attention, value, reinforcement and salience might disclose their simple relationships within this framework. information divergence, information gain, cross or relative entropy is a non-commutative measure of the difference between two probability distributions. a measure of salience based on the divergence between the recognition and prior densities. It measures the information in the data that can be recognised. or posterior density is the probability distribution of causes or model parameters, given some data; i.e., a probabilistic mapping from observed data to causes. priors that are induced by hierarchical models; they provide constraints on the recognition density is the usual way but depend on the data. the average surprise of outcomes sampled from a probability distribution or density. A density with low entropy means, on average, the outcome is relatively predictable. a process is ergodic if its long term time-average converges to its ensemble average. Ergodic processes that evolve for a long time forget their initial states. an information theory measure that bounds (is greater than) the surprise on sampling some data, given a generative model. of motion cover the value of a variable, its motion, acceleration, jerk and higher orders of motion. A point in generalised coordinates corresponds to a path or trajectory over time. or forward model is a probabilistic mapping from causes to observed consequences (data). It is usually specified in terms of the likelihood of getting some data given their causes (parameters of a model) and priors on the parameters an optimisation scheme that finds a minimum of a function by changing its arguments in proportion to the negative of the gradient of the function at the current value. device or scheme that uses a generative model to furnish a recognition density. They learn hidden structure in data by optimising the parameters of generative models. the probability distribution or density on the causes of data that encode beliefs about those causes prior to observing the data. or approximating conditional density is an approximate probability distribution of the causes of data. It is the product of inference or inverting a generative model. the successive states of stochastic processes are governed by random effects. quantities which are sufficient to parameterise a probability density (e.g., mean and covariance of a Gaussian density). or self-information is the negative log-probability of an outcome. An improbable outcome is therefore surprising.