Robust Estimation of a Location Parameter

数学 统计 估计理论 估计 位置参数 应用数学 计量经济学 估计员 经济 管理
作者
Peter J. Huber
出处
期刊:Annals of Mathematical Statistics [Institute of Mathematical Statistics]
卷期号:35 (1): 73-101 被引量:6520
标识
DOI:10.1214/aoms/1177703732
摘要

This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators--intermediaries between sample mean and sample median--that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.) Let $x_1, \cdots, x_n$ be independent random variables with common distribution function $F(t - \xi)$. The problem is to estimate the location parameter $\xi$, but with the complication that the prototype distribution $F(t)$ is only approximately known. I shall primarily be concerned with the model of indeterminacy $F = (1 - \epsilon)\Phi + \epsilon H$, where $0 \leqq \epsilon < 1$ is a known number, $\Phi(t) = (2\pi)^{-\frac{1}{2}} \int^t_{-\infty} \exp(-\frac{1}{2}s^2) ds$ is the standard normal cumulative and $H$ is an unknown contaminating distribution. This model arises for instance if the observations are assumed to be normal with variance 1, but a fraction $\epsilon$ of them is affected by gross errors. Later on, I shall also consider other models of indeterminacy, e.g., $\sup_t |F(t) - \Phi(t)| \leqq \epsilon$. Some inconvenience is caused by the fact that location and scale parameters are not uniquely determined: in general, for fixed $\epsilon$, there will be several values of $\xi$ and $\sigma$ such that $\sup_t|F(t) - \Phi((t - \xi)/\sigma)| \leqq \epsilon$, and similarly for the contaminated case. Although this inherent and unavoidable indeterminacy is small if $\epsilon$ is small and is rather irrelevant for practical purposes, it poses awkward problems for the theory, especially for optimality questions. To remove this difficulty, one may either (i) restrict attention to symmetric distributions, and estimate the location of the center of symmetry (this works for $\xi$ but not for $\sigma$); or (ii) one may define the parameter to be estimated in terms of the estimator itself, namely by its asymptotic value for sample size $n \rightarrow \infty$; or (iii) one may define the parameters by arbitrarily chosen functionals of the distribution (e.g., by the expectation, or the median of $F$). All three possibilities have unsatisfactory aspects, and I shall usually choose the variant which is mathematically most convenient. It is interesting to look back to the very origin of the theory of estimation, namely to Gauss and his theory of least squares. Gauss was fully aware that his main reason for assuming an underlying normal distribution and a quadratic loss function was mathematical, i.e., computational, convenience. In later times, this was often forgotten, partly because of the central limit theorem. However, if one wants to be honest, the central limit theorem can at most explain why many distributions occurring in practice are approximately normal. The stress is on the word "approximately." This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): What happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance: seemingly quite mild deviations may already explode its variance. Tukey and others proposed several more robust substitutes--trimmed means, Winsorized means, etc.--and explored their performance for a few typical violations of normality. A general theory of robust estimation is still lacking; it is hoped that the present paper will furnish the first few steps toward such a theory. At the core of the method of least squares lies the idea to minimize the sum of the squared "errors," that is, to adjust the unknown parameters such that the sum of the squares of the differences between observed and computed values is minimized. In the simplest case, with which we are concerned here, namely the estimation of a location parameter, one has to minimize the expression $\sum_i (x_i - T)^2$; this is of course achieved by the sample mean $T = \sum_i x_i/n$. I should like to emphasize that no loss function is involved here; I am only describing how the least squares estimator is defined, and neither the underlying family of distributions nor the true value of the parameter to be estimated enters so far. It is quite natural to ask whether one can obtain more robustness by minimizing another function of the errors than the sum of their squares. We shall therefore concentrate our attention to estimators that can be defined by a minimum principle of the form (for a location parameter): $T = T_n(x_1, \cdots, x_n) minimizes \sum_i \rho(x_i - T),$ \begin{equation*} \tag{M} where \rho is a non-constant function. \end{equation*} Of course, this definition generalizes at once to more general least squares type problems, where several parameters have to be determined. This class of estimators contains in particular (i) the sample mean $(\rho(t) = t^2)$, (ii) the sample median $(\rho(t) = |t|)$, and more generally, (iii) all maximum likelihood estimators $(\rho(t) = -\log f(t)$, where $f$ is the assumed density of the untranslated distribution). These ($M$)-estimators, as I shall call them for short, have rather pleasant asymptotic properties; sufficient conditions for asymptotic normality and an explicit expression for their asymptotic variance will be given. How should one judge the robustness of an estimator $T_n(x) = T_n(x_1, \cdots, x_n)$? Since ill effects from contamination are mainly felt for large sample sizes, it seems that one should primarily optimize large sample robustness properties. Therefore, a convenient measure of robustness for asymptotically normal estimators seems to be the supremum of the asymptotic variance $(n \rightarrow \infty)$ when $F$ ranges over some suitable set of underlying distributions, in particular over the set of all $F = (1 - \epsilon)\Phi + \epsilon H$ for fixed $\epsilon$ and symmetric $H$. On second thought, it turns out that the asymptotic variance is not only easier to handle, but that even for moderate values of $n$ it is a better measure of performance than the actual variance, because (i) the actual variance of an estimator depends very much on the behavior of the tails of $H$, and the supremum of the actual variance is infinite for any estimator whose value is always contained in the convex hull of the observations. (ii) If an estimator is asymptotically normal, then the important central part of its distribution and confidence intervals for moderate confidence levels can better be approximated in terms of the asymptotic variance than in terms of the actual variance. If we adopt this measure of robustness, and if we restrict attention to ($M$)-estimators, then it will be shown that the most robust estimator is uniquely determined and corresponds to the following $\rho:\rho(t) = \frac{1}{2}t^2$ for $|t| < k, \rho(t) = k|t| - \frac{1}{2}k^2$ for $|t| \geqq k$, with $k$ depending on $\epsilon$. This estimator is most robust even among all translation invariant estimators. Sample mean $(k = \infty)$ and sample median $(k = 0)$ are limiting cases corresponding to $\epsilon = 0$ and $\epsilon = 1$, respectively, and the estimator is closely related and asymptotically equivalent to Winsorizing. I recall the definition of Winsorizing: assume that the observations have been ordered, $x_1 \leqq x_2 \leqq \cdots \leqq x_n$, then the statistic $T = n^{-1}(gx_{g + 1} + x_{g + 1} + x_{g + 2} + \cdots + x_{n - h} + hx_{n - h})$ is called the Winsorized mean, obtained by Winsorizing the $g$ leftmost and the $h$ rightmost observations. The above most robust ($M$)-estimators can be described by the same formula, except that in the first and in the last summand, the factors $x_{g + 1}$ and $x_{n - h}$ have to be replaced by some numbers $u, v$ satisfying $x_g \leqq u \leqq x_{g + 1}$ and $x_{n - h} \leqq v \leqq x_{n - h + 1}$, respectively; $g, h, u$ and $v$ depend on the sample. In fact, this ($M$)-estimator is the maximum likelihood estimator corresponding to a unique least favorable distribution $F_0$ with density $f_0(t) = (1 - \epsilon)(2\pi)^{-\frac{1}{2}}e^{-\rho(t)}$. This $f_0$ behaves like a normal density for small $t$, like an exponential density for large $t$. At least for me, this was rather surprising--I would have expected an $f_0$ with much heavier tails. This result is a particular case of a more general one that can be stated roughly as follows: Assume that $F$ belongs to some convex set $C$ of distribution functions. Then the most robust ($M$)-estimator for the set $C$ coincides with the maximum likelihood estimator for the unique $F_0 \varepsilon C$ which has the smallest Fisher information number $I(F) = \int (f'/f)^2f dt$ among all $F \varepsilon C$. Miscellaneous related problems will also be treated: the case of non-symmetric contaminating distributions; the most robust estimator for the model of indeterminacy $\sup_t|F(t) - \Phi(t)| \leqq \epsilon$; robust estimation of a scale parameter; how to estimate location, if scale and $\epsilon$ are unknown; numerical computation of the estimators; more general estimators, e.g., minimizing $\sum_{i < j} \rho(x_i - T, x_j - T)$, where $\rho$ is a function of two arguments. Questions of small sample size theory will not be touched in this paper.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
大胆飞荷完成签到,获得积分10
刚刚
落后的听双完成签到,获得积分10
1秒前
1秒前
death123517完成签到,获得积分10
2秒前
淦淦关注了科研通微信公众号
2秒前
熙可檬发布了新的文献求助10
3秒前
3秒前
我师傅不是好人完成签到,获得积分10
3秒前
3秒前
zjw1997发布了新的文献求助30
4秒前
大个应助superming采纳,获得10
4秒前
完美羿完成签到,获得积分10
4秒前
4秒前
duwang完成签到,获得积分10
5秒前
5秒前
鄙视注册完成签到,获得积分0
5秒前
风华发布了新的文献求助10
5秒前
5秒前
6秒前
Robin发布了新的文献求助10
6秒前
现代半山完成签到 ,获得积分10
6秒前
思源应助888采纳,获得10
7秒前
7秒前
科研通AI6应助康康采纳,获得10
7秒前
Tina完成签到,获得积分10
7秒前
zwx发布了新的文献求助10
7秒前
香菜发布了新的文献求助10
7秒前
12138完成签到,获得积分10
8秒前
听话的炳完成签到,获得积分20
8秒前
8秒前
9秒前
耍酷的婴发布了新的文献求助10
9秒前
科研通AI6应助mochen采纳,获得10
9秒前
9秒前
zhou发布了新的文献求助10
10秒前
搞怪的萃发布了新的文献求助10
11秒前
kopp发布了新的文献求助10
11秒前
Jared应助wuran采纳,获得10
11秒前
12秒前
zeta发布了新的文献求助10
12秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Encyclopedia of Reproduction Third Edition 3000
《药学类医疗服务价格项目立项指南(征求意见稿)》 1000
花の香りの秘密―遺伝子情報から機能性まで 800
1st Edition Sports Rehabilitation and Training Multidisciplinary Perspectives By Richard Moss, Adam Gledhill 600
Chemistry and Biochemistry: Research Progress Vol. 7 430
Biotechnology Engineering 400
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5629957
求助须知:如何正确求助?哪些是违规求助? 4721200
关于积分的说明 14971845
捐赠科研通 4787915
什么是DOI,文献DOI怎么找? 2556638
邀请新用户注册赠送积分活动 1517713
关于科研通互助平台的介绍 1478320