亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Robust Estimation of a Location Parameter

数学 统计 估计理论 估计 位置参数 应用数学 计量经济学 估计员 经济 管理
作者
Peter J. Huber
出处
期刊:Annals of Mathematical Statistics [Institute of Mathematical Statistics]
卷期号:35 (1): 73-101 被引量:5502
标识
DOI:10.1214/aoms/1177703732
摘要

This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators--intermediaries between sample mean and sample median--that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.) Let $x_1, \cdots, x_n$ be independent random variables with common distribution function $F(t - \xi)$. The problem is to estimate the location parameter $\xi$, but with the complication that the prototype distribution $F(t)$ is only approximately known. I shall primarily be concerned with the model of indeterminacy $F = (1 - \epsilon)\Phi + \epsilon H$, where $0 \leqq \epsilon < 1$ is a known number, $\Phi(t) = (2\pi)^{-\frac{1}{2}} \int^t_{-\infty} \exp(-\frac{1}{2}s^2) ds$ is the standard normal cumulative and $H$ is an unknown contaminating distribution. This model arises for instance if the observations are assumed to be normal with variance 1, but a fraction $\epsilon$ of them is affected by gross errors. Later on, I shall also consider other models of indeterminacy, e.g., $\sup_t |F(t) - \Phi(t)| \leqq \epsilon$. Some inconvenience is caused by the fact that location and scale parameters are not uniquely determined: in general, for fixed $\epsilon$, there will be several values of $\xi$ and $\sigma$ such that $\sup_t|F(t) - \Phi((t - \xi)/\sigma)| \leqq \epsilon$, and similarly for the contaminated case. Although this inherent and unavoidable indeterminacy is small if $\epsilon$ is small and is rather irrelevant for practical purposes, it poses awkward problems for the theory, especially for optimality questions. To remove this difficulty, one may either (i) restrict attention to symmetric distributions, and estimate the location of the center of symmetry (this works for $\xi$ but not for $\sigma$); or (ii) one may define the parameter to be estimated in terms of the estimator itself, namely by its asymptotic value for sample size $n \rightarrow \infty$; or (iii) one may define the parameters by arbitrarily chosen functionals of the distribution (e.g., by the expectation, or the median of $F$). All three possibilities have unsatisfactory aspects, and I shall usually choose the variant which is mathematically most convenient. It is interesting to look back to the very origin of the theory of estimation, namely to Gauss and his theory of least squares. Gauss was fully aware that his main reason for assuming an underlying normal distribution and a quadratic loss function was mathematical, i.e., computational, convenience. In later times, this was often forgotten, partly because of the central limit theorem. However, if one wants to be honest, the central limit theorem can at most explain why many distributions occurring in practice are approximately normal. The stress is on the word "approximately." This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): What happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance: seemingly quite mild deviations may already explode its variance. Tukey and others proposed several more robust substitutes--trimmed means, Winsorized means, etc.--and explored their performance for a few typical violations of normality. A general theory of robust estimation is still lacking; it is hoped that the present paper will furnish the first few steps toward such a theory. At the core of the method of least squares lies the idea to minimize the sum of the squared "errors," that is, to adjust the unknown parameters such that the sum of the squares of the differences between observed and computed values is minimized. In the simplest case, with which we are concerned here, namely the estimation of a location parameter, one has to minimize the expression $\sum_i (x_i - T)^2$; this is of course achieved by the sample mean $T = \sum_i x_i/n$. I should like to emphasize that no loss function is involved here; I am only describing how the least squares estimator is defined, and neither the underlying family of distributions nor the true value of the parameter to be estimated enters so far. It is quite natural to ask whether one can obtain more robustness by minimizing another function of the errors than the sum of their squares. We shall therefore concentrate our attention to estimators that can be defined by a minimum principle of the form (for a location parameter): $T = T_n(x_1, \cdots, x_n) minimizes \sum_i \rho(x_i - T),$ \begin{equation*} \tag{M} where \rho is a non-constant function. \end{equation*} Of course, this definition generalizes at once to more general least squares type problems, where several parameters have to be determined. This class of estimators contains in particular (i) the sample mean $(\rho(t) = t^2)$, (ii) the sample median $(\rho(t) = |t|)$, and more generally, (iii) all maximum likelihood estimators $(\rho(t) = -\log f(t)$, where $f$ is the assumed density of the untranslated distribution). These ($M$)-estimators, as I shall call them for short, have rather pleasant asymptotic properties; sufficient conditions for asymptotic normality and an explicit expression for their asymptotic variance will be given. How should one judge the robustness of an estimator $T_n(x) = T_n(x_1, \cdots, x_n)$? Since ill effects from contamination are mainly felt for large sample sizes, it seems that one should primarily optimize large sample robustness properties. Therefore, a convenient measure of robustness for asymptotically normal estimators seems to be the supremum of the asymptotic variance $(n \rightarrow \infty)$ when $F$ ranges over some suitable set of underlying distributions, in particular over the set of all $F = (1 - \epsilon)\Phi + \epsilon H$ for fixed $\epsilon$ and symmetric $H$. On second thought, it turns out that the asymptotic variance is not only easier to handle, but that even for moderate values of $n$ it is a better measure of performance than the actual variance, because (i) the actual variance of an estimator depends very much on the behavior of the tails of $H$, and the supremum of the actual variance is infinite for any estimator whose value is always contained in the convex hull of the observations. (ii) If an estimator is asymptotically normal, then the important central part of its distribution and confidence intervals for moderate confidence levels can better be approximated in terms of the asymptotic variance than in terms of the actual variance. If we adopt this measure of robustness, and if we restrict attention to ($M$)-estimators, then it will be shown that the most robust estimator is uniquely determined and corresponds to the following $\rho:\rho(t) = \frac{1}{2}t^2$ for $|t| < k, \rho(t) = k|t| - \frac{1}{2}k^2$ for $|t| \geqq k$, with $k$ depending on $\epsilon$. This estimator is most robust even among all translation invariant estimators. Sample mean $(k = \infty)$ and sample median $(k = 0)$ are limiting cases corresponding to $\epsilon = 0$ and $\epsilon = 1$, respectively, and the estimator is closely related and asymptotically equivalent to Winsorizing. I recall the definition of Winsorizing: assume that the observations have been ordered, $x_1 \leqq x_2 \leqq \cdots \leqq x_n$, then the statistic $T = n^{-1}(gx_{g + 1} + x_{g + 1} + x_{g + 2} + \cdots + x_{n - h} + hx_{n - h})$ is called the Winsorized mean, obtained by Winsorizing the $g$ leftmost and the $h$ rightmost observations. The above most robust ($M$)-estimators can be described by the same formula, except that in the first and in the last summand, the factors $x_{g + 1}$ and $x_{n - h}$ have to be replaced by some numbers $u, v$ satisfying $x_g \leqq u \leqq x_{g + 1}$ and $x_{n - h} \leqq v \leqq x_{n - h + 1}$, respectively; $g, h, u$ and $v$ depend on the sample. In fact, this ($M$)-estimator is the maximum likelihood estimator corresponding to a unique least favorable distribution $F_0$ with density $f_0(t) = (1 - \epsilon)(2\pi)^{-\frac{1}{2}}e^{-\rho(t)}$. This $f_0$ behaves like a normal density for small $t$, like an exponential density for large $t$. At least for me, this was rather surprising--I would have expected an $f_0$ with much heavier tails. This result is a particular case of a more general one that can be stated roughly as follows: Assume that $F$ belongs to some convex set $C$ of distribution functions. Then the most robust ($M$)-estimator for the set $C$ coincides with the maximum likelihood estimator for the unique $F_0 \varepsilon C$ which has the smallest Fisher information number $I(F) = \int (f'/f)^2f dt$ among all $F \varepsilon C$. Miscellaneous related problems will also be treated: the case of non-symmetric contaminating distributions; the most robust estimator for the model of indeterminacy $\sup_t|F(t) - \Phi(t)| \leqq \epsilon$; robust estimation of a scale parameter; how to estimate location, if scale and $\epsilon$ are unknown; numerical computation of the estimators; more general estimators, e.g., minimizing $\sum_{i < j} \rho(x_i - T, x_j - T)$, where $\rho$ is a function of two arguments. Questions of small sample size theory will not be touched in this paper.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
满地枫叶完成签到,获得积分20
30秒前
joanna完成签到,获得积分10
31秒前
满地枫叶发布了新的文献求助10
41秒前
46秒前
M先生完成签到,获得积分10
55秒前
1分钟前
1分钟前
tlx发布了新的文献求助10
1分钟前
1分钟前
1分钟前
2分钟前
2分钟前
2分钟前
小圆圈发布了新的文献求助30
2分钟前
兴奋的宛亦完成签到,获得积分20
2分钟前
zhanglongfei发布了新的文献求助10
2分钟前
3分钟前
小圆圈发布了新的文献求助10
3分钟前
3分钟前
小圆圈发布了新的文献求助10
3分钟前
李健的小迷弟应助小圆圈采纳,获得10
3分钟前
3分钟前
冬瓜排骨养生汤完成签到,获得积分10
3分钟前
4分钟前
小圆圈发布了新的文献求助10
4分钟前
vantie完成签到 ,获得积分10
4分钟前
5分钟前
zhanglongfei完成签到,获得积分10
5分钟前
Luis发布了新的文献求助10
5分钟前
7分钟前
7分钟前
北陆玄枵发布了新的文献求助10
7分钟前
8分钟前
8分钟前
8分钟前
8分钟前
Dan完成签到,获得积分10
9分钟前
9分钟前
lcs完成签到,获得积分10
9分钟前
9分钟前
高分求助中
Evolution 10000
Sustainability in Tides Chemistry 2800
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
An Introduction to Geographical and Urban Economics: A Spiky World Book by Charles van Marrewijk, Harry Garretsen, and Steven Brakman 500
Diagnostic immunohistochemistry : theranostic and genomic applications 6th Edition 500
Chen Hansheng: China’s Last Romantic Revolutionary 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3150609
求助须知:如何正确求助?哪些是违规求助? 2802008
关于积分的说明 7846050
捐赠科研通 2459372
什么是DOI,文献DOI怎么找? 1309219
科研通“疑难数据库(出版商)”最低求助积分说明 628696
版权声明 601757