Robust Estimation of a Location Parameter

数学 统计 估计理论 估计 位置参数 应用数学 计量经济学 估计员 经济 管理
作者
Peter J. Huber
出处
期刊:Annals of Mathematical Statistics [Institute of Mathematical Statistics]
卷期号:35 (1): 73-101 被引量:5502
标识
DOI:10.1214/aoms/1177703732
摘要

This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators--intermediaries between sample mean and sample median--that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.) Let $x_1, \cdots, x_n$ be independent random variables with common distribution function $F(t - \xi)$. The problem is to estimate the location parameter $\xi$, but with the complication that the prototype distribution $F(t)$ is only approximately known. I shall primarily be concerned with the model of indeterminacy $F = (1 - \epsilon)\Phi + \epsilon H$, where $0 \leqq \epsilon < 1$ is a known number, $\Phi(t) = (2\pi)^{-\frac{1}{2}} \int^t_{-\infty} \exp(-\frac{1}{2}s^2) ds$ is the standard normal cumulative and $H$ is an unknown contaminating distribution. This model arises for instance if the observations are assumed to be normal with variance 1, but a fraction $\epsilon$ of them is affected by gross errors. Later on, I shall also consider other models of indeterminacy, e.g., $\sup_t |F(t) - \Phi(t)| \leqq \epsilon$. Some inconvenience is caused by the fact that location and scale parameters are not uniquely determined: in general, for fixed $\epsilon$, there will be several values of $\xi$ and $\sigma$ such that $\sup_t|F(t) - \Phi((t - \xi)/\sigma)| \leqq \epsilon$, and similarly for the contaminated case. Although this inherent and unavoidable indeterminacy is small if $\epsilon$ is small and is rather irrelevant for practical purposes, it poses awkward problems for the theory, especially for optimality questions. To remove this difficulty, one may either (i) restrict attention to symmetric distributions, and estimate the location of the center of symmetry (this works for $\xi$ but not for $\sigma$); or (ii) one may define the parameter to be estimated in terms of the estimator itself, namely by its asymptotic value for sample size $n \rightarrow \infty$; or (iii) one may define the parameters by arbitrarily chosen functionals of the distribution (e.g., by the expectation, or the median of $F$). All three possibilities have unsatisfactory aspects, and I shall usually choose the variant which is mathematically most convenient. It is interesting to look back to the very origin of the theory of estimation, namely to Gauss and his theory of least squares. Gauss was fully aware that his main reason for assuming an underlying normal distribution and a quadratic loss function was mathematical, i.e., computational, convenience. In later times, this was often forgotten, partly because of the central limit theorem. However, if one wants to be honest, the central limit theorem can at most explain why many distributions occurring in practice are approximately normal. The stress is on the word "approximately." This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): What happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance: seemingly quite mild deviations may already explode its variance. Tukey and others proposed several more robust substitutes--trimmed means, Winsorized means, etc.--and explored their performance for a few typical violations of normality. A general theory of robust estimation is still lacking; it is hoped that the present paper will furnish the first few steps toward such a theory. At the core of the method of least squares lies the idea to minimize the sum of the squared "errors," that is, to adjust the unknown parameters such that the sum of the squares of the differences between observed and computed values is minimized. In the simplest case, with which we are concerned here, namely the estimation of a location parameter, one has to minimize the expression $\sum_i (x_i - T)^2$; this is of course achieved by the sample mean $T = \sum_i x_i/n$. I should like to emphasize that no loss function is involved here; I am only describing how the least squares estimator is defined, and neither the underlying family of distributions nor the true value of the parameter to be estimated enters so far. It is quite natural to ask whether one can obtain more robustness by minimizing another function of the errors than the sum of their squares. We shall therefore concentrate our attention to estimators that can be defined by a minimum principle of the form (for a location parameter): $T = T_n(x_1, \cdots, x_n) minimizes \sum_i \rho(x_i - T),$ \begin{equation*} \tag{M} where \rho is a non-constant function. \end{equation*} Of course, this definition generalizes at once to more general least squares type problems, where several parameters have to be determined. This class of estimators contains in particular (i) the sample mean $(\rho(t) = t^2)$, (ii) the sample median $(\rho(t) = |t|)$, and more generally, (iii) all maximum likelihood estimators $(\rho(t) = -\log f(t)$, where $f$ is the assumed density of the untranslated distribution). These ($M$)-estimators, as I shall call them for short, have rather pleasant asymptotic properties; sufficient conditions for asymptotic normality and an explicit expression for their asymptotic variance will be given. How should one judge the robustness of an estimator $T_n(x) = T_n(x_1, \cdots, x_n)$? Since ill effects from contamination are mainly felt for large sample sizes, it seems that one should primarily optimize large sample robustness properties. Therefore, a convenient measure of robustness for asymptotically normal estimators seems to be the supremum of the asymptotic variance $(n \rightarrow \infty)$ when $F$ ranges over some suitable set of underlying distributions, in particular over the set of all $F = (1 - \epsilon)\Phi + \epsilon H$ for fixed $\epsilon$ and symmetric $H$. On second thought, it turns out that the asymptotic variance is not only easier to handle, but that even for moderate values of $n$ it is a better measure of performance than the actual variance, because (i) the actual variance of an estimator depends very much on the behavior of the tails of $H$, and the supremum of the actual variance is infinite for any estimator whose value is always contained in the convex hull of the observations. (ii) If an estimator is asymptotically normal, then the important central part of its distribution and confidence intervals for moderate confidence levels can better be approximated in terms of the asymptotic variance than in terms of the actual variance. If we adopt this measure of robustness, and if we restrict attention to ($M$)-estimators, then it will be shown that the most robust estimator is uniquely determined and corresponds to the following $\rho:\rho(t) = \frac{1}{2}t^2$ for $|t| < k, \rho(t) = k|t| - \frac{1}{2}k^2$ for $|t| \geqq k$, with $k$ depending on $\epsilon$. This estimator is most robust even among all translation invariant estimators. Sample mean $(k = \infty)$ and sample median $(k = 0)$ are limiting cases corresponding to $\epsilon = 0$ and $\epsilon = 1$, respectively, and the estimator is closely related and asymptotically equivalent to Winsorizing. I recall the definition of Winsorizing: assume that the observations have been ordered, $x_1 \leqq x_2 \leqq \cdots \leqq x_n$, then the statistic $T = n^{-1}(gx_{g + 1} + x_{g + 1} + x_{g + 2} + \cdots + x_{n - h} + hx_{n - h})$ is called the Winsorized mean, obtained by Winsorizing the $g$ leftmost and the $h$ rightmost observations. The above most robust ($M$)-estimators can be described by the same formula, except that in the first and in the last summand, the factors $x_{g + 1}$ and $x_{n - h}$ have to be replaced by some numbers $u, v$ satisfying $x_g \leqq u \leqq x_{g + 1}$ and $x_{n - h} \leqq v \leqq x_{n - h + 1}$, respectively; $g, h, u$ and $v$ depend on the sample. In fact, this ($M$)-estimator is the maximum likelihood estimator corresponding to a unique least favorable distribution $F_0$ with density $f_0(t) = (1 - \epsilon)(2\pi)^{-\frac{1}{2}}e^{-\rho(t)}$. This $f_0$ behaves like a normal density for small $t$, like an exponential density for large $t$. At least for me, this was rather surprising--I would have expected an $f_0$ with much heavier tails. This result is a particular case of a more general one that can be stated roughly as follows: Assume that $F$ belongs to some convex set $C$ of distribution functions. Then the most robust ($M$)-estimator for the set $C$ coincides with the maximum likelihood estimator for the unique $F_0 \varepsilon C$ which has the smallest Fisher information number $I(F) = \int (f'/f)^2f dt$ among all $F \varepsilon C$. Miscellaneous related problems will also be treated: the case of non-symmetric contaminating distributions; the most robust estimator for the model of indeterminacy $\sup_t|F(t) - \Phi(t)| \leqq \epsilon$; robust estimation of a scale parameter; how to estimate location, if scale and $\epsilon$ are unknown; numerical computation of the estimators; more general estimators, e.g., minimizing $\sum_{i < j} \rho(x_i - T, x_j - T)$, where $\rho$ is a function of two arguments. Questions of small sample size theory will not be touched in this paper.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Orange应助任鲂采纳,获得10
1秒前
LL完成签到,获得积分10
2秒前
2秒前
郭子仪完成签到,获得积分10
3秒前
4秒前
充电宝应助provin采纳,获得10
4秒前
5秒前
6秒前
7秒前
从前的我完成签到 ,获得积分10
7秒前
黄老牛发布了新的文献求助10
8秒前
8秒前
8秒前
9秒前
10秒前
肖肖恩发布了新的文献求助10
10秒前
Gumiano发布了新的文献求助10
10秒前
jinxichen完成签到,获得积分10
11秒前
666完成签到,获得积分10
13秒前
13秒前
jenningseastera应助浅夏采纳,获得10
14秒前
思源应助浅夏采纳,获得10
14秒前
xuz发布了新的文献求助10
14秒前
红甲发布了新的文献求助10
14秒前
任鲂发布了新的文献求助10
15秒前
15秒前
量子星尘发布了新的文献求助10
16秒前
provin发布了新的文献求助10
16秒前
芋泥啵啵完成签到,获得积分10
19秒前
20秒前
qwe完成签到 ,获得积分10
20秒前
24秒前
hcw发布了新的文献求助10
25秒前
25秒前
kk应助红甲采纳,获得10
26秒前
大模型应助红甲采纳,获得10
26秒前
26秒前
任鲂完成签到,获得积分20
26秒前
26秒前
28秒前
高分求助中
The Mother of All Tableaux Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 2400
Ophthalmic Equipment Market by Devices(surgical: vitreorentinal,IOLs,OVDs,contact lens,RGP lens,backflush,diagnostic&monitoring:OCT,actorefractor,keratometer,tonometer,ophthalmoscpe,OVD), End User,Buying Criteria-Global Forecast to2029 2000
Optimal Transport: A Comprehensive Introduction to Modeling, Analysis, Simulation, Applications 800
Official Methods of Analysis of AOAC INTERNATIONAL 600
ACSM’s Guidelines for Exercise Testing and Prescription, 12th edition 588
T/CIET 1202-2025 可吸收再生氧化纤维素止血材料 500
Comparison of adverse drug reactions of heparin and its derivates in the European Economic Area based on data from EudraVigilance between 2017 and 2021 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3952555
求助须知:如何正确求助?哪些是违规求助? 3498015
关于积分的说明 11089696
捐赠科研通 3228463
什么是DOI,文献DOI怎么找? 1784978
邀请新用户注册赠送积分活动 869059
科研通“疑难数据库(出版商)”最低求助积分说明 801309