Marginal singularity and the benefits of labels in covariate-shift

协变量 极小极大 数学 非参数统计 边际分布 传输(计算) 样本量测定 分布(数学) 学习迁移 分类器(UML) 联合概率分布 概率分布 统计 样品(材料) 计量经济学 人工智能 计算机科学 数学优化 随机变量 数学分析 色谱法 并行计算 化学
作者
Samory Kpotufe,Guillaume Martinet
出处
期刊:Annals of Statistics [Institute of Mathematical Statistics]
卷期号:49 (6) 被引量:17
标识
DOI:10.1214/21-aos2084
摘要

Transfer Learning addresses common situations in Machine Leaning where little or no labeled data is available for a target prediction problem—corresponding to a distribution Q, but much labeled data is available from some related but different data distribution P. This work is concerned with the fundamental limits of transfer, that is, the limits in target performance in terms of (1) sample sizes from P and Q, and (2) differences in data distributions P, Q. In particular, we aim to address practical questions such as how much target data from Q is sufficient given a certain amount of related data from P, and how to optimally sample such target data for labeling. We present new minimax results for transfer in nonparametric classification (i.e., for situations where little is known about the target classifier), under the common assumption that the marginal distributions of covariates differ between P and Q (often termed covariate-shift). Our results are first to concisely capture the relative benefits of source and target labeled data in these settings through information-theoretic limits. Namely, we show that the benefits of target labels are tightly controlled by a transfer-exponent γ that encodes how singular Q is locally with respect to P, and interestingly paints a more favorable picture of transfer than what might be believed from insights from previous work. In fact, while previous work rely largely on refinements of traditional metrics and divergences between distributions, and often only yield a coarse view of when transfer is possible or not, our analysis—in terms of γ—reveals a continuum of new regimes ranging from easy to hard transfer. We then address the practical question of how to efficiently sample target data to label, by showing that a recently proposed semi-supervised procedure—based on k-NN classification, can be refined to adapt to unknown γ and, therefore, requests target labels only when beneficial, while achieving nearly minimax-optimal transfer rates without knowledge of distributional parameters. Of independent interest, we obtain new minimax-optimality results for vanilla k-NN classification in regimes with nonuniform marginals.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
酸奶七完成签到,获得积分10
刚刚
汉堡包应助能干的杨柿子采纳,获得10
1秒前
负责冰烟完成签到,获得积分10
1秒前
leoooo完成签到,获得积分10
1秒前
Akim应助Arjun采纳,获得10
5秒前
5秒前
橘仔乐完成签到,获得积分10
5秒前
7秒前
zhongyi完成签到,获得积分10
7秒前
CodeCraft应助负责冰烟采纳,获得10
8秒前
斯文败类应助DYN采纳,获得30
8秒前
乔呆驼发布了新的文献求助10
8秒前
彭于彦祖应助Jc采纳,获得20
9秒前
非要叫我起个昵称完成签到,获得积分10
9秒前
11秒前
wzgkeyantong发布了新的文献求助10
11秒前
yxl完成签到,获得积分10
14秒前
14秒前
15秒前
wh雨完成签到,获得积分20
17秒前
hyq008发布了新的文献求助10
17秒前
温柔的老头完成签到,获得积分10
19秒前
乔呆驼完成签到,获得积分10
20秒前
Ultraman45发布了新的文献求助10
20秒前
21秒前
负责冰烟发布了新的文献求助10
21秒前
Arjun完成签到,获得积分20
22秒前
23秒前
23秒前
23秒前
仁爱青文完成签到 ,获得积分10
24秒前
25秒前
鲤鱼青槐完成签到,获得积分10
25秒前
bird完成签到,获得积分10
29秒前
嘀嘀咕咕完成签到,获得积分10
29秒前
Arjun发布了新的文献求助10
29秒前
30秒前
共享精神应助科研通管家采纳,获得10
30秒前
科研通AI2S应助科研通管家采纳,获得10
30秒前
Owen应助科研通管家采纳,获得10
30秒前
高分求助中
Evolution 10000
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
Foreign Policy of the French Second Empire: A Bibliography 500
Chen Hansheng: China’s Last Romantic Revolutionary 500
China's Relations With Japan 1945-83: The Role of Liao Chengzhi 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3147980
求助须知:如何正确求助?哪些是违规求助? 2798977
关于积分的说明 7833117
捐赠科研通 2456104
什么是DOI,文献DOI怎么找? 1307127
科研通“疑难数据库(出版商)”最低求助积分说明 628062
版权声明 601620