How Much Data Is Sufficient to Learn High-Performing Algorithms?

计算机科学 算法
作者
Maria-Florina Balcan,Dan DeBlasio,Travis Dick,Carl Kingsford,Tüomas Sandholm,Ellen Vitercik
出处
期刊:Journal of the ACM [Association for Computing Machinery]
卷期号:71 (5): 1-58 被引量:13
标识
DOI:10.1145/3676278
摘要

Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that a data-driven approach to parameter tuning can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide techniques for deriving generalization guarantees that bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research [e.g., 12 , 16 , 20 , 62 ] has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We streamline these analyses with a general theorem that applies whenever an algorithm’s performance is a piecewise-constant, piecewise-linear, or—more generally— piecewise-structured function of its parameters. Our results, which are tight up to logarithmic factors in the worst case, also imply novel bounds for configuring dynamic programming algorithms from computational biology.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
不懈奋进发布了新的文献求助10
刚刚
1秒前
桐桐应助YYL采纳,获得10
1秒前
小刘一定能读C9博完成签到 ,获得积分10
1秒前
2秒前
3秒前
4秒前
生动路人发布了新的文献求助10
4秒前
4秒前
4秒前
执着完成签到,获得积分10
5秒前
5秒前
善学以致用应助草木采纳,获得10
5秒前
沉默大白完成签到 ,获得积分10
5秒前
Zhang发布了新的文献求助10
6秒前
大太阳发布了新的文献求助10
6秒前
ikress完成签到,获得积分10
6秒前
6秒前
小福宝完成签到,获得积分10
7秒前
狗头发布了新的文献求助50
8秒前
Kenzonvay发布了新的文献求助10
9秒前
9秒前
大太阳完成签到,获得积分10
12秒前
12秒前
庸人自扰完成签到,获得积分10
12秒前
hjyylab完成签到,获得积分10
12秒前
生动路人完成签到,获得积分10
13秒前
Sev发布了新的文献求助10
13秒前
lzq完成签到,获得积分10
13秒前
14秒前
14秒前
搜集达人应助第七班采纳,获得10
17秒前
吴昕奕完成签到 ,获得积分10
17秒前
18秒前
无私的问芙完成签到,获得积分10
19秒前
科研通AI2S应助那个人采纳,获得10
20秒前
20秒前
21秒前
和平港湾完成签到,获得积分10
22秒前
高分求助中
Licensing Deals in Pharmaceuticals 2019-2024 3000
Cognitive Paradigms in Knowledge Organisation 2000
Practical Pulmonary Pathology 1000
Mantiden: Faszinierende Lauerjäger Faszinierende Lauerjäger Heßler, Claudia, Rud 1000
PraxisRatgeber: Mantiden: Faszinierende Lauerjäger 1000
Natural History of Mantodea 螳螂的自然史 1000
A Photographic Guide to Mantis of China 常见螳螂野外识别手册 800
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 量子力学 冶金 电极
热门帖子
关注 科研通微信公众号,转发送积分 3318150
求助须知:如何正确求助?哪些是违规求助? 2949464
关于积分的说明 8546274
捐赠科研通 2625891
什么是DOI,文献DOI怎么找? 1437001
科研通“疑难数据库(出版商)”最低求助积分说明 666040
邀请新用户注册赠送积分活动 652067