How Much Data Is Sufficient to Learn High-Performing Algorithms?

计算机科学 算法
作者
Maria-Florina Balcan,Dan DeBlasio,Travis Dick,Carl Kingsford,Tüomas Sandholm,Ellen Vitercik
出处
期刊:Journal of the ACM [Association for Computing Machinery]
卷期号:71 (5): 1-58 被引量:13
标识
DOI:10.1145/3676278
摘要

Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that a data-driven approach to parameter tuning can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide techniques for deriving generalization guarantees that bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research [e.g., 12 , 16 , 20 , 62 ] has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We streamline these analyses with a general theorem that applies whenever an algorithm’s performance is a piecewise-constant, piecewise-linear, or—more generally— piecewise-structured function of its parameters. Our results, which are tight up to logarithmic factors in the worst case, also imply novel bounds for configuring dynamic programming algorithms from computational biology.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
花海发布了新的文献求助10
1秒前
在水一方应助旋律采纳,获得10
1秒前
达达利亚发布了新的文献求助10
1秒前
有人应助hhh采纳,获得10
2秒前
shudder完成签到,获得积分20
2秒前
云飞扬应助ning采纳,获得10
2秒前
华仔应助薛定谔的猫采纳,获得10
3秒前
Tumumu完成签到,获得积分10
3秒前
3秒前
噗宝凹发布了新的文献求助10
3秒前
3秒前
kai0305完成签到,获得积分10
4秒前
4秒前
猴子魏应助yi5feng采纳,获得20
4秒前
yangz发布了新的文献求助10
4秒前
suha发布了新的文献求助10
5秒前
jdh发布了新的文献求助10
5秒前
5秒前
5秒前
6秒前
木虫发布了新的文献求助10
6秒前
waypeter完成签到,获得积分10
6秒前
HUANG_黄完成签到,获得积分10
7秒前
烟花应助倪倪采纳,获得10
7秒前
量子星尘发布了新的文献求助10
7秒前
脑洞疼应助负责乐安采纳,获得10
7秒前
7秒前
NexusExplorer应助yangz采纳,获得10
8秒前
旋律发布了新的文献求助10
8秒前
8秒前
嘿嘿嘿发布了新的文献求助10
9秒前
ADDDGDD完成签到,获得积分10
9秒前
牟烨伟发布了新的文献求助10
9秒前
9秒前
10秒前
领导范儿应助仰望星空采纳,获得100
10秒前
10秒前
916应助科研通管家采纳,获得10
10秒前
李爱国应助科研通管家采纳,获得10
11秒前
11秒前
高分求助中
The Mother of All Tableaux Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 2400
Ophthalmic Equipment Market by Devices(surgical: vitreorentinal,IOLs,OVDs,contact lens,RGP lens,backflush,diagnostic&monitoring:OCT,actorefractor,keratometer,tonometer,ophthalmoscpe,OVD), End User,Buying Criteria-Global Forecast to2029 2000
Optimal Transport: A Comprehensive Introduction to Modeling, Analysis, Simulation, Applications 800
Official Methods of Analysis of AOAC INTERNATIONAL 600
ACSM’s Guidelines for Exercise Testing and Prescription, 12th edition 588
T/CIET 1202-2025 可吸收再生氧化纤维素止血材料 500
Interpretation of Mass Spectra, Fourth Edition 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3955056
求助须知:如何正确求助?哪些是违规求助? 3501390
关于积分的说明 11102563
捐赠科研通 3231634
什么是DOI,文献DOI怎么找? 1786494
邀请新用户注册赠送积分活动 870109
科研通“疑难数据库(出版商)”最低求助积分说明 801813