概化理论
人工智能
计算机科学
机器学习
数据科学
心理学
发展心理学
作者
Yasha Ektefaie,Andrew Shen,Daria Bykova,Maximillian G. Marin,Marinka Žitnik,Maha Farhat
标识
DOI:10.1101/2024.02.25.581982
摘要
Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap,
科研通智能强力驱动
Strongly Powered by AbleSci AI