变压器
计算机科学
语言模型
卷积神经网络
序列标记
人工智能
序列(生物学)
机器学习
任务(项目管理)
电压
工程类
生物
系统工程
电气工程
遗传学
作者
Kevin Yang,Nicoló Fusi,Alex X. Lu
标识
DOI:10.1101/2022.05.19.492714
摘要
Abstract Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks, and are now routinely integrated into bioinformatics tools. However, these models largely rely on the Transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated if convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive to and occasionally superior to Transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art Transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance simply by using a CNN architecture instead of a Transformer, and emphasizes the importance of disentangling pretraining task and model architecture.
科研通智能强力驱动
Strongly Powered by AbleSci AI