提交
计算机科学
水准点(测量)
任务(项目管理)
软件
源代码
人工智能
机器学习
软件工程
软件开发
编码器
程序设计语言
数据库
操作系统
大地测量学
地理
管理
经济
作者
Shangqing Liu,Yanzhou Li,Yang Liu
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:1
标识
DOI:10.48550/arxiv.2208.08100
摘要
GitHub commits, which record the code changes with natural language messages for description, play a critical role for software developers to comprehend the software evolution. To promote the development of the open-source software community, we collect a commit benchmark including over 7.99 million commits across 7 programming languages. Based on this benchmark, we present CommitBART, a large pre-trained encoder-decoder Transformer model for GitHub commits. The model is pre-trained by three categories (i.e., denoising objectives, cross-modal generation and contrastive learning) for six pre-training tasks to learn commit fragment representations. Furthermore, we unify a ``commit intelligence'' framework with one understanding task and three generation tasks for commits. The comprehensive experiments on these tasks demonstrate that CommitBARTsignificantly outperforms previous pre-trained works for code. Further analysis also reveals each pre-training task enhances the model performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI