已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model

计算机科学 程序设计语言 编码(集合论) 自然语言处理 语言模型 代码生成 人工智能 操作系统 集合(抽象数据类型) 钥匙(锁)
作者
Peng Di,Jianguo Li,Hang Yu,Wei Kang Jiang,Wenju Cai,Yang Cao,Chao‐Yu Chen,Dajun Chen,Hongwei Chen,L. Chen,Gang Fan,Jie Kai Gong,Zi Gong,Wen Hu,Tianshi Guo,Zhichao Lei,Ting Li,Zheng Li,Ming Liang,Cong Liao,Bingchang Liu,Jiachen Liu,Zhiwei Liu,Shun Lu,Min Shen,Guangpei Wang,H.X. Wang,Z. Wang,Zhaogui Xu,Jiawei Yang,Qing Ye,Gehao Zhang,Yu Zhang,Zelin Zhao,Xiaolan Zheng,Hailian Zhou,L.D. Zhu,X Zhu
标识
DOI:10.1145/3639477.3639719
摘要

Code Large Language Models (Code LLMs) have gained significant attention in the industry due to their wide applications in the full lifecycle of software engineering. However, the effectiveness of existing models in understanding non-English inputs for multi-lingual code-related tasks is still far from well studied. This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM 2. It is specifically designed for code-related tasks with both English and Chinese prompts and supports over 40 programming languages. CodeFuse achieves its effectiveness by utilizing a high-quality pre-training dataset that is carefully filtered by program analyzers and optimized during the training process. Extensive experiments are conducted using real-world usage scenarios, the industry-standard benchmark HumanEval-x, and the specially designed CodefuseEval for Chinese prompts. To assess the effectiveness of CodeFuse, we actively collected valuable human feedback from the AntGroup's software development process where CodeFuse has been successfully deployed. The results demonstrate that CodeFuse-13B achieves a HumanEval pass@1 score of 37.10%, positioning it as one of the top multi-lingual code LLMs with similar parameter sizes. In practical scenarios, such as code generation, code translation, code comments, and testcase generation, CodeFuse performs better than other models when confronted with Chinese prompts.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
高发完成签到 ,获得积分10
刚刚
快乐排骨汤完成签到 ,获得积分10
5秒前
幸福老六发布了新的文献求助30
7秒前
12秒前
Simpson完成签到 ,获得积分10
13秒前
尧尧完成签到,获得积分10
15秒前
1no完成签到 ,获得积分10
17秒前
捏个小雪团完成签到 ,获得积分10
18秒前
18秒前
Lqian_Yu发布了新的文献求助10
19秒前
19秒前
沁沁沁完成签到 ,获得积分10
20秒前
21秒前
吖123发布了新的文献求助10
24秒前
肉丸完成签到 ,获得积分10
24秒前
翻译度发布了新的文献求助10
25秒前
bobo发布了新的文献求助10
25秒前
橘橘橘子皮完成签到 ,获得积分10
25秒前
果汁完成签到 ,获得积分10
26秒前
研友_Z6Qrbn完成签到,获得积分10
26秒前
烂漫的蜡烛完成签到 ,获得积分10
31秒前
32秒前
安详初蓝完成签到 ,获得积分10
34秒前
大个应助吖123采纳,获得10
34秒前
36秒前
852应助乐观的醉薇采纳,获得10
36秒前
liujingyi发布了新的文献求助10
37秒前
乔苏惠娜发布了新的文献求助10
41秒前
44秒前
36038138完成签到 ,获得积分10
45秒前
满意的迎南完成签到 ,获得积分10
46秒前
小熊饼干完成签到,获得积分10
46秒前
46秒前
大方的笑萍完成签到 ,获得积分10
47秒前
Hello应助CKX采纳,获得10
47秒前
49秒前
illion1发布了新的文献求助10
50秒前
与共完成签到 ,获得积分10
52秒前
xuan发布了新的文献求助10
52秒前
CKX完成签到,获得积分10
53秒前
高分求助中
Licensing Deals in Pharmaceuticals 2019-2024 3000
Cognitive Paradigms in Knowledge Organisation 2000
Effect of reactor temperature on FCC yield 2000
Very-high-order BVD Schemes Using β-variable THINC Method 1020
Near Infrared Spectra of Origin-defined and Real-world Textiles (NIR-SORT): A spectroscopic and materials characterization dataset for known provenance and post-consumer fabrics 610
Promoting women's entrepreneurship in developing countries: the case of the world's largest women-owned community-based enterprise 500
Shining Light on the Dark Side of Personality 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3307193
求助须知:如何正确求助?哪些是违规求助? 2940961
关于积分的说明 8499766
捐赠科研通 2615195
什么是DOI,文献DOI怎么找? 1428732
科研通“疑难数据库(出版商)”最低求助积分说明 663525
邀请新用户注册赠送积分活动 648382