Python(编程语言)
文档
计算机科学
脚本语言
程序设计语言
源代码
代码生成
机器翻译
自然语言处理
编码(集合论)
软件文档
人工智能
软件
软件开发
操作系统
钥匙(锁)
软件建设
集合(抽象数据类型)
作者
Antonio Valerio Miceli Barone,Rico Sennrich
摘要
Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (“docstrings”) generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas.
科研通智能强力驱动
Strongly Powered by AbleSci AI