计算机科学
文字2vec
Java
程序设计语言
编码(集合论)
嵌入
文字嵌入
班级(哲学)
应用程序编程接口
人工智能
集合(抽象数据类型)
作者
Trong Duc Nguyen,Anh Tuan Nguyen,Hung Phan,Tien N. Nguyen
摘要
Word2Vec is a class of neural network models that as being trainedfrom a large corpus of texts, they can produce for each unique word acorresponding vector in a continuous space in which linguisticcontexts of words can be observed. In this work, we study thecharacteristics of Word2Vec vectors, called API2VEC or API embeddings, for the API elements within the API sequences in source code. Ourempirical study shows that the close proximity of the API2VEC vectorsfor API elements reflects the similar usage contexts containing thesurrounding APIs of those API elements. Moreover, API2VEC can captureseveral similar semantic relations between API elements in API usagesvia vector offsets. We demonstrate the usefulness of API2VEC vectorsfor API elements in three applications. First, we build a tool thatmines the pairs of API elements that share the same usage relationsamong them. The other applications are in the code migrationdomain. We develop API2API, a tool to automatically learn the APImappings between Java and C# using a characteristic of the API2VECvectors for API elements in the two languages: semantic relationsamong API elements in their usages are observed in the two vectorspaces for the two languages as similar geometric arrangements amongtheir API2VEC vectors. Our empirical evaluation shows that API2APIrelatively improves 22.6% and 40.1% top-1 and top-5 accuracy over astate-of-the-art mining approach for API mappings. Finally, as anotherapplication in code migration, we are able to migrate equivalent APIusages from Java to C# with up to 90.6% recall and 87.2% precision.
科研通智能强力驱动
Strongly Powered by AbleSci AI