机器学习
人工智能
标记数据
脆弱性(计算)
通知
编码(集合论)
主动学习(机器学习)
深度学习
半监督学习
计算机科学
水准点(测量)
计算机安全
集合(抽象数据类型)
大地测量学
政治学
法学
程序设计语言
地理
作者
Xiaobing Sun,Liangqiong Tu,Jiale Zhang,Jie Cai,Bin Li,Yu Wang
标识
DOI:10.1016/j.jisa.2023.103423
摘要
With the popularity of blockchain, the amount of smart contracts has increased very fast, and the safety of smart contracts has come to more extensive notice. Recently, machine learning technology has been widely applied in vulnerability detection for smart contracts. However, it implements effective smart contract vulnerability detection still faces a major challenge, that is, there is a problem of insufficient labeled data in the current field. Active learning can label data more efficiently. Nevertheless, classical active learning only uses limited labeled data for model training, contrary to the deep learning of a large amount of data required for model training. Because of the above, we provide a new framework, called ASSBert, that leverages active and semi-supervised bidirectional encoder representation from transformers network, which is dedicated to completing the task of smart contract vulnerability classification with a little amount of labeled code data and a large number of unlabeled code data. In our framework, active learning is responsible for selecting highly uncertain code data from unlabeled sol files and putting them into the training set after manual labeling. Besides, semi-supervised learning is charged to continuously pick a certain number of high-confidence unlabeled code data from unlabeled sol files, and put them into the training dataset behind pseudo-labeling. Intuitively, by combining active learning and semi-supervised learning, we are able to get more valuable data to increase the performance of our detection model. In our experiments, we collect our benchmark dataset included 6 vulnerabilities in about 20829 smart contracts. The result of the experiment demonstrates that our framework is superior to the baseline methods with a little amount of labeled code data and a large number of unlabeled code data.
科研通智能强力驱动
Strongly Powered by AbleSci AI