计算机科学
人工智能
机器学习
标记数据
训练集
过程(计算)
半监督学习
监督学习
深度学习
集合(抽象数据类型)
数据建模
数据集
培训(气象学)
人工神经网络
数据库
气象学
物理
程序设计语言
操作系统
作者
Awanish Kumar,Soumyadeep Ghosh,Janu Verma
标识
DOI:10.1145/3533271.3561783
摘要
Semi supervised learning has attracted attention of AI researchers in the recent past, especially after the advent of deep learning methods and their success in several real world applications. Most deep learning models require large amounts of labelled data, which is expensive to obtain. Fraud detection is a very important problem for several industries and large amount of data is often available. However, obtaining labelled data is cumbersome and hence semi-supervised learning is perfectly positioned to aid us in building robust and accurate supervised models. In this work, we consider different kinds of fraud detection paradigms and show that a self-training based semi-supervised learning approach can produce significant improvements over a model that has been training on a limited set of labelled data. We propose a novel self-training approach by using a guided sharpening technique using a pair of autoencoders which provide useful cues for incorporating unlabelled data in the training process. We conduct thorough experiments on three different real world databases and analysis to showcase the effectiveness of the approach. On the elliptic bitcoin fraud dataset, we show that utilizing unlabelled data improves the F1 score of the model trained on limited labelled data by around 10%.
科研通智能强力驱动
Strongly Powered by AbleSci AI