A machine learning-based approach for vital node identification in complex networks

计算机科学节点（物理）鉴定（生物学）适应性机器学习支持向量机人工智能图形核复杂网络病毒式营销数据挖掘核方法多项式核社会化媒体万维网工程类生物结构工程植物生态学

作者

Ahmad Asgharian Rezaei,Justin Munoz,Mahdi Jalili,Hamid Khayyam

出处

期刊：Expert Systems With Applications [Elsevier BV]
日期：2022-10-20 卷期号：214: 119086-119086 被引量：72

标识

DOI：10.1016/j.eswa.2022.119086

摘要

Vital node identification is the problem of finding nodes of highest importance in complex networks. This problem has crucial applications in various contexts such as viral marketing or controlling the propagation of virus or rumours in real-world networks. Existing approaches for vital node identification mainly focus on capturing the importance of a node through a mathematical expression which directly relates structural properties of the node to its vitality. Although these heuristic approaches have achieved good performance in practice, they have weak adaptability, and their performance is limited to specific settings and certain dynamics. Inspired by the power of machine learning models for efficiently capturing different types of patterns and relations, we propose a machine learning-based, data driven approach for vital node identification. The main idea is to train the model with a small portion of the graph, say 0.5% of the nodes, and do the prediction on the rest of the nodes. The ground-truth vitality for the train data is computed by simulating the SIR diffusion method starting from the train nodes. We use collective feature engineering where each node in the network is represented by incorporating elements of its connectivity, degree and extended coreness. Several machine learning models are trained on the node representations, but the best results are achieved by a Support Vector Regression machine with RBF kernel. The empirical results confirms that the proposed model outperforms state-of-the-art models on a selection of datasets, while it also shows more adaptability to changes in the dynamics parameters. With respect to correlation of ranking of the nodes with the ground-truth ranking, the proposed model outperforms other models with a margin as high as 4.63%, while it maintains the lowest variation in performance, with a performance difference as low as 5% across different influence probabilities. The proposed model also obtains the highest uniqueness of ranking, achieving almost unique ranking with a monotonicity relation score of more than 0.9997 on four datasets.

求助该文献

最长约 10秒，即可获得该文献文件

A machine learning-based approach for vital node identification in complex networks

今日热心研友