图像拼接
计算机科学
节点(物理)
成对比较
匹配(统计)
个性化
相似性(几何)
情报检索
理论计算机科学
数据挖掘
人工智能
万维网
数学
结构工程
统计
图像(数学)
工程类
作者
Di Jin,Mark Heimann,Ryan A. Rossi,Danai Koutra
标识
DOI:10.1007/978-3-030-46150-8_29
摘要
Identity stitching, the task of identifying and matching various online references (e.g., sessions over different devices and timespans) to the same user in real-world web services, is crucial for personalization and recommendations. However, traditional user stitching approaches, such as grouping or blocking, require pairwise comparisons between a massive number of user activities, thus posing both computational and storage challenges. Recent works, which are often application-specific, heuristically seek to reduce the amount of comparisons, but they suffer from low precision and recall. To solve the problem in an application-independent way, we take a heterogeneous network-based approach in which users (nodes) interact with content (e.g., sessions, websites), and may have attributes (e.g., location). We propose node2bits, an efficient framework that represents multi-dimensional features of node contexts with binary hashcodes. node2bits leverages feature-based temporal walks to encapsulate short- and long-term interactions between nodes in heterogeneous web networks, and adopts SimHash to obtain compact, binary representations and avoid the quadratic complexity for similarity search. Extensive experiments on large-scale real networks show that node2bits outperforms traditional techniques and existing works that generate real-valued embeddings by up to $$5.16\%$$ in F1 score on user stitching, while taking only up to $$1.56\%$$ as much storage.
科研通智能强力驱动
Strongly Powered by AbleSci AI