计算机科学
变压器
编码器
全球定位系统
情态动词
人工智能
数据挖掘
传感器融合
方案(数学)
计算机视觉
电信
数学分析
化学
物理
数学
量子力学
电压
高分子化学
操作系统
作者
Zheng Chen,Junhua Fang,Pingfu Chao,Pengpeng Zhao,Jiajie Xu,Lei Zhao
标识
DOI:10.1016/j.knosys.2023.110890
摘要
Map quality is of great importance to location-based-services(LBS) applications such as navigation and route planning. Typically, a map can be extracted from either vehicle GPS trajectories or aerial images. Unfortunately, the quality of the extracted maps is usually unsatisfactory due to the inherent quality issues in the two data sources. Compared with extracting maps from a single data source, cross-modal map extraction methods consider both data sources and often achieve better results. However, almost all existing cross-modal methods are based on CNN, which fail to sufficiently model global information. To overcome the above problem, we propose MoviNet, a novel cross-modal map extraction method that combines ViT (vision transformer) and CNN. Specifically, instead of partially integrating global information in the fusion scheme as in previous works, MoviNet introduces a lightweight ViT model MobileViT as the encoder to enhance the model’s ability to capture global information. Meanwhile, we introduce a new lightweight but effective fusion scheme that generates modal-unified fusion features from the features of the two modalities, to enhance the information representation ability of the respective modalities. Extensive experiments conducted on the Beijing and Porto datasets show the superior performance of our proposed method over all baselines. https://github.com/Chan6688/MoviNet
科研通智能强力驱动
Strongly Powered by AbleSci AI