多任务学习
计算机科学
人工智能
变压器
卷积神经网络
机器学习
水准点(测量)
特征学习
深度学习
任务(项目管理)
量子力学
大地测量学
物理
经济
电压
管理
地理
作者
Yingjie Tian,Kunlong Bai
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2023-01-09
卷期号:35 (7): 9579-9590
被引量:3
标识
DOI:10.1109/tnnls.2023.3234166
摘要
Multitask learning (MTL) is a challenging puzzle, particularly in the realm of computer vision (CV). Setting up vanilla deep MTL requires either hard or soft parameter sharing schemes that employ greedy search to find the optimal network designs. Despite its widespread application, the performance of MTL models is vulnerable to under-constrained parameters. In this article, we draw on the recent success of vision transformer (ViT) to propose a multitask representation learning method called multitask ViT (MTViT), which proposes a multiple branch transformer to sequentially process the image patches (i.e., tokens in transformer) that are associated with various tasks. Through the proposed cross-task attention (CA) module, a task token from each task branch is regarded as a query for exchanging information with other task branches. In contrast to prior models, our proposed method extracts intrinsic features using the built-in self-attention mechanism of the ViT and requires just linear time on memory and computation complexity, rather than quadratic time. Comprehensive experiments are carried out on two benchmark datasets, including NYU-Depth V2 (NYUDv2) and CityScapes, after which it is found that our proposed MTViT outperforms or is on par with existing convolutional neural network (CNN)-based MTL methods. In addition, we apply our method to a synthetic dataset in which task relatedness is controlled. Surprisingly, experimental results reveal that the MTViT exhibits excellent performance when tasks are less related.
科研通智能强力驱动
Strongly Powered by AbleSci AI