计算机科学
编码器
移动设备
软件部署
架空(工程)
计算
推论
延迟(音频)
计算机工程
市场细分
图像(数学)
分布式计算
嵌入式系统
人工智能
算法
操作系统
电信
营销
业务
作者
Ao Wang,Hui Chen,Zijia Lin,Jungong Han,Guiguang Ding
出处
期刊:Cornell University - arXiv
日期:2023-01-01
标识
DOI:10.48550/arxiv.2312.05760
摘要
Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks recently. However, its heavy computation costs remain daunting for practical applications. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation, which results in a significant reduction in computational requirements. However, its deployment on resource-constrained mobile devices still encounters challenges due to the substantial memory and computational overhead caused by self-attention mechanisms. Recently, RepViT achieves the state-of-the-art performance and latency trade-off on mobile devices by incorporating efficient architectural designs of ViTs into CNNs. Here, to achieve real-time segmenting anything on mobile devices, following MobileSAM, we replace the heavyweight image encoder in SAM with RepViT model, ending up with the RepViT-SAM model. Extensive experiments show that RepViT-SAM can enjoy significantly better zero-shot transfer capability than MobileSAM, along with nearly $10\times$ faster inference speed. The code and models are available at \url{https://github.com/THU-MIG/RepViT}.
科研通智能强力驱动
Strongly Powered by AbleSci AI