计算机科学
素描
人工智能
发电机(电路理论)
图像(数学)
特征(语言学)
计算机视觉
特征提取
分割
图像编辑
跳跃式监视
图像分割
模式识别(心理学)
功率(物理)
算法
语言学
物理
哲学
量子力学
作者
Tianyu Zhang,Haoran Xie
标识
DOI:10.1109/cgip62525.2024.00035
摘要
Recent text-to-image generation models can produce high-quality images from textual prompts. However, it is difficult to correctly interpret instructions specifying the complex images with multiple objects using only texts. To solve this issue, we propose a sketch-guided spatial control for text-to-image diffusion models. In the feature extraction stage of the proposed framework, sketch inputs are segmented into individual objects using the image segmentation approach. The obtained bounding boxes and labels are used as spatial-guided inputs into the attention layers of the diffusion model. For the image generation stage, the proposed model utilizes a pretrained text-to-image diffusion model as the image generator. We assess the proposed method through both quantitative and qualitative evaluations, demonstrating its versatility in spatial control based on user sketches.
科研通智能强力驱动
Strongly Powered by AbleSci AI