摘要
随着计算机视觉和人工智能技术的快速发展,目标检测受到了更加广泛的关注。由于小目标像素占比小、语义信息少、易受复杂场景干扰以及易聚集遮挡等问题,导致小目标检测一直是目标检测领域中的一大难点。目前,视觉的小目标检测在生活的各个领域中日益重要。为了进一步促进小目标检测的发展,提高小目标检测的精度与速度,优化其算法模型,本文针对小目标检测中存在的问题,梳理了国内外研究现状及成果。首先,分别从小目标可视化特征、目标分布情况和检测环境等角度对小目标检测的难点进行了分析,同时从数据增强、超分辨率、多尺度特征融合、上下文语义信息、锚框机制、注意力机制以及特定的检测场景等方面系统总结了小目标检测方法,并整理了在框架结构、损失函数、预测和匹配机制等方面发展的较为成熟的单阶段小目标检测方法。其次,本文对小目标检测的评价指标以及可用于小目标检测的各类数据集进行了详细介绍,并针对部分经典的小目标检测方法在 MSCOCO(Microsoft common objects in context)、VisDrone2021(vision meets drones2021)和 Tsinghua-Tencent100K 等数据集上的检测结果及其可视化检测效果进行了对比与分析。最后,对未来小目标检测面临的挑战,包括如何解决小目标定位困难、网络模型下采样对小目标的影响、交并比阈值的设置对小目标不合理等问题和其对应的研究方向进行了分析与展望。;In recent years, object detection has attracted increasing attention because of the rapid development of computer vision and artificial intelligence technology.Early traditional object detection methods, such as histogram of oriented gradient(HOG)and deformable parts model(DPM)usually adopt three steps:region selection, manual feature extraction, and classification regression.However, manual feature extraction has great limitations for small object detection.The object detection algorithm based on the convolutional neural network can be divided into two-stage and one-stage detection algorithms.Two-stage detection algorithms, such as faster region with convolutional neural network(Faster RCNN)and cascade region with convolutional neural network(Cascade RCNN), select candidate regions through the region proposal network.Then, they classify and regress these regions to obtain the detection results.However, the problem of low accuracy still exists in small object detection.One-stage detection algorithms, such as single shot MultiBox detector(SSD)and you only look once(YOLO), can directly locate the object and output the category detection information of the object, thereby improving the speed of object detection to a certain extent.However, small object detection has always been a huge challenge in the field of object detection because of the small proportion of small object pixels, little semantic information, and small objects that are easily disturbed by complex scenes.In particular, the challenges in object detection are as follows:First, the characteristics of small objects are few.Given the small scale of small objects and the small coverage area in data images, extracting favorable semantic feature information in network training is difficult.Second, small object detection is susceptible to interference.Most of the small objects have low resolution, blurred images, and little visual information.Thus, they are easily disturbed during difficult feature extraction.Thus, the detection model cannot easily locate and identify small objects accurately.Moreover, many false detections and missed detections exist.Third, a shortage of small object datasets exists.At present, most of the mainstream object datasets, such as PASCAL VOC and MS-COCO, are aimed at normal-scale objects.In particular, the proportion of small-scale objects is insufficient, and the distribution is uneven.However, some datasets mentioned in this study that can be used for small object detection are all aimed at specific scenes or tasks.These datasets include DOTA remote sensing object detection dataset, face detection dataset and benchmark, which are not universal for small object detection.Fourth, small objects are easy to gather and block.A serious occlusion problem occurs when small objects gather.After many downsampling and pooling operations, quite a lot of feature information is lost, resulting in some detection difficulties.At present, visual small object detection is increasingly important in all fields of life.Aiming at the problems in small object detection, this study combs the research status and achievements of small object detection at home and abroad to promote the development of small object detection further, improve the speed and accuracy of small object detection, and optimize its algorithm model.The methods of small object detection are analyzed and summarized from the aspects of data enhancement, super resolution, multiscale feature fusion, contextual semantic information, anchor frame mechanism, attention, and specific detection scenarios.Data enhancement is the method proposed for solving the problems of a few general small object datasets, a small number of small objects in public datasets, and uneven distribution of small objects in images.The earliest data enhancement strategy is to increase the number of object training and improve the performance of object detection by deforming, rotating, scaling, cutting, and translating object instances.Then, other effective data augmentation methods emerged, which included oversampling the images containing small objects in the experiment, scaling and rotating the small objects, and copying the objects to any new position in order to augment the data.Data enhancement helps improve the robustness of a model to a certain extent.Moreover, it solves the problems of unobvious visual features of small objects and less object information.It also achieves good results in the final detection performance.However, the improper design of data enhancement strategy in practical applications may lead to new noise, impairing the performance of feature extraction.This scenario also brings some challenges to the design of the algorithm.The small object detection method based on multiscale fusion needs to make full use of the detailed information in the image because the characteristic information of small-scale objects is little.In the existing convolutional neural network(CNN)model of general object detection, multiscale detection can help the model to obtain accurate positioning information and discriminating feature information by using a low-level feature layer.This scenario is conducive to the detection and recognition of small-scale objects.First, a feature pyramid network(FPN)with strong semantic features at all scales is introduced.Then, an fpn-based path aggregation network(PANet), which not only achieved good results in case segmentation but also improved the detection of small objects.In feature fusion, the residual feature enhancement method extracts the context information with a constant ratio to reduce the information loss of the highest pyramid feature map.At present, many methods are based on multiscale feature fusion, which uses the low-level highresolution and high-level strong feature semantic information of the network to improve the accuracy of small objects.In small object detection, the target's feature expression ability is weak.Thus, the network structure must be deepened to learn considerable feature information.Introducing an attention mechanism can often make the network model pay considerable attention to the channels and areas related to the task.In the object detection network, the shallow feature map lacks the contextual semantic information of small objects.By incorporating attention mechanisms into the SSD model, irrelevant information in feature fusion is suppressed, leading to an improvement in the detection accuracy of small objects.In general, the attention mechanism can reasonably allocate the used resources, quickly find the region of interest, and ignore disturbing information.However, the improper design in use increases the cost of network calculation and affects the extraction of object features by the model.Finally, the future research direction of small object detection is prospected.Visual small object detection is becoming increasingly important in all fields of life, and it will develop in other directions in the future.