Quantification of fish feeding behavior from an image is crucial for achieving smart feeding in industrial aquaculture. Because fish images provide a wealth of spatial information about their behavior, which can be used to determine the fish feeding intensity. However, most studies only use a single spatial feature to quantify fish feeding behavior. For the extraction of multiple spatial feature indicators, a computational approach is lacking due to image challenges caused by occlusion, overlapping, and clustering during the feeding stage. In this paper, a novel emerging BlendMask-VoNetV2 method is developed to segment two-class fish and distinguish different instance individuals for extracting multiple spatial features. Serial indicators are proposed for analyzing spatial feature variations from the time-series-based videos, such as the number of fish, the number of pixels, and the distance between individual fish. Additionally, we present the first fish dataset with fish occlusion and aggregation for feeding image segmentation in industrial aquaculture. It contains 1038 images consisting of 67,519 instance individuals with pixel annotations for two semantic categories: fish1 (non-occlusion and non-aggregation), and fish2 (occlusion or aggregation). Extensive experiments demonstrate that BlendMask-VoVNetV2 achieves competitive segmentation performance with an accuracy of 83.7% on the feeding dataset, outperforming other instance segmentation algorithms such as SOLOV2, SOTR, ConInst, Mask RCNN.et.al. A distinctive advantage of our idea proposed is beneficial to deal with the problem of inaccurate segmentation caused by severe occlusion and overlapping fish. Finally, the BlendMask-VoVNetV2 method is verified on four videos with non-feeding, strong feeding, medium feeding, and weak feeding. The results show that the method we proposed is effective, which can accurately and objectively depict each moment of the entire feeding process using multiple spatial feature indicators.