计算机科学
人工智能
过度拟合
地标
生成对抗网络
生成语法
模式识别(心理学)
变压器
机器学习
上下文图像分类
编码(集合论)
任务(项目管理)
图像(数学)
深度学习
计算机视觉
集合(抽象数据类型)
人工神经网络
电压
管理
程序设计语言
经济
物理
量子力学
作者
Mahdi Darvish,Mahsa Pouramini,Hamid Bahador
标识
DOI:10.1109/mvip53647.2022.9738759
摘要
Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler
科研通智能强力驱动
Strongly Powered by AbleSci AI