Instance Switching-based Contrastive Learning for Fine-grained Airplane Detection

Lanxin Zeng, Haowen Guo, Wen Yang, Huai Yu, Lei Yu, Peng Zhang, Tongyuan Zou
EIS SPL, Wuhan University, Wuhan, China
503 Institute, China Academy of Space Technology, Beijing, China

[Paper] [Data and code]

Abstract


Detecting airplanes from high-resolution remote sensing images has a variety of applications. The characteristics of clear details, rich spatial, and texture information of objects in high-resolution remote sensing images make it possible to identify different types of airplanes from backgrounds. However, airplanes usually exhibit slight interclass discrepancy and unbalanced class distribution, which pose significant challenges to the fine-grained detection of airplanes. In this article, we propose the ISCL, an instance switching-based contrastive learning method for fine-grained airplane detection. Specifically, we introduce a contrastive learning-based module (CLM) to widen the interclass distance while narrowing the intraclass distance by optimizing feature space distribution with the InfoNCE+ loss, which is built on a serial head in a cascaded way. Then, we design a refined instance switching (ReIS) module to alleviate the class imbalance problem. To take full advantage of the CLM and ReIS, we further introduce an optimization strategy, which is an organic combination of the two modules to widen the distances of different airplane categories that are easily confused. In addition, we contribute a fine-grained attribute-assisted dataset, dubbed GF-RarePlanes dataset (GRD), to help the detectors better learn the subtle differences between the airplanes. Extensive experiments on two datasets (i.e., GF and FAIR1M) demonstrate that our proposed method can significantly improve the accuracy of fine-grained airplane detection under both horizontal bounding box (HBB) and oriented bounding box (OBB) scenarios. Dataset and codes will be available at https://lanxin1011.github.io/ISCL/.

Introduction


detection has attracted attention from a wide range of communities in recent years. As one of the branches of object detection, airplane detection is of great importance to flight monitoring at the airport with high resolution remote sensing images. The characteristics of rich spatial information and clear texture details allow the object detectors to distinguish airplanes from backgrounds in traditional detection tasks. However, as shown in Fig.1-(a), CNN-based detectors still find it hard to accurately classify airplanes in the task of fine-grained airplane detection due to the following issues:

Unbalanced Class Distribution. Airplanes of different classes exist at different frequencies in the real world, and thus airplane datasets (i.e., GF and FAIR1M) usually exhibit unbalanced inter-class distribution as shown in Fig.1-(b). A study found that ConvNet would significantly overfit the minor class who has insufficient training instances, resulting in poor detection accuracy. Hence, designing an appropriate augmenting method to improve minor class instance diversity and construct a balanced dataset is non-trivial. Indistinct Inter-class Discrepancy. Similar to the challenge in fine-grained classification, airplanes of different classes exhibit slight inter-class discrepancy as shown in Fig.1-(c). Meanwhile, due to the different coating of the same type of airplanes, the geometric distortion of remote sensing images and other factors, airplanes of the same class appear to have large intra-class difference as well. Thus, CNN-based detectors cannot learn distinguishable features to correctly classify airplanes, which leads to performance degradation on fine-grained detection tasks.




Figure 1. (a) Detection results of Faster R-CNN. (b) Sample Distribution of the training set in GF and FAIR1M Datasets. (c) 10 types of airplanes in GF dataset.

The above two problems interrelate and influence each other. Slight inter-class discrepancy poses significant challenges to the feature learning process of detectors. The unbalanced class distribution makes it harder for the training process to provide sufficient data for feature learning of novel classes, thus severely degrading the detection performance. Hence, the two problems cannot be treated independently but should be seen as a whole.
To address the aforementioned problems, we propose a novel Instance Switching-based Contrastive Learning (ISCL) method including a Serial Head (SH), a Contrastive Learning-based Module (CLM), refined Instance Switching (ReIS), and fine-grained attributes assisted GF-RarePlanes Dataset (GRD) to progressively and selectively switch instances that are easily misclassified and optimize the feature space distribution accordingly. Our contributions are four-folds:

  • We propose CLM to mitigate the problem of slight inter-class discrepancy by optimizing the feature space distribution with InfoNCE+ loss, which is built on a serial head with a cascaded form.
  • We introduce the ReIS module to alleviate the class imbalance problem by augmenting novel instances. Besides, a cross-shaped Gaussian kernel that fits the shape of airplanes is designed to alleviate background inconsistency while switching.
  • We design an optimization strategy based on CLM and ReIS, dubbed ISCL, to combine the abilities of expanding inter-class distance and augmenting instances, which achieves an considerable lift on the performance of fine-grained airplane detection.
  • We contribute a fine-grained attribute-assisted dataset GRD in facilitating the detector with the ability to learn subtle differences among classes. By pretraining on GRD, the detection accuracy of classes holding slight discrepancy with others has greatly improved.

Experimental Results


A Comparison of Different Methods on GF and FAIR1M datasets


Figure 2. Visualization of the HBB detection results on both the GF dataset (the first row) and the FAIR1M dataset (the second row). Each column from left to right separately demonstrates the detection performance of FasterR-CNN, Cascade R-CNN, DetectoRS, and the Faster R-CNN added with our ISCL method. The green, yellow, and red boxes respectively indicate true positive (TP), false positive (FP), and false negative (FN) predictions.


Figure 3. Visualization of the OBB detection results on \textbf{GF} dataset, using baseline detectors (the first row) and the baseline detectors added with our proposed ISCL (the second row). The green, yellow, and red boxes respectively indicate true positive (TP), false positive (FP), and false negative (FN) predictions. The superscript * indicates simply adding ReIS module to S2A-Net due to the non-transferability of ISCL to one-stage detectors.


Table 1. Detection results on GF dataset.

Acknowledgements


We would like to thank the anonymous reviewers for their valuable comments and contributions. The numerical calculations in this article have been done on the supercomputing system in the Supercomputing Center, Wuhan University.

References


  1. Accurate Bridge Detection in Aerial Images With an Auxiliary Waterbody Extraction Task [paper]
    H. Guo, R. Zhang, Y. Wang, W. Yang, H.-C. Li, G.-S. Xia
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS), 2021.
  2. Redet: A rotation-equivariant detector for aerial object detection [paper]
    J. Han, J. Ding, N. Xue, G.-S. Xia
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  3. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark [paper]
    C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, G.-S. Xia
    International Society for Photogrammetry and Remote Sensing (ISPRS), 2022.
  4. MMDetection: Open MMLab Detection Toolbox and Benchmark [paper]
    K. Chen, J. Wang, J. Pang, et al.
    arxiv, 2019.
  5. MMRotate: A Rotated Object Detection Benchmark using PyTorch [paper]
    Y. Zhou, X. Yang, G. Zhang, J. Wang, Y. Liu, et al.
    arxiv, 2022.