Object Detection for Unmanned Aerial Vehicle Camera via Convolutional Neural Networks

The object tracking alongside the image segmentation have recently become a particular significance in satellite and aerial imagery. The latest achievements in this field are closely related to the application of the deep-learning algorithms and, particularly, convolutional neural networks (CNNs). Supplemented by the sufficient amount of the training data, CNNs provide the advantageous performance in comparison to the classical methods based on Viola-Jones or support vector machines. However, the application of CNNs for the object detection on the aerial images faces several general issues that cause classification error. The first one is related to the limited camera shooting angle and spatial resolution. The second one arises from the restricted dataset for specific classes of objects that rarely appear in the captured data. This article represents a comparative study on the effectiveness of different deep neural networks for detection of the objects with similar patterns on the images within a limited amount of the pretrained datasets. It has been revealed that YOLO ver. 3 network enables better accuracy and faster analysis than region convolution neural network (R-CNN), Fast R-CNN, Faster R-CNN, and SSD architectures. This has been demonstrated on the example of “Stanford dataset,” “DOTA v-1.5,” and “xView 2018 Detection” datasets. The following metrics on the accuracy have been obtained for the YOLO ver. 3 network: 89.12 mAP (Stanford dataset), 80.20 mAP (DOTA v-1.5), and 78.29 (xView 2018) for testing; and 85.51 mAP (Stanford dataset), 79.28 (DOTA v-1.5), and 79.92 (xView 2018) on validation with the analysis speed of 26.82 frames/s.

For more about this article see link below. 

https://ieeexplore.ieee.org/document/9273062