Using Deep Learning and UAVs to Identify Citrus Trees

January 16, 2018

Trimble eCognition Team

It is not always easy to find academic papers that grab an audiences’ attention right from the start, but this new article from Csillik et al. did just that” Their article, entitled “Identification of Citrus Trees from Unmanned Aerial Vehicle Imagery Using Convolutional Neural Networks” was just published on November 20, 2018 in MDPI’s open access journal drones.

The authors start their abstract with “remote sensing is important to precision agriculture and the spatial resolution provided by Unmanned Aerial Vehicles (UAVs) is revolutionizing precision agriculture workflows…” This, in combination with dropping an ultimate remote sensing buzzword in the article title – convolutional neural networks – is making hungry to learn more. As a developer of remote sensing software, it could not get much better: UAVs, CNNs and precision agriculture!

The authors chose a study area located at the Lindcove Research and Extension Center (LREC) in Tulare County, CA, U.S.A. The senseFly eBee UAV was equipped with a Parrot Sequoia multi-spectral camera to acquire 4-band imagery (green, red, red edge and near-infrared) at a resolution of 12.8 cm.

Figure 1: Study area

The UAV data product was then used as input in the Trimble eCognition Developer software, “one of the most popular software for object-based image analysis and the application of the Convolutional Neural Network (CNN) using this platform gave the opportunity of integrating the CNN approach with object-based post-processing of the results, thus performing the entire analysis in one software”. The authors outline 3 simple steps in their application of CNN, taking about 20 minutes:

    Generation of 4,000 training samples (5 minutes)
    Training the CNN model (13 minutes)
    Applying the CNN model to the validation area (2 minutes)

The authors split the study area into training and validation areas. Subsequently, a CNN was trained with 3 classes (4,000 sample per class) representing trees, bare soil and weeds. The samples were derived from a previously established data set generated manually based on NAIP data (see publications for details). Randomly generated samples were used for areas without trees.

All 4 bands of the UAV data were used to train the CNN model. For additional details on the CNN parameters used in the model, please refer to the paper.

Figure 2: Example of training sets used, for: (a–e) trees, (f–j) bare ground and (k–o) weeds.

The CNN model created a heatmap layer with values between 0-1, the closer to 1, the higher the likelihood of tree detection. A Gaussian filter was then applied to smooth the results and trees were detected by using a local maxima approach. The initial classification did contain some confusion between trees and weeds, particularly at the edge of a parcel. To account for this, a simple NDVI threshold was applied to remove the false positives.

Another problem the authors faced distinguishing between small trees and large trees. To tackle this issue, they turned to a more traditional object-based approach – the advantage of doing the analysis within eCognition is that allows users to combine CNN-based feature extraction with OBIA tools in a single automated environment. The nature of the CNN identified trees with larger canopies several times, “to aggregate the targets representing the same tree, we segmented the heatmap produced by CNN (the probability of tree detection) and the NDVI layer into superpixels”. The superpixel segmentation approach chosen by the authors was Simple Linear Iterative Clustering (SLIC) as SLIC only involves 1 parameter, k, that governs the number of equally sized superpixels to be created. An iterative segmentation approach was taken, applying superpixel sizes larger than the 40×40 pixel sample patch size used to train the CNN. Only objects with a circular shape and low asymmetry values were selected. The centroid of each object was computed and used as the tree location.

To validate their results, the authors used the manually created data set based on NAIP imagery. In total, the automated CNN-OBIA approach detected 3015 trees. Of these, the authors determined that 2852 were truly detected trees, 60 were missed trees and 163 were falsely added trees. Thus, the combination of object-based post-processing and CNN “significantly” improves accuracy as the authors reported “without refinement, the CNN approach achieved an F-score of 78%, Precision was 65% and Recall 98%. After reducing the effect of multiple crown detection, the overall accuracy (F-score) for the final classification was 96.24%, with a Precision of 94.59% and Recall of 97.94%”.

Figure 3: Final tree detection from the test area with white crosses indicating the location of trees: (a) the southern portion of LREC; (b) medium size trees correctly classified; (c) large canopy trees with reduced effect of multiple crown detection, after the object-based refinement; (d) similar sized trees were correctly classified; and (e) heterogeneous tree canopies sizes were correctly classified.

The importance of mapping and monitoring individual trees in agricultural environments is important for improved crop management at a number of levels throughout the plant’s life-cycle. For this to be effective, such monitoring has to be repeated regularly and an automated and standardized approach will play a key role in the quality of results and allow for timely decision making. The increased use of UAVs in the agricultural business will make the widespread use of such methods more important in the future.

I hope you enjoyed this article as much as I did. Finally, I would like to thank Csillik et al. for their contributions.