Domain Shift Evaluation and Alleviation Methods in OpenDR powered by Deep Learning

13 January 2023

Deep Learning brought significant improvements both in terms of effectiveness and efficiency in various tasks related to robotic vision, including object detection and tracking among others. Object detection combines the tasks of classification and localization, aiming to find what objects are pictured in an image, as well as where in the image they are located, making it an extremely useful task in robotics applications. The most widely used object detection benchmarks include popular classes of objects, such as humans, cars, animals, etc., with PASCAL VOC and COCO depicting objects spanning 20 and 80 categories respectively. 

Despite improvements in these detectors on well known object detection benchmarks, deploying them on new applications highlights the domain adaptation problem. Domain adaptation refers to the process of alleviating the domain shift between a source and target domain, i.e., how to effectively deploy a detector trained on a source domain onto a target domain. Towards this purpose, it’s important to first evaluate the domain shift effect when deploying a detector on a new dataset, and to then use some form of knowledge transfer to effectively train a detector on the new dataset. 

Recently, AUTH in collaboration with AGI, have introduced the OpenDR Humans in Fields dataset, collected in the context of agricultural use cases using the Robotti robotic platform, designed for detection of humans in fields. Samples from the dataset can be seen in Figure 1, and the severity of the domain shift effect can be seen qualitatively in Figure 2, where a pretrained detector was deployed on the collected images. Specifically, after thorough evaluation of multiple pretrained detectors, the SSD model pretrained on the COCO dataset was chosen as the baseline, giving the best speed vs accuracy trade-off.

(a) Positive sample depicting two humans, collected from the   front camera of the Robotti.

(b) Positive samples depicting one human, collected from the back camera of the Robotti.

Figure 1: Samples from the collected dataset.

(a) Detections on image depicting two humans, collected from the front camera of the Robotti.

(b) Detections on image depicting one human, collected from the back camera of the Robotti.

Figure 2: Examples of detections on the collected dataset using an SSD model pretrained on the COCO dataset.

To investigate and fully understand the effect of the domain shift phenomenon, AUTH proposed splitting the evaluation into two subsets: first, evaluation on only positive samples, i.e., those that depict humans, and, then, evaluation on the entire dataset, i.e., both positive and negative (no humans) samples. The drop in precision from the first to the second subset is exclusively related to the existence of false positive detections, giving an insightful measure of the extent of the domain shift problem.

The goal is then to develop a training curriculum such that the drop in precision is minimized between the two sets. Keeping in mind that bounding box annotation is costly, and the developed methods should generalize well to other applications, two types of strategies were evaluated, 1) finetuning with the target dataset only and 2) finetuning using both the source and target dataset, on three subsets of the target dataset: To properly evaluate the effect of using the source domain during finetuning, the recall measure is also reported. The best strategy improved precision and recall at 0.5 IoU by 23.8% and 14.7% respectively, whereas the least effective method, i.e. finetuning with the source dataset and the target negative subset only, improved precision by 9.3% while sacrificing 1.9% in recall. Figure 3 shows detections made with the best proposed model on the same samples as Figure 2, where the false positive detections are eliminated completely.

The dataset is publicly available for download.

More information available here: https://github.com/opendr-eu/datasets/tree/main/examples/agi_humans 

(a) Detections on image depicting two humans, collected from the front camera of the Robotti.

(b) Detections on image depicting one human, collected from the back camera of the Robotti.

Figure 3: Examples of detections on the collected dataset using an SSD model finetuned on the COCO dataset and the entirety of the collected dataset.

Authored by: Paraskevi Nousi and AUTH team,

Aristotle University of Thessaloniki, Greece