Datasets

Engine Assembly Dataset

Engine Assembly Dataset including 8 classes of object and targets. The dataset was generated by capturing 195 real images and annotating them with the correct labels (bounding boxes, keypoints and polygons). Further augmentation of the images completes the dataset to around 280,000 images. All tools and scripts to generate and replicate the dataset are provided, as well as the trained model and visualization scripts.  For more information you can read here: https://zenodo.org/record/7669593#.ZBwLW3ZBzIU 

ΑUTH-AGI OpenDR Humans in Fields Dataset

The OpenDR Humans in Fields dataset is a 2D Object Detection dataset, specifically designed for person detection in agricultural fields. A Robotti robot was deployed by AGI to collect images with a front and back camera, in a realistic scenario to mimic the images that the robot might encounter in the agricultural use case. The cameras are equipped with wide-angle lenses, contributing to the domain shift problem when applying pretrained person detectors to the task. The collected images are saved in JPG format at a resolution of 2048×1536. The dataset is split in train and test sets, and each of these is split in two subsets: a) images depicting humans, and b) images with no humans. The dataset was annotated with bounding boxes by AUTH using the labelImg tool, and the annotations are provided in PASCAL VOC .xml format. For more information you can read here: https://zenodo.org/record/7534233#.ZBwKCXZBzIU  

AUTH-OpenDR ALFW Dataset

Multi-view Facial Image Dataset Based on LFW: Using software that is based on the code that accompanies  this paper  a set of synthetically generated multi-view facial images has been created within OpenDR H2020 research project by Aristotle University of Thessaloniki based on the  LFW   image dataset which is a facial image dataset that consists of 13,233 facial images in the wild for 5,749 person identities collected from the Web. The resulting set, named AUTH-OpenDR Augmented LFW (AUTH-OpenDR ALFW), consists of 5,749 person identities. From each image of these subjects (13,233 in total), 13 synthetic images generated by yaw axis camera rotation in the interval [0◦: +60◦ ] with step +5◦ are obtained. Moreover, 10 synthetic images generated by pitch axis camera rotation in the interval [0◦ : +45◦ ] with step +5◦ are also created for each facial image of the aforementioned dataset. The dataset structure is as follows. Two folders exist for every person identity with his/her name as folder name: one of these folders contains the aligned and the other the original facial images (this is also indicated in the folder name). The file names of the images in these folders are as follows: {Name of the person} {current ID in case of multiplicity} {yaw/pitch direction} {angle in rad}.jpg. The ID is used to distinguish between the various images of the same person, if more than one are available. Examples: ”Alicia Witt 0001 pitch 0.26.jpg” or ”Alicia Witt 0002 yaw 0.52.jpg”. The ALFW dataset is covered by a Creative Commons Attribution-NonCommercial 4.0 International license and can be downloaded from this FTP site.

                                                                                    Table 1: Summary of ALFW Multi-view Facial Image Dataset

                                                                                                          Samples from the ALFW dataset

AUTH-OpenDR ACelebA Dataset

Multi-view Facial Image Dataset Based on CelebA: For performance evaluation or training of face recognition methods, a dataset of facial images from several viewing angles was created by Aristotle University of Thessaloniki based on the CelebA image dataset, using the software that was created in OpenDR H2020 research project based on this paper and the respective code provided by the authors. CelebA is a largescale facial dataset and consists of 202,599 facial images of 10,177 celebrities captured in the wild. The new dataset is named AUTH-OpenDR Augmented CelebA (AUTH-OpenDR ACelebA). The set was generated from 140,000 facial images corresponding to 9161 persons, i.e. a subset of CelebA was used. For each CelebA image used, 13 synthetic images generated by yaw axis camera rotation in the interval [0◦ : +60◦ ] with step +5◦ were obtained. Moreover, 10 synthetic images generated by pitch axis camera rotation in the interval [0◦: +45◦] with step +5◦ are also created for each facial image of the aforementioned dataset. Since CelebA license does not allow distribution of derivative work we do not make AcelebA directly available but instead provide instructions and scripts on how to recreate it. The process that the user shall follow in order to reproduce the Augmented CelebA (ACelebA) is the following:

  • Download to your local folder the public available code of the Github repo Rotate-and-Render
  • Download the CelebA facial image dataset in a local folder
  • Download the python script Do_main_Person_Identity.py and identity_CelebA.csv from this FTP site
  • Create the folder: 3ddfa/Pre-processing
  • Add script and csv to this folder
  • Include the appropriate input paths in the script (rootdir-folder of CelebA dataset)Execute the command python3 Do_main_Person_Identity_ACelebA.py
  • Execute bash experiments/v100_test.sh with appropriate parameters (e.g. gpu_ids) and after modifying the last two lines as follows:

       –yaw_poses 0 5 10 15 20 25 30 35 40 45 50 55 60 \

       –pitch_poses 5 10 15 20 25 30 35 40 45 \

                                                                                 Table 2: Summary of and ACelebA Multi-view Facial Image Dataset

AUTH-OpenDR SMPL+D Human Bodies Dataset

SMPL is a parametric statistical body shape model. SMPL+D is an extension of SMPL, which can encode shape deformations from clothes and hair as vertex displacements. The dataset, generated by Aristotle University of Thessaloniki (AUTH), contains 2928 human models in various shapes and textures. At its core, the dataset consists of 183 unique SMPL+D bodies, which were generated through non-rigid shape registration of manually generated MakeHuman models. The rest were generated by applying shape and texture alterations to those models.  In addition, code is provided for converting those human models in the FBX format. This format is supported by a wide range of simulators, including Webots. However, pose-dependent deformations are not applied to the human models. To convert those models to the FBX format, the original SMPL body model must be downloaded from the authors website. Finally, instructions for setting a demo project in the Webots simulator are provided. In the project, one the SMPL+D bodies in FBX format can perform an animation from AMASS.

                                          SMPL+D bodies generated through non-rigid shape registration of MakeHuman manually created models

The dataset is available through the official GitHub repository of the OpenDR toolkit here. Detailed instructions on how to download the dataset are provided in the repository.

The dataset is licensed under the Creative Commons Attribution 4.0 International License.  It should be noted that the SMPL-Model is needed for generating the SMPL+D bodies in the FBX format. The SMPL-Model is distributed only by its authors and is copyrighted under a separate license. Thus, we do not provide the SMPL-Model in our dataset in any form, and we encourage the users, through our instructions to download it from the webpage of the authors [here]. However, the SMPL-Body, which is directly related to our dataset, is licensed by its authors under the Creative Commons Attribution 4.0 International License [here].

AUTH-OpenDR Mixed Image Annotated Dataset for Human-centric Perception Tasks

The dataset was generated through a mixed (real and synthetic) image data generation method which utilizes real background images and DL-generated human models. The method was developed by Aristotle University of Thessaloniki (AUTH) within the H2020 OpenDR Project. The dataset is suitable for training/evaluating (a) pose estimation, (b) person detection, (c) identity recognition methods. The 3D human models, required by the method, were generated using the Pixel-aligned Implicit Function (PIFu) and full-body images of people from the Clothing Co-Parsing (CCP) dataset. As background images, a subset of the Cityscapes dataset was used. The Cityscapes license prohibits the distribution of any modified versions of itself. Thus, code is provided, through the official GitHub repository of the OpenDR toolkit, that can re-generate the exact same dataset, given that the Cityscapes dataset is downloaded by the website of its authors. However, the set of the 3D human models is directly available through OpenDR’s GitHub repository.

                                      Image from the mixed image dataset. Ground truth keypoints for human pose estimation and bounding

                                                                                                        boxes for person detection are drawn.

The following annotations are provided for the mixed image dataset:

  • 2D Bounding Boxes of humans

                – A .csv file is provided for each image, specifying the 2d bounding box of each depicted human.

  • Human identity labels

                – The same .csv file provided for each image, specifies the identity label of each depicted human.

  • 2D keypoints of human poses

                -csv file provided for each image, also specifies the image coordinates of the keypoints (COCO format) of each depicted human.

                -The standard COCO JSON annotation format is also provided.

The following annotations are provided for the 3D human models used to create the image dataset:

  • 3D bounding boxes and 3D keypoints

                 -Each human model contains a pickle file (.pkl) that specifies the locations of the 8 vertices of its 3D bounding box.

                 -Each human model contains a pickle file (.pkl) that specifies the 3D locations of its keypoints (COCO format).

                                                                                          3D human models in various views generated by PIFu.

                                                                                       Full-body images from the CCP dataset were used as input.

Code and instructions for re-generating the dataset are provided here. Instructions for downloading the 3D human models are also provided. The code, the annotations and the 3D human models are licensed under the Apache 2.0 License. The final dataset is subject to the original license of the authors of the Cityscapes dataset [here].

AU-Multimodal Agricultural Aerial and Ground Robotics Simulation Dataset

This dataset was generated using an aerial robot and a ground robot in the Webots simulator with the OpenDR agricultural dataset generator tool. It consists of 13980 RGB images and their semantic segmentation counterparts taken at different lighting conditions and robot positions in an agricultural field. It also includes the annotation data comprised of the class of the object, x, and y of the top left pixel of the object bounding box, and the width and height of the object bounding box. Furthermore, it includes gps and inertial unit sensor data for UAV and gps, inertial and lidar sensor data for UGV. The dataset is available for download here: AU-Multimodal Agricultural Aerial and Ground Robotics Simulation Dataset

ActiveFace Dataset

This dataset introduces a realistic synthetic facial image generation pipeline, using a modified version of Unity’s Perception package installed on a URP project, that has been designed to support active face recognition. The developed pipeline enables generating images under a wide range of different view angles and distances, as well as under different illumination conditions and backgrounds.

The dataset is available here

ActiveHuman Dataset

ActiveHuman was generated using Unity’s Perception package. It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals). Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

ActiveHuman is available here.

KITTI Panoptic Segmentation Dataset

The KITTI panoptic segmentation dataset for urban scene understanding provides panoptic annotations for a subset of images from the KITTI Vision Benchmark Suite. The annotations for the images that we provide do not intersect with the official KITTI semantic/instance segmentation test set, therefore in addition to panoptic segmentation, they can also be used as supplementary training data for benchmarking semantic or instance segmentation tasks individually. The dataset consists of a total of 1055 images, out of which 855 are used for the training set and 200 are used for the validation set. The images are a resolution of 1280×384 pixels. We provide annotations for 11 ‘stuff’ classes and 8 ‘thing’ classes adhering to the Cityscapes ‘stuff’ and ‘thing’ class distribution. The dataset is available here.

NuScenes LiDAR Panoptic Segmentation Dataset

The NuScenes LiDAR panoptic segmentation dataset for urban 3D scene understanding provides panoptic annotations for LiDAR point clouds from the NuScenes dataset. Our dataset consists of a total of 850 scans, out of which 700 are used for the training set and 150 are used for the validation set. We provide annotations for 6 ‘stuff’ classes and 10 ‘thing’ classes. The dataset is available here.