
Training dataset for object detection - Penguins from UAV

Australian Ocean Data Network
Belyaev, O. ; BELYAEV, OLEG
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=Dataset DOI&rft.title=Training dataset for object detection - Penguins from UAV&rft.identifier=Dataset DOI&rft.publisher=Australian Antarctic Data Centre&rft.description=On February 8, 2021, Deception Island Chinstrap penguin colonies were photographed during the PiMetAn Project XXXIV Spanish Antarctic campaign using unmanned aerial vehicles (UAV) at a height of 30m. From the obtained imagery, a training dataset for penguin detection from aerial perspective was generated. The penguin species is the Chinstrap penguin (Pygoscelis antarcticus). The dataset consists of three folders: train, containing 531 images, intended for model training; valid, containing 50 images, intended for model validation; and test, containing 25 images, intended for model testing. In each of the three folders, an additional .csv file is located, containing labels (x,y positions and class names for every penguin in the images), annotated in Tensorflow Object Detection format. There is only one annotation class: Penguin. All 606 images are 224x224 px in size, and 96 dpi. The following augmentation was applied to create 3 versions of each source image: * Random shear of between -18° to +18° horizontally and -11° to +11° vertically This dataset was annotated and exported via The model Faster R-CNN64 with ResNet-101 backbone was used to perform object detection tasks. Training and evaluation tasks were performed using the TensorFlow 2.0 machine learning platform by Google.Progress Code: completedStatement: The dataset has an overall good quality, However, when detecting penguins from UAV, it is sometimes difficult to distinguish grey-feathered chicks from rocks, which present similar colours. In this sense, the model can predict rocky coasts as a penguin-abundant zones.&rft.creator=Belyaev, O. &rft.creator=BELYAEV, OLEG &; southlimit=-63.02409; eastlimit=-60.60815; northlimit=-62.99615&rft.coverage=westlimit=-60.66992; southlimit=-63.02409; eastlimit=-60.60815; northlimit=-62.99615&rft_rights=This metadata record is publicly available.&rft_rights=These data are publicly available for download from the provided URL.&rft_rights= data set conforms to the CCBY Attribution License ( Please follow instructions listed in the citation reference provided at when using these data. Network Graphic&rft_rights= Commons by Attribution logo&rft_rights=Attribution 4.0 International (CC BY 4.0)&rft_rights=Legal code for Creative Commons by Attribution 4.0 International license&rft_rights=Attribution 4.0 International (CC BY 4.0)&rft_rights= Access the data

Licence & Rights:

Other view details

This data set conforms to the CCBY Attribution License (

Please follow instructions listed in the citation reference provided at when using these data.

Attribution 4.0 International (CC BY 4.0)

This metadata record is publicly available.

These data are publicly available for download from the provided URL.

Portable Network Graphic

Creative Commons by Attribution logo

Attribution 4.0 International (CC BY 4.0)

Legal code for Creative Commons by Attribution 4.0 International license



Contact Information

Brief description

On February 8, 2021, Deception Island Chinstrap penguin colonies were photographed during the PiMetAn Project XXXIV Spanish Antarctic campaign using unmanned aerial vehicles (UAV) at a height of 30m. From the obtained imagery, a training dataset for penguin detection from aerial perspective was generated.

The penguin species is the Chinstrap penguin (Pygoscelis antarcticus).

The dataset consists of three folders: "train", containing 531 images, intended for model training; "valid", containing 50 images, intended for model validation; and "test", containing 25 images, intended for model testing. In each of the three folders, an additional .csv file is located, containing labels (x,y positions and class names for every penguin in the images), annotated in Tensorflow Object Detection format.

There is only one annotation class: Penguin.

All 606 images are 224x224 px in size, and 96 dpi.

The following augmentation was applied to create 3 versions of each source image:
* Random shear of between -18° to +18° horizontally and -11° to +11° vertically

This dataset was annotated and exported via

The model Faster R-CNN64 with ResNet-101 backbone was used to perform object detection tasks. Training and evaluation tasks were performed using the TensorFlow 2.0 machine learning platform by Google.


Progress Code: completed
Statement: The dataset has an overall good quality, However, when detecting penguins from UAV, it is sometimes difficult to distinguish grey-feathered chicks from rocks, which present similar colours. In this sense, the model can predict rocky coasts as a penguin-abundant zones.


This dataset was created to perform Chinstrap penguin detection from Vapour Col colony, Deception Island, enabling an efficient quantification of the population of this species within the colony. Subsequently it was used in the article "The contribution of penguin guano to the Southern Ocean iron pool".

Data time period: 2021-02-08 to 2021-02-08

-60.60815,-62.99615 -60.60815,-63.02409 -60.66992,-63.02409 -60.66992,-62.99615 -60.60815,-62.99615


text: westlimit=-60.66992; southlimit=-63.02409; eastlimit=-60.60815; northlimit=-62.99615

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
Download the dataset. (GET DATA > DIRECT DOWNLOAD)

uri :