Data

Australasian Seabird Occurences (including migratory species) - Aggregated data product (1939 - ongoing) (NESP MaC 5.9, IMOS)

Australian Ocean Data Network
Integrated Marine Observing System (IMOS) ; CSIRO Marine National Facility (MNF)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=https://catalogue-imos.aodn.org.au/geonetwork/srv/api/records/ec2c0ef9-3645-4ded-b617-c8297f6eb250&rft.title=Australasian Seabird Occurences (including migratory species) - Aggregated data product (1939 - ongoing) (NESP MaC 5.9, IMOS)&rft.identifier=https://catalogue-imos.aodn.org.au/geonetwork/srv/api/records/ec2c0ef9-3645-4ded-b617-c8297f6eb250&rft.description=This aggregated product collates observed seabird sightings from multiple CSIRO‑hosted Ocean Biodiversity Information System (OBIS) datasets spanning historical to contemporary programs, unifying platform‑based visual surveys, voyage‑based watch observations (WOV), archival ship logs, and animal‑borne tracking into one analysis‑ready resource. Ship‑based visual surveys (including standardised watch-observations from research vessels) record species identity, counts and behaviour along vessel tracks, with observation effort and environmental context captured as measurement‑or‑fact where available (e.g., Research vessel voyages: Investigator, Falkor, Franklin, Southern Surveyor and Aurora Australis). Historical atlas and ship‑log compilations extend temporal coverage back to the 1970s and earlier through curated digitisation of paper records and atlases.Complementary animal‑tracking datasets add individual movement ecology via light‑level geolocators (GLS) and satellite/GPS tags, capturing migration routes, staging areas and foraging ranges (e.g., short‑tailed shearwaters, Arctic terns). Tracking programs delivered under IMOS Animal Tracking and other initiatives contribute deployment metadata, phases (e.g., pre‑laying exodus, incubation), and repeated seasons, enabling cross‑study synthesis of space use across the Southern Ocean and beyond. Across sources, records are standardised to Darwin Core with taxonomic alignment to World Register of Marine Species (WoRMS, https://www.marinespecies.org/about.php#what_is_worms), harmonising vessel‑based sightings, historical ship logs and telemetry fixes for robust ecological, biogeographic and conservation analyses. In addition, this product has been normalised with a H3 spatial index applied, and Australian Marine Region Tags have been added which allows spatial filtering of the data based on known marine regions (for example, commonwealth marine regions).Maintenance and Update Frequency: monthlyStatement: Data Ingestion Workflow The Seabird Aggregated Data Product is generated using a reproducible Extract–Transform–Load (ETL) workflow designed to unify the many heterogeneous seabird datasets published through the OBIS Australia Node into a single harmonised dataset suitable for scientific analysis. Data Extraction Source datasets are obtained from the OBIS Australia Node as Darwin Core Archives (DwC‑A). Each archive includes metadata (EML), structural definitions (meta.xml), and one or more core data tables—typically occurrence, event, and extendedMeasurementOrFact (EMOF). Relevant taxonomic and vocabulary references are retrieved from the World Register of Marine Species (WoRMS) and the GBIF Life Stage vocabulary to ensure globally consistent terminology. Schema Validation and Standardisation Darwin Core tables are validated against predefined schemas prior to further processing. Mandatory fields such as occurrenceID, scientificNameID, scientificName, and eventDate are checked for completeness, and records missing essential spatial data (latitude and longitude) are removed. EMOF tables must reference a valid occurrence or event record; those that do not are excluded. Dataset metadata (e.g., title, abstract, contacts, temporal/spatial extent) is extracted directly from eml.xml. Transformation and Harmonisation The transformation step integrates the Darwin Core content with WoRMS taxonomy and GBIF life‑stage vocabularies. Key steps include: • Taxonomic validation: Scientific names, scientificNameIDs, and verbatim identifications are cross‑checked against WoRMS to ensure each observation is linked to an accepted marine taxon with correct hierarchy and status. • Timestamp standardisation: eventDate values are checked against ISO‑8601 patterns and normalised to ensure consistent temporal interpretation. • Life‑stage validation: Reported life‑stage values are validated against the GBIF vocabulary to ensure controlled terminology. • EMOF pivoting: The long‑format EMOF table (which may contain thousands of measurement types) is pivoted into wide format to merge environmental and contextual measurements with the corresponding occurrence record. Measurement types with significant data coverage (e.g., sea state, sea‑surface temperature, depth) are retained; less common attributes fall below the threshold for inclusion. • Measurement harmonisation: Where distinct measurement types represent the same underlying variable (e.g., “Wind Speed (knt)” vs “Wind Force (Beaufort)”), they are merged only where units and interpretation allow consistent integration. After merging, all tables are joined—occurrence, EMOF, and dataset‑level metadata—into a single unified record for each seabird observation. Aggregation and Quality Control All validated and transformed datasets are aggregated vertically into a single parquet dataset. Additional filtering ensures: • Rows with missing spatial coordinates, missing organism quantity, or invalid EMOF relationships are removed. • Empty or unused columns are dropped. • Each record is enriched with WoRMS taxonomic hierarchy and dataset‑level metadata. An ETL timestamp is added to allow tracking of update cycles and reproducibility. Loading and Publication The final harmonised dataset is written to cloud storage in Parquet format. The columnar format and compression significantly reduce file size relative to the source archives and improve accessibility for analytical workflows and scheduled data refreshes. Update Schedule The underlying seabird datasets may be updated at OBIS Australia, therefore the ETL pipeline is executed once per month to capture any changes to the source data. Advantages of the Workflow • Reproducible: Provides a transparent and standardised method for joining complex Darwin Core content into a single table. • Harmonised: Ensures consistent species names, life‑stage terms, measurements, and temporal formats across disparate datasets. • Efficient: Original multi‑file Darwin Core archives are reduced to a compact ~5 MB Parquet file, optimising storage and compute. • Cloud‑native: The workflow supports automated updates and fast access for researchers.Statement: Datasets ingested from OBIS Australia Please see the seabirds spreadsheet (link in distribution) for the following information about each individual dataset - Dataset ID - Dataset Title - Dataset Contact - Name - Organisation - Role/Title - Email address - Licence - OBIS data source link - Data Organisation Owner&rft.creator=Integrated Marine Observing System (IMOS) &rft.creator=CSIRO Marine National Facility (MNF) &rft.date=2026&rft.coverage=westlimit=-180.00; southlimit=-71.078; eastlimit=180.00; northlimit=63.041&rft.coverage=westlimit=-180.00; southlimit=-71.078; eastlimit=180.00; northlimit=63.041&rft_rights=Creative Commons Attribution-Noncommercial 4.0 International License http://creativecommons.org/licenses/by-nc/4.0/&rft_rights=The citation in a list of references is: IMOS [year-of-data-download], Australasian Seabird Occurences (including migratory species) - Aggregated data product (1939 - ongoing) (NESP MaC 5.9, IMOS), [data-access-URL], accessed [date-of-access].&rft_rights=If CSIRO Marine National Facility (MNF) data is utilised (identified in 'Source' file - Data Organisation Owner), the citation in a list of references is: IMOS, CSIRO Marine National Facility [year-of-data-download], Australasian Seabird Occurences (including migratory species) - Aggregated data product (1939 - ongoing) (NESP MaC 5.9, IMOS), [data-access-URL], accessed [date-of-access].&rft_rights=Any users of IMOS data are required to clearly acknowledge the source of the material derived from IMOS in the format: Data was sourced from Australia's Integrated Marine Observing System (IMOS) - IMOS is enabled by the National Collaborative Research Infrastructure strategy (NCRIS).&rft_rights=Any users of CSIRO Marine National Facility (MNF) data (identified in 'Source' file - Data Organisation Owner) are required to clearly acknowledge the source of the material in the format We acknowledge the use of the CSIRO Marine National Facility (https://ror.org/01mae9353) in undertaking this research. Acknowledgment and citations should be in all forms of publication including presentations (where the acknowledgement should be on the closing slide), journals, books, reports and related research outputs.&rft_rights=Note - the source data contributing to this aggregated data product had a mix of CC-BY (25%) and CC-BY-NC (75%) licensing. We have applied a CC-BY-NC licence to this product, but please consult the source table for individual dataset details ('datasetName' identifies the dataset the record comes from)&rft_rights=Data, products and services from IMOS are provided as is without any warranty as to fitness for a particular purpose.&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Other view details
Unknown

Creative Commons Attribution-Noncommercial 4.0 International License
http://creativecommons.org/licenses/by-nc/4.0/

The citation in a list of references is: "IMOS [year-of-data-download], Australasian Seabird Occurences (including migratory species) - Aggregated data product (1939 - ongoing) (NESP MaC 5.9, IMOS), [data-access-URL], accessed [date-of-access]."

If CSIRO Marine National Facility (MNF) data is utilised (identified in 'Source' file - "Data Organisation Owner"), the citation in a list of references is: "IMOS, CSIRO Marine National Facility [year-of-data-download], Australasian Seabird Occurences (including migratory species) - Aggregated data product (1939 - ongoing) (NESP MaC 5.9, IMOS), [data-access-URL], accessed [date-of-access]."

Any users of IMOS data are required to clearly acknowledge the source of the material derived from IMOS in the format: "Data was sourced from Australia's Integrated Marine Observing System (IMOS) - IMOS is enabled by the National Collaborative Research Infrastructure strategy (NCRIS)."

Any users of CSIRO Marine National Facility (MNF) data (identified in 'Source' file - "Data Organisation Owner") are required to clearly acknowledge the source of the material in the format "We acknowledge the use of the CSIRO Marine National Facility (https://ror.org/01mae9353) in undertaking this research."
Acknowledgment and citations should be in all forms of publication including presentations (where the acknowledgement should be on the closing slide), journals, books, reports and related research outputs.

Note - the source data contributing to this aggregated data product had a mix of CC-BY (25%) and CC-BY-NC (75%) licensing. We have applied a CC-BY-NC licence to this product, but please consult the source table for individual dataset details ('datasetName' identifies the dataset the record comes from)

Data, products and services from IMOS are provided "as is" without any warranty as to fitness for a particular purpose.

Access:

Other

Full description

This aggregated product collates observed seabird sightings from multiple CSIRO‑hosted Ocean Biodiversity Information System (OBIS) datasets spanning historical to contemporary programs, unifying platform‑based visual surveys, voyage‑based watch observations (WOV), archival ship logs, and animal‑borne tracking into one analysis‑ready resource. Ship‑based visual surveys (including standardised watch-observations from research vessels) record species identity, counts and behaviour along vessel tracks, with observation effort and environmental context captured as measurement‑or‑fact where available (e.g., Research vessel voyages: Investigator, Falkor, Franklin, Southern Surveyor and Aurora Australis). Historical atlas and ship‑log compilations extend temporal coverage back to the 1970s and earlier through curated digitisation of paper records and atlases.

Complementary animal‑tracking datasets add individual movement ecology via light‑level geolocators (GLS) and satellite/GPS tags, capturing migration routes, staging areas and foraging ranges (e.g., short‑tailed shearwaters, Arctic terns). Tracking programs delivered under IMOS Animal Tracking and other initiatives contribute deployment metadata, phases (e.g., pre‑laying exodus, incubation), and repeated seasons, enabling cross‑study synthesis of space use across the Southern Ocean and beyond.

Across sources, records are standardised to Darwin Core with taxonomic alignment to World Register of Marine Species (WoRMS, https://www.marinespecies.org/about.php#what_is_worms), harmonising vessel‑based sightings, historical ship logs and telemetry fixes for robust ecological, biogeographic and conservation analyses. In addition, this product has been normalised with a H3 spatial index applied, and Australian Marine Region Tags have been added which allows spatial filtering of the data based on known marine regions (for example, commonwealth marine regions).

Lineage

Maintenance and Update Frequency: monthly
Statement: Data Ingestion Workflow The Seabird Aggregated Data Product is generated using a reproducible Extract–Transform–Load (ETL) workflow designed to unify the many heterogeneous seabird datasets published through the OBIS Australia Node into a single harmonised dataset suitable for scientific analysis. Data Extraction Source datasets are obtained from the OBIS Australia Node as Darwin Core Archives (DwC‑A). Each archive includes metadata (EML), structural definitions (meta.xml), and one or more core data tables—typically occurrence, event, and extendedMeasurementOrFact (EMOF). Relevant taxonomic and vocabulary references are retrieved from the World Register of Marine Species (WoRMS) and the GBIF Life Stage vocabulary to ensure globally consistent terminology. Schema Validation and Standardisation Darwin Core tables are validated against predefined schemas prior to further processing. Mandatory fields such as occurrenceID, scientificNameID, scientificName, and eventDate are checked for completeness, and records missing essential spatial data (latitude and longitude) are removed. EMOF tables must reference a valid occurrence or event record; those that do not are excluded. Dataset metadata (e.g., title, abstract, contacts, temporal/spatial extent) is extracted directly from eml.xml. Transformation and Harmonisation The transformation step integrates the Darwin Core content with WoRMS taxonomy and GBIF life‑stage vocabularies. Key steps include: • Taxonomic validation: Scientific names, scientificNameIDs, and verbatim identifications are cross‑checked against WoRMS to ensure each observation is linked to an accepted marine taxon with correct hierarchy and status. • Timestamp standardisation: eventDate values are checked against ISO‑8601 patterns and normalised to ensure consistent temporal interpretation. • Life‑stage validation: Reported life‑stage values are validated against the GBIF vocabulary to ensure controlled terminology. • EMOF pivoting: The long‑format EMOF table (which may contain thousands of measurement types) is pivoted into wide format to merge environmental and contextual measurements with the corresponding occurrence record. Measurement types with significant data coverage (e.g., sea state, sea‑surface temperature, depth) are retained; less common attributes fall below the threshold for inclusion. • Measurement harmonisation: Where distinct measurement types represent the same underlying variable (e.g., “Wind Speed (knt)” vs “Wind Force (Beaufort)”), they are merged only where units and interpretation allow consistent integration. After merging, all tables are joined—occurrence, EMOF, and dataset‑level metadata—into a single unified record for each seabird observation. Aggregation and Quality Control All validated and transformed datasets are aggregated vertically into a single parquet dataset. Additional filtering ensures: • Rows with missing spatial coordinates, missing organism quantity, or invalid EMOF relationships are removed. • Empty or unused columns are dropped. • Each record is enriched with WoRMS taxonomic hierarchy and dataset‑level metadata. An ETL timestamp is added to allow tracking of update cycles and reproducibility. Loading and Publication The final harmonised dataset is written to cloud storage in Parquet format. The columnar format and compression significantly reduce file size relative to the source archives and improve accessibility for analytical workflows and scheduled data refreshes. Update Schedule The underlying seabird datasets may be updated at OBIS Australia, therefore the ETL pipeline is executed once per month to capture any changes to the source data. Advantages of the Workflow • Reproducible: Provides a transparent and standardised method for joining complex Darwin Core content into a single table. • Harmonised: Ensures consistent species names, life‑stage terms, measurements, and temporal formats across disparate datasets. • Efficient: Original multi‑file Darwin Core archives are reduced to a compact ~5 MB Parquet file, optimising storage and compute. • Cloud‑native: The workflow supports automated updates and fast access for researchers.
Statement: Datasets ingested from OBIS Australia Please see the seabirds spreadsheet (link in distribution) for the following information about each individual dataset - Dataset ID - Dataset Title - Dataset Contact - Name - Organisation - Role/Title - Email address - Licence - OBIS data source link - Data Organisation Owner

Notes

Credit
Source data - obtained from the OBIS Australia Node (managed by the CSIRO National Collections and Marine Infrastructure Information and Data Centre)
Credit
Source data - all contributors of seabird data to the OBIS Australia Node (data providers retain ownership of the data provided - authoritative scientists and science organisations approved by OBIS)
Credit
Source data - collated for AODN aggregation purposes by Dave Watts and Sachit Rajbhandari
Credit
Australia's Integrated Marine Observing System (IMOS) is enabled by the National Collaborative Research Infrastructure Strategy (NCRIS). It is operated by a consortium of institutions as an unincorporated joint venture, with the University of Tasmania as Lead Agent.
Credit
The data collection described in this record was funded by the Australian Government Department of Climate Change, the Environment, Energy & Water (DCCEEW) through the NESP Marine and Coastal Hub. In addition to NESP (DCCEEW) funding, this project was supported by an equivalent amount of in-kind support and co-investment from project partners and collaborators.
Credit
CSIRO National Collections and Marine Infrastructure Information and Data Centre
Credit
Any users of CSIRO Marine National Facility (MNF) data (identified in 'Source' file - "Data Organisation Owner") are required to clearly acknowledge the source of the material in the format "We acknowledge the use of the CSIRO Marine National Facility (https://ror.org/01mae9353) in undertaking this research." Acknowledgment and citations should be in all forms of publication including presentations (where the acknowledgement should be on the closing slide), journals, books, reports and related research outputs.

Created: 12 02 2026

Data time period: 1939-09-19

This dataset is part of a larger collection

Click to explore relationships graph

180,63.041 180,-71.078 -180,-71.078 -180,63.041 180,63.041

0,-4.0185

text: westlimit=-180.00; southlimit=-71.078; eastlimit=180.00; northlimit=63.041

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
(OBIS Australia webpage)

url : https://www.obis.org.au/

(View and download data though the AODN Portal)

url : https://portal.aodn.org.au/search?uuid=ec2c0ef9-3645-4ded-b617-c8297f6eb250

(Access To AWS Open Data Program registry for the Cloud Optimised version of this dataset (link to be added))

url : https://registry.opendata.aws/

(Data files via Amazon Web Services S3 storage - download link (full dataset))

url : https://data-uplift-public.s3.ap-southeast-2.amazonaws.com/stored/datauplift/seabird/seabird.parquet

(Data files accessible via Amazon S3 (public access, S3 URI))

local : s3://data-uplift-public/stored/datauplift/seabird/seabird.parquet

(Access to Jupyter notebook to query Cloud Optimised converted dataset)

url : https://github.com/aodn/imos-user-code-library/blob/master/NESP/seabird.ipynb

(Access to R Markdown notebook to query Cloud Optimised converted dataset)

url : https://github.com/aodn/imos-user-code-library/blob/master/NESP/seabird.Rmd

(Video tutorials demonstrating the use of the associated Python Jupyter notebook)

url : https://youtube.com/playlist?list=PLHCEbETnUz5w2KXiON-8iSYpO3x1qJgxj&si=IxNFi0_VXG86TvW0

(Video tutorials demonstrating the use of the associated R notebook)

url : https://youtube.com/playlist?list=PLHCEbETnUz5w0JYQ3BsDaMam-UZ42AA7w&si=fgD7QgY4jA5jktio

(Technical description of product)

url : https://content.aodn.org.au/Documents/IMOS/Data_product/Seabird_v1.0.pdf

(Source metadata)

url : https://content.aodn.org.au/Documents/IMOS/Data_product/Seabird_source_metadata.csv

global : aeb0afce-7fc7-4d48-91fc-f7b8e730073c

local : 010x3gp67

local : 01mae9353

local : 010x3gp67

local : 03qn8fb07

local : 03qn8fb07

NESP MaC Project 5.9 – Making marine environmental data more assessment ready, 2025 (UTAS, IMOS)

doi : 10.82210/a44c9c1d