Biological ocean data collected from ships find reuse in aggregations of historical data. These data are heavily relied upon to document long term change, validate satellite algorithms for ocean biology and are useful in assessing the performance of autonomous platforms and biogeochemical models. There is a need to combine subsurface biological and physical data into one aggregate data product to support reproducible research. Existing aggregate products are dissimilar in source data, have largely been isolated to the surface ocean and most omit physical data. These products cannot easily be used to explore subsurface bio-physical relationships. We present the first version of a biological ocean data reformatting effort (BIO-MATE, https://gitlab.com/KBaldry/BIO-MATE). BIO-MATE uses R software that reformats openly sourced published datasets from oceanographic voyages. These reformatted biological and physical data from underway sensors, profiling sensors and pigments analysis are stored in an interoperable and reproducible BIO-MATE data product for easy access and use.
Maintenance and Update Frequency: none-planned
Statement: The BIO-MATE aggregate data product brings together ship-based data that have been collected by a Principal Investigator (PI) and openly published as publicly accessible data. The first version of BIO-MATE includes published datasets associated with four types of measurements: 1. sensors in the vessels underway seawater in-take (underway sensor data stream), 2. profiling sensors mounted to sampling rosettes (profiling sensor data stream), 3. particulate organic carbon (POC) measured in the laboratory (POC data stream), and 4. pigments measured in the laboratory (pigment data stream).
A semi-automated workflow and the BIO-MATE R software (https://github.com/KimBaldry/BIOMATE-Rpackage) were used to reformat published datasets, and produce the BIO-MATE data product. Reformatted data files follow the WHP-Exchange format (https://exchange-format.readthedocs.io/en/latest/index.html), except pigment headers follow MAREDAT standards. The software arranges reformatted WHPE files into four data streams in local directories that include separate WHPE files, for each EXPOCODE, and for underway sensors, profiling sensor casts and pigment measurements. An algorithm links biological data in the pigment and POC data streams to the physical data in the profiling sensor and underway sensor data streams. Biological data records are given a profiling sensor identification tag (CTD_ID) if matched to physical data in BIO-MATE. Information is included in the BIO-MATE data product, for citing published datasets, laboratory analysis methodologies (for the PIG data stream) and the data repositories through which published data records were accessed. Each citation is recorded as a BibTEX entry, compatible with EndNote, R and LaTEX. Each BibTEX entry has a tag that is referenced in the processing metadata.
Limited quality assurance has been performed on the BIO-MATE data product and is variable across published datasets. The initial integrity of these data records lies with the Principal Investigators of the published data record. As a result, reformatted data have varying levels of quality control and post-processing.
Data included in the data product were made available by the following data repositories; PANGAEA, AODN/
IMOS, SeaBASS, CCHDO, AADC, GLODAP, PAL-LTER, CSIRO, MDGS and BCO-DMO. Records of data access dates, source addresses and digital object identifiers are recorded as metadata, alongside appropriate data citations. We acknowledge the enormous community effort undertaken in the collection, analysis and publication of this data and thank principle investigators for publishing their data in open access repositories.
Australian Research Council’s Special Research Initiative for Antarctic Gateway Partnership (Project ID SR140300001)
2019 Fellowship from the Scientific Committee of Antarctic Research