Data

Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS)

Australian Ocean Data Network
Hammerton, Marc ; Lawrey, Eric
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.26274/qfy8-hj59&rft.title=Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS)&rft.identifier=10.26274/qfy8-hj59&rft.description=This dataset corresponds to land area polygons of Australian coastline and surrounding islands. It was generated from 10 m Sentinel 2 imagery from 2022 - 2024 using the Normalized Difference Water Index (NDWI) to distinguish land from water. It was estimated from composite imagery made up from images where the tide is above the mean sea level. The coastline approximately corresponds to the mean high water level.This dataset was created as part of the NESP MaC 3.17 northern Australian Reef mapping project. It was developed to allow the inshore edge of digitised fringing reef features to be neatly clipped to the land areas without requiring manual digitisation of the neighbouring coastline. This required a coastline polygon with an edge positional error of below 50 m so as to not distort the shape of small fringing reefs. We found that existing coastline datasets such as the Geodata Coast 100K 2004 and the Australian Hydrographic Office (AHO) Australian land and coastline dataset did not meet our needs. The scale of the Geodata Coast 100K 2004 was too coarse to represent small islands and the the positional error of the Australian Hydrographic Office (AHO) Australian land and coastline dataset was too high (typically 80 m) for our application as the errors would have introduced significant errors in the shape of small fringing reefs. The Digital Earth Australia Coastline (GA) dataset was sufficiently accurate and detailed however the format of the data was unsuitable for our application as the coast was expressed as disconnected line features between rivers, rather than a closed polygon of the land areas. We did however base our approach on the process developed for the DEA coastline described in Bishop-Taylor et al., 2021 (https://doi.org/10.1016/j.rse.2021.112734). Adapting it to our existing Sentinel 2 Google Earth processing pipeline. The difference between the approach used for the DEA coastline and this dataset was the DEA coastline performed the tidal calculations and filtering at the pixel level, where as in this dataset we only estimated a single tidal level for each whole Sentinel image scene. This was done for computational simplicity and to align with our existing Google Earth Engine image processing code. The images in the stack were sorted by this tidal estimate and those with a tidal high greater than the mean seal level were combined into the composite. The Sentinel 2 satellite follows a sun synchronous orbit and so does not observe the full range of tidal levels. This observed tidal range varies spatially due to the relative timing of peak tides with satellite image timing. We made no accommodation for variation in the tidal levels of the images used to calculate the coastline, other than selecting images that were above the mean tide level. This means tidal height that the dataset coastline corresponds to will vary spatially. While this approach is less precise than that used in the DEA Coastline the resulting errors were sufficiently low to meet the project goals. This simplified approach was chosen because it integrated well with our existing Sentinel 2 processing pipeline for generating composite imagery. To verify the accuracy of this dataset we manually checked the generated coastline with high resolution imagery (ArcGIS World Imagery). We found that 90% of the coastline polygons in this dataset have a horizontal position error of less than 20 m when compared to high-resolution imagery, except for isolated failure cases. During our manual checks we identified some areas where our algorithm can lead to falsely identifying land or not identifying land. We identified specific scenarios, or 'failure modes,' where our algorithm struggled to distinguish between land and water. These are shown in the image Potential failure modes:a) The coastline is pushed out due to breaking waves (example: western coast, S2 tile ID 49KPG). b) False land polygons are created because of very turbid water due to suspended sediment. In clear water areas the near infrared channel is almost black, starkly different to the bright land areas. In very highly turbid waters the suspended sediment appears in the near infrared channel, raising its brightness to a level where it starts to overlap with the brightness of the dimmest land features. (example: Joseph Bonaparte Gulf, S2 tile ID 52LEJ). This results in turbid rivers not being correctly mapped. In version 1-1 of the dataset the rivers across northern Australia were manually corrected for these failures.c) Very shallow, gentle sloping areas are not recognised as water and the coastline is pushed out (example: Mornington Island, S2 tile ID 54KUG). Update: A second review of this area indicated that the mapped coastline is likely to be very close to the try coastline.d) The coastline is lower than the mean high water level (example: Great Keppel (Wop-pa) Island, S2 tile ID 55KHQ).Some of these potential failure modes could probably be addressed in the future by using a higher resolution tide calculation and using adjusted NDWI thresholds per region to accommodate for regional differences. Some of these failure modes are likely due to the near infrared channel (B8) being able to penetrate the water approximately 0.5 m leading to errors in very shallow areas. Some additional failures include:- Interpreting jetties as land- Interpreting oil rigs as land- Bridges being interpreted as land, cutting off riversMethods:The coastline polygons were created in four separate steps:1. Create above mean sea level (AMSL) composite images.2. Calculate the Normalized Difference Water Index (NDWI) and visualise as a grey scale image.3. Generate vector polygons from the grey scale image using a NDWI threshold.4. Clean up and merge polygons.To create the AMSL composite images, multiple Sentinel 2 images were combined using the Google Earth Engine. The core algorithm was:1. For each Sentinel 2 tile filter the COPERNICUS/S2_HARMONIZED image collection by - tile ID - maximum cloud cover 20% - date between '2022-01-01' and '2024-06-30' - asset_size > 100000000 (remove small fragments of tiles)2. Remove high sun-glint images (see High sun-glint image detection for more information).3. Split images by SENSING_ORBIT_NUMBER (see Using SENSING_ORBIT_NUMBER for a more balanced composite for more information).4. Iterate over all images in the split collections to predict the tide elevation for each image from the image timestamp (see Tide prediction for more information).5. Remove images where tide elevation is below mean sea level.6. Select maximum of 200 images with AMSL tide elevation.7. Combine SENSING_ORBIT_NUMBER collections into one image collection.8. Remove sun-glint and apply atmospheric correction on each image (see Sun-glint removal and atmospheric correction for more information).9. Duplicate image collection to first create a composite image without cloud masking and using the 15th percentile of the images in the collection (i.e. for each pixel the 15th percentile value of all images is used).10. Apply cloud masking to all images in the original image collection (see Cloud Masking for more information) and create a composite by using the 15th percentile of the images in the collection (i.e. for each pixel the 15th percentile value of all images is used).11. Combine the two composite images (no cloud mask composite and cloud mask composite). This solves the problem of some coral cays and islands being misinterpreted as clouds and therefore creating holes in the composite image. These holes are plugged with the underlying composite without cloud masking. (Lawrey et al. 2022)Next, for each image the NDWI was calculated:1. Calculate the normalised difference using the B3 (green) and B8 (near infrared).2. Shift the value range from between -1 and +1 to values between 1 and 255 (0 reserved as no-data value). 3. Export image as 8 bit unsigned Integer grey scale image.During the next step, we generated vector polygons from the grey scale image using a NDWI threshold:1. Upscale image to 5 m resolution using bilinear interpolation. This was to help smooth the coastline and reduce the error introduced by the jagged pixel edges.2. Apply a threshold to create a binary image (see NDWI Threshold for more information) with the value 1 for land and 2 for water (0: no data).3. Create polygons for land values (1) in the binary image. 4. Export as shapefile.Finally, we created a single layer from the vectorised images:1. Merge and dissolve all vector layers in QGIS.2. Perform smoothing (QGIS toolbox, Iterations 1, Offset 0.25, Maximum node angle to smooth 180).3. Perform simplification (QGIS toolbox, tolerance 0.00003).4. Remove polygon vertices on the inner circle to fill out the continental Australia.5. Perform manual QA/QC. In this step we removed false polygons created due to sun glint and breaking waves. We also removed very small features (1 – 1.5 pixel sized features, e.g. single mangrove trees) by calculating the area of each feature (in m2) and removing features smaller than 200 m2.15th percentile composite:The composite image was created using the 15th percentile of the pixels values in the image stack. The 15th percentile was chosen, in preference to the median, to select darker pixels in the stack as these tend to correspond to images with clearer water conditions and higher tides. High sun-glint image detection:Images with high sun-glint can lead to lower quality composite images. To determine high sun-glint images, a land mask was first applied to the image to only retain water pixels. This land mask was estimated using NDWI. The proportion of the water pixels in the near-infrared and short-wave infrared bands above a sun-glint threshold was calculated. Images with a high proportion were then filtered out of the image collection. Sun-glint removal and atmospheric correction:The Top of Atmosphere L1 Sentinel 2 imagery was used in this dataset. These images are affected by atmospheric scattering (haze) lowering the contrast of the imagery. Additionally sun-glint on the water areas can lead to these areas appearing brighter than they should. To correct for these effects we used a simple constant black point correction (subtracting a constant value from all pixels) for atmospheric scattering and the near infrared channel (B8) to correct for sun-glint. The amount of black point level correction was chosen so that dark areas (hill shadows, parts of mangroves) on land would appear dark after the correction. The same level of black point correction was applied to all images. A level was chosen that worked well across a wide range of scenes across Australia over multiple seasons. Sun-glint correction was achieved by subtracting a scaled version (0.9x, a constant tested to work well across a wide range of scenes) of the B8 channel, up to a maximum that matched the black point correction level. Limiting the sun-glint correction at the same level as the black point correction results in a relatively clear transition between the sun-glint correction on the water and the constant atmospheric correction on the land (which is just the clipped B8 channel). If the sun-glint correction is not capped then the land areas end up black due to the B8 channel being very bright on land.This algorithm is an adjustment of the algorithm already used in Lawrey et al. 2022No research was undertaken into how important the black point correction and sun-glint correction was to the final NDWI imagery. These corrections are very important for generating true colour imagery to view marine features, but maybe unnecessary for NDWI calculations.Tide prediction:To determine the tide elevation in a specific satellite image, we used a tide prediction model to predict the tide elevation for the image timestamp. After investigating and comparing a number of models, settled on the empirical ocean tide model EOT20 (Hart-Davis et al., 2021). The model data can be freely accessed at https://doi.org/10.17882/79489 and works with the Python library pyTMD (https://github.com/tsutterley/pyTMD). In our comparison we found this model was able to predict accurately the tide elevation across multiple points along the study coastline when compared to historic Bureau of Meteorology and AusTide data. To determine the tide elevation of the satellite images we manually created a point dataset where we placed a central point on the water for each Sentinel tile in the study area. We used these points as centroids in the ocean models and calculated the tide elevation from the image timestamp.Using SENSING_ORBIT_NUMBER for a more balanced composite:Some of the Sentinel 2 tiles are made up of different sections depending on the SENSING_ORBIT_NUMBER. For example, a tile could have a small triangle on the left side and a bigger section on the right side. If we filter an image collection and use a subset to create a composite, we could end up with a high number of images for one section (e.g. the left side triangle) and only few images for the other section(s). To avoid this issue, the initial unfiltered image collection is divided into multiple image collections by using the image property SENSING_ORBIT_NUMBER. The filtering and limiting (max number of images in collection) is then performed on each SENSING_ORBIT_NUMBER image collection and finally, they are combined back into one image collection to generate the final composite.Cloud Masking:Each image was processed to mask out clouds and their shadows before creating the composite image.The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts. A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 35% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask. A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer. The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. As such there are probably significant potential improvements that could be made to this algorithm.Erosion, dilation and buffer operations were performed at one quarter the resolution of satellite imagery to improve the computational speed. Even with this lower resolution calculations these operations were still using over 90% of the total processing (Lawrey et al. 2022)NDWI Threshold:Generally, NDWI values between 0.2 and 1 indicate Water surface and values between 0 and 0.2 indicate Flooding, humidity areas (https://eos.com/make-an-analysis/ndwi/ , accessed 28/08/2024). After experimenting with different values to adjust for the recalibration and rounding inaccuracies we settled on a threshold value of 0.15 which gave us the best results. Format:ESRI Shapefile with multipolygons.The dataset is available in three versions. - Full: Highest resolution version of the dataset (40MB). This version can be slow to render because of the high number of vertices that make up the mainland polygon.- Split: This is the Full version but split into 2 degree grid. This limits the number of vertices per polygon speeding up the render time by about five times. The downside is when rendered with a polygon stroke the grid is visible across the land. The version also provides a line version of the dataset also cut into line segments by the 2 degree grid. A clean map can be draw by rendering split polygon with no border stroke, then rendering the split line version as the coastline. - Simp: This is a simplified version of the dataset. A Douglas-Peucker distance simplification with a 0.00007 degree tolerance was applied to approximately halve the number of vertices in the polygon. This adds approximately 5 m error to the coastline accuracy. The accuracy is still typically better than 20 m. This version is faster to render. Change log:Changes to the dataset will be noted in this change log.2024-09-02 - 1st Edition - Initial release (Git tag: coastline_v1)2024-10-02 - Added split and simplified versions of the dataset.2024-11-19 - 2nd Edition - Manual correction of rivers and remote islands (Git tag: coastline_v1-1):The automated coastline tends to fail in the following situations: - Very high turbidity environments found in large rivers across northern Australia- Where the water is highly green due to organic matter in the water, found at the mouth of some large rivers- In highly dense seagrass in shallow clear water. In these cases the coastline was manually corrected by digitising directly from the true colour imagery, using the ArcGIS World Imagery as a secondary reference. Note: The mainland polygon of this dataset is extremely slow to edit in QGIS due to the size of the mainland polygon. Trimming the mainland polygon typically took 20 - 30 min to process each edit. As a result only limited number of edits were practical in this update.In version 1-1 the following corrections were made:Corrected:- Kennedy Inlet, Cape York, QLD; Lloyd bay, Cape York, QLD: extended the river mapping further inland, also mapping the mangrove islands. These areas were affected by the highly green water. These rivers were previously capped at the river mouth.- West Arm and Ord River, Cambridge Gulf, WA; Victoria River, Joseph Bonaparte Gulf, NT: Extended the river mapping further inland. These areas were poorly mapped due to the high turbidity.- Ross River, Townsville, QLD: Extended the river, which was blocked by the Southern Port Rd bridge.Added:- Lord Howe Island, North Islet (near Lord Howe Island), Christmas Island, Norfolk Island and Phillip Island (just south of Norfolk Island). These were not part of the automated mapping due to the insufficient number of images. Removed:- False islands in Spencer Gulf and St Vincent Gulf. These were caused by dense seagrass meadows.Errata (known errors in the dataset):Version 1 - The following are areas in the coastline_v1 that have errors:- Christmas Island, Lord Howe Island and Norfolk Island are not included.- The inlet in Loyd Bay (QLD) includes areas that are highly green water and is not cut in close to the coast. The maximum error is 3.9 km.- Lucinda jetty, Hay point jetty, and jetties in Gladstone harbour appear as part of the coastline.- Ross river is blocked short by a bridge at its river mouth.Version 1-1:- The automated coastline used an hole filling algorithm to fill in any salt flat areas to get the outer coastline. Unfortunately this algorithm also fills in rivers and ocean connected bays where there is a bridge crossing the inlet. As a result there are quite a few ocean connected water bodies that are not represented by the dataset. Some of these include:- Bribie Island (QLD) is incorrected connected to the mainland because of the bridge at Sandstone Point.- Swan River, Perth, WA- Collins Pool, WA- Brisbane Water, NSW- Moonet Mooney Creek, NSW- Parramatter River, NSW (west of Sydney Harbour Bridge)- Brisbane River, QLD (west of Gateway Bridge)References:Bishop-Taylor, R., Nanson, R., Sagar, S., Lymburner, L. (2021). Digital Earth Australia Coastlines. Geoscience Australia, Canberra. https://doi.org/10.26186/116268 [Accessed 28 August 2024]Bishop-Taylor, R., Nanson, R., Sagar, S., Lymburner, L. (2021). Mapping Australia's dynamic coastline at mean sea level using three decades of Landsat imagery. Remote Sensing of Environment, 267, 112734. https://doi.org/10.1016/j.rse.2021.112734Geoscience Australia (2004) GEODATA COAST 100K 2004. Geoscience Australia, Canberra. https://pid.geoscience.gov.au/dataset/ga/61395Google (n.d.) Sentinel-2: Cloud Probability. Earth Engine Data Catalog. Accessed 10 April 2021 from https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_CLOUD_PROBABILITYHart-Davis, M., Piccioni, G., Dettmering, D., Schwatke, C., Passaro, M., and Seitz, F.: EOT20 – A global Empirical Ocean Tide model from multi-mission satellite altimetry, SEANOE [data set], https://doi.org/10.17882/79489, 2021.Lawrey, E., & Hammerton, M. (2022). Coral Sea features satellite imagery and raw depth contours (Sentinel 2 and Landsat 8) 2015 – 2021 (AIMS) [Data set]. eAtlas. https://doi.org/10.26274/NH77-ZW79Smith, R., (2017) Australian land and coastline (including Lord Howe Island) at lowest astronomical tide (LAT) datum [for NESP D3], AODN, https://catalogue.aodn.org.au/geonetwork/srv/eng/catalog.search#/metadata/358afb92-4977-4f9f-9c74-e66ad7a6c65aZupanc, A., (2017) Improving Cloud Detection with Machine Learning. Medium. Accessed 10 April 2021 from https://medium.com/sentinel-hub/improving-cloud-detection-with-machine-learning-c09dc5d7cf13 Data Location:This dataset is filed in the eAtlas enduring data repository at: data\custodian\2023-2026-NESP-MaC-3\3.17_Northern-Aus-reef-mappingThe source code is available on GitHub: https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp&rft.creator=Hammerton, Marc &rft.creator=Lawrey, Eric &rft.date=2024&rft.coverage=-29.554409601575074,113.86027492566896 -28.570489959287187,113.43837592229194 -25.668545733574945,112.50037374832434 -24.818481263416828,112.52968631626082 -21.945849605641087,113.29181308260948 -20.36297093707077,115.607505949592 -20.275033233261297,117.45419772959069 -19.219780787547762,118.59738787911368 -17.607589551040945,118.50945017530421 -16.728212512946357,119.2129518057799 -16.610962241200397,121.64589494450834 -13.94351855898006,121.76314521625432 -12.448577594219216,122.40802171085701 -12.331327322473271,123.78571240387191 -13.093454088821929,126.24796811053685 -13.152079224694916,126.83421946926659 -13.94351855898006,128.50503584164636 -11.158824605013791,130.29310248577207 -10.426010406601591,132.93123360005592 -11.276074876759722,136.4780543203709 -12.302014754536785,137.5626193340209 -15.907460610724712,137.50399419814792 -15.878148042788226,139.61449908957502 -16.66958737707337,140.757689239098 -15.966085746597685,141.37325316576425 -9.405328182074797,141.19291903480723 -9.231096327568991,144.01790754381508 -18.978437203853517,149.02049136184985 -24.390007439762442,154.1407830344266 -31.419452251106627,154.36049864512705 -39.11814812831571,150.41132287331297 -44.24327056411232,149.03769651789932 -44.48876709910027,144.05830097952503 -40.04447174948184,141.99786144640464 -36.54190308080145,135.12972966933663 -33.304750112025935,132.38247695850941 -33.59128328533347,126.2011583591482 -35.57024274630855,123.453905648321 -35.9172414463948,115.56606562864333 -35.116471633497994,114.11170857080292 -33.94223050499918,113.94747321388708 -33.65238027304261,113.93003355624343 -33.59429256065983,113.93003355624343 -33.59429256065983,113.93003355624343 -33.24494542574106,114.01723184446162 -32.55675352623304,114.15674910561076 -32.203281382938535,114.31370602440352 -32.159000253274115,114.33114568204715 -32.21803697284542,114.31370602440352 -31.981660620436344,114.41834397026541 -31.239061084038518,114.52298191612726 -30.73072751167048,114.33114568204722 -29.917845033482834,114.01723184446169 -29.554409601575074,113.86027492566896&rft.coverage=-14.197560814429636,96.95494188600732 -14.168248246493135,96.07556484791272 -13.367038056229148,95.22550037775457 -12.233618762684983,95.13756267394513 -11.256533164802093,96.06579399193386 -11.227220596865578,97.06242130177444 -14.197560814429636,96.95494188600732&rft.coverage=-10.04494702342727,105.0549814924566 -10.904782349564229,105.03543978049895 -10.8950114935854,105.97344195446654 -10.015634455490769,105.95390024250888 -10.04494702342727,105.0549814924566&rft.coverage=-29.000407622355652,159.03244687150132 -31.658080448597175,159.05850248744485 -31.697163872512483,160.11375493315842 -29.0264632382992,160.07467150924305 -29.000407622355652,159.03244687150132&rft.coverage=-28.114516680275152,167.1227156219718 -29.88629856443616,167.14877123791533 -29.860242948492612,168.13888464377 -28.114516680275152,168.0867734118829 -28.114516680275152,167.1227156219718&rft.coverage=-14.197560814429636,96.95494188600732 -14.168248246493135,96.07556484791272 -13.367038056229148,95.22550037775457 -12.233618762684983,95.13756267394513 -11.256533164802093,96.06579399193386 -11.227220596865578,97.06242130177444 -14.197560814429636,96.95494188600732&rft.coverage=-10.04494702342727,105.0549814924566 -10.904782349564229,105.03543978049895 -10.8950114935854,105.97344195446654 -10.015634455490769,105.95390024250888 -10.04494702342727,105.0549814924566&rft.coverage=-29.000407622355652,159.03244687150132 -31.658080448597175,159.05850248744485 -31.697163872512483,160.11375493315842 -29.0264632382992,160.07467150924305 -29.000407622355652,159.03244687150132&rft.coverage=-28.114516680275152,167.1227156219718 -29.88629856443616,167.14877123791533 -29.860242948492612,168.13888464377 -28.114516680275152,168.0867734118829 -28.114516680275152,167.1227156219718&rft_rights=Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/&rft_rights=Cite as reference: Hammerton, M., & Lawrey, E. (2024). Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS) (Version 1-1) [Data set]. eAtlas. https://doi.org/10.26274/qfy8-hj59&rft_subject=oceans&rft_subject=National Environmental Science Program (NESP) Marine and Coastal Hub&rft_subject=marine&rft_subject=MARINE&rft_subject=Coastal Waters (Australia)&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Open Licence view details
CC-BY

Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/

Cite as reference: Hammerton, M., & Lawrey, E. (2024). Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS) (Version 1-1) [Data set]. eAtlas. https://doi.org/10.26274/qfy8-hj59

Access:

Other

Full description

This dataset corresponds to land area polygons of Australian coastline and surrounding islands. It was generated from 10 m Sentinel 2 imagery from 2022 - 2024 using the Normalized Difference Water Index (NDWI) to distinguish land from water. It was estimated from composite imagery made up from images where the tide is above the mean sea level. The coastline approximately corresponds to the mean high water level.

This dataset was created as part of the NESP MaC 3.17 northern Australian Reef mapping project. It was developed to allow the inshore edge of digitised fringing reef features to be neatly clipped to the land areas without requiring manual digitisation of the neighbouring coastline. This required a coastline polygon with an edge positional error of below 50 m so as to not distort the shape of small fringing reefs.

We found that existing coastline datasets such as the Geodata Coast 100K 2004 and the Australian Hydrographic Office (AHO) Australian land and coastline dataset did not meet our needs. The scale of the Geodata Coast 100K 2004 was too coarse to represent small islands and the the positional error of the Australian Hydrographic Office (AHO) Australian land and coastline dataset was too high (typically 80 m) for our application as the errors would have introduced significant errors in the shape of small fringing reefs. The Digital Earth Australia Coastline (GA) dataset was sufficiently accurate and detailed however the format of the data was unsuitable for our application as the coast was expressed as disconnected line features between rivers, rather than a closed polygon of the land areas.

We did however base our approach on the process developed for the DEA coastline described in Bishop-Taylor et al., 2021 (https://doi.org/10.1016/j.rse.2021.112734). Adapting it to our existing Sentinel 2 Google Earth processing pipeline. The difference between the approach used for the DEA coastline and this dataset was the DEA coastline performed the tidal calculations and filtering at the pixel level, where as in this dataset we only estimated a single tidal level for each whole Sentinel image scene. This was done for computational simplicity and to align with our existing Google Earth Engine image processing code. The images in the stack were sorted by this tidal estimate and those with a tidal high greater than the mean seal level were combined into the composite.

The Sentinel 2 satellite follows a sun synchronous orbit and so does not observe the full range of tidal levels. This observed tidal range varies spatially due to the relative timing of peak tides with satellite image timing. We made no accommodation for variation in the tidal levels of the images used to calculate the coastline, other than selecting images that were above the mean tide level. This means tidal height that the dataset coastline corresponds to will vary spatially. While this approach is less precise than that used in the DEA Coastline the resulting errors were sufficiently low to meet the project goals.

This simplified approach was chosen because it integrated well with our existing Sentinel 2 processing pipeline for generating composite imagery.

To verify the accuracy of this dataset we manually checked the generated coastline with high resolution imagery (ArcGIS World Imagery). We found that 90% of the coastline polygons in this dataset have a horizontal position error of less than 20 m when compared to high-resolution imagery, except for isolated failure cases.

During our manual checks we identified some areas where our algorithm can lead to falsely identifying land or not identifying land. We identified specific scenarios, or 'failure modes,' where our algorithm struggled to distinguish between land and water. These are shown in the image "Potential failure modes":
a) The coastline is pushed out due to breaking waves (example: western coast, S2 tile ID 49KPG).
b) False land polygons are created because of very turbid water due to suspended sediment. In clear water areas the near infrared channel is almost black, starkly different to the bright land areas. In very highly turbid waters the suspended sediment appears in the near infrared channel, raising its brightness to a level where it starts to overlap with the brightness of the dimmest land features. (example: Joseph Bonaparte Gulf, S2 tile ID 52LEJ). This results in turbid rivers not being correctly mapped. In version 1-1 of the dataset the rivers across northern Australia were manually corrected for these failures.
c) Very shallow, gentle sloping areas are not recognised as water and the coastline is pushed out (example: Mornington Island, S2 tile ID 54KUG). Update: A second review of this area indicated that the mapped coastline is likely to be very close to the try coastline.
d) The coastline is lower than the mean high water level (example: Great Keppel (Wop-pa) Island, S2 tile ID 55KHQ).

Some of these potential failure modes could probably be addressed in the future by using a higher resolution tide calculation and using adjusted NDWI thresholds per region to accommodate for regional differences. Some of these failure modes are likely due to the near infrared channel (B8) being able to penetrate the water approximately 0.5 m leading to errors in very shallow areas.

Some additional failures include:
- Interpreting jetties as land
- Interpreting oil rigs as land
- Bridges being interpreted as land, cutting off rivers


Methods:

The coastline polygons were created in four separate steps:
1. Create above mean sea level (AMSL) composite images.
2. Calculate the Normalized Difference Water Index (NDWI) and visualise as a grey scale image.
3. Generate vector polygons from the grey scale image using a NDWI threshold.
4. Clean up and merge polygons.

To create the AMSL composite images, multiple Sentinel 2 images were combined using the Google Earth Engine. The core algorithm was:
1. For each Sentinel 2 tile filter the "COPERNICUS/S2_HARMONIZED" image collection by
- tile ID
- maximum cloud cover 20%
- date between '2022-01-01' and '2024-06-30'
- asset_size > 100000000 (remove small fragments of tiles)
2. Remove high sun-glint images (see "High sun-glint image detection" for more information).
3. Split images by "SENSING_ORBIT_NUMBER" (see "Using SENSING_ORBIT_NUMBER for a more balanced composite" for more information).
4. Iterate over all images in the split collections to predict the tide elevation for each image from the image timestamp (see "Tide prediction" for more information).
5. Remove images where tide elevation is below mean sea level.
6. Select maximum of 200 images with AMSL tide elevation.
7. Combine SENSING_ORBIT_NUMBER collections into one image collection.
8. Remove sun-glint and apply atmospheric correction on each image (see "Sun-glint removal and atmospheric correction" for more information).
9. Duplicate image collection to first create a composite image without cloud masking and using the 15th percentile of the images in the collection (i.e. for each pixel the 15th percentile value of all images is used).
10. Apply cloud masking to all images in the original image collection (see "Cloud Masking" for more information) and create a composite by using the 15th percentile of the images in the collection (i.e. for each pixel the 15th percentile value of all images is used).
11. Combine the two composite images (no cloud mask composite and cloud mask composite). This solves the problem of some coral cays and islands being misinterpreted as clouds and therefore creating holes in the composite image. These holes are "plugged" with the underlying composite without cloud masking. (Lawrey et al. 2022)

Next, for each image the NDWI was calculated:
1. Calculate the normalised difference using the B3 (green) and B8 (near infrared).
2. Shift the value range from between -1 and +1 to values between 1 and 255 (0 reserved as no-data value).
3. Export image as 8 bit unsigned Integer grey scale image.

During the next step, we generated vector polygons from the grey scale image using a NDWI threshold:
1. Upscale image to 5 m resolution using bilinear interpolation. This was to help smooth the coastline and reduce the error introduced by the jagged pixel edges.
2. Apply a threshold to create a binary image (see "NDWI Threshold" for more information) with the value 1 for land and 2 for water (0: no data).
3. Create polygons for land values (1) in the binary image.
4. Export as shapefile.

Finally, we created a single layer from the vectorised images:
1. Merge and dissolve all vector layers in QGIS.
2. Perform smoothing (QGIS toolbox, Iterations 1, Offset 0.25, Maximum node angle to smooth 180).
3. Perform simplification (QGIS toolbox, tolerance 0.00003).
4. Remove polygon vertices on the inner circle to fill out the continental Australia.
5. Perform manual QA/QC. In this step we removed false polygons created due to sun glint and breaking waves. We also removed very small features (1 – 1.5 pixel sized features, e.g. single mangrove trees) by calculating the area of each feature (in m2) and removing features smaller than 200 m2.

15th percentile composite:

The composite image was created using the 15th percentile of the pixels values in the image stack. The 15th percentile was chosen, in preference to the median, to select darker pixels in the stack as these tend to correspond to images with clearer water conditions and higher tides.

High sun-glint image detection:

Images with high sun-glint can lead to lower quality composite images. To determine high sun-glint images, a land mask was first applied to the image to only retain water pixels. This land mask was estimated using NDWI. The proportion of the water pixels in the near-infrared and short-wave infrared bands above a sun-glint threshold was calculated. Images with a high proportion were then filtered out of the image collection.

Sun-glint removal and atmospheric correction:

The Top of Atmosphere L1 Sentinel 2 imagery was used in this dataset. These images are affected by atmospheric scattering (haze) lowering the contrast of the imagery. Additionally sun-glint on the water areas can lead to these areas appearing brighter than they should. To correct for these effects we used a simple constant black point correction (subtracting a constant value from all pixels) for atmospheric scattering and the near infrared channel (B8) to correct for sun-glint. The amount of black point level correction was chosen so that dark areas (hill shadows, parts of mangroves) on land would appear dark after the correction. The same level of black point correction was applied to all images. A level was chosen that worked well across a wide range of scenes across Australia over multiple seasons. Sun-glint correction was achieved by subtracting a scaled version (0.9x, a constant tested to work well across a wide range of scenes) of the B8 channel, up to a maximum that matched the black point correction level. Limiting the sun-glint correction at the same level as the black point correction results in a relatively clear transition between the sun-glint correction on the water and the constant atmospheric correction on the land (which is just the clipped B8 channel). If the sun-glint correction is not capped then the land areas end up black due to the B8 channel being very bright on land.

This algorithm is an adjustment of the algorithm already used in Lawrey et al. 2022

No research was undertaken into how important the black point correction and sun-glint correction was to the final NDWI imagery. These corrections are very important for generating true colour imagery to view marine features, but maybe unnecessary for NDWI calculations.

Tide prediction:

To determine the tide elevation in a specific satellite image, we used a tide prediction model to predict the tide elevation for the image timestamp. After investigating and comparing a number of models, settled on the empirical ocean tide model EOT20 (Hart-Davis et al., 2021). The model data can be freely accessed at https://doi.org/10.17882/79489 and works with the Python library pyTMD (https://github.com/tsutterley/pyTMD). In our comparison we found this model was able to predict accurately the tide elevation across multiple points along the study coastline when compared to historic Bureau of Meteorology and AusTide data. To determine the tide elevation of the satellite images we manually created a point dataset where we placed a central point on the water for each Sentinel tile in the study area. We used these points as centroids in the ocean models and calculated the tide elevation from the image timestamp.


Using "SENSING_ORBIT_NUMBER" for a more balanced composite:

Some of the Sentinel 2 tiles are made up of different sections depending on the "SENSING_ORBIT_NUMBER". For example, a tile could have a small triangle on the left side and a bigger section on the right side. If we filter an image collection and use a subset to create a composite, we could end up with a high number of images for one section (e.g. the left side triangle) and only few images for the other section(s). To avoid this issue, the initial unfiltered image collection is divided into multiple image collections by using the image property "SENSING_ORBIT_NUMBER". The filtering and limiting (max number of images in collection) is then performed on each "SENSING_ORBIT_NUMBER" image collection and finally, they are combined back into one image collection to generate the final composite.


Cloud Masking:

Each image was processed to mask out clouds and their shadows before creating the composite image.
The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts.

A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 35% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask.

A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer.

The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. As such there are probably significant potential improvements that could be made to this algorithm.

Erosion, dilation and buffer operations were performed at one quarter the resolution of satellite imagery to improve the computational speed. Even with this lower resolution calculations these operations were still using over 90% of the total processing (Lawrey et al. 2022)


NDWI Threshold:

Generally, NDWI values between 0.2 and 1 indicate "Water surface" and values between 0 and 0.2 indicate "Flooding, humidity" areas (https://eos.com/make-an-analysis/ndwi/ , accessed 28/08/2024). After experimenting with different values to adjust for the recalibration and rounding inaccuracies we settled on a threshold value of 0.15 which gave us the best results.


Format:

ESRI Shapefile with multipolygons.

The dataset is available in three versions.
- Full: Highest resolution version of the dataset (40MB). This version can be slow to render because of the high number of vertices that make up the mainland polygon.
- Split: This is the Full version but split into 2 degree grid. This limits the number of vertices per polygon speeding up the render time by about five times. The downside is when rendered with a polygon stroke the grid is visible across the land. The version also provides a line version of the dataset also cut into line segments by the 2 degree grid. A clean map can be draw by rendering split polygon with no border stroke, then rendering the split line version as the coastline.
- Simp: This is a simplified version of the dataset. A Douglas-Peucker distance simplification with a 0.00007 degree tolerance was applied to approximately halve the number of vertices in the polygon. This adds approximately 5 m error to the coastline accuracy. The accuracy is still typically better than 20 m. This version is faster to render.


Change log:

Changes to the dataset will be noted in this change log.
2024-09-02 - 1st Edition - Initial release (Git tag: "coastline_v1")
2024-10-02 - Added split and simplified versions of the dataset.
2024-11-19 - 2nd Edition - Manual correction of rivers and remote islands (Git tag: "coastline_v1-1"):
The automated coastline tends to fail in the following situations:
- Very high turbidity environments found in large rivers across northern Australia
- Where the water is highly green due to organic matter in the water, found at the mouth of some large rivers
- In highly dense seagrass in shallow clear water.
In these cases the coastline was manually corrected by digitising directly from the true colour imagery, using the ArcGIS World Imagery as a secondary reference.

Note: The mainland polygon of this dataset is extremely slow to edit in QGIS due to the size of the mainland polygon. Trimming the mainland polygon typically took 20 - 30 min to process each edit. As a result only limited number of edits were practical in this update.

In version 1-1 the following corrections were made:

Corrected:
- Kennedy Inlet, Cape York, QLD; Lloyd bay, Cape York, QLD: extended the river mapping further inland, also mapping the mangrove islands. These areas were affected by the highly green water. These rivers were previously capped at the river mouth.
- West Arm and Ord River, Cambridge Gulf, WA; Victoria River, Joseph Bonaparte Gulf, NT: Extended the river mapping further inland. These areas were poorly mapped due to the high turbidity.
- Ross River, Townsville, QLD: Extended the river, which was blocked by the Southern Port Rd bridge.

Added:
- Lord Howe Island, North Islet (near Lord Howe Island), Christmas Island, Norfolk Island and Phillip Island (just south of Norfolk Island). These were not part of the automated mapping due to the insufficient number of images.

Removed:
- False islands in Spencer Gulf and St Vincent Gulf. These were caused by dense seagrass meadows.


Errata (known errors in the dataset):

Version 1 - The following are areas in the coastline_v1 that have errors:
- Christmas Island, Lord Howe Island and Norfolk Island are not included.
- The inlet in Loyd Bay (QLD) includes areas that are highly green water and is not cut in close to the coast. The maximum error is 3.9 km.
- Lucinda jetty, Hay point jetty, and jetties in Gladstone harbour appear as part of the coastline.
- Ross river is blocked short by a bridge at its river mouth.

Version 1-1:
- The automated coastline used an hole filling algorithm to fill in any salt flat areas to get the outer coastline. Unfortunately this algorithm also fills in rivers and ocean connected bays where there is a bridge crossing the inlet. As a result there are quite a few ocean connected water bodies that are not represented by the dataset. Some of these include:
- Bribie Island (QLD) is incorrected connected to the mainland because of the bridge at Sandstone Point.
- Swan River, Perth, WA
- Collins Pool, WA
- Brisbane Water, NSW
- Moonet Mooney Creek, NSW
- Parramatter River, NSW (west of Sydney Harbour Bridge)
- Brisbane River, QLD (west of Gateway Bridge)

References:

Bishop-Taylor, R., Nanson, R., Sagar, S., Lymburner, L. (2021). Digital Earth Australia Coastlines. Geoscience Australia, Canberra. https://doi.org/10.26186/116268 [Accessed 28 August 2024]

Bishop-Taylor, R., Nanson, R., Sagar, S., Lymburner, L. (2021). Mapping Australia's dynamic coastline at mean sea level using three decades of Landsat imagery. Remote Sensing of Environment, 267, 112734. https://doi.org/10.1016/j.rse.2021.112734

Geoscience Australia (2004) GEODATA COAST 100K 2004. Geoscience Australia, Canberra. https://pid.geoscience.gov.au/dataset/ga/61395

Google (n.d.) Sentinel-2: Cloud Probability. Earth Engine Data Catalog. Accessed 10 April 2021 from https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_CLOUD_PROBABILITY

Hart-Davis, M., Piccioni, G., Dettmering, D., Schwatke, C., Passaro, M., and Seitz, F.: EOT20 – A global Empirical Ocean Tide model from multi-mission satellite altimetry, SEANOE [data set], https://doi.org/10.17882/79489, 2021.

Lawrey, E., & Hammerton, M. (2022). Coral Sea features satellite imagery and raw depth contours (Sentinel 2 and Landsat 8) 2015 – 2021 (AIMS) [Data set]. eAtlas. https://doi.org/10.26274/NH77-ZW79

Smith, R., (2017) Australian land and coastline (including Lord Howe Island) at lowest astronomical tide (LAT) datum [for NESP D3], AODN, https://catalogue.aodn.org.au/geonetwork/srv/eng/catalog.search#/metadata/358afb92-4977-4f9f-9c74-e66ad7a6c65a

Zupanc, A., (2017) Improving Cloud Detection with Machine Learning. Medium. Accessed 10 April 2021 from https://medium.com/sentinel-hub/improving-cloud-detection-with-machine-learning-c09dc5d7cf13


Data Location:

This dataset is filed in the eAtlas enduring data repository at: data\custodian\2023-2026-NESP-MaC-3\3.17_Northern-Aus-reef-mapping
The source code is available on GitHub: https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp

Notes

Credit
Team members: Marc Hammerton (AIMS), Eric Lawrey (AIMS)
Credit
National Environmental Science Program (NESP) Marine and Coastal Hub
Credit
Department of Climate Change, Energy, the Environment and Water (DCCEEW), Australian Government
Credit
In addition to NESP (DCCEEW) funding, this project is matched by an equivalent amount of in-kind support and co-investment from project partners and collaborators.
Purpose
This dataset is intended to be used for clipping marine features around the Australian coast.

Data time period: 2022-01-01 to 2024-06-30

This dataset is part of a larger collection

Click to explore relationships graph

-44.48877,86 -9.2311,86

-26.859931713335,90

-14.19756,86 -11.22722,86

-12.712390705648,90

-10.90478,86 -10.01563,86

-10.460208402527,90

-31.69716,86 -29.00041,86

-30.348785747434,90

-29.8863,86 -28.11452,86

-29.000407622356,90

-14.19756,86 -11.22722,86

-12.712390705648,90

-10.90478,86 -10.01563,86

-10.460208402527,90

-31.69716,86 -29.00041,86

-30.348785747434,90

-29.8863,86 -28.11452,86

-29.000407622356,90

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
(Source code - Python Google Earth Engine (GitHub))

uri : https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp

(Browse and Download shapefiles (Full 40 MB, Split 80 MB, and Simplified18 MB) including old versions)

uri : https://nextcloud.eatlas.org.au/apps/sharealias/a/AU_NESP-MaC-3-17_AIMS_Australian-Coastline-50K-2024

(Direct download Simplified version V1-1 [Zip 22 MB])

uri : https://nextcloud.eatlas.org.au/s/DcGmpS3F5KZjgAG/download?path=%2FV1-1%2F&files=Simp

(Direct download Full version V1-1 [Zip 40MB])

uri : https://nextcloud.eatlas.org.au/s/DcGmpS3F5KZjgAG/download?path=%2FV1-1%2F&files=Full

(Direct download Split version V1-1 [Zip 80MB])

uri : https://nextcloud.eatlas.org.au/s/DcGmpS3F5KZjgAG/download?path=%2FV1-1%2F&files=Split

global : 58f3a091-2463-4963-a908-2a5505e2baf9

ror : 03x57gn41

ror : 03x57gn41

ror : 03x57gn41

ror : 03x57gn41

Identifiers