Data

R code for analysis of Irukandji data of the GBR (NESP TWQ 2.2.3, CSIRO)

eAtlas
Richardson, Anthony J, Prof
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=https://eatlas.org.au/data/uuid/c5ce8cb6-04e5-4153-836d-df91878a3131&rft.title=R code for analysis of Irukandji data of the GBR (NESP TWQ 2.2.3, CSIRO)&rft.identifier=https://eatlas.org.au/data/uuid/c5ce8cb6-04e5-4153-836d-df91878a3131&rft.publisher=eAtlas&rft.description=This dataset presents the code written for the analysis and modelling for the Jellyfish Forecasting System for NESP TWQ Project 2.2.3. The Jellyfish Forecasting System (JFS) searches for robust statistical relationships between historical sting events (and observations) and local environmental conditions. These relationships are tested using data to quantify the underlying uncertainties. They then form the basis for forecasting risk levels associated with current environmental conditions. The development of the JFS modelling and analysis is supported by the Venomous Jellyfish Database (sting events and specimen samples – November 2018) (NESP 2.2.3, CSIRO) with corresponding analysis of wind fields and tidal heights along the Queensland coastline. The code has been calibrated and tested for the study focus regions including Cairns (Beach, Island, Reef), Townsville (Beach, Island+Reef) and Whitsundays (Beach, Island+Reef). The JFS uses the European Centre for Medium-Range Weather forecasting (ECMWF) wind fields from the ERA Interim, Daily product (https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim). This daily product has global coverage at a spatial resolution of approximately 80km. However, only 11 locations off the Queensland coast were extracted covering the period 1-Jan-1985 to 31-Dec-2016. For the modelling, the data has been transformed into CSV files containing date, eastward wind (m/s) and northward wind (m/s), for each of the 11 geographical locations. Hourly tidal height was calculated from tidal harmonics supplied by the Bureau of Meteorology (http://www.bom.gov.au/oceanography/projects/ntc/ntc.shtml) using the XTide software (http://www.flaterco.com/xtide/). Hourly tidal heights have been calculated for 7 sites along the Queensland coast (Albany Island, Cairns, Cardwell, Cooktown, Fife, Grenville, Townsville) for the period 1-Jan-1985 to 31-Dec-2017. Data has been transformed into CSV files, one for each of the 7 sites. Columns correspond to number of days since 1-Jan 1990 and tidal height (m). Irukandji stings were then modelled using a generalised linear model (GLM). A GLM generalises ordinary linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value (McCullagh & Nelder 1989). For each region, we used a GLM with the number of Irukandji stings per day as the response variable. The GLM had a Poisson error structure and a log link function (Crawley 2005). For the Poisson GLMs, we inferred absences when stings were not recorded in the data for a day. We consider that there was reasonably consistent sampling effort in the database since 1985, but very patchy prior to this date. It should be noted that Irukandji are very patchy in time; for example, there was a single sting record in 2017 despite considerable effort trying to find stings in that year. Although the database might miss small and localised Irukandji sting events, we believe it captures larger infestation events. We included six predictors in the models: Month, two wind variables, and three tidal variables. Month was a factor and arranged so that the summer was in the middle of the year (i.e., from June to May). The two wind variables were Speed and Direction. For each day within each region (Cairns, Townsville or Whitsundays), hourly wind-speed and direction was used. We derived cumulative wind Speed and Direction, working backwards from each day, with the current day being Day 1. We calculated cumulative winds from the current day (Day 1) to 14 days previously for every day in every Region and Area. To provide greater weighting for winds on more recent days, we used an inverse weighting for each day, where the weighting was given by 1/i for each day i. Thus, the Cumulative Speed for n days is given by: Cumulative Speed_n=(\sum_(i=1)^n Speed_i/i) / (\sum_(i=1)^n 1/i) For example, calculations for the 3-day cumulative wind speed are: (1/1×Wind Day 1 + 1/2 × Wind Day 2 + 1/3 × Wind Day 3) / (1/1+1/2+1/3) Similarly, we calculated the cumulative weighted wind Direction using the formula: Cumulative Direction_n=(\sum_(i=1)^n Direction_i/i) / (\sum_(i=1)^n 1/i) We used circular statistics in the R Package Circular to calculate the weighted cumulative mean, because direction 0º is the same as 360º. We initially used a smoother for this term in the model, but because of its non-linearity and the lack of winds of all directions, we found that it was better to use wind Direction as a factor with four levels (NW, NE, SE and SW). In some Regions and Areas, not all wind Directions were present. To assign each event to the tidal cycle, we used tidal data from the closest of the seven stations to calculate three tidal variables: (i) the tidal range each day (m); (ii) the tidal height (m); and (iii) whether the tide was incoming or outgoing. To estimate the three tidal variables, the time of day of the event was required. However, the Time of Day was only available for 780 observations, and the 291 missing observations were estimated assuming a random Time of Day, which will not influence the relationship but will keep these rows in the analysis. Tidal range was not significant in any models and will not be considered further. To focus on times when Irukandji were present, months when stings never occurred in an area/region were excluded from the analysis – this is generally the winter months. For model selection, we used Akaike Information Criterion (AIC), which is an estimate of the relative quality of models given the data, to choose the most parsimonious model. We thus do not talk about significant predictors, but important ones, consistent with information theoretic approaches. Limitations: It is important to note that while the presence of Irukandji is more likely on high risk days, the forecasting system should not be interpreted as predicting the presence of Irukandji or that stings will occur. Format: It is a text file with a .r extension, the default code format in R. This code runs on the csv datafile “VJD_records_EXTRACT_20180802_QLD.csv” that has latitude, longitude, date, and time of day for each Irukandji sting on the GBR. A subset of these data have been made publicly available through eAtlas, but not all data could be made publicly available because of permission issues. For more information about data permissions, please contact Dr Lisa Gershwin (lisa.gershwin@stingeradvisor.com). Data Location: This dataset is filed in the eAtlas enduring data repository at: data\custodian\2016-18-NESP-TWQ-2\2.2.3_Jellyfish-early-warning\data\ and https://github.com/eatlas/NESP_2.2.3_Jellyfish-early-warning&rft.creator=Richardson, Anthony J, Prof &rft.date=2019&rft_rights=Attribution 3.0 Australia http://creativecommons.org/licenses/by/3.0/au/&rft_subject=biota&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Open Licence view details

Access:

Open

Contact Information



Brief description

This dataset presents the code written for the analysis and modelling for the Jellyfish Forecasting System for NESP TWQ Project 2.2.3. The Jellyfish Forecasting System (JFS) searches for robust statistical relationships between historical sting events (and observations) and local environmental conditions. These relationships are tested using data to quantify the underlying uncertainties. They then form the basis for forecasting risk levels associated with current environmental conditions.

The development of the JFS modelling and analysis is supported by the Venomous Jellyfish Database (sting events and specimen samples – November 2018) (NESP 2.2.3, CSIRO) with corresponding analysis of wind fields and tidal heights along the Queensland coastline. The code has been calibrated and tested for the study focus regions including Cairns (Beach, Island, Reef), Townsville (Beach, Island+Reef) and Whitsundays (Beach, Island+Reef).

The JFS uses the European Centre for Medium-Range Weather forecasting (ECMWF) wind fields from the ERA Interim, Daily product (https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim). This daily product has global coverage at a spatial resolution of approximately 80km. However, only 11 locations off the Queensland coast were extracted covering the period 1-Jan-1985 to 31-Dec-2016. For the modelling, the data has been transformed into CSV files containing date, eastward wind (m/s) and northward wind (m/s), for each of the 11 geographical locations.

Hourly tidal height was calculated from tidal harmonics supplied by the Bureau of Meteorology (http://www.bom.gov.au/oceanography/projects/ntc/ntc.shtml) using the XTide software (http://www.flaterco.com/xtide/). Hourly tidal heights have been calculated for 7 sites along the Queensland coast (Albany Island, Cairns, Cardwell, Cooktown, Fife, Grenville, Townsville) for the period 1-Jan-1985 to 31-Dec-2017. Data has been transformed into CSV files, one for each of the 7 sites. Columns correspond to number of days since 1-Jan 1990 and tidal height (m).

Irukandji stings were then modelled using a generalised linear model (GLM). A GLM generalises ordinary linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value (McCullagh & Nelder 1989). For each region, we used a GLM with the number of Irukandji stings per day as the response variable. The GLM had a Poisson error structure and a log link function (Crawley 2005). For the Poisson GLMs, we inferred absences when stings were not recorded in the data for a day. We consider that there was reasonably consistent sampling effort in the database since 1985, but very patchy prior to this date. It should be noted that Irukandji are very patchy in time; for example, there was a single sting record in 2017 despite considerable effort trying to find stings in that year. Although the database might miss small and localised Irukandji sting events, we believe it captures larger infestation events.

We included six predictors in the models: Month, two wind variables, and three tidal variables. Month was a factor and arranged so that the summer was in the middle of the year (i.e., from June to May). The two wind variables were Speed and Direction. For each day within each region (Cairns, Townsville or Whitsundays), hourly wind-speed and direction was used. We derived cumulative wind Speed and Direction, working backwards from each day, with the current day being Day 1. We calculated cumulative winds from the current day (Day 1) to 14 days previously for every day in every Region and Area. To provide greater weighting for winds on more recent days, we used an inverse weighting for each day, where the weighting was given by 1/i for each day i. Thus, the Cumulative Speed for n days is given by:

Cumulative Speed_n=(\sum_(i=1)^n Speed_i/i) / (\sum_(i=1)^n 1/i)

For example, calculations for the 3-day cumulative wind speed are:

(1/1×Wind Day 1 + 1/2 × Wind Day 2 + 1/3 × Wind Day 3) / (1/1+1/2+1/3)

Similarly, we calculated the cumulative weighted wind Direction using the formula:

Cumulative Direction_n=(\sum_(i=1)^n Direction_i/i) / (\sum_(i=1)^n 1/i)

We used circular statistics in the R Package Circular to calculate the weighted cumulative mean, because direction 0º is the same as 360º. We initially used a smoother for this term in the model, but because of its non-linearity and the lack of winds of all directions, we found that it was better to use wind Direction as a factor with four levels (NW, NE, SE and SW). In some Regions and Areas, not all wind Directions were present.

To assign each event to the tidal cycle, we used tidal data from the closest of the seven stations to calculate three tidal variables: (i) the tidal range each day (m); (ii) the tidal height (m); and (iii) whether the tide was incoming or outgoing. To estimate the three tidal variables, the time of day of the event was required. However, the Time of Day was only available for 780 observations, and the 291 missing observations were estimated assuming a random Time of Day, which will not influence the relationship but will keep these rows in the analysis. Tidal range was not significant in any models and will not be considered further.

To focus on times when Irukandji were present, months when stings never occurred in an area/region were excluded from the analysis – this is generally the winter months. For model selection, we used Akaike Information Criterion (AIC), which is an estimate of the relative quality of models given the data, to choose the most parsimonious model. We thus do not talk about significant predictors, but important ones, consistent with information theoretic approaches.


Limitations:
It is important to note that while the presence of Irukandji is more likely on high risk days, the forecasting system should not be interpreted as predicting the presence of Irukandji or that stings will occur.


Format:

It is a text file with a .r extension, the default code format in R. This code runs on the csv datafile “VJD_records_EXTRACT_20180802_QLD.csv” that has latitude, longitude, date, and time of day for each Irukandji sting on the GBR. A subset of these data have been made publicly available through eAtlas, but not all data could be made publicly available because of permission issues. For more information about data permissions, please contact Dr Lisa Gershwin (lisa.gershwin@stingeradvisor.com).


Data Location:

This dataset is filed in the eAtlas enduring data repository at: data\custodian\2016-18-NESP-TWQ-2\2.2.3_Jellyfish-early-warning\data\ and https://github.com/eatlas/NESP_2.2.3_Jellyfish-early-warning

Issued: 20190124

Data time period: 1985 to 31 12 2016

This dataset is part of a larger collection

Click to explore relationships graph
Subjects
biota |

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers