Name: Data for bioregionalisation and ancestral range estimation in the daisy family
Published: 2019-01-25

Brief description

This data package contains data underlying the manuscript McDonald-Spicer et al., "Big data for a large clade: bioregionalisation and ancestral range estimation in the daisy family (Asteraceae)" presenting the results of research on the global biogeography and biogeographic history of the Asteraceae. It includes (1) a supermatrix of DNA sequence data obtained from GenBank and BOLD with partitioning information, (2) distribution data at the TDWG level 3, originally based on a database extract of the Global Compositae Checklist but subsequently cleaned and supplemented with additional data for some geographic areas, and (3) input files for ancestral area estimation using the R package BioGeoBEARS.
Lineage: Spatial data

The spatial data set was based on data extracted from the GCC (compositae.landcareresearch.co.nz, accessed 15 Aug 2014), a database of distribution information for the Asteraceae family. We used OpenRefine (Huynh & Mazzocchi, 2014) to clean the dataset, correcting spelling of taxon names, collapsing varieties and subspecies to species level, and removing hybrids and taxa with distribution listed as ‘null’. Additional distribution information was added for New Zealand, the Cordoba Province in Argentina, Mongolia, South Africa, and Mexico. We removed species from regions where they are non-native. The final spatial dataset used in this study included 27,019 species representing 1,636 genera. All analyses were conducted using the TDWG level 3 of spatial resolution.

Phylogeny

Sequences from the nuclear ribosomal Internal Transcribed Spacer region (ITS) and three chloroplast regions (matK, rbcL, and trnL-trnF) were obtained from GenBank and BOLD (Ratnasingham & Hebert, 2007). We used genera as Operational Taxonomic Units and selected a representative sequence for each genus and locus.

Gene regions were individually aligned using MAFFT 7 (Katoh & Standley, 2013) and manually edited in Bioedit 7.0.5 (Hall, 1999). The four sequence regions were combined into a supermatrix of 1,273 genera and 9,030 characters. The phylogeny was inferred using RAxML (Stamatakis, 2014) under the GTRCAT model and partitioning by sequence region. The tree was rooted on the Barnadesieae, which are sister to the rest of the family (Funk et al., 2005).

Time calibration

We time calibrated our phylogeny using Penalized Likelood as implemented in the chronos function of the R package APE (Sanderson, 2002; Paradis et al., 2004; R Core Team, 2016). We set nine calibration points (Appendix S3 of the manuscript). We tested all three implemented clock models (relaxed, correlated, and discrete) and lambda values of 1 and 10. The favoured clock model was discrete with lambda = 10.

Available: 2019-01-25

Data time period: 2014-08-15 to 2018-10-04

Subjects

User Contributed Tags

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers

DOI : 10.25919/5BF781632E8AB
Handle : 102.100.100/75623
URL : data.csiro.au/collection/csiro:35797

Data for bioregionalisation and ancestral range estimation in the daisy family

Licence & Rights:

Access:

Contact Information

Brief description

This dataset is part of a larger collection

User Contributed Tags

Quick Links

Explore

External Resources

Share

Data for bioregionalisation and ancestral range estimation in the daisy family

Licence & Rights:

Access:

Contact Information

Brief description

This dataset is part of a larger collection

Related Grants and Projects

User Contributed Tags