Brief description
This data package contains data underlying the manuscript McDonald-Spicer et al., "Big data for a large clade: bioregionalisation and ancestral range estimation in the daisy family (Asteraceae)" presenting the results of research on the global biogeography and biogeographic history of the Asteraceae. It includes (1) a supermatrix of DNA sequence data obtained from GenBank and BOLD with partitioning information, (2) distribution data at the TDWG level 3, originally based on a database extract of the Global Compositae Checklist but subsequently cleaned and supplemented with additional data for some geographic areas, and (3) input files for ancestral area estimation using the R package BioGeoBEARS.Lineage: Spatial data
The spatial data set was based on data extracted from the GCC (compositae.landcareresearch.co.nz, accessed 15 Aug 2014), a database of distribution information for the Asteraceae family. We used OpenRefine (Huynh & Mazzocchi, 2014) to clean the dataset, correcting spelling of taxon names, collapsing varieties and subspecies to species level, and removing hybrids and taxa with distribution listed as ‘null’. Additional distribution information was added for New Zealand, the Cordoba Province in Argentina, Mongolia, South Africa, and Mexico. We removed species from regions where they are non-native. The final spatial dataset used in this study included 27,019 species representing 1,636 genera. All analyses were conducted using the TDWG level 3 of spatial resolution.
Phylogeny
Sequences from the nuclear ribosomal Internal Transcribed Spacer region (ITS) and three chloroplast regions (matK, rbcL, and trnL-trnF) were obtained from GenBank and BOLD (Ratnasingham & Hebert, 2007). We used genera as Operational Taxonomic Units and selected a representative sequence for each genus and locus.
Gene regions were individually aligned using MAFFT 7 (Katoh & Standley, 2013) and manually edited in Bioedit 7.0.5 (Hall, 1999). The four sequence regions were combined into a supermatrix of 1,273 genera and 9,030 characters. The phylogeny was inferred using RAxML (Stamatakis, 2014) under the GTRCAT model and partitioning by sequence region. The tree was rooted on the Barnadesieae, which are sister to the rest of the family (Funk et al., 2005).
Time calibration
We time calibrated our phylogeny using Penalized Likelood as implemented in the chronos function of the R package APE (Sanderson, 2002; Paradis et al., 2004; R Core Team, 2016). We set nine calibration points (Appendix S3 of the manuscript). We tested all three implemented clock models (relaxed, correlated, and discrete) and lambda values of 1 and 10. The favoured clock model was discrete with lambda = 10.
Available: 2019-01-25
Data time period: 2014-08-15 to 2018-10-04
Subjects
Asteraceae |
Biological Sciences |
Biogeography and Phylogeography |
Compositae |
Evolutionary Biology |
Plant and Fungus Systematics and Taxonomy |
biogeography |
phylogeny |
spatial data |
User Contributed Tags
Login to tag this record with meaningful keywords to make it easier to discover
Identifiers
- DOI : 10.25919/5BF781632E8AB
- Handle : 102.100.100/75623
- URL : data.csiro.au/collection/csiro:35797