Data

Data for bioregionalisation and ancestral range estimation in the daisy family

Commonwealth Scientific and Industrial Research Organisation
Schmidt-Lebuhn, Alexander ; Knerr, Nunzio ; Encinas-Viso, Francisco ; McDonald-Spicer, Christiana
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25919/5bf781632e8ab&rft.title=Data for bioregionalisation and ancestral range estimation in the daisy family&rft.identifier=https://doi.org/10.25919/5bf781632e8ab&rft.publisher=Commonwealth Scientific and Industrial Research Organisation&rft.description=This data package contains data underlying the manuscript McDonald-Spicer et al., Big data for a large clade: bioregionalisation and ancestral range estimation in the daisy family (Asteraceae) presenting the results of research on the global biogeography and biogeographic history of the Asteraceae. It includes (1) a supermatrix of DNA sequence data obtained from GenBank and BOLD with partitioning information, (2) distribution data at the TDWG level 3, originally based on a database extract of the Global Compositae Checklist but subsequently cleaned and supplemented with additional data for some geographic areas, and (3) input files for ancestral area estimation using the R package BioGeoBEARS.\nLineage: Spatial data\n\nThe spatial data set was based on data extracted from the GCC (compositae.landcareresearch.co.nz, accessed 15 Aug 2014), a database of distribution information for the Asteraceae family. We used OpenRefine (Huynh & Mazzocchi, 2014) to clean the dataset, correcting spelling of taxon names, collapsing varieties and subspecies to species level, and removing hybrids and taxa with distribution listed as ‘null’. Additional distribution information was added for New Zealand, the Cordoba Province in Argentina, Mongolia, South Africa, and Mexico. We removed species from regions where they are non-native. The final spatial dataset used in this study included 27,019 species representing 1,636 genera. All analyses were conducted using the TDWG level 3 of spatial resolution.\n\nPhylogeny\n\nSequences from the nuclear ribosomal Internal Transcribed Spacer region (ITS) and three chloroplast regions (matK, rbcL, and trnL-trnF) were obtained from GenBank and BOLD (Ratnasingham & Hebert, 2007). We used genera as Operational Taxonomic Units and selected a representative sequence for each genus and locus.\n\nGene regions were individually aligned using MAFFT 7 (Katoh & Standley, 2013) and manually edited in Bioedit 7.0.5 (Hall, 1999). The four sequence regions were combined into a supermatrix of 1,273 genera and 9,030 characters. The phylogeny was inferred using RAxML (Stamatakis, 2014) under the GTRCAT model and partitioning by sequence region. The tree was rooted on the Barnadesieae, which are sister to the rest of the family (Funk et al., 2005).\n\nTime calibration\n\nWe time calibrated our phylogeny using Penalized Likelood as implemented in the chronos function of the R package APE (Sanderson, 2002; Paradis et al., 2004; R Core Team, 2016). We set nine calibration points (Appendix S3 of the manuscript). We tested all three implemented clock models (relaxed, correlated, and discrete) and lambda values of 1 and 10. The favoured clock model was discrete with lambda = 10.&rft.creator=Schmidt-Lebuhn, Alexander &rft.creator=Knerr, Nunzio &rft.creator=Encinas-Viso, Francisco &rft.creator=McDonald-Spicer, Christiana &rft.date=2019&rft.edition=v2&rft_rights=CSIRO Data Licence https://research.csiro.au/dap/licences/csiro-data-licence/&rft_rights=Data is accessible online and may be reused in accordance with licence conditions&rft_rights=All Rights (including copyright) CSIRO 2018.&rft_subject=Asteraceae&rft_subject=Compositae&rft_subject=biogeography&rft_subject=spatial data&rft_subject=phylogeny&rft_subject=Biogeography and phylogeography&rft_subject=Evolutionary biology&rft_subject=BIOLOGICAL SCIENCES&rft_subject=Plant and fungus systematics and taxonomy&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Other view details
Other

CSIRO Data Licence
https://research.csiro.au/dap/licences/csiro-data-licence/

Data is accessible online and may be reused in accordance with licence conditions

All Rights (including copyright) CSIRO 2018.

Access:

Open view details

Accessible for free

Contact Information



Brief description

This data package contains data underlying the manuscript McDonald-Spicer et al., "Big data for a large clade: bioregionalisation and ancestral range estimation in the daisy family (Asteraceae)" presenting the results of research on the global biogeography and biogeographic history of the Asteraceae. It includes (1) a supermatrix of DNA sequence data obtained from GenBank and BOLD with partitioning information, (2) distribution data at the TDWG level 3, originally based on a database extract of the Global Compositae Checklist but subsequently cleaned and supplemented with additional data for some geographic areas, and (3) input files for ancestral area estimation using the R package BioGeoBEARS.
Lineage: Spatial data

The spatial data set was based on data extracted from the GCC (compositae.landcareresearch.co.nz, accessed 15 Aug 2014), a database of distribution information for the Asteraceae family. We used OpenRefine (Huynh & Mazzocchi, 2014) to clean the dataset, correcting spelling of taxon names, collapsing varieties and subspecies to species level, and removing hybrids and taxa with distribution listed as ‘null’. Additional distribution information was added for New Zealand, the Cordoba Province in Argentina, Mongolia, South Africa, and Mexico. We removed species from regions where they are non-native. The final spatial dataset used in this study included 27,019 species representing 1,636 genera. All analyses were conducted using the TDWG level 3 of spatial resolution.

Phylogeny

Sequences from the nuclear ribosomal Internal Transcribed Spacer region (ITS) and three chloroplast regions (matK, rbcL, and trnL-trnF) were obtained from GenBank and BOLD (Ratnasingham & Hebert, 2007). We used genera as Operational Taxonomic Units and selected a representative sequence for each genus and locus.

Gene regions were individually aligned using MAFFT 7 (Katoh & Standley, 2013) and manually edited in Bioedit 7.0.5 (Hall, 1999). The four sequence regions were combined into a supermatrix of 1,273 genera and 9,030 characters. The phylogeny was inferred using RAxML (Stamatakis, 2014) under the GTRCAT model and partitioning by sequence region. The tree was rooted on the Barnadesieae, which are sister to the rest of the family (Funk et al., 2005).

Time calibration

We time calibrated our phylogeny using Penalized Likelood as implemented in the chronos function of the R package APE (Sanderson, 2002; Paradis et al., 2004; R Core Team, 2016). We set nine calibration points (Appendix S3 of the manuscript). We tested all three implemented clock models (relaxed, correlated, and discrete) and lambda values of 1 and 10. The favoured clock model was discrete with lambda = 10.

Available: 2019-01-25

Data time period: 2014-08-15 to 2018-10-04

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover