Data

Data from: Phylogenomic resolution of the cetacean tree of life using target sequence capture

The University of Western Australia
McGowen, Michael ; Tsagkogeorga, Georgia ; Álvarez-Carretero, Sandra ; Dos Reis, Mario ; Struebig, Monika ; Deaville, Rob ; Jepson, Paul ; Jarman, Simon ; Polanowski, Andrea ; Morin, Phillip ; Rossiter, Stephen
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.5061/dryad.jq40b0f&rft.title=Data from: Phylogenomic resolution of the cetacean tree of life using target sequence capture&rft.identifier=10.5061/dryad.jq40b0f&rft.publisher=DRYAD&rft.description=The evolution of the cetaceans, from their early transition to an aquatic lifestyle to their subsequent diversification, has been the subject of numerous studies. However, while the higher-level relationships among cetacean families have been largely settled, several aspects of the systematics within these groups remain unresolved. Problematic clades include the oceanic dolphins (37 spp.), which have experienced a recent rapid radiation, and the beaked whales (22 spp.), which have not been investigated in detail using nuclear loci. The combined application of high-throughput sequencing with techniques that target specific genomic sequences provide a powerful means of rapidly generating large volumes of orthologous sequence data for use in phylogenomic studies. To elucidate the phylogenetic relationships within the Cetacea, we combined sequence capture with Illumina sequencing to generate data for ~3200 protein-coding genes for 68 cetacean species and their close relatives including the pygmy hippopotamus. By combining data from >38,000 exons with existing sequences from 11 cetaceans and seven outgroup taxa, we produced the first comprehensive comparative genomic dataset for cetaceans, spanning 6,527,596 aligned base pairs and 89 taxa. Phylogenetic trees reconstructed with maximum likelihood and Bayesian inference of concatenated loci, as well as with coalescence analyses of individual gene trees, produced mostly concordant and well-supported trees. Our results completely resolve the relationships among beaked whales as well as the contentious relationships among ocean dolphins, especially the problematic subfamily Delphininae, which includes the common and bottlenose dolphins. We performed Bayesian estimation of species divergence times using MCMCtree, integrating recently described fossils as calibration points (e.g., Mystacodon selenensis) that have not been used before. Integration of new fossil dates in the context of autocorrelated rates indicate that the diversification of Crown Cetacea began before the Late Eocene and the divergence of Crown Delphinidae as early as the Middle Miocene.,Figure_S1Maximum likelihood phylogram of Dataset B with the maximum number of partitions. Bootstrap values are 100 for all 3 analyses except at 6 nodes labelled with a red circle; bootstrap values for these are shown in the upper left.Figure_S2ASTRAL species tree. All support values are 1.0 unless orthwise noted over the branch.FigureS3Time tree of Cetacea using the independent rates (IR) model. Numbers over each node correspond to raw values in Table 3.Table_S1Description of values for sequencing (ie. number of reads), Trinity (ie. number of contigs), and reciprocal BLAST searches for each sample for which we performed target sequence captureTable_S2List of Genbank accession numbers for sequences included in our analysis for Platanista gangetica and Balaenoptera omuraiDATASET_A.phylipDataset A, concatenated alignmentDATASET_BDataset B (without Platanista ganagetica and Balaenoptera omurai).Cetacea_gene_partitionRAXML partitions for each gene (3,191)PartitionFinder PartitionsRAXML partitions generated by Partition FinderpartitionfindersetsDATASET_A_RAxML_unpartitioned_best_treeBest tree for unpartitioned analysis of RAxML using DATASET ARAxML_unpartitioned_best_tree.treeDATASET_A_RAxML_unpartitioned_bootstrapDATASET_A_RAxML_unpartitioned_bootstrapRAxML_unpartitioned_bootstrap.resultDATASET_A_RAxMLpartitionfinder_best_treeBest tree of RAxML analysis of Dataset A using the partition scheme generated by Partition Finder.RAxMLpartitionfinder_best_tree.treDATASET_A_RAxML_partitionfinder_bootstrap.resultDATASET_A_RAxML_partitionfinder_bootstrap treesRAxML_partitionfinder_bootstrap.result.txtDATASET_A_RAxML_partition_by_gene_best_treeBest tree of RAxML analysis partitioned by gene and using DATASET ARAxML_partition_by_gene_best_tree.treDATASET_A_RAxML_bootstrap_partition_by_geneDATASET_A_RAxML_bootstrap_partition_by_geneRAxML_bootstrap_partition_by_gene.resultDATASET_B_RAxML_unpartitioned_bestTreeDATASET_B_RAxML_unpartitioned_bestTreeRAxML_DATASET_B_unpartitioned_bestTree.resultDATASET_B_RAxML_bootstrap_unpartitioned.resultDATASET_B_RAxML_bootstrap_unpartitioned treesRAxML_bootstrap_unpartitioned.result.txtDATASET_B_RAxML_partitionfinder_bestTreeDATASET_B_RAxML_partitionfinder_bestTreeRAxML_DATASET_B_partitionfinder_bestTree.resultDATASET_B_RAxML_bootstrap_partitionfinder.resultDATASET_B_RAxML_bootstrap_partitionfinder.resultRAxML_bootstrap_partitionfinder.result.txtDATASET_B_RAxML_partition_by_gene_bestTreeDATASET_B_RAxML_partition_by_gene_bestTreeRAxML_DATASET_B_all_genes_bestTree.resultDATASET_B_RAXML_boostrap_partition_by_gene_resultDATASET_B_RAXML_boostrap_partition_by_gene_resultRAXML_boostrap_by_gene_result.txtExabayes_treeTree resulting from the ExaBayes analysisBayes_tree.nexBayes_tree.nexASTRAL input of RAxML gene trees for each of the 3,191 genesASTRAL input of RAxML gene trees for each of the 3,191 genesRAXML_gene_trees_ASTRAL_inputASTRAL_species_tree_resultsResults of the ASTRAL species tree analysisASTRAL_species_tree.txtMCMCTree inputDataset including the top 1/3 of genes in terms of divergence between odontocetes and mysticetes. This was the inout for all MCMCTree analyses.GENE_LIST3.phylipMCMCTREE.treTree input with calibration points for all MCMCTree analysesHessian matrix file for input in MCMCTree analysesin.BVOutput for MCMCTree Strict clock Run 1Output for MCMCTree Strict clock analysis; Run 1out_clock_1_1.txtOutput for MCMCTree Strict clock Run 2Output for MCMCTree Strict clock analysis; Run 2out_clock_1_2.txtOutput for MCMCTree IR analysis; Run 1Output for MCMCTree IR analysis; Run 1out_clock2_1.txtOutput for MCMCTree IR analysis; Run 2Output for MCMCTree IR analysis; Run 2out_clock_2_2.txtOutput for MCMCTree AR analysis; Run 1Output for MCMCTree AR analysis; Run 1out_clock3_1.txtOutput for MCMCTree AR analysis; Run 2Output for MCMCTree AR analysis; Run 2out_clock3_2.txtFigure_S2Figure S2. Tracer file showing convergence of -lnL values for both runs of the Bayesian analysis using ExaBayes.Figure_S3Figure S3. Species tree of Dataset B generated by ASTRAL. All nodes have posterior probabilities of 1.0, except for those with values listed above the node.Figure_S4_Tracer_3_ARFigure S4. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with autocorrelated rates using MCMCTree.Figure_S5_Tracer_3_IRFigure S5. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with independent rates using MCMCTree.Figure_S6_Tracer_6_ARFigure S6. Tracer file showing convergence of -lnL values for both runs of the 6-partition analysis with autocorrelated rates using MCMCTree.Figure_S7_Tracer_6_IRFigure_S8Figure S8. Cetartiodactyl tree with the topology from Figure 3 with nodes labelled corresponding to the list of mean ages and 95% confidence intervals (CIs) for both the AR and IR models of the 6-partition scheme in Table S3.Figure_S9Figure S9. Timetree of Cetacea analyzed in the MCMCTree package of PAML 4.9h using 3 partitions and approximate likelihood (Yang, 2007). A time scale in Ma (millions of years) is shown above the tree, with geologic periods labelled below the tree for reference (Q=Quaternary). Above each node the posterior distributions of the AR model (purple) and IR model (white) are shown. Red circles at each node represent calibrationSupplemental_Figure_CaptionsTable_S3Cetacea_ExaBayes Input FileInput file for ExaBayes analyses.Cetacea_ExaBayes.phyConfiguration file used in ExaBayes analysesconfig.nexTopologies for ExaBayes Run 1ExaBayes_topologies.run-0.Cetacea_1Parameters for ExaBayes Run 1ExaBayes_parameters.run-0.Cetacea_1Topologies for ExaBayes Run 2ExaBayes_topologies.run-0.Cetacea_2Parameters for ExaBayes Run 2ExaBayes_parameters.run-0.Cetacea_2Cetacea_partition_mcmctree_3Alignment file for the 3 partition analyses for MCMCTreeCetacea_partition_mcmctree_6Alignment file for the 6-partition analyses in MCMCTreeHessian matrix file for input in 3-partition MCMCTree analysesin.BV1-3Hessian matrix file for input in 6-partition MCMCTree analysesin.BV1-6Tree file for MCMCTree analysesMCMCTREE.treResult file for 3-partition mcmctree AR Run 1parts_3_mcmctree_AR_mcmc.txtFigTree result for 3-partition mcmctree AR Run 1FigTree_parts_3_mcmctree_AR_1.treControl file for 3-partition AR analyses MCMCTreemcmctree_3p_AR.ctlResult file for 3-partition mcmctree AR Run 2parts_3_mcmctree_AR_2_mcmc.txtFigTree result for 3-partition mcmctree AR Run 2FigTreeparts_3_mcmctree_AR_2.treResult file for 3-partition mcmctree IR Run 1parts_3_mcmctree_IR_mcmc.txtFigTree result for 3-partition mcmctree IR Run 1FigTree_parts_3_mcmctree_IR.treControl file for 3-partition IR analyses MCMCTreemcmctree_3p_IR.ctlResult file for 3-partition mcmctree IR Run 2parts_3_mcmctree_IR_2_mcmc.txtFigTree result for 3-partition mcmctree IR Run 2FigTree_parts_3_mcmctree_IR_2.treResult file for 6-partition mcmctree AR Run 1parts_6_mcmctree_AR_mcmc.txtFigTree result for 6-partition mcmctree AR Run 1FigTree_parts_6_mcmctree_AR.treControl file for 6-partition AR analyses MCMCTreemcmctree_6p_AR.ctlResult file for 6-partition mcmctree AR Run 2parts_6_mcmctree_AR_mcmc.txtFigTree result for 6-partition mcmctree AR Run 2FigTree_parts_6_mcmctree_AR.treResult file for 6-partition mcmctree IR Run 1parts_6_mcmctree_IR_mcmc.txtFigTree result for 6-partition mcmctree IR Run 1FigTree_parts_6_mcmctree_IR.treControl file for 6-partition IR analyses MCMCTreemcmctree_6p_IR.ctlResult file for 6-partition mcmctree IR Run 2parts_6_mcmctree_IR_2_mcmc.txtFigTree result for 6-partition mcmctree IR Run 2FigTree_parts_6_mcmctree_IR_2.tre,&rft.creator=McGowen, Michael &rft.creator=Tsagkogeorga, Georgia &rft.creator=Álvarez-Carretero, Sandra &rft.creator=Dos Reis, Mario &rft.creator=Struebig, Monika &rft.creator=Deaville, Rob &rft.creator=Jepson, Paul &rft.creator=Jarman, Simon &rft.creator=Polanowski, Andrea &rft.creator=Morin, Phillip &rft.creator=Rossiter, Stephen &rft.date=2019&rft.relation=http://research-repository.uwa.edu.au/en/publications/bb697a8b-2b2e-465e-998a-18a83b08d1da&rft_subject=Cetartiodactyla&rft_subject=Delphinidae&rft_subject=Hippopotamidae&rft_subject=Divergence Dating&rft_subject=Balaenopteridae&rft_subject=Platanistidae&rft_subject=Balaenidae&rft_subject=Balaenopteroidea&rft_subject=Eschrichtiidae&rft_subject=Phocoenidae&rft_subject=phylogenomic&rft_subject=Ziphiidae&rft_subject=Odontoceti&rft_subject=Physeteridae&rft_subject=Pontoporiidae&rft_subject=Iniidae&rft_subject=Cetacea&rft_subject=cetaceans&rft_subject=whales&rft_subject=Neobalaenidae&rft_subject=Kogiidae&rft_subject=Mysticeti&rft_subject=Whippomorpha&rft_subject=target sequence capture&rft_subject=Monodontidae&rft_subject=dolphins&rft_subject=Physeteroidea&rft.type=dataset&rft.language=English Access the data

Access:

Open

Full description

The evolution of the cetaceans, from their early transition to an aquatic lifestyle to their subsequent diversification, has been the subject of numerous studies. However, while the higher-level relationships among cetacean families have been largely settled, several aspects of the systematics within these groups remain unresolved. Problematic clades include the oceanic dolphins (37 spp.), which have experienced a recent rapid radiation, and the beaked whales (22 spp.), which have not been investigated in detail using nuclear loci. The combined application of high-throughput sequencing with techniques that target specific genomic sequences provide a powerful means of rapidly generating large volumes of orthologous sequence data for use in phylogenomic studies. To elucidate the phylogenetic relationships within the Cetacea, we combined sequence capture with Illumina sequencing to generate data for ~3200 protein-coding genes for 68 cetacean species and their close relatives including the pygmy hippopotamus. By combining data from >38,000 exons with existing sequences from 11 cetaceans and seven outgroup taxa, we produced the first comprehensive comparative genomic dataset for cetaceans, spanning 6,527,596 aligned base pairs and 89 taxa. Phylogenetic trees reconstructed with maximum likelihood and Bayesian inference of concatenated loci, as well as with coalescence analyses of individual gene trees, produced mostly concordant and well-supported trees. Our results completely resolve the relationships among beaked whales as well as the contentious relationships among ocean dolphins, especially the problematic subfamily Delphininae, which includes the common and bottlenose dolphins. We performed Bayesian estimation of species divergence times using MCMCtree, integrating recently described fossils as calibration points (e.g., Mystacodon selenensis) that have not been used before. Integration of new fossil dates in the context of autocorrelated rates indicate that the diversification of Crown Cetacea began before the Late Eocene and the divergence of Crown Delphinidae as early as the Middle Miocene.,Figure_S1Maximum likelihood phylogram of Dataset B with the maximum number of partitions. Bootstrap values are 100 for all 3 analyses except at 6 nodes labelled with a red circle; bootstrap values for these are shown in the upper left.Figure_S2ASTRAL species tree. All support values are 1.0 unless orthwise noted over the branch.FigureS3Time tree of Cetacea using the independent rates (IR) model. Numbers over each node correspond to raw values in Table 3.Table_S1Description of values for sequencing (ie. number of reads), Trinity (ie. number of contigs), and reciprocal BLAST searches for each sample for which we performed target sequence captureTable_S2List of Genbank accession numbers for sequences included in our analysis for Platanista gangetica and Balaenoptera omuraiDATASET_A.phylipDataset A, concatenated alignmentDATASET_BDataset B (without Platanista ganagetica and Balaenoptera omurai).Cetacea_gene_partitionRAXML partitions for each gene (3,191)PartitionFinder PartitionsRAXML partitions generated by Partition FinderpartitionfindersetsDATASET_A_RAxML_unpartitioned_best_treeBest tree for unpartitioned analysis of RAxML using DATASET ARAxML_unpartitioned_best_tree.treeDATASET_A_RAxML_unpartitioned_bootstrapDATASET_A_RAxML_unpartitioned_bootstrapRAxML_unpartitioned_bootstrap.resultDATASET_A_RAxMLpartitionfinder_best_treeBest tree of RAxML analysis of Dataset A using the partition scheme generated by Partition Finder.RAxMLpartitionfinder_best_tree.treDATASET_A_RAxML_partitionfinder_bootstrap.resultDATASET_A_RAxML_partitionfinder_bootstrap treesRAxML_partitionfinder_bootstrap.result.txtDATASET_A_RAxML_partition_by_gene_best_treeBest tree of RAxML analysis partitioned by gene and using DATASET ARAxML_partition_by_gene_best_tree.treDATASET_A_RAxML_bootstrap_partition_by_geneDATASET_A_RAxML_bootstrap_partition_by_geneRAxML_bootstrap_partition_by_gene.resultDATASET_B_RAxML_unpartitioned_bestTreeDATASET_B_RAxML_unpartitioned_bestTreeRAxML_DATASET_B_unpartitioned_bestTree.resultDATASET_B_RAxML_bootstrap_unpartitioned.resultDATASET_B_RAxML_bootstrap_unpartitioned treesRAxML_bootstrap_unpartitioned.result.txtDATASET_B_RAxML_partitionfinder_bestTreeDATASET_B_RAxML_partitionfinder_bestTreeRAxML_DATASET_B_partitionfinder_bestTree.resultDATASET_B_RAxML_bootstrap_partitionfinder.resultDATASET_B_RAxML_bootstrap_partitionfinder.resultRAxML_bootstrap_partitionfinder.result.txtDATASET_B_RAxML_partition_by_gene_bestTreeDATASET_B_RAxML_partition_by_gene_bestTreeRAxML_DATASET_B_all_genes_bestTree.resultDATASET_B_RAXML_boostrap_partition_by_gene_resultDATASET_B_RAXML_boostrap_partition_by_gene_resultRAXML_boostrap_by_gene_result.txtExabayes_treeTree resulting from the ExaBayes analysisBayes_tree.nexBayes_tree.nexASTRAL input of RAxML gene trees for each of the 3,191 genesASTRAL input of RAxML gene trees for each of the 3,191 genesRAXML_gene_trees_ASTRAL_inputASTRAL_species_tree_resultsResults of the ASTRAL species tree analysisASTRAL_species_tree.txtMCMCTree inputDataset including the top 1/3 of genes in terms of divergence between odontocetes and mysticetes. This was the inout for all MCMCTree analyses.GENE_LIST3.phylipMCMCTREE.treTree input with calibration points for all MCMCTree analysesHessian matrix file for input in MCMCTree analysesin.BVOutput for MCMCTree Strict clock Run 1Output for MCMCTree Strict clock analysis; Run 1out_clock_1_1.txtOutput for MCMCTree Strict clock Run 2Output for MCMCTree Strict clock analysis; Run 2out_clock_1_2.txtOutput for MCMCTree IR analysis; Run 1Output for MCMCTree IR analysis; Run 1out_clock2_1.txtOutput for MCMCTree IR analysis; Run 2Output for MCMCTree IR analysis; Run 2out_clock_2_2.txtOutput for MCMCTree AR analysis; Run 1Output for MCMCTree AR analysis; Run 1out_clock3_1.txtOutput for MCMCTree AR analysis; Run 2Output for MCMCTree AR analysis; Run 2out_clock3_2.txtFigure_S2Figure S2. Tracer file showing convergence of -lnL values for both runs of the Bayesian analysis using ExaBayes.Figure_S3Figure S3. Species tree of Dataset B generated by ASTRAL. All nodes have posterior probabilities of 1.0, except for those with values listed above the node.Figure_S4_Tracer_3_ARFigure S4. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with autocorrelated rates using MCMCTree.Figure_S5_Tracer_3_IRFigure S5. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with independent rates using MCMCTree.Figure_S6_Tracer_6_ARFigure S6. Tracer file showing convergence of -lnL values for both runs of the 6-partition analysis with autocorrelated rates using MCMCTree.Figure_S7_Tracer_6_IRFigure_S8Figure S8. Cetartiodactyl tree with the topology from Figure 3 with nodes labelled corresponding to the list of mean ages and 95% confidence intervals (CIs) for both the AR and IR models of the 6-partition scheme in Table S3.Figure_S9Figure S9. Timetree of Cetacea analyzed in the MCMCTree package of PAML 4.9h using 3 partitions and approximate likelihood (Yang, 2007). A time scale in Ma (millions of years) is shown above the tree, with geologic periods labelled below the tree for reference (Q=Quaternary). Above each node the posterior distributions of the AR model (purple) and IR model (white) are shown. Red circles at each node represent calibrationSupplemental_Figure_CaptionsTable_S3Cetacea_ExaBayes Input FileInput file for ExaBayes analyses.Cetacea_ExaBayes.phyConfiguration file used in ExaBayes analysesconfig.nexTopologies for ExaBayes Run 1ExaBayes_topologies.run-0.Cetacea_1Parameters for ExaBayes Run 1ExaBayes_parameters.run-0.Cetacea_1Topologies for ExaBayes Run 2ExaBayes_topologies.run-0.Cetacea_2Parameters for ExaBayes Run 2ExaBayes_parameters.run-0.Cetacea_2Cetacea_partition_mcmctree_3Alignment file for the 3 partition analyses for MCMCTreeCetacea_partition_mcmctree_6Alignment file for the 6-partition analyses in MCMCTreeHessian matrix file for input in 3-partition MCMCTree analysesin.BV1-3Hessian matrix file for input in 6-partition MCMCTree analysesin.BV1-6Tree file for MCMCTree analysesMCMCTREE.treResult file for 3-partition mcmctree AR Run 1parts_3_mcmctree_AR_mcmc.txtFigTree result for 3-partition mcmctree AR Run 1FigTree_parts_3_mcmctree_AR_1.treControl file for 3-partition AR analyses MCMCTreemcmctree_3p_AR.ctlResult file for 3-partition mcmctree AR Run 2parts_3_mcmctree_AR_2_mcmc.txtFigTree result for 3-partition mcmctree AR Run 2FigTreeparts_3_mcmctree_AR_2.treResult file for 3-partition mcmctree IR Run 1parts_3_mcmctree_IR_mcmc.txtFigTree result for 3-partition mcmctree IR Run 1FigTree_parts_3_mcmctree_IR.treControl file for 3-partition IR analyses MCMCTreemcmctree_3p_IR.ctlResult file for 3-partition mcmctree IR Run 2parts_3_mcmctree_IR_2_mcmc.txtFigTree result for 3-partition mcmctree IR Run 2FigTree_parts_3_mcmctree_IR_2.treResult file for 6-partition mcmctree AR Run 1parts_6_mcmctree_AR_mcmc.txtFigTree result for 6-partition mcmctree AR Run 1FigTree_parts_6_mcmctree_AR.treControl file for 6-partition AR analyses MCMCTreemcmctree_6p_AR.ctlResult file for 6-partition mcmctree AR Run 2parts_6_mcmctree_AR_mcmc.txtFigTree result for 6-partition mcmctree AR Run 2FigTree_parts_6_mcmctree_AR.treResult file for 6-partition mcmctree IR Run 1parts_6_mcmctree_IR_mcmc.txtFigTree result for 6-partition mcmctree IR Run 1FigTree_parts_6_mcmctree_IR.treControl file for 6-partition IR analyses MCMCTreemcmctree_6p_IR.ctlResult file for 6-partition mcmctree IR Run 2parts_6_mcmctree_IR_2_mcmc.txtFigTree result for 6-partition mcmctree IR Run 2FigTree_parts_6_mcmctree_IR_2.tre,

Notes

External Organisations
Smithsonian Institution; Queen Mary University of London; Zoological Society of London Institute of Zoology; Australian Antarctic Division
Associated Persons
Michael McGowen (Creator); Georgia Tsagkogeorga (Creator); Sandra Álvarez-Carretero (Creator); Mario Dos Reis (Creator); Monika Struebig (Creator); Rob Deaville (Creator); Paul Jepson (Creator); Andrea Polanowski (Creator); Phillip Morin (Creator); Stephen Rossiter (Creator)

Issued: 2019-10-16

This dataset is part of a larger collection

Click to explore relationships graph
Identifiers