Data

Data from: Dissecting molecular evolution in the highly diverse plant clade Caryophyllales using transcriptome sequencing

The University of Western Australia
Yang, Ya ; Moore, Michael J. ; Brockington, Samuel F. ; Soltis, Douglas E. ; Wong, Gane Ka-Shu ; Carpenter, Eric J. ; Zhang, Yong ; Chen, Li ; Yan, Zhixiang ; Xie, Yinlong ; Sage, Rowan F. ; Covshoff, Sarah ; Hibberd, Julian M. ; Nelson, Matthew ; Smith, Stephen A.
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.5061/dryad.33m48&rft.title=Data from: Dissecting molecular evolution in the highly diverse plant clade Caryophyllales using transcriptome sequencing&rft.identifier=10.5061/dryad.33m48&rft.publisher=DRYAD&rft.description=Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.,cary_dryad.tar,&rft.creator=Yang, Ya &rft.creator=Moore, Michael J. &rft.creator=Brockington, Samuel F. &rft.creator=Soltis, Douglas E. &rft.creator=Wong, Gane Ka-Shu &rft.creator=Carpenter, Eric J. &rft.creator=Zhang, Yong &rft.creator=Chen, Li &rft.creator=Yan, Zhixiang &rft.creator=Xie, Yinlong &rft.creator=Sage, Rowan F. &rft.creator=Covshoff, Sarah &rft.creator=Hibberd, Julian M. &rft.creator=Nelson, Matthew &rft.creator=Smith, Stephen A. &rft.date=2016&rft.relation=http://research-repository.uwa.edu.au/en/publications/9fc722da-03a6-438c-8181-24ef4695dd71&rft.type=dataset&rft.language=English Access the data

Access:

Open

Full description

Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.,cary_dryad.tar,

Notes

External Organisations
University of Michigan; University of Alberta; Beijing Genomics Institute-Wuhan; Oberlin College; University of Cambridge; University of Florida; BGI-Shenzhen; University of Toronto
Associated Persons
Ya Yang (Creator); Michael J. Moore (Creator); Samuel F. Brockington (Creator); Douglas E. Soltis (Creator); Gane Ka-Shu Wong (Creator); Eric J. Carpenter (Creator); Yong Zhang (Creator); Li Chen (Creator); Zhixiang Yan (Creator); Yinlong Xie (Creator); Rowan F. Sage (Creator); Sarah Covshoff (Creator); Julian M. Hibberd (Creator); Stephen A. Smith (Creator)

Created: 2017-11-06 to 2017-11-06

Issued: 2016-04-01

This dataset is part of a larger collection

Click to explore relationships graph

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers