Data

Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource

Name: Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource
Published: 2023

University of the Sunshine Coast

McKinnie, Lachlan ; Cummins, Scott ; Zhao, Min

Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]

ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25907/00782&rft.title=Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource&rft.identifier=10.25907/00782&rft.publisher=University of the Sunshine Coast&rft.description=This dataset contains the raw data associated with the study 'Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource', submitted to Marine Drugs for review. In this study, Rhodophyte genome and transcriptome assemblies were functionally annotated and their metabolic pathways reconstructed, while phylogeny inferred with OrthoFinder was used to correlate the abundance of specific functional annotations with gene duplication analysis.Description of the data and file structureThis dataset contains 4 main directories, labelled D1, D2, D3, and D4. Major subdirectories are labelled with letters (for example, D1a, D1b, etc).Assemblies referred to in this dataset can be divided into two broad categories: those with prior protein annotations available, and those without. Those which had prior protein annotations available for download are classed as 'pre-annotated', while those without will be referred to as 'unannotated'.D1: BUSCO dataDirectory D1 contains the BUSCO results for this study. There are two subdirectories: D1a and D1b. D1a contains the BUSCO data for all the assemblies used in the main part of this study, including all the red algae and a small outgroup of green algae and a Glaucophyte. D1b contains the BUSCO results for 41 preannotated green algal protein assemblies which were used as a comparison for the BUSCO results and other general statistics, and which were not otherwise used in the study.D2: Protein sequence dataD2 contains protein assemblies that were annotated as part of this dataset. Protein annotations were predicted using AUGUSTUS trained on BUSCO training data generated by running BUSCO on genome mode with AUGUSTUS as the prediction algorithm. Sequences are in FASTA format, with an organism prefix before the gene number.D3: Repeat dataRepeat identification and masking results are included in directory D3. Only genomes had repeats identified. Both pre-annotated and unannotated genome assemblies had masking, but only the unannotated assemblies had proteins predicted using these masked assemblies; the pre-annotated assemblies only had repeat identification done as comparison. The results include both summary tables (D3a) and full results tables (D3b). The summaries include short tables detailing the total percentage of the assembly corresponded to each type of repeat element. The result tables detail each individual repeat element for each assembly. Both sets of results are in .txt format.D4: OrthoFinder dataGene orthologue and phylogenetic data inferred by OrthoFinder is contained in directory D5. The OrthoFinder directory has two subdirectories: D4a and D4b. D4a contains the results from an OrthoFinder run on 64 functionally annotated Rhodophyte genome and transcriptome assemblies with an outgroup of Chlorophyte and Glaucophyte genomes. This was performed using default settings, using DendroBLAST for phylogenetic inference. D4b contains the results of a smaller OrthoFinder run using 32 genomes run using the multiple sequence alignment option with default parameters. Both subdirectories contain results as outputted by default by OrthoFinder. The WorkingDirectory data was not included in this dataset.Sharing/Access informationData was derived from the following sources:NCBI Assembly DatabaseZhao, M.; Campbell, A.; Patwary, Z.; Wang, T.; Lang, T.; Webb, J.; Zuccarello, G.; Wegner, A.; Heyne, D.; McKinnie, L.; et al. The red seaweed Asparagopsis taxiformis genome and integrative -omics analysis. Research Square 2022, 9 November 2022, doi:10.21203/rs.3.rs-2232367/v1. Rossoni, A.W.; Price, D.C.; Seger, M.; Lyska, D.; Lammers, P.; Bhattacharya, D.; Weber, A.P.M. Data from: The genomes of polyextremophilic Cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions. 2019, doi:10.5061/dryad.m06n200.Leebens-Mack, J.H.; Barker, M.S.; Carpenter, E.J.; Deyholos, M.K.; Gitzendanner, M.A.; Graham, S.W.; Grosse, I.; Li, Z.; Melkonian, M.; Mirarab, S.; et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019, 574, 679-685, doi:10.1038/s41586-019-1693-2.Van Vlierberghe, M.; Di Franco, A.; Philippe, H.; Baurain, D. Decontamination, pooling and dereplication of the 678 samples of the Marine Microbial Eukaryote Transcriptome Sequencing Project. BMC Res. Notes 2021, 14, 306, doi:10.1186/s13104-021-05717-2.Code/SoftwareThis dataset was created using the following software packages:AUGUSTUS v3.4.0BUSCO v5.2.2CD-HIT v4.8.1OmicsBox v2OrthoFinder v2.5.4RAxML-NGRepeatMasker v4.1.2RepeatModeler v2.0.2a&rft.creator=McKinnie, Lachlan &rft.creator=Cummins, Scott &rft.creator=Zhao, Min &rft.date=2023&rft.relation=11269452840002621&rft_rights=CC0 1.0 Universal The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.&rft_subject=Rhodophyta&rft_subject=Multi-omics&rft_subject=Metabolism&rft_subject=Bioinformatics and computational biology&rft_subject=BIOLOGICAL SCIENCES&rft.type=dataset&rft.language=English Access the data

Access data via landing page
http://doi.org/10.2590...

Cite Saved to MyRDA Save to MyRDA

Licence & Rights:

view details

CC0 1.0 Universal The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.

Access:

Open

Full description

This dataset contains the raw data associated with the study 'Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource', submitted to Marine Drugs for review. In this study, Rhodophyte genome and transcriptome assemblies were functionally annotated and their metabolic pathways reconstructed, while phylogeny inferred with OrthoFinder was used to correlate the abundance of specific functional annotations with gene duplication analysis.

Description of the data and file structure

This dataset contains 4 main directories, labelled D1, D2, D3, and D4. Major subdirectories are labelled with letters (for example, D1a, D1b, etc).

Assemblies referred to in this dataset can be divided into two broad categories: those with prior protein annotations available, and those without. Those which had prior protein annotations available for download are classed as 'pre-annotated', while those without will be referred to as 'unannotated'.

D1: BUSCO data

Directory D1 contains the BUSCO results for this study. There are two subdirectories: D1a and D1b. D1a contains the BUSCO data for all the assemblies used in the main part of this study, including all the red algae and a small outgroup of green algae and a Glaucophyte. D1b contains the BUSCO results for 41 preannotated green algal protein assemblies which were used as a comparison for the BUSCO results and other general statistics, and which were not otherwise used in the study.

D2: Protein sequence data

D2 contains protein assemblies that were annotated as part of this dataset. Protein annotations were predicted using AUGUSTUS trained on BUSCO training data generated by running BUSCO on genome mode with AUGUSTUS as the prediction algorithm. Sequences are in FASTA format, with an organism prefix before the gene number.

D3: Repeat data

Repeat identification and masking results are included in directory D3. Only genomes had repeats identified. Both pre-annotated and unannotated genome assemblies had masking, but only the unannotated assemblies had proteins predicted using these masked assemblies; the pre-annotated assemblies only had repeat identification done as comparison. The results include both summary tables (D3a) and full results tables (D3b). The summaries include short tables detailing the total percentage of the assembly corresponded to each type of repeat element. The result tables detail each individual repeat element for each assembly. Both sets of results are in .txt format.

D4: OrthoFinder data

Gene orthologue and phylogenetic data inferred by OrthoFinder is contained in directory D5. The OrthoFinder directory has two subdirectories: D4a and D4b. D4a contains the results from an OrthoFinder run on 64 functionally annotated Rhodophyte genome and transcriptome assemblies with an outgroup of Chlorophyte and Glaucophyte genomes. This was performed using default settings, using DendroBLAST for phylogenetic inference. D4b contains the results of a smaller OrthoFinder run using 32 genomes run using the multiple sequence alignment option with default parameters. Both subdirectories contain results as outputted by default by OrthoFinder. The WorkingDirectory data was not included in this dataset.

Sharing/Access information

Data was derived from the following sources:

NCBI Assembly Database

Zhao, M.; Campbell, A.; Patwary, Z.; Wang, T.; Lang, T.; Webb, J.; Zuccarello, G.; Wegner, A.; Heyne, D.; McKinnie, L.; et al. The red seaweed Asparagopsis taxiformis genome and integrative -omics analysis. Research Square 2022, 9 November 2022, doi:10.21203/rs.3.rs-2232367/v1.

Rossoni, A.W.; Price, D.C.; Seger, M.; Lyska, D.; Lammers, P.; Bhattacharya, D.; Weber, A.P.M. Data from: The genomes of polyextremophilic Cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions. 2019, doi:10.5061/dryad.m06n200.

Leebens-Mack, J.H.; Barker, M.S.; Carpenter, E.J.; Deyholos, M.K.; Gitzendanner, M.A.; Graham, S.W.; Grosse, I.; Li, Z.; Melkonian, M.; Mirarab, S.; et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019, 574, 679-685, doi:10.1038/s41586-019-1693-2.

Van Vlierberghe, M.; Di Franco, A.; Philippe, H.; Baurain, D. Decontamination, pooling and dereplication of the 678 samples of the Marine Microbial Eukaryote Transcriptome Sequencing Project. BMC Res. Notes 2021, 14, 306, doi:10.1186/s13104-021-05717-2.

Code/Software

This dataset was created using the following software packages:

AUGUSTUS v3.4.0

BUSCO v5.2.2

CD-HIT v4.8.1

OmicsBox v2

OrthoFinder v2.5.4

RAxML-NG

RepeatMasker v4.1.2

RepeatModeler v2.0.2a

Issued: 2023

Created: 20210513 to 20220331

This dataset is part of a larger collection

Click to explore relationships graph

Subjects

User Contributed Tags

Identifiers

usc : 11267943120002621

Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource

Licence & Rights:

Access:

Full description

Description of the data and file structure

D1: BUSCO data

D2: Protein sequence data

D3: Repeat data

D4: OrthoFinder data

Sharing/Access information

Code/Software

This dataset is part of a larger collection

User Contributed Tags

Quick Links

Explore

External Resources

Share

Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource

Licence & Rights:

Access:

Full description

Description of the data and file structure

D1: BUSCO data

D2: Protein sequence data

D3: Repeat data

D4: OrthoFinder data

Sharing/Access information

Code/Software

This dataset is part of a larger collection

Related Publications

Related People

Related Grants and Projects

User Contributed Tags