Data

Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource

University of the Sunshine Coast
McKinnie, Lachlan ; Cummins, Scott ; Zhao, Min
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25907/00782&rft.title=Data from: Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource&rft.identifier=10.25907/00782&rft.publisher=University of the Sunshine Coast&rft.description=This dataset contains the raw data associated with the study 'Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource', submitted to Marine Drugs for review. In this study, Rhodophyte genome and transcriptome assemblies were functionally annotated and their metabolic pathways reconstructed, while phylogeny inferred with OrthoFinder was used to correlate the abundance of specific functional annotations with gene duplication analysis.Description of the data and file structureThis dataset contains 4 main directories, labelled D1, D2, D3, and D4. Major subdirectories are labelled with letters (for example, D1a, D1b, etc).Assemblies referred to in this dataset can be divided into two broad categories: those with prior protein annotations available, and those without. Those which had prior protein annotations available for download are classed as 'pre-annotated', while those without will be referred to as 'unannotated'.D1: BUSCO dataDirectory D1 contains the BUSCO results for this study. There are two subdirectories: D1a and D1b. D1a contains the BUSCO data for all the assemblies used in the main part of this study, including all the red algae and a small outgroup of green algae and a Glaucophyte. D1b contains the BUSCO results for 41 preannotated green algal protein assemblies which were used as a comparison for the BUSCO results and other general statistics, and which were not otherwise used in the study.D2: Protein sequence dataD2 contains protein assemblies that were annotated as part of this dataset. Protein annotations were predicted using AUGUSTUS trained on BUSCO training data generated by running BUSCO on genome mode with AUGUSTUS as the prediction algorithm. Sequences are in FASTA format, with an organism prefix before the gene number.D3: Repeat dataRepeat identification and masking results are included in directory D3. Only genomes had repeats identified. Both pre-annotated and unannotated genome assemblies had masking, but only the unannotated assemblies had proteins predicted using these masked assemblies; the pre-annotated assemblies only had repeat identification done as comparison. The results include both summary tables (D3a) and full results tables (D3b). The summaries include short tables detailing the total percentage of the assembly corresponded to each type of repeat element. The result tables detail each individual repeat element for each assembly. Both sets of results are in .txt format.D4: OrthoFinder dataGene orthologue and phylogenetic data inferred by OrthoFinder is contained in directory D5. The OrthoFinder directory has two subdirectories: D4a and D4b. D4a contains the results from an OrthoFinder run on 64 functionally annotated Rhodophyte genome and transcriptome assemblies with an outgroup of Chlorophyte and Glaucophyte genomes. This was performed using default settings, using DendroBLAST for phylogenetic inference. D4b contains the results of a smaller OrthoFinder run using 32 genomes run using the multiple sequence alignment option with default parameters. Both subdirectories contain results as outputted by default by OrthoFinder. The WorkingDirectory data was not included in this dataset.Sharing/Access informationData was derived from the following sources:NCBI Assembly DatabaseZhao, M.; Campbell, A.; Patwary, Z.; Wang, T.; Lang, T.; Webb, J.; Zuccarello, G.; Wegner, A.; Heyne, D.; McKinnie, L.; et al. The red seaweed Asparagopsis taxiformis genome and integrative -omics analysis. Research Square 2022, 9 November 2022, doi:10.21203/rs.3.rs-2232367/v1. Rossoni, A.W.; Price, D.C.; Seger, M.; Lyska, D.; Lammers, P.; Bhattacharya, D.; Weber, A.P.M. Data from: The genomes of polyextremophilic Cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions. 2019, doi:10.5061/dryad.m06n200.Leebens-Mack, J.H.; Barker, M.S.; Carpenter, E.J.; Deyholos, M.K.; Gitzendanner, M.A.; Graham, S.W.; Grosse, I.; Li, Z.; Melkonian, M.; Mirarab, S.; et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019, 574, 679-685, doi:10.1038/s41586-019-1693-2.Van Vlierberghe, M.; Di Franco, A.; Philippe, H.; Baurain, D. Decontamination, pooling and dereplication of the 678 samples of the Marine Microbial Eukaryote Transcriptome Sequencing Project. BMC Res. Notes 2021, 14, 306, doi:10.1186/s13104-021-05717-2.Code/SoftwareThis dataset was created using the following software packages:AUGUSTUS v3.4.0BUSCO v5.2.2CD-HIT v4.8.1OmicsBox v2OrthoFinder v2.5.4RAxML-NGRepeatMasker v4.1.2RepeatModeler v2.0.2a&rft.creator=McKinnie, Lachlan &rft.creator=Cummins, Scott &rft.creator=Zhao, Min &rft.date=2023&rft.relation=11269452840002621&rft_rights=CC0 1.0 Universal The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.&rft_subject=Rhodophyta&rft_subject=Multi-omics&rft_subject=Metabolism&rft_subject=Bioinformatics and computational biology&rft_subject=BIOLOGICAL SCIENCES&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

view details

CC0 1.0 Universal The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information below.

Access:

Open

Full description

This dataset contains the raw data associated with the study 'Incomplete sterol biosynthesis pathways and highly duplicated haem peroxidases revealed in Rhodophyte algae using multi-omics resource', submitted to Marine Drugs for review. In this study, Rhodophyte genome and transcriptome assemblies were functionally annotated and their metabolic pathways reconstructed, while phylogeny inferred with OrthoFinder was used to correlate the abundance of specific functional annotations with gene duplication analysis.

Description of the data and file structure

This dataset contains 4 main directories, labelled D1, D2, D3, and D4. Major subdirectories are labelled with letters (for example, D1a, D1b, etc).
Assemblies referred to in this dataset can be divided into two broad categories: those with prior protein annotations available, and those without. Those which had prior protein annotations available for download are classed as 'pre-annotated', while those without will be referred to as 'unannotated'.

D1: BUSCO data

Directory D1 contains the BUSCO results for this study. There are two subdirectories: D1a and D1b. D1a contains the BUSCO data for all the assemblies used in the main part of this study, including all the red algae and a small outgroup of green algae and a Glaucophyte. D1b contains the BUSCO results for 41 preannotated green algal protein assemblies which were used as a comparison for the BUSCO results and other general statistics, and which were not otherwise used in the study.

D2: Protein sequence data

D2 contains protein assemblies that were annotated as part of this dataset. Protein annotations were predicted using AUGUSTUS trained on BUSCO training data generated by running BUSCO on genome mode with AUGUSTUS as the prediction algorithm. Sequences are in FASTA format, with an organism prefix before the gene number.

D3: Repeat data

Repeat identification and masking results are included in directory D3. Only genomes had repeats identified. Both pre-annotated and unannotated genome assemblies had masking, but only the unannotated assemblies had proteins predicted using these masked assemblies; the pre-annotated assemblies only had repeat identification done as comparison. The results include both summary tables (D3a) and full results tables (D3b). The summaries include short tables detailing the total percentage of the assembly corresponded to each type of repeat element. The result tables detail each individual repeat element for each assembly. Both sets of results are in .txt format.

D4: OrthoFinder data

Gene orthologue and phylogenetic data inferred by OrthoFinder is contained in directory D5. The OrthoFinder directory has two subdirectories: D4a and D4b. D4a contains the results from an OrthoFinder run on 64 functionally annotated Rhodophyte genome and transcriptome assemblies with an outgroup of Chlorophyte and Glaucophyte genomes. This was performed using default settings, using DendroBLAST for phylogenetic inference. D4b contains the results of a smaller OrthoFinder run using 32 genomes run using the multiple sequence alignment option with default parameters. Both subdirectories contain results as outputted by default by OrthoFinder. The WorkingDirectory data was not included in this dataset.

Sharing/Access information

Data was derived from the following sources:
  • NCBI Assembly Database
  • Zhao, M.; Campbell, A.; Patwary, Z.; Wang, T.; Lang, T.; Webb, J.; Zuccarello, G.; Wegner, A.; Heyne, D.; McKinnie, L.; et al. The red seaweed Asparagopsis taxiformis genome and integrative -omics analysis. Research Square 2022, 9 November 2022, doi:10.21203/rs.3.rs-2232367/v1.
  • Rossoni, A.W.; Price, D.C.; Seger, M.; Lyska, D.; Lammers, P.; Bhattacharya, D.; Weber, A.P.M. Data from: The genomes of polyextremophilic Cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions. 2019, doi:10.5061/dryad.m06n200.
  • Leebens-Mack, J.H.; Barker, M.S.; Carpenter, E.J.; Deyholos, M.K.; Gitzendanner, M.A.; Graham, S.W.; Grosse, I.; Li, Z.; Melkonian, M.; Mirarab, S.; et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019, 574, 679-685, doi:10.1038/s41586-019-1693-2.
  • Van Vlierberghe, M.; Di Franco, A.; Philippe, H.; Baurain, D. Decontamination, pooling and dereplication of the 678 samples of the Marine Microbial Eukaryote Transcriptome Sequencing Project. BMC Res. Notes 2021, 14, 306, doi:10.1186/s13104-021-05717-2.
  • Code/Software

    This dataset was created using the following software packages:
  • AUGUSTUS v3.4.0
  • BUSCO v5.2.2
  • CD-HIT v4.8.1
  • OmicsBox v2
  • OrthoFinder v2.5.4
  • RAxML-NG
  • RepeatMasker v4.1.2
  • RepeatModeler v2.0.2a
  • Issued: 2023

    Created: 20210513 to 20220331

    This dataset is part of a larger collection

    Click to explore relationships graph
    Subjects

    User Contributed Tags    

    Login to tag this record with meaningful keywords to make it easier to discover

    Identifiers
    • usc : 11267943120002621