Data
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.48610/ce96855&rft.title=Drosophila serrata genome scaffolding and annotation&rft.identifier=RDM ID: 02979de6-e671-4d6a-895c-cc9d01e73822&rft.publisher=The University of Queensland&rft.description=Supplementary files required for https://github.com/scottlallen/DserSweepsThe reference genome of D. serrata was created using long-read sequencing technology and has a length of 198 Mbp and a contig N50 of 0.94 Mbp (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002093755.2/). We subsequently used Dovetail HiRise and Hi-C methods to scaffold those contigs and achieved a scaffold N50 of 30.3 Mb. The six largest scaffolds span 80% of the genome and reach near chromosome-arm level length except for 2L, which is spanned by two large scaffolds of 21Mb and 8.7 Mb. The genome was annotated by NCBI and lifted over to the Hi-C genome. File Descriptions:FASTA Files: drosophila_06Jul2018_A8VGg.fasta : Original Hi-C genome sequence. - drosophila_06Jul2018_A8VGg_noSpecialChar.fasta : Hi-C genome sequence with special characters removed from scaffold names.drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED.fasta : Masked version of the Hi-C genome sequence. drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED_shortName.fasta : Masked version of the Hi-C genome sequence with short scaffold names. - `top6.anc.fa : Hi-C genome sequence of the 6 longest scaffolds specifying the ancestral sequence. GFF Files: GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted.gff : NCBI Annotation file converted to Hi-C scaffolds. GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted_noSpecialChar.gff : Annotation file converted to Hi-C scaffolds with special characters removed from scaffold names. FAA File: GCF_002093755.1_Dser1.0_protein.faa : Protein sequences. FNA File: GCF_002093755.1_Dser1.0_rna_from_genomic.fna : Coding sequences.&rft.creator=Dr Scott Allen&rft.creator=Dr Scott Allen&rft.creator=Professor Steve Chenoweth&rft.creator=Professor Steve Chenoweth&rft.date=2024&rft_rights= http://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions&rft_subject=eng&rft_subject=Genomics and transcriptomics&rft_subject=Bioinformatics and computational biology&rft_subject=BIOLOGICAL SCIENCES&rft_subject=Statistical and quantitative genetics&rft_subject=Biological adaptation&rft_subject=Evolutionary biology&rft_subject=Molecular evolution&rft_subject=Genetics&rft_subject=Deep learning&rft_subject=Machine learning&rft_subject=INFORMATION AND COMPUTING SCIENCES&rft_subject=Neural networks&rft.type=dataset&rft.language=English Access the data

Contact Information

[email protected]
School of the Environment

Full description

Supplementary files required for https://github.com/scottlallen/DserSweepsThe reference genome of D. serrata was created using long-read sequencing technology and has a length of 198 Mbp and a contig N50 of 0.94 Mbp (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002093755.2/). We subsequently used Dovetail HiRise and Hi-C methods to scaffold those contigs and achieved a scaffold N50 of 30.3 Mb. The six largest scaffolds span 80% of the genome and reach near chromosome-arm level length except for 2L, which is spanned by two large scaffolds of 21Mb and 8.7 Mb. The genome was annotated by NCBI and lifted over to the Hi-C genome. File Descriptions:FASTA Files: drosophila_06Jul2018_A8VGg.fasta : Original Hi-C genome sequence. - drosophila_06Jul2018_A8VGg_noSpecialChar.fasta : Hi-C genome sequence with special characters removed from scaffold names.drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED.fasta : Masked version of the Hi-C genome sequence. drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED_shortName.fasta : Masked version of the Hi-C genome sequence with short scaffold names. - `top6.anc.fa : Hi-C genome sequence of the 6 longest scaffolds specifying the ancestral sequence. GFF Files: GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted.gff : NCBI Annotation file converted to Hi-C scaffolds. GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted_noSpecialChar.gff : Annotation file converted to Hi-C scaffolds with special characters removed from scaffold names. FAA File: GCF_002093755.1_Dser1.0_protein.faa : Protein sequences. FNA File: GCF_002093755.1_Dser1.0_rna_from_genomic.fna : Coding sequences.

Issued: 23 07 2024

This dataset is part of a larger collection

Click to explore relationships graph
Other Information
The impacts of positive selection on genomic variation in Drosophila serrata: Insights from a deep learning approach

local : UQ:5306efe

Wang, Yiguan, Allen, Scott L., Reddiex, Adam J. and Chenoweth, Stephen F. (2024). The impacts of positive selection on genomic variation in Drosophila serrata: Insights from a deep learning approach. Molecular Ecology, 33 (18). doi: 10.1111/mec.17499

Research Data Collections

local : UQ:289097

Identifiers