Data

Genome Annotation of the Biting Midges Culicoides sonorensis and Culicoides stellifer Generated Using the EGAPx-alpha Pipeline

Commonwealth Scientific and Industrial Research Organisation
Ahmed, Asif ; Klein, Melissa ; Court, Leon ; Rane, Rahul ; Walsh, Tom ; Paradkar, Prasad ; Lynch, Stacey ; Eagles, Debbie ; Pandey, Gunjan
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.25919/1tf9-bn03&rft.title=Genome Annotation of the Biting Midges Culicoides sonorensis and Culicoides stellifer Generated Using the EGAPx-alpha Pipeline&rft.identifier=https://doi.org/10.25919/1tf9-bn03&rft.publisher=Commonwealth Scientific and Industrial Research Organisation&rft.description=This collection contains gene annotation datasets for Culicoides sonorensis and Culicoides stellifer (Diptera: Ceratopogonidae), two biting midge species of veterinary significance. Culicoides sonorensis is a confirmed vector of Bluetongue and Epizootic Hemorrhagic Disease viruses in North America, while C. stellifer has been implicated in Orbivirus transmission in the southeastern United States. The reference genome assemblies for these species are publicly available at NCBI under accession numbers GCA_047716325.1 (C. sonorensis) and GCA_040583785.1 (C. stellifer).\n\nGene models for both assemblies were predicted using the EGAPx-alpha pipeline (egapx:0.3.2-alpha), employing the same C. sonorensis RNA-Seq dataset (ERR2171964 – ERR2171978) as transcriptomic evidence for model training, as no C. stellifer RNA-Seq data are currently available (Last checked- 17/11/2025). The dataset includes gene model coordinates (GFF), transcript nucleotide sequences (FNA), and predicted protein sequences (FAA). \n\nThese curated annotation datasets were generated to support the forthcoming manuscript: Chromosome-scale genome of Culicoides brevitarsis highlights genetic basis of vector competency. They provide consistent annotation resources for cross-species comparative analyses among Culicoides midges.\nLineage: Culicoides sonorensis\nGenome assembly: GCA_047716325.1 (idCulSono.KS.ABADRU.1.0.female)\nIsolate: Kansas colony\nWGS project: JBLLJK01\nSubmitter: Ag100Pest Initiative (USDA-ARS ABADRU)\nRelease date: 12 Feb 2025\nAnnotation pipeline: EGAPx-alpha (egapx:0.3.2-alpha)\nTranscript evidence: C. sonorensis RNA-Seq ERR2171964 – ERR2171978\n\nCulicoides stellifer: \nGenome assembly: GCA_040583785.1 (c_stellifer_primary030_purged)\nSubmitter: University of Guelph\nRelease date: 10 Jul 2024\nAnnotation pipeline: EGAPx-alpha (egapx:0.3.2-alpha)\nTranscript evidence: Shared C. sonorensis RNA-Seq dataset (ERR2171964 – ERR2171978) used for training and gene structure validation&rft.creator=Ahmed, Asif &rft.creator=Klein, Melissa &rft.creator=Court, Leon &rft.creator=Rane, Rahul &rft.creator=Walsh, Tom &rft.creator=Paradkar, Prasad &rft.creator=Lynch, Stacey &rft.creator=Eagles, Debbie &rft.creator=Pandey, Gunjan &rft.date=2025&rft.edition=v1&rft_rights=Creative Commons Attribution 4.0 International Licence https://creativecommons.org/licenses/by/4.0/&rft_rights=Data is accessible online and may be reused in accordance with licence conditions&rft_rights=All Rights (including copyright) CSIRO 2025.&rft_subject=Culicoides sonorensis&rft_subject=Culicoides stellifer&rft_subject=EGAPx-alpha&rft_subject=genome annotation&rft_subject=Ceratopogonidae&rft_subject=Diptera&rft_subject=vector genomics&rft_subject=Bluetongue virus&rft_subject=Orbivirus&rft_subject=vector competence&rft_subject=arthropod genomics&rft_subject=Veterinary sciences not elsewhere classified&rft_subject=Veterinary sciences&rft_subject=AGRICULTURAL, VETERINARY AND FOOD SCIENCES&rft_subject=Bioinformatics and computational biology not elsewhere classified&rft_subject=Bioinformatics and computational biology&rft_subject=BIOLOGICAL SCIENCES&rft_subject=Genomics&rft_subject=Genetics&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Open Licence view details
CC-BY

Creative Commons Attribution 4.0 International Licence
https://creativecommons.org/licenses/by/4.0/

Data is accessible online and may be reused in accordance with licence conditions

All Rights (including copyright) CSIRO 2025.

Access:

Open view details

Accessible for free

Contact Information



Brief description

This collection contains gene annotation datasets for Culicoides sonorensis and Culicoides stellifer (Diptera: Ceratopogonidae), two biting midge species of veterinary significance. Culicoides sonorensis is a confirmed vector of Bluetongue and Epizootic Hemorrhagic Disease viruses in North America, while C. stellifer has been implicated in Orbivirus transmission in the southeastern United States. The reference genome assemblies for these species are publicly available at NCBI under accession numbers GCA_047716325.1 (C. sonorensis) and GCA_040583785.1 (C. stellifer).

Gene models for both assemblies were predicted using the EGAPx-alpha pipeline (egapx:0.3.2-alpha), employing the same C. sonorensis RNA-Seq dataset (ERR2171964 – ERR2171978) as transcriptomic evidence for model training, as no C. stellifer RNA-Seq data are currently available (Last checked- 17/11/2025). The dataset includes gene model coordinates (GFF), transcript nucleotide sequences (FNA), and predicted protein sequences (FAA).

These curated annotation datasets were generated to support the forthcoming manuscript: Chromosome-scale genome of Culicoides brevitarsis highlights genetic basis of vector competency. They provide consistent annotation resources for cross-species comparative analyses among Culicoides midges.
Lineage: Culicoides sonorensis
Genome assembly: GCA_047716325.1 (idCulSono.KS.ABADRU.1.0.female)
Isolate: Kansas colony
WGS project: JBLLJK01
Submitter: Ag100Pest Initiative (USDA-ARS ABADRU)
Release date: 12 Feb 2025
Annotation pipeline: EGAPx-alpha (egapx:0.3.2-alpha)
Transcript evidence: C. sonorensis RNA-Seq ERR2171964 – ERR2171978

Culicoides stellifer:
Genome assembly: GCA_040583785.1 (c_stellifer_primary030_purged)
Submitter: University of Guelph
Release date: 10 Jul 2024
Annotation pipeline: EGAPx-alpha (egapx:0.3.2-alpha)
Transcript evidence: Shared C. sonorensis RNA-Seq dataset (ERR2171964 – ERR2171978) used for training and gene structure validation

Available: 2025-11-17

Data time period: 2022-09-10 to 2025-10-10

This dataset is part of a larger collection

Click to explore relationships graph