Data

Allele Frequencies between worldwide domestic sheep and Asiatic Mouflon

Commonwealth Scientific and Industrial Research Organisation
Naval Sanchez, Marina ; Kijas, James
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.4225/08/5a78ff454afe2&rft.title=Allele Frequencies between worldwide domestic sheep and Asiatic Mouflon&rft.identifier=10.4225/08/5a78ff454afe2&rft.publisher=Commonwealth Scientific and Industrial Research Organisation (CSIRO)&rft.description=Supplementary Data9: Allele Frequencies for 14 million SNPs MAF>0.05Samples. A total of 70 animals were sampled from 43 domestic breeds and subjected to genome sequencing. These comprise 46 animals selected from an earlier SNP array based global survey of breed diversity 45 and another six animals used for SNP discovery, construction of the SNP50 BeadChip and CNV detection. The final group of 18 individuals have not been examined before. Breeds were drawn from Asia (12), Africa (6), the Middle East (13), the Americas (8), the United Kingdom (8) and continental Europe (23). Whole genome sequence data for 19 Asian mouflon (Ovis orientalis) was collected and made available by the NEXTGEN project (http://nextgen.epfl.ch/). Fastq files were downloaded from the ENA public repository (http://www.ebi.ac.uk:/ena/data/view/ERP001583) and processed as described below for the domestic sheep genomes. Genome sequencing, variant detection and annotation. Paired-end short insert libraries were constructed using 5 ug of genomic DNA and sequenced on the Illumina HiSeq 2000 platform. Reads were mapped against the sheep reference assembly v3.1 using BWA aligner v0.7.12 (bwa aln + bwa sampe, default parameters). Animals were sequenced to an average median depth of 11.8 x (8.4-17.2 x) (Supplementary Table Data 1). Duplicate reads were removed using Picard tools (http://broadinstitute.github.io/picard/), and local realignment around INDELS was performed using GATK v3.2.. Variant detection and SNP diversity analyses were performed using SAMTOOLS 1.2.1 mpileup and annotated using VCFTools v0.1.14. After obtaining genotype calls for a total of 89 samples the following filters were applied using a combination of VCFtools and in-house scripts: i) SNP were retained in positions with read depth between 5x and twice the average depth per sample; ii) minimum mapping quality of 30 and base quality of 20 were applied; iii) SNP within 5bp of INDELS were removed; iv) for SNP pairs separated by less than 4bp, the lower quality variant was excluded; v) tri-allelic variants were removed; vi) SNP called in less than 90% of animals were excluded and vii) SNP displaying an excess of heterozygosity were excluded (--hwe 0.001). This defined a set of 28,100,631 SNP across domestic (67) and mouflon (17) genomes. A total of five low coverage animals were excluded (3 domestic and 2 mouflon). PLINK v1.9 was used to perform genetic diversity estimates and PCA (https://www.cog-genomics.org/plink2). The variant effect predictor tool from ensembl (version 78) was used to identify 24 separate SNP classifications, including coding, missense and non-synonymous substitutions, intron and intergenic, in relation to the gene models annotated on reference assembly OARv3.1 . Allele frequency (AF) was estimated for each SNP separately for domestic and wild sheep genomes using PLINK V1.9 (--freq –within)&rft.creator=Naval Sanchez, Marina &rft.creator=Kijas, James &rft.date=2018&rft.edition=v1&rft_rights=All Rights (including copyright) CSIRO 2018.&rft_rights=Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/&rft_subject=Allele Frequency, sheep, mouflon&rft_subject=Genomics&rft_subject=BIOLOGICAL SCIENCES&rft_subject=GENETICS&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Open Licence view details
CC-BY

Creative Commons Attribution
https://creativecommons.org/licenses/by/4.0/

All Rights (including copyright) CSIRO 2018.

Access:

Open view details

Data is accessible online and may be reused in accordance with licence conditions

Brief description

Supplementary Data9: Allele Frequencies for 14 million SNPs MAF>0.05

Lineage

Samples.
A total of 70 animals were sampled from 43 domestic breeds and subjected to genome sequencing. These comprise 46 animals selected from an earlier SNP array based global survey of breed diversity 45 and another six animals used for SNP discovery, construction of the SNP50 BeadChip and CNV detection. The final group of 18 individuals have not been examined before. Breeds were drawn from Asia (12), Africa (6), the Middle East (13), the Americas (8), the United Kingdom (8) and continental Europe (23). Whole genome sequence data for 19 Asian mouflon (Ovis orientalis) was collected and made available by the NEXTGEN project (http://nextgen.epfl.ch/). Fastq files were downloaded from the ENA public repository (http://www.ebi.ac.uk:/ena/data/view/ERP001583) and processed as described below for the domestic sheep genomes.

Genome sequencing, variant detection and annotation.
Paired-end short insert libraries were constructed using 5 ug of genomic DNA and sequenced on the Illumina HiSeq 2000 platform. Reads were mapped against the sheep reference assembly v3.1 using BWA aligner v0.7.12 (bwa aln + bwa sampe, default parameters). Animals were sequenced to an average median depth of 11.8 x (8.4-17.2 x) (Supplementary Table Data 1). Duplicate reads were removed using Picard tools (http://broadinstitute.github.io/picard/), and local realignment around INDELS was performed using GATK v3.2.. Variant detection and SNP diversity analyses were performed using SAMTOOLS 1.2.1 mpileup and annotated using VCFTools v0.1.14. After obtaining genotype calls for a total of 89 samples the following filters were applied using a combination of VCFtools and in-house scripts: i) SNP were retained in positions with read depth between 5x and twice the average depth per sample; ii) minimum mapping quality of 30 and base quality of 20 were applied; iii) SNP within 5bp of INDELS were removed; iv) for SNP pairs separated by less than 4bp, the lower quality variant was excluded; v) tri-allelic variants were removed; vi) SNP called in less than 90% of animals were excluded and vii) SNP displaying an excess of heterozygosity were excluded (--hwe 0.001). This defined a set of 28,100,631 SNP across domestic (67) and mouflon (17) genomes. A total of five low coverage animals were excluded (3 domestic and 2 mouflon). PLINK v1.9 was used to perform genetic diversity estimates and PCA (https://www.cog-genomics.org/plink2). The variant effect predictor tool from ensembl (version 78) was used to identify 24 separate SNP classifications, including coding, missense and non-synonymous substitutions, intron and intergenic, in relation to the gene models annotated on reference assembly OARv3.1 .

Allele frequency (AF) was estimated for each SNP separately for domestic and wild sheep genomes using PLINK V1.9 (--freq –within)

Data time period: 2012-01-01 to 2014-01-01

This dataset is part of a larger collection

Click to explore relationships graph
Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Identifiers