Data

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes: supplemental material

The University of Queensland
Dr Donovan Parks (Aggregated by) Dr Michael Imelfort (Aggregated by) Honorary Professor Gene Tyson (Aggregated by) Honorary Professor Gene Tyson (Aggregated by) Mr Connor Skennerton (Aggregated by)
Viewed: [[ro.stat.viewed]] Cited: [[ro.stat.cited]] Accessed: [[ro.stat.accessed]]
ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rfr_id=info%3Asid%2FANDS&rft_id=info:doi10.14264/uql.2016.841&rft.title=CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes: supplemental material&rft.identifier=10.14264/uql.2016.841&rft.publisher=The University of Queensland&rft.description=Supplementary Results Refinement for Gene Loss and Duplication Estimates under Opal Stop Codon Recodings Supplementary Methods Identification of Trusted Reference Genomes Refining Marker Sets for Lineage-specific Gene Loss and Duplication Determination of Coding Table Systematic Bias of Completeness and Contamination Estimates Supplemental Figure S1. Distribution of the 104 bacterial and 281 gammaproteobacterial marker genes around the E. coli K12 genome. Supplemental Figure S2. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the random contig model. Supplemental Figure S3. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the inverse length model. Supplemental Figure S4. Maximum-likelihood genome tree inferred from 5656 reference genomes. Supplemental Figure S5. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the random fragment model using a window size of 20 kbp. Supplemental Figure S6. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the inverse length model. Supplemental Figure S7. Error in completeness and contamination estimates on simulated genomes from different phyla. Supplemental Figure S8. Bias in completeness and contamination estimates when modelled as a binomial distribution. Supplemental Figure S9. GC-distribution plots of the HMP Capnocytophaga sp. oral taxon 329 genome. Supplemental Figure S10. Phylogenetic placement of the two genomes (Cluster 0 and Cluster 1) identified within the HMP Capnocytophaga sp. oral taxon 329 genome. Supplemental Figure S11. Completeness estimates for 90 putative population genomes recovered from an acetate-amended aquifer. Supplemental Figure S12. Contamination estimates for 90 putative population genomes recovered from an acetate-amended aquifer. Supplemental Figure S13. Identification of the 213 marker genes within the Meyerdierks et al. (2010) ANME-1 genome. Supplemental Figure S14. Refining a marker set for lineage-specific gene loss and duplication. Supplemental Tables Supplemental Table S1. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using different universal- and domain-specific marker gene sets. Supplemental Table S2. Number of marker genes and marker sets for taxonomic groups with ≥ 20 reference genomes. Supplemental Table S3. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS). Supplemental Table S4. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS). Supplemental Table S5. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS). Supplemental Table S6. Phylogenetically informative marker genes used to infer the reference genome tree along with matching PhyloSift genes. Supplemental Table S7. Phylogenetically informative genes used in PhyloSift without a matching CheckM gene. Supplemental Table S8. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms). Supplemental Table S9. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms). Supplemental Table S10. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms). Supplemental Table S11. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker set selected by CheckM (sms). Supplemental Table S12. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker sets selected by CheckM (sms). Supplemental Table S13. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker sets selected by CheckM (sms). Supplemental Table S14. Taxonomic rank of the selected lineage-specific marker set used for evaluating the quality of genomes at different degrees of taxonomic novelty. Supplemental Table S15. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates for simulated genomes at different degrees of taxonomic novelty. Supplemental Table S16. Lineage-specific completeness and contamination estimates for isolate genomes from large-scale sequencing initiatives. (see Excel file) Supplemental Table S17. Completeness and contamination estimates of the Lactobacillus gasseri MV-22 genome for increasingly basal lineage-specific marker sets. Supplemental Table S18. Bacterial marker genes identified within the HMP Lactobacillus gasseri genomes. Markers missing from a genome or present in multiple copies are highlighted with a grey background. Supplemental Table S19. Lineage-specific completeness and contamination estimates for genomes annotated as finished at IMG, along with predicted translation tables and calculated coding density. (see Excel file) Supplemental Table S20: Lineage-specific completeness and contamination estimates for single-cell genomes from the GEBA-MDM initiative along with traditional assembly statistics. (see Excel file) Supplemental Table S21: Lineage-specific completeness and contamination estimates for population genomes, plasmids, and phage recovered from metagenomic datasets along with traditional assembly statistics. (see Excel file) Supplemental Table S22: Completeness and contamination estimates for population genomes recovered from an acetate-amended aquifer determined using domain-level and lineage-specific marker sets. (see Excel file)&rft.creator=Dr Donovan Parks&rft.creator=Dr Michael Imelfort&rft.creator=Honorary Professor Gene Tyson&rft.creator=Honorary Professor Gene Tyson&rft.creator=Mr Connor Skennerton&rft.creator=Professor Phil Hugenholtz&rft.creator=Professor Phil Hugenholtz&rft.date=2015&rft.coverage=153.050537,-27.352253&rft_rights=2015, The University of Queensland&rft_rights= https://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreement&rft_subject=eng&rft_subject=Isolates&rft_subject=Single cells&rft_subject=Metagenomic data&rft_subject=Genome quality&rft_subject=CheckM&rft_subject=Phylogeny and Comparative Analysis&rft_subject=BIOLOGICAL SCIENCES&rft_subject=EVOLUTIONARY BIOLOGY&rft_subject=Microbial Ecology&rft_subject=MICROBIOLOGY&rft.type=dataset&rft.language=English Access the data

Licence & Rights:

Other view details

Access:

Open

Contact Information

g.tyson@uq.edu.au

Full description

Supplementary Results Refinement for Gene Loss and Duplication Estimates under Opal Stop Codon Recodings Supplementary Methods Identification of Trusted Reference Genomes Refining Marker Sets for Lineage-specific Gene Loss and Duplication Determination of Coding Table Systematic Bias of Completeness and Contamination Estimates Supplemental Figure S1. Distribution of the 104 bacterial and 281 gammaproteobacterial marker genes around the E. coli K12 genome. Supplemental Figure S2. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the random contig model. Supplemental Figure S3. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the inverse length model. Supplemental Figure S4. Maximum-likelihood genome tree inferred from 5656 reference genomes. Supplemental Figure S5. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the random fragment model using a window size of 20 kbp. Supplemental Figure S6. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the inverse length model. Supplemental Figure S7. Error in completeness and contamination estimates on simulated genomes from different phyla. Supplemental Figure S8. Bias in completeness and contamination estimates when modelled as a binomial distribution. Supplemental Figure S9. GC-distribution plots of the HMP Capnocytophaga sp. oral taxon 329 genome. Supplemental Figure S10. Phylogenetic placement of the two genomes (Cluster 0 and Cluster 1) identified within the HMP Capnocytophaga sp. oral taxon 329 genome. Supplemental Figure S11. Completeness estimates for 90 putative population genomes recovered from an acetate-amended aquifer. Supplemental Figure S12. Contamination estimates for 90 putative population genomes recovered from an acetate-amended aquifer. Supplemental Figure S13. Identification of the 213 marker genes within the Meyerdierks et al. (2010) ANME-1 genome. Supplemental Figure S14. Refining a marker set for lineage-specific gene loss and duplication. Supplemental Tables Supplemental Table S1. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using different universal- and domain-specific marker gene sets. Supplemental Table S2. Number of marker genes and marker sets for taxonomic groups with ≥ 20 reference genomes. Supplemental Table S3. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS). Supplemental Table S4. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS). Supplemental Table S5. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS). Supplemental Table S6. Phylogenetically informative marker genes used to infer the reference genome tree along with matching PhyloSift genes. Supplemental Table S7. Phylogenetically informative genes used in PhyloSift without a matching CheckM gene. Supplemental Table S8. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms). Supplemental Table S9. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms). Supplemental Table S10. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms). Supplemental Table S11. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker set selected by CheckM (sms). Supplemental Table S12. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker sets selected by CheckM (sms). Supplemental Table S13. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker sets selected by CheckM (sms). Supplemental Table S14. Taxonomic rank of the selected lineage-specific marker set used for evaluating the quality of genomes at different degrees of taxonomic novelty. Supplemental Table S15. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates for simulated genomes at different degrees of taxonomic novelty. Supplemental Table S16. Lineage-specific completeness and contamination estimates for isolate genomes from large-scale sequencing initiatives. (see Excel file) Supplemental Table S17. Completeness and contamination estimates of the Lactobacillus gasseri MV-22 genome for increasingly basal lineage-specific marker sets. Supplemental Table S18. Bacterial marker genes identified within the HMP Lactobacillus gasseri genomes. Markers missing from a genome or present in multiple copies are highlighted with a grey background. Supplemental Table S19. Lineage-specific completeness and contamination estimates for genomes annotated as finished at IMG, along with predicted translation tables and calculated coding density. (see Excel file) Supplemental Table S20: Lineage-specific completeness and contamination estimates for single-cell genomes from the GEBA-MDM initiative along with traditional assembly statistics. (see Excel file) Supplemental Table S21: Lineage-specific completeness and contamination estimates for population genomes, plasmids, and phage recovered from metagenomic datasets along with traditional assembly statistics. (see Excel file) Supplemental Table S22: Completeness and contamination estimates for population genomes recovered from an acetate-amended aquifer determined using domain-level and lineage-specific marker sets. (see Excel file)

Issued: 2015

This dataset is part of a larger collection

Click to explore relationships graph

153.05054,-27.35225

153.050537,-27.352253

Subjects

User Contributed Tags    

Login to tag this record with meaningful keywords to make it easier to discover

Other Information
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

local : UQ:365357

Parks, Donovan H., Imelfort, Michael, Skennerton, Connor T., Hugenholtz, Philip and Tyson, Gene W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25 (7), 1043-1055. doi: 10.1101/gr.186072.114

Research Data Collections

local : UQ:289097

School of Chemistry and Molecular Biosciences

local : UQ:3825

Australian Centre for Ecogenomics

local : UQ:253506

Identifiers