GradeBins: a comprehensive framework to augment metagenomic bin quality control

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

GradeBins: a comprehensive framework to augment metagenomic bin quality control

Authors

Bushnell, B.; Bowers, R. M.; Villada, J. C.

Abstract

Metagenomic binning and single-cell assembly produce draft genomes whose completeness and contamination vary with experimental and computational choices. Comparing whole bin sets remains difficult because most quality assessment tools report per-bin metrics and operate either with ground truth labels or with inference estimates. GradeBins evaluates complete bin sets under two execution modes while producing matched per-bin and bin-set summaries. For real metagenomes, inference mode integrates bin statistics, mapping depth, taxonomy, and external quality estimates from tools such as CheckM2 and EukCC to standardize per-bin and bin-set quality reporting across Bacteria, Archaea, and Eukaryotes. For synthetic or otherwise labeled datasets, ground truth mode computes base-resolved completeness, contamination, and misbinning from labeled contigs or CAMI mappings, enabling objective benchmarking of binners, parameter choices, and experimental conditions, and calibration of inference-based estimates. Across synthetic metagenomes of 10, 50, 100, 500 and 1,000 Bacteria and Archaea, and a mixed metagenome containing also Eukaryotes, GradeBins separated binner and parameter effects using Total Score and a quality-weighted bin count, together with quality tier distributions, recovery fractions, and label-aware diagnostics. Inference-mode completeness generally tracked ground truth, whereas contamination and clean-bin rates showed mode-dependent shifts that were most pronounced in the mixed community. GradeBins added low overhead in these benchmarks, with peak memory below 8 GB and runtimes typically below 30 seconds. GradeBins enables reproducible protocol comparison, regression testing, and consistent quality reporting for genome-resolved metagenomics in both benchmarking and real-data settings. The full software package is open-source and available for download at https://bbmap.org/tools/gradebins.

Follow Us on

0 comments

Add comment