Beyond single markers: bacterial synergies identified by Multidimensional Feature Selection reveal conserved microbiome disease signatures
Beyond single markers: bacterial synergies identified by Multidimensional Feature Selection reveal conserved microbiome disease signatures
Zielinska, K.; Rudnicki, W.; Labaj, P. P.
AbstractThe gut microbiome encodes disease-relevant information not only in the abundance of individual taxa and functions, but in the way they co-occur and interact. Yet metagenomic analyses have largely relied on univariate approaches that evaluate features in isolation, systematically overlooking the combinatorial signals that arise from microbial co-occurrence. Here, we introduce a framework based on the Multidimensional Feature Selection (MDFS) algorithm to identify synergistic feature pairs - combinations of taxa and functions whose joint predictive relevance substantially exceeds that of either constituent alone, including features that carry no individual signal and would be discarded by any conventional analysis. We first validated the approach on a meta-analysis of colorectal cancer (CRC) cohorts - one of the most competitive microbiome classification benchmarks available - using a leave-one-cohort-out cross-validation framework. Our framework matched state-of-the-art classification performance (AUC = 0.85) while simultaneously revealing microbial interactions that are structurally inaccessible to univariate methods. A subset of high-stability synergistic pairs showed consistently elevated model selection frequencies and robust discriminatory power across independent cohorts, confirmed under stringent per-cohort effect size testing. Extending the framework to 20 disease cohorts spanning inflammatory bowel disease, type 2 diabetes, liver cirrhosis, and atherosclerotic cardiovascular disease, we identified thousands of high-impact synergistic interactions and 21 conserved cross-cohort markers. Across all contexts examined, synergistic pairs substantially outperformed their individual constituents, establishing microbial co-occurrence as a reproducible and biologically informative axis of disease-associated variation that univariate approaches are structurally unable to detect. The framework is freely available at https://github.com/Kizielins/MDFS_synergies. Importance: Most microbiome studies search for individual gut bacterial species associated with disease. However, bacteria do not act in isolation, and their combined presence or relative balance may be far more informative than any single microbe considered alone. This study presents a computational framework that identifies pairs of gut microorganisms whose co-occurrence or relative abundance carries substantially greater predictive signal than either constituent feature independently. Applied to stool metagenomic data from patients with colorectal cancer and more than a dozen additional conditions, we demonstrate that these synergistic interactions are widespread, reproducible across independent patient cohorts, and reveal disease-relevant microbial relationships that standard analyses miss entirely. Our framework offers a more complete view of how the gut microbiome is altered in disease and provides a principled basis for identifying robust, interaction-based biomarkers.