MVCBench: A Multimodal Benchmark for Drug-induced Virtual Cell Phenotypes

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

MVCBench: A Multimodal Benchmark for Drug-induced Virtual Cell Phenotypes

Authors

Li, B.; Wang, Q.; Wang, S.; Zhang, B.; Peng, Y.; Zeng, P.; Liu, C.; Li, M.; Tang, Z.; Yao, X.; Deng, C.; Song, Q.

Abstract

Drugs induce coordinated phenotypic changes across multiple modalities, including transcriptional reprogramming and cellular morphological remodeling. Predicting these drug-induced modality changes is central to drug discovery, mechanism-of-action studies and precision therapeutics, however, prediction performance depends critically on how both drug compounds and cellular states are represented. Despite rapid advances in drug molecular and gene representation methods, a systematic evaluation of these methods remains lacking. Herein, we introduce MVCBench, a comprehensive benchmarking framework for evaluating drug molecular and gene representation methods in predicting drug-induced multimodal virtual cell (MVC) phenotypes. MVCBench leverages large-scale transcriptomic and high-content imaging data and systematically evaluates 24 representation methods (12 drug molecular and 12 gene representation methods) across nearly 1.1 million drug-induced profiles, under both in-distribution and out-of-distribution settings spanning unseen compounds, cell lines, assay plates and datasets. Our benchmarking reveals a pronounced modality-dependent asymmetry: advanced drug molecular representations substantially improve the prediction of drug-induced morphological phenotypes but provide only limited gains for gene expression prediction relative to classical fingerprints, whereas task-specific gene representations outperform general-purpose foundation models in predicting drug-induced transcriptomic responses. Predictive performance also deteriorates sharply under distribution shift, highlighting persistent challenges in cross-dataset and cross-platform generalization. We further show that integrating transcriptomic and morphological modalities consistently improves prediction accuracy, and derive practical design principles for MVC architectures, including modality-aware loss calibration and fusion strategies. Together, MVCBench provides a systematic foundation for evaluating representation methods and offers guidance for developing robust MVC models of drug-induced cellular responses.

Follow Us on

0 comments

Add comment