Predicting Antibody Self-Association with Sequence Structure Fusion Models: The Central Role of CSI-BLI in Early Developability Screening

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Predicting Antibody Self-Association with Sequence Structure Fusion Models: The Central Role of CSI-BLI in Early Developability Screening

Authors

Ahmed, S.; Devalle, F.; Leisen, L.; Pham, T.; Amofah, B.; Lee, A.; Hutchinson, M.; Chakiath, C.; DiChiara, J.; Farzandh, S.; Kreitz, M.; Hinton, A.; Mody, N.; Dippel, A.; Kaplan, G.; Pouryahya, M.

Abstract

Antibody-based biologics are expanding rapidly, yet challenges in development from self-association, high viscosity, aggregation, and unfavorable clearance underscore the need for accurate in silico screening. Clone self-interaction biolayer interferometry (CSI-BLI) is a plate-based, low-material assay of weak, reversible self-association that serves as an early proxy for high-concentration viscosity and a complementary predictor of in vivo clearance. In a 246-mAb panel, CSI-BLI moderately correlates with viscosity; further, in hFcRn Tg32 mice (41 antibodies), CSI-BLI strongly associates with clearance. Here, we present an end-to-end framework that distinguishes high versus low self-interacting clones (CSI-BLI class) by coupling a fine-tuned protein language model (ESM-2) with residue-aligned 3D context from AlphaFold-predicted structures encoded as residue graphs. Disentangled multi-stream attention fuses sequence content, chain-aware positional information, and structural signals to capture spatially proximate interactions that are distant in sequence. Edit-distance controlled splits across 1499 IgGs and 988 VHHs assess generalization. The structure-aware model achieves the highest hold-out performance (VHH-Fc F 1 = 0.76; IgG F 1 = 0.57), while a sequence-only disentangled variant outperforms a standard PLM baseline without structural inputs. Complementary biophysical feature-based models, built from AlphaFold structures and sequence/structure-derived physicochemical descriptors with cluster-aware selection, deliver robust, interpretable performance (VHH F 1 = 0.72; IgG F 1 = 0.57), with SHAP analyses highlighting charge/dipole, hydrophobicity, and aggregation-propensity drivers across CDRs and frameworks. This interaction-aware sequence structure framework, supported by interpretable feature models, is extensible to other developability endpoints and broader protein classification tasks where joint modeling of language-derived representations and residue-level geometry is advantageous.

Follow Us on

0 comments

Add comment