Beyond Structure and Affinity: Context-Dependent Signals for de novo Binder Success
Beyond Structure and Affinity: Context-Dependent Signals for de novo Binder Success
Bozkurt, C.
AbstractDe novo protein binder design has advanced rapidly, yet most designs fail experimentally and current structure- and affinity-centred evaluation does not reliably predict which candidates will succeed. Here we show that biology-informed sequence features, derived from models trained on natural proteins, identify transferable and context-dependent associations with binder expression and binding that are not captured by structural scoring alone. We re-analysed two public benchmarks - the Bits to Binders CAR-T CD20 competition (11,984 designs; expression, proliferation, and T cell function gates) and the Adaptyv EGFR competition (603 designs; expression and binding affinity) - using five biology-informed ML models predicting disorder, amyloidogenicity, topology, PTM sites, and protein classification. Every feature was tested at each gate with FDR-corrected statistics. We identify three layers of signal. Transferable: lower aggregation propensity is the most robust cross-benchmark signal; PTM-site density recurs univariately but is partly length-confounded in EGFR. Architecture-dependent: topology, disorder, and disulfide-related descriptors are significant in both datasets but flip direction, consistent with the different requirements of CAR extracellular domains versus standalone binders. Context-specific: phosphorylation-related associations with CAR-T depletion and low-disorder dominance in EGFR binding are tied to individual assay or format contexts. In the CAR-T benchmark, stacking biology-informed filters raises the enrichment hit rate from 13.8% to 38.6% (2.8x lift) after controlling for known sequence-level predictors. These results suggest that pre-synthesis screening of de novo binders may benefit from being multi-gate and context-aware, using biology-informed sequence descriptors not only to rank candidates but also to help flag likely failure modes earlier and reduce wasted synthesis and testing.