Science Cast

Active Learning for Budget-Constrained TCR--pMHC Wet-Lab Validation

librarianApril 17, 2026 4:56pm

Views (5)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Active Learning for Budget-Constrained TCR--pMHC Wet-Lab Validation

bioRxivPDFApril 17, 2026 12:00am

Authors

Mazur, K.; Piotrowska, M.; Kowalski, J.

Abstract

Wet-lab validation of TCR--pMHC binding hypotheses is the rate-limiting step in T-cell therapy discovery: a single binding assay round can cost thousands of dollars and weeks of turnaround time, yet computational models generate thousands of candidate pairs per run. We frame this as a \emph{pool-based active learning} problem: given a fixed annotation budget $B$, which unlabeled pairs should be sent to the assay to maximally improve a predictive model that will guide the next screening round? We introduce \emph{UDAL} (Uncertainty--Diversity Active Learning), a batch acquisition strategy that combines BALD-based uncertainty estimation via MC Dropout with greedy core-set diversity selection in the encoder feature space. Evaluated on a curated VDJdb--IEDB benchmark under epitope-held-out and distance-aware protocols, UDAL achieves AUPRC 0.487 with only 5{,}000 queried labels---matching the performance of a model trained on 3$\times$ more randomly sampled labels. At a budget of 2{,}000 labels, UDAL improves AUPRC by 16.7\% over random acquisition, translating directly to fewer wasted assay slots. These results demonstrate that principled active query strategies can substantially reduce the wet-lab cost of building reliable TCR specificity models.

TwitterandLinkedIn

0 comments

Add comment

Active Learning for Budget-Constrained TCR--pMHC Wet-Lab Validation

Active Learning for Budget-Constrained TCR--pMHC Wet-Lab Validation

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments