Science Cast

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

Thomas ZolloApril 22, 2026 2:15am

Views (12)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

arXivPDFApril 21, 2026 12:00am

Authors

Thomas Zollo, Jimmy Wang, Richard Zemel

Abstract

Reasoning language models can solve increasingly complex tasks, but struggle to produce the calibrated confidence estimates necessary for reliable deployment. Existing calibration methods usually depend on labels or repeated sampling at inference time, making them impractical in many settings. We introduce a method for unsupervised confidence calibration of reasoning LLMs when only a single generation is available at inference time. Our approach uses offline sampling on unlabeled data to derive a self-consistency-based proxy target, then distills this signal into a lightweight deployment-time confidence predictor. In a broad evaluation across 5 math and question-answering tasks using 9 reasoning models, our method substantially outperforms baselines, including under distribution shift, and improves downstream performance in selective prediction and simulated downstream decision-making.

TwitterandLinkedIn

0 comments

Add comment

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments