Artificial Intelligence

OpenForgeRL: Train Harness-native Agents in Any Environment

OpenForgeRL: Train Harness-native Agents in An...

Artificial Intelligence

librarian

1 view

Beyond Sycophancy: Structured Resistance and Compliance in LLM Moral Reasoning

Beyond Sycophancy: Structured Resistance and C...

Artificial Intelligence

librarian

1 view

Detecting LLM-Generated Tokens in Human--LLM Coauthored Text

Detecting LLM-Generated Tokens in Human--LLM C...

Artificial Intelligence

librarian

1 view

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

PATS: Policy-Aware Training Scaffolding for Ag...

Artificial Intelligence

Yipeng Shi

3 views

AREX: Towards a Recursively Self-Improving Agent for Deep Research

AREX: Towards a Recursively Self-Improving Age...

Artificial Intelligence

librarian

2 views

Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems

Agentic Context Management: Solving Agent Memo...

Artificial Intelligence

librarian

3 views

SoftReason: A Fully Differentiable Neuro-Soft-Symbolic Deductive Reasoning Architecture over High-Dimensional Perceptual Data

SoftReason: A Fully Differentiable Neuro-Soft-...

Artificial Intelligence

Wael AbdAlmageed

6 views

PRO-LONG: Programmatic Memory Enables Long-Horizon Reasoning

PRO-LONG: Programmatic Memory Enables Long-Hor...

Artificial Intelligence

Alexis Fox

6 views

PoTRE: Test-Time Reasoning inspired by Cognitive Heterogeneity

PoTRE: Test-Time Reasoning inspired by Cogniti...

Artificial Intelligence

librarian

4 views

Train the Model, Not the Reader: Decodability Supervision for Verifiable Activation Explanations

Train the Model, Not the Reader: Decodability ...

Artificial Intelligence

Hiskias Dingeto

4 views

ResearchArena: Evaluating Sabotage and Monitoring in Automated AI R&D

ResearchArena: Evaluating Sabotage and Monitor...

Artificial Intelligence

librarian

12 views

Agents in the Wild: Where Research Meets Deployment

Agents in the Wild: Where Research Meets Deplo...

Artificial Intelligence

Grace Hui Yang

11 views

CodeRescue: Budget-Calibrated Recovery Routing for Coding Agents

CodeRescue: Budget-Calibrated Recovery Routing...

Artificial Intelligence

librarian

11 views

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

WorldCupArena: Fine-Grained Evaluation of Lang...

Artificial Intelligence

librarian

13 views

Rethinking Heterogeneous LLM Merging: A Weighted Model Averaging Perspective

Rethinking Heterogeneous LLM Merging: A Weight...

Artificial Intelligence

librarian

10 views

Can We Break LLMs Out of Self-Loops? Fine-Grained Reasoning Control with Activation Steering

Can We Break LLMs Out of Self-Loops? Fine-Grai...

Artificial Intelligence

Sheldon Yu

8 views

Logical Judgments Under Pressure: Diagnosing Syllogistic Stability with Learned Soft Prefixes

Logical Judgments Under Pressure: Diagnosing S...

Artificial Intelligence

librarian

8 views

AutoSynthesis: An agentic system for automated meta-analysis

AutoSynthesis: An agentic system for automated...

Artificial Intelligence

librarian

36 views

When Words Are Safe But Actions Kill: Probing Physical Danger Beyond Text Safety in Hidden-State Risk Space

When Words Are Safe But Actions Kill: Probing ...

Artificial Intelligence

librarian

28 views

Concept-Guided Spatial Regularization for World Models in Atari Pong

Concept-Guided Spatial Regularization for Worl...

Artificial Intelligence

librarian

27 views

Long-Context Fine-Tuning with Limited VRAM

Long-Context Fine-Tuning with Limited VRAM

Artificial Intelligence

librarian

27 views

MedFailBench: A Clinician-Built Open-Source Benchmark for Medical AI Safety Boundary Inspection

MedFailBench: A Clinician-Built Open-Source Be...

Artificial Intelligence

Goktug Ozkan

25 views

Can We Trust Item Response Theory for AI Evaluation?

Can We Trust Item Response Theory for AI Evalu...

Artificial Intelligence

Han Jiang

20 views

Benchmarking Multimodal Large Language Models for Scientific Visualization Literacy

Benchmarking Multimodal Large Language Models ...

Artificial Intelligence

Patrick Do

25 views

SearchOS-V1: Towards Robust Open-Domain Information-Seeking Agent Collaboration

SearchOS-V1: Towards Robust Open-Domain Inform...

Artificial Intelligence

librarian

23 views

The Industrialization of Research ; On AI-Driven Science and Its Consequences

The Industrialization of Research ; On AI-Driv...

Artificial Intelligence

Emmanuel Jeannot

23 views

Pretraining Data Can Be Poisoned through Computational Propaganda

Pretraining Data Can Be Poisoned through Compu...

Artificial Intelligence

Victoria Graf

19 views

Experience Memory Graph: One-Shot Error Correction for Agents

Experience Memory Graph: One-Shot Error Correc...

Artificial Intelligence

Wenjun Wang

26 views

Reproducing human biases in route choice using large language models: Toward scalable behavioral modeling

Reproducing human biases in route choice using...

Artificial Intelligence

Shuxian Xu

33 views

Interaction Scaling: Grounding the Third Axis of Test-Time Compute

Interaction Scaling: Grounding the Third Axis ...

Artificial Intelligence

Bojie Li

30 views

Think Through a Bottleneck: Hourglass Reasoning for Rigorous Induction

Think Through a Bottleneck: Hourglass Reasonin...

Artificial Intelligence

librarian

32 views

Filtering Harmful Actions Isn't Enough: Phantom Transfer in Agentic SDF

Filtering Harmful Actions Isn't Enough: Phanto...

Artificial Intelligence

librarian

27 views

Web analytics