Machine Learning

Tight Sample Complexity of Transformers
Avatar
librarian
10 views
Muon Learns More Robust and Transferable Features than Adam
Avatar
Fengzhuo Zhang
10 views
Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret
Avatar
Seoungbin Bae
4 views
End-to-End Subgraph Detection with GraphDETR
Avatar
librarian
21 views
Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss
Avatar
Thomas Zhanga
28 views
Pretraining Recurrent Networks without Recurrence
Avatar
librarian
38 views
Deep Embedded Multiplicative DMD for Algebra-Preserving Koopman Learning
Avatar
Kelan Gray
27 views
Neuron Populations Exhibit Divergent Selectivity with Scale
Avatar
Amil Dravid
28 views
Dynamic Short Convolutions Improve Transformers
Avatar
librarian
30 views
q0: Primitives for Hyper-Epoch Pretraining
Avatar
librarian
30 views
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
Avatar
Ali Behrouz
27 views
When Model Merging Breaks Routing: Training-Free Calibration for MoE
Avatar
Xiaojun Quan
31 views