alphaXiv

175,239

03 Apr 2025

reinforcement-learninginference-optimizationonline-learning

Inference-Time Scaling for Generalist Reward Modeling

DeepSeek-AI researchers develop a Self-Principled Critique Tuning (SPCT) framework that enables reward models to scale performance with increased inference compute, achieving competitive results with GPT-4o and Nemotron-4-340B while demonstrating improved scalability through parallel sampling and meta-reward modeling techniques.

15,763

07 Apr 2025

generative-modelstransformersvideo-understanding

One-Minute Video Generation with Test-Time Training

Karan Dalal

Researchers from NVIDIA and leading universities introduce Test-Time Training (TTT) layers within pre-trained Diffusion Transformers to generate one-minute-long videos from text storyboards, enabling coherent multi-scene narratives while addressing the computational limitations of traditional self-attention mechanisms for long sequences.

8,848

07 Apr 2025

reinforcement-learningsynthetic-datareasoning

Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use

Stanford and Google DeepMind researchers introduce SWiRL (Step-Wise Reinforcement Learning), a framework that combines synthetic data generation with offline reinforcement learning to enable large language models to perform multi-step reasoning and tool use, achieving 11-21% accuracy improvements across five different reasoning benchmarks while demonstrating strong cross-task generalization.

31,336

03 Apr 2025

attention-mechanismstransformersmechanistic-interpretability

Why do LLMs attend to the first token?

University of Oxford and Google DeepMind researchers demonstrate that attention sinks (the tendency of LLMs to heavily attend to the first token) serve as a defense mechanism against over-mixing and representational collapse in transformer architectures, with mathematical analysis showing stronger sink formation in larger models and longer contexts.

9,264

05 Apr 2025

reasoningchain-of-thoughttransformers

Reasoning on Multiple Needles In A Haystack

Independent research identifies and addresses performance degradation in long-context language models through a retrieval-reflection mechanism, demonstrating that accuracy decline correlates with shorter thinking processes rather than context placement while achieving improved performance on both MNIAH-R and mathematical reasoning tasks.

4,827

06 Apr 2025

autonomous-vehiclesimitation-learningrobotics-perception

Data Scaling Laws for End-to-End Autonomous Driving

Researchers from NVIDIA and leading universities analyze scaling laws in end-to-end autonomous driving using an 8,000-hour industry-scale dataset, revealing that concurrent scaling of data and model capacity is essential while demonstrating that performance improvements vary significantly across different driving tasks and sensor configurations.

15,219

03 Apr 2025

agentschain-of-thoughtknowledge-distillation

Affordable AI Assistants with Knowledge Graph of Thoughts

ETH Zurich researchers introduce Knowledge Graph of Thoughts (KGoT), an AI assistant architecture that dynamically constructs and utilizes knowledge graphs for complex task solving, achieving 29% higher success rates on the GAIA benchmark while reducing operational costs by 36x compared to GPT-4 based approaches.

5,046

07 Apr 2025

efficient-transformersinference-optimizationknowledge-distillation

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

China Merchants Bank researchers introduce LagKV, a KV cache compression method that uses lag-relative information to identify important tokens during LLM inference, achieving 8x compression with only 10% performance degradation while maintaining compatibility with FlashAttention optimizations.

2,628

07 Apr 2025

model-compressionchain-of-thoughtreasoning

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Researchers at Tsinghua University and Huawei Noah's Ark Lab conduct a systematic evaluation of quantization's impact on reasoning language models, revealing that 8-bit weight-activation quantization preserves accuracy while 4-bit weight-only quantization achieves near-lossless results across multiple reasoning benchmarks and model architectures.

4,539

05 Apr 2025

reasoning-verificationtransformerschain-of-thought

Rethinking Reflection in Pre-Training

Essential AI researchers reveal that reflective reasoning capabilities emerge naturally during language model pre-training, demonstrating consistent improvement in both situational and self-reflection abilities across model scales without requiring specialized post-training techniques, while introducing a framework to measure reflection through adversarial datasets.

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Filters