Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Pinned
Suggested

Filters

03 Apr 2025
reinforcement-learninginference-optimizationonline-learning

DeepSeek-AI researchers develop a Self-Principled Critique Tuning (SPCT) framework that enables reward models to scale performance with increased inference compute, achieving competitive results with GPT-4o and Nemotron-4-340B while demonstrating improved scalability through parallel sampling and meta-reward modeling techniques.

07 Apr 2025
generative-modelstransformersvideo-understanding

Researchers from NVIDIA and leading universities introduce Test-Time Training (TTT) layers within pre-trained Diffusion Transformers to generate one-minute-long videos from text storyboards, enabling coherent multi-scene narratives while addressing the computational limitations of traditional self-attention mechanisms for long sequences.

07 Apr 2025
reinforcement-learningsynthetic-datareasoning

Stanford and Google DeepMind researchers introduce SWiRL (Step-Wise Reinforcement Learning), a framework that combines synthetic data generation with offline reinforcement learning to enable large language models to perform multi-step reasoning and tool use, achieving 11-21% accuracy improvements across five different reasoning benchmarks while demonstrating strong cross-task generalization.

03 Apr 2025
attention-mechanismstransformersmechanistic-interpretability

University of Oxford and Google DeepMind researchers demonstrate that attention sinks (the tendency of LLMs to heavily attend to the first token) serve as a defense mechanism against over-mixing and representational collapse in transformer architectures, with mathematical analysis showing stronger sink formation in larger models and longer contexts.

05 Apr 2025
reasoningchain-of-thoughttransformers

Independent research identifies and addresses performance degradation in long-context language models through a retrieval-reflection mechanism, demonstrating that accuracy decline correlates with shorter thinking processes rather than context placement while achieving improved performance on both MNIAH-R and mathematical reasoning tasks.

06 Apr 2025
autonomous-vehiclesimitation-learningrobotics-perception

Researchers from NVIDIA and leading universities analyze scaling laws in end-to-end autonomous driving using an 8,000-hour industry-scale dataset, revealing that concurrent scaling of data and model capacity is essential while demonstrating that performance improvements vary significantly across different driving tasks and sensor configurations.

03 Apr 2025
agentschain-of-thoughtknowledge-distillation

ETH Zurich researchers introduce Knowledge Graph of Thoughts (KGoT), an AI assistant architecture that dynamically constructs and utilizes knowledge graphs for complex task solving, achieving 29% higher success rates on the GAIA benchmark while reducing operational costs by 36x compared to GPT-4 based approaches.

07 Apr 2025
efficient-transformersinference-optimizationknowledge-distillation

China Merchants Bank researchers introduce LagKV, a KV cache compression method that uses lag-relative information to identify important tokens during LLM inference, achieving 8x compression with only 10% performance degradation while maintaining compatibility with FlashAttention optimizations.

07 Apr 2025
model-compressionchain-of-thoughtreasoning

Researchers at Tsinghua University and Huawei Noah's Ark Lab conduct a systematic evaluation of quantization's impact on reasoning language models, revealing that 8-bit weight-activation quantization preserves accuracy while 4-bit weight-only quantization achieves near-lossless results across multiple reasoning benchmarks and model architectures.

05 Apr 2025
reasoning-verificationtransformerschain-of-thought

Essential AI researchers reveal that reflective reasoning capabilities emerge naturally during language model pre-training, demonstrating consistent improvement in both situational and self-reflection abilities across model scales without requiring specialized post-training techniques, while introducing a framework to measure reflection through adversarial datasets.