Tags
AGI
DFT
Evolutionary Algorithms
GNNs
- December Papers: FP8 Training & Simpler Transformers
- March Papers: De-Norming, Skill-Scaling, Over-Training and Drug-Generating
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- October Papers: Fast and Smart Language Models
- September Papers: Proper Conditioning
LLM
LLMs
- A transformer walk-through, with Gemma
- April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards
- April Papers: TriForce, QuaRot & Mixture-of-Depths
- August Papers: Hallucinations, Quantisations and Test-Time Computations
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- December Papers: FP8 Training & Simpler Transformers
- December Papers: MoE, Fact-storing and Byteifying Language Models
- December Papers: Spend Your FLOPs Wisely
- February Papers: Learning to Scale
- February Papers: Longer RoPEs & Better Quantisation
- January Papers: Great Teachers & Beyond Chinchilla
- January Papers: More Like "Reas-anuary Papers"
- July Papers: All About Scaling
- July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- June Papers: Mamba-2 & Matmul-free Models
- Llama 3.2 Vision — A Deep Dive
- March Papers: De-Norming, Skill-Scaling, Over-Training and Drug-Generating
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- May Papers: xLSTM, Schedule-Free Optimizers, and Multi-token prediction
- November Papers: An LLM Feast
- October Papers: Fast and Smart Language Models
- October Papers: Improving image generation & making LLMs think
- September Papers: Proper Conditioning
- September Papers: The L in ML Stands for LLMs
- Speeding up LLM inference using SparQ Attention & llama.cpp
RAG
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- September Papers: The L in ML Stands for LLMs
- UltRAG: a Universal Simple Scalable Recipe for Knowledge Graph RAG
RNNs
VLMs
activation-functions
active-learning
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- January Papers: Great Teachers & Beyond Chinchilla
audio-visual generation
automated-theorem-proving
batch-size
byte-level
- December Papers: MoE, Fact-storing and Byteifying Language Models
- December Papers: Spend Your FLOPs Wisely
chip-design
computer-vision
- January Papers: Great Teachers & Beyond Chinchilla
- October Papers: Improving image generation & making LLMs think
dataset
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation
diffusion
- April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- December Papers: Spend Your FLOPs Wisely
- January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers
- January Papers: Great Teachers & Beyond Chinchilla
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- October Papers: Fast and Smart Language Models
- October Papers: Improving image generation & making LLMs think
- September Papers: Proper Conditioning
diffusion transformer
distillation
distributed-training
drug-design
efficiency
efficient-inference
- April Papers: TriForce, QuaRot & Mixture-of-Depths
- August Papers: Hallucinations, Quantisations and Test-Time Computations
- February Papers: Learning to Scale
- February Papers: Longer RoPEs & Better Quantisation
- January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers
- January Papers: More Like "Reas-anuary Papers"
- July Papers: All About Scaling
- July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- June Papers: Mamba-2 & Matmul-free Models
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- May Papers: xLSTM, Schedule-Free Optimizers, and Multi-token prediction
- November Papers: An LLM Feast
- October Papers: Fast and Smart Language Models
- October Papers: Improving image generation & making LLMs think
- Optimal Formats and the Cube Root of the PDF
- September Papers: Proper Conditioning
- September Papers: The L in ML Stands for LLMs
- Speeding up LLM inference using SparQ Attention & llama.cpp
efficient-training
- August Papers: Hallucinations, Quantisations and Test-Time Computations
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- December Papers: MoE, Fact-storing and Byteifying Language Models
- December Papers: Spend Your FLOPs Wisely
- January Papers: Great Teachers & Beyond Chinchilla
- January Papers: More Like "Reas-anuary Papers"
- July Papers: All About Scaling
- June Papers: Mamba-2 & Matmul-free Models
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- May Papers: xLSTM, Schedule-Free Optimizers, and Multi-token prediction
- November Papers: An LLM Feast
- November Papers: Perspectives on efficiency
embedding-models
fine-tuning
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- February Papers: Longer RoPEs & Better Quantisation
- January Papers: More Like "Reas-anuary Papers"
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- March Papers: De-Norming, Skill-Scaling, Over-Training and Drug-Generating
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- September Papers: Proper Conditioning
- September Papers: The L in ML Stands for LLMs
flow-matching
fp8
generative-models
- April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards
- December Papers: Spend Your FLOPs Wisely
graph foundational models
graph-learning
- October Papers: Fast and Smart Language Models
- Why Graph Topology Matters: Insights from Applications in Drug Discovery
hallucinations
hiring
image-generation
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- October Papers: Improving image generation & making LLMs think
inference
- February Papers: Longer RoPEs & Better Quantisation
- January Papers: Great Teachers & Beyond Chinchilla
- January Papers: More Like "Reas-anuary Papers"
- November Papers: Perspectives on efficiency
inference-time-compute
knowledge-graphs
- UltRAG: a Universal Simple Scalable Recipe for Knowledge Graph RAG
- Why Graph Topology Matters: Insights from Applications in Drug Discovery
language-models
learning-rate-schedules
life-sciences
ligand
llm
- January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers
- November Papers: Perspectives on efficiency
local-updates
long-context
- February Papers: Longer RoPEs & Better Quantisation
- January Papers: More Like "Reas-anuary Papers"
- July Papers: All About Scaling
- November Papers: An LLM Feast
mamba
materials
memory
- January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers
- January Papers: More Like "Reas-anuary Papers"
mixture-of-experts
- April Papers: TriForce, QuaRot & Mixture-of-Depths
- December Papers: MoE, Fact-storing and Byteifying Language Models
- July Papers: All About Scaling
- July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation
- March Papers: Low-Rank Galore & 1.58-Bit Weights
molecule-generation
multi-modality
mup
- July Papers: All About Scaling
- June Papers: Mamba-2 & Matmul-free Models
- November Papers: Perspectives on efficiency
- Scale-preserving nonlinearities for u-μP
normalisation
not-transformers
- June Papers: Mamba-2 & Matmul-free Models
- May Papers: xLSTM, Schedule-Free Optimizers, and Multi-token prediction
number-formats
- December Papers: FP8 Training & Simpler Transformers
- November Papers: An LLM Feast
- Optimal Formats and the Cube Root of the PDF
optimisation
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- November Papers: Perspectives on efficiency
- September Papers: Proper Conditioning
optimization
- May Papers: xLSTM, Schedule-Free Optimizers, and Multi-token prediction
- November Papers: Perspectives on efficiency
position-embeddings
power
pretraining
quantisation
- April Papers: TriForce, QuaRot & Mixture-of-Depths
- August Papers: Hallucinations, Quantisations and Test-Time Computations
- February Papers: Learning to Scale
- February Papers: Longer RoPEs & Better Quantisation
- January Papers: More Like "Reas-anuary Papers"
- June Papers: Mamba-2 & Matmul-free Models
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- November Papers: An LLM Feast
- October Papers: Fast and Smart Language Models
- Optimal Formats and the Cube Root of the PDF
- September Papers: Proper Conditioning
quantization
reasoning
- April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards
- December Papers: Spend Your FLOPs Wisely
- February Papers: Learning to Scale
- January Papers: More Like "Reas-anuary Papers"
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- October Papers: Fast and Smart Language Models
- October Papers: Improving image generation & making LLMs think
- September Papers: The L in ML Stands for LLMs
reinforcement learning
reinforcement-learning
- April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- October Papers: Fast and Smart Language Models
- September Papers: Proper Conditioning
- September Papers: The L in ML Stands for LLMs
retrieval-augmented-generation
- August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG
- July Papers: All About Scaling
- March Papers: Low-Rank Galore & 1.58-Bit Weights
reward-modeling
scaling-laws
- February Papers: Learning to Scale
- January Papers: Great Teachers & Beyond Chinchilla
- July Papers: All About Scaling
- March Papers: De-Norming, Skill-Scaling, Over-Training and Drug-Generating
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
- November Papers: An LLM Feast
- November Papers: Perspectives on efficiency
self-correction
- October Papers: Improving image generation & making LLMs think
- September Papers: Proper Conditioning
self-improvement
sparse-attention
- April Papers: TriForce, QuaRot & Mixture-of-Depths
- Speeding up LLM inference using SparQ Attention & llama.cpp
sparsity
- December Papers: MoE, Fact-storing and Byteifying Language Models
- December Papers: Spend Your FLOPs Wisely
- January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers
- July Papers: All About Scaling
- June Papers: Mamba-2 & Matmul-free Models
- Speeding up LLM inference using SparQ Attention & llama.cpp
speculative-decoding
- April Papers: TriForce, QuaRot & Mixture-of-Depths
- February Papers: Learning to Scale
- February Papers: Longer RoPEs & Better Quantisation
state-space-models
synthetic data
synthetic-data
test-time-compute
- April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards
- May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning
training
training dynamics
training-dynamics
- Almost-scaled dot-product attention
- December Papers: FP8 Training & Simpler Transformers
- January Papers: Great Teachers & Beyond Chinchilla
- July Papers: All About Scaling
- June Papers: Gradient Norms, LLM Reasoning and Video Generation
- June Papers: Mamba-2 & Matmul-free Models
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- May Papers: xLSTM, Schedule-Free Optimizers, and Multi-token prediction
- November Papers: An LLM Feast
- Scale-preserving nonlinearities for u-μP
transformers
- A transformer walk-through, with Gemma
- December Papers: FP8 Training & Simpler Transformers
- December Papers: MoE, Fact-storing and Byteifying Language Models
- December Papers: Spend Your FLOPs Wisely
- February Papers: Learning to Scale
- June Papers: Mamba-2 & Matmul-free Models
- Llama 3.2 Vision — A Deep Dive
- March Papers: De-Norming, Skill-Scaling, Over-Training and Drug-Generating
- March Papers: Low-Rank Galore & 1.58-Bit Weights
- October Papers: Improving image generation & making LLMs think
- Speeding up LLM inference using SparQ Attention & llama.cpp