Blog

March 11, 2026
in Articles
8 min read

1-bit Wonderful Weights for LLMs

Would you rather use 1 million \(\times\) 16-bit weights, 4 million \(\times\) 4-bit weights, or even 16 million \(\times\) 1-bit weights?

In joint work between Aleph Alpha Research and Graphcore, we asked this question of LLMs — the answer encouraged us to embrace the wonder ✨ of 1-bit weights, which can outperform 4-bit and 16-bit weights on a fixed weight memory budget.

1-bit weights rule!

March 3, 2026
in Papers of the Month
12 min read

February Papers: Thinking Depth, Latent Actions, Quantization and Riemannian Flows

The stream of papers never ends, even so, in February our team found 4 we'd like to share:

Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens investigates how many layers are actually needed for each token during autoregressive LM rollouts.
Factored Latent Action World Models takes videos that contain multiple objects, and instead of encoding them into one latent state for the whole scene, employs one latent state per object.
LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs generalises Hadamard transforms to better handle outliers when block-quantizing LLMs.
Riemannian Mean Flow extends MeanFlow for generating proteins within the corresponding structured spaces, e.g. the space of all residue positions and orientations.

February 20, 2026
in Articles
8 min read

UltRAG: a Universal Simple Scalable Recipe for Knowledge Graph RAG

Knowledge graphs are an efficient and easily verifiable repository of factual information and using knowledge graph queries as a tool for LLMs to improve the factuality of their output is a promising direction. But have you ever wondered how to make query execution work for knowledge graph RAG? "No!"/"Boring!" Let us guess — queries were flawed, knowledge graphs incomplete, results were simply suboptimal. What if we tell you that we have discovered a secret... recipe.

February 10, 2026
in Papers of the Month
9 min read

January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers

Welcome to the first edition of our Paper of the Month newsletter for 2026!

This month, our team went through 21 different papers to find the most insightful new pieces of literature that we think have the potential to leave a mark. From this selection, three papers stood out in particular:

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models. Cheng et al. introduce a simple, scalable memory-augmentation for large language models to offload the cost of simple knowledge-based retrieval to embedding lookups.
LTX-2: Efficient Joint Audio-Visual Foundation Model. HaCohen et al. propose a joint text-conditioned audio-visual generation framework built using modality-specific VAEs, a refined text-conditioning module, and an asymmetric dual-stream diffusion transformer.
How to Set the Batch Size for Large-Scale Pre-training? Zhou et al. discuss how to identify the optimal batch size for large-scale pretraining, and find that dyamically increasing the batch size through time can improve performance.

January 13, 2026
in Papers of the Month
11 min read

December Papers: MoE, Fact-storing and Byteifying Language Models

Despite the holiday season and the busy NeurIPS period, December closed the year with set of insightful papers. Our team reviewed the following three papers:

First up, SonicMoE tackles issues of fine-grained and sparse MoEs using hardware-aware optimizations to restore efficiency.

Next, Constructing Efficient Fact-Storing MLPs for Transformers shows how MLP layers can be explicitly constructed as key–value stores to achieve high facts-per-parameter efficiency.

Finally, Bolmo presents a method for "byteifying" existing subword-level language models that improves character-level understanding while achieving comparable performance to subword-level models.

December 8, 2025
in Papers of the Month
11 min read

November Papers: Perspectives on efficiency

November is back to a favourite topic of ours: efficiency. We reviewed three of our favorite papers looking on LLM efficiency from different angles:

First up, How to Scale Second-Order Optimization is looking at optimal tuning of second order optimizers such as Muon.
Intelligence per Watt discusses our favorite metric on large language models: energy efficiency. And how to take advantage of edge AI inference.
Finally, Int vs FP is contributing to an old-timer topic in quantization: integer vs floating (block) point formats.

November 5, 2025
in Articles
5 min read

Why Graph Topology Matters: Insights from Applications in Drug Discovery

Knowledge Graphs in Drug Discovery

Repurposing existing drugs to treat diseases beyond what they were originally designed for can be a way to identify new disease treatment opportunities. But how do we identify which drugs might affect a given disease? This and similar questions in drug discovery, which require identifying new links between known entities, can be addressed with the help of Knowledge Graphs (KGs), graph-structured repositories of information that represent facts as (head, relation, tail) triples, connecting entities head and tail with an edge that categorizes their relationship. In the biomedical domain, entities can represent drugs and diseases, but also genes, pathways, side effects, etc. KG edges represent interactions like (disease A, associates, gene B), (gene X, upregulates, gene Y) and many more.

November 4, 2025
in Papers of the Month
12 min read

October Papers: Fast and Smart Language Models

October was packed with insights into making language models faster and smarter. We reviewed four of our favorite papers for you in detail:

First up, Grouped Lattice Vector Quantisation introduces a novel technique for a fine-grained post-training quantisation of LLMs, retaining good performance even at low bit widths.
Planned Diffusion combines autoregressive planning with text diffusion, achieving low-latency text generation.
Rethinking Thinking addresses the problem of long reasoning chains by distilling intermediate results into a bounded workspace for faster answers.
Finally, When Structure Doesn’t Help compares techniques for encoding graphs for consumption by LLMs with surprising results.

October 7, 2025
in Papers of the Month
18 min read

September Papers: The L in ML Stands for LLMs

For September, the research team reviewed a whopping 22 papers! Needless to say, competition was fierce, and only four made the final cut for this month’s edition, which is LLM-themed:

FlowRL uses GFlowNets to train LLMs on full reward distributions, promoting diverse reasoning paths instead of just reward maximization.
Soft Tokens, Hard Truths proposes using continuous “soft” tokens with injected noise to enable reinforcement learning fine-tuning of LLM reasoning.
Set Block Decoding accelerates LLM inference by generating multiple tokens in parallel using non-causal attention and iterative entropy-based sampling.
Metacognitive Reuse enables LLMs to extract and reuse concise reasoning “behaviors” to improve efficiency and reduce repeated computation.

September 8, 2025
in Papers of the Month
15 min read

August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG

August, even with its heat waves and holidays, left no shortage of exciting research. Our top papers for this month are the following: - ADMIRE-BayesOpt that investigates how to weight different data sources when they are mixed to make a single training dataset where, using multi-Fidelity Bayesian Optimization, the search for the optimal mixture can be automated; - Stable Molecule Generation that uses a force-field based reward function to fine-tune pre-trained 3D molecule generation diffusion models with the goal of sampling physically stable and valid molecules; and - Graph-R1 that takes an agentic RAG approach with a knowledge hypergraph to effectively represent and retrieve information from a corpus of documents.