Skip to content
Tom Cashman

Tom Cashman

Research Team Lead

Posts

February Papers: Thinking Depth, Latent Actions, Quantization and Riemannian Flows

The stream of papers never ends, even so, in February our team found 4 we'd like to share

  • Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens investiagtes how many layers are actually needed for each token during autoregressive LM rollouts.

  • Factored Latent Action World Models takes videos that contain multiple objects, and instead of encoding them into one latent state for the whole scene, tries to one latent state per object.

  • LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs generalises Hadamard transforms to better handle outliers when block-quantizing LLMs.

  • Riemannian Mean Flow extends MeanFlow for generating proteins within the coresponding structured spaces, e.g. the space of all residue positions and orientations.

November Papers: Perspectives on efficiency

November is back to a favourite topic of ours: efficiency. We reviewed three of our favorite papers looking on LLM efficiency from different angles:

  • First up, How to Scale Second-Order Optimization is looking at optimal tuning of second order optimizers such as Muon.
  • Intelligence per Watt discusses our favorite metric on large language models: energy efficiency. And how to take advantage of edge AI inference.
  • Finally, Int vs FP is contributing to an old-timer topic in quantization: integer vs floating (block) point formats.

July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation

As July brought tennis at Wimbledon, so too did the ML world serve up a volley of research. This month, we took an eagle-eyed approach—or, perhaps, Hawk Eyed approach—to three papers.

In our first paper, Subliminal Learning addresses the question, "Can we control or filter the distillation training data so that a student learns desirable properties but avoids picking up undesirable traits?" The authors conclude that the student learns all the teacher's traits, whether they're desirable or not!

Next, Mixture of Recursions brings a twist to token-level computation: instead of fixed-depth processing, the model learns to recurse adaptively, allocating compute per token dynamically and efficiently—like a rally whose length depends on the importance of the point.

Last up is DataRater, where the problem of dataset quality is addressed. A 'rater' is meta-learned to curate training data without manual filtering—an ace for data-centric AI.