Skip to content
Callum McLean

Callum McLean

Research Scientist

Posts

January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers

Welcome to the first edition of our Paper of the Month newsletter for 2026!

This month, our team went through 21 different papers to find the most insightful new pieces of literature that we think have the potential to leave a mark. From this selection, three papers stood out in particular:

  • Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models. Cheng et al. introduce a simple, scalable memory-augmentation for large language models to offload the cost of simple knowledge-based retrieval to embedding lookups.

  • LTX-2: Efficient Joint Audio-Visual Foundation Model. HaCohen et al. propose a joint text-conditioned audio-visual generation framework built using modality-specific VAEs, a refined text-conditioning module, and an asymmetric dual-stream diffusion transformer.

  • How to Set the Batch Size for Large-Scale Pre-training? Zhou et al. discuss how to identify the optimal batch size for large-scale pretraining, and find that dyamically increasing the batch size through time can improve performance.

June Papers: Gradient Norms, LLM Reasoning and Video Generation

This June not only brought us very hot and sunny days (at least here in the UK), but also an excellent selection of new and exciting ML research! Out of the many good candidates, this month we selected three papers, covering quite a lot of different ground.

In the first paper, Why Gradients Rapidly Increase Near the End of Training, a researcher from FAIR explores the puzzling phenomenon of increasing gradient magnitudes during training, offering an elegant mathematical explanation and a simple remedy.

Next, in ProRL, NVIDIA researchers dive into the evolving topic of large language model reasoning, showing how prolonged reinforcement learning can indeed introduce novel reasoning abilities.

Finally, we look at AAPT, a fresh approach from the ByteDance Seed team that turns pre-trained offline diffusion models into real-time video generators via adversarial post-training.