3.1.15. unit_scaling.MLP

class unit_scaling.MLP(hidden_size: int, expansion_factor: int = 4)[source]

A unit-scaled implementation of an MLP layer using SwiGLU.

Parameters:
  • hidden_size (int) – the hidden dimension size of the input.

  • expansion_factor (int) – the factor by which the MLP’s intermediate size increases relative to hidden_size.