3.1.23. unit_scaling.optim
Optimizer wrappers that apply scaling rules for u-muP.
Provides Adam
, AdamW
, SGD
as out-of-the-box
optimizers.
Alternatively, scaled_parameters()
provides finer control by
transforming a parameter group for any downstream optimizer, given a
function that defines the LR scaling rules.
Functions
|
Calculate the LR scaling factor for depth only. |
|
Calculate the LR scaling factor for |
|
Calculate the LR scaling factor for |
|
Create optimizer-appropriate lr-scaled parameter groups. |
Classes
|
An lr-scaled version of |
|
An lr-scaled version of |
|
An lr-scaled version of |