Unit Scaling
Welcome to the unit-scaling
library. This library is designed to facilitate
the use of the unit scaling and u-µP methods, as outlined in the papers
Unit Scaling: Out-of-the-Box Low-Precision Training (ICML, 2023) and
u-μP: The Unit-Scaled Maximal Update Parametrization
For a demonstration of the library, see u-μP using the unit_scaling library — a notebook showing the definition and training of a u-µP language model, comparing against Standard Parametrization (SP).
Installation
To install unit-scaling
, run:
pip install unit-scaling
Getting Started
We recommend that new users get started with Section 1. User guide.
A reference outlining our API can be found at Section 3. API reference.
The following video gives a broad overview of the workings of unit scaling.
Note
The library is currently in its beta release. Some features have yet to be implemented and occasional bugs may be present. We’re keen to help users with any problems they encounter.
The following slides also give an overview of u-µP.
Development
For those who wish to develop on the unit-scaling
codebase, clone or fork our
GitHub repo and follow the
instructions in our developer guide.
- 1. User guide
- 2. Limitations
- 3. API reference
- 3.1. unit_scaling
- 3.1.1. unit_scaling.Parameter
- 3.1.2. unit_scaling.transformer_residual_scaling_rule
- 3.1.3. unit_scaling.visualiser
- 3.1.4. unit_scaling.Conv1d
- 3.1.5. unit_scaling.CrossEntropyLoss
- 3.1.6. unit_scaling.DepthModuleList
- 3.1.7. unit_scaling.DepthSequential
- 3.1.8. unit_scaling.Dropout
- 3.1.9. unit_scaling.Embedding
- 3.1.10. unit_scaling.GELU
- 3.1.11. unit_scaling.LayerNorm
- 3.1.12. unit_scaling.Linear
- 3.1.13. unit_scaling.LinearReadout
- 3.1.14. unit_scaling.MHSA
- 3.1.15. unit_scaling.MLP
- 3.1.16. unit_scaling.RMSNorm
- 3.1.17. unit_scaling.SiLU
- 3.1.18. unit_scaling.Softmax
- 3.1.19. unit_scaling.TransformerDecoder
- 3.1.20. unit_scaling.TransformerLayer
- 3.1.21. unit_scaling.core
- 3.1.22. unit_scaling.functional
- 3.1.23. unit_scaling.optim
- 3.1.24. unit_scaling.parameter
- 3.2. unit_scaling.analysis
- 3.3. unit_scaling.constraints
- 3.3.1. unit_scaling.constraints.amean
- 3.3.2. unit_scaling.constraints.apply_constraint
- 3.3.3. unit_scaling.constraints.gmean
- 3.3.4. unit_scaling.constraints.hmean
- 3.3.5. unit_scaling.constraints.to_grad_input_scale
- 3.3.6. unit_scaling.constraints.to_left_grad_scale
- 3.3.7. unit_scaling.constraints.to_output_scale
- 3.3.8. unit_scaling.constraints.to_right_grad_scale
- 3.4. unit_scaling.formats
- 3.1.22. unit_scaling.functional
- 3.1.22.1. unit_scaling.functional.add
- 3.1.22.2. unit_scaling.functional.conv1d
- 3.1.22.3. unit_scaling.functional.cross_entropy
- 3.1.22.4. unit_scaling.functional.dropout
- 3.1.22.5. unit_scaling.functional.embedding
- 3.1.22.6. unit_scaling.functional.gelu
- 3.1.22.7. unit_scaling.functional.layer_norm
- 3.1.22.8. unit_scaling.functional.linear
- 3.1.22.9. unit_scaling.functional.linear_readout
- 3.1.22.10. unit_scaling.functional.matmul
- 3.1.22.11. unit_scaling.functional.mse_loss
- 3.1.22.12. unit_scaling.functional.residual_add
- 3.1.22.13. unit_scaling.functional.residual_apply
- 3.1.22.14. unit_scaling.functional.residual_split
- 3.1.22.15. unit_scaling.functional.rms_norm
- 3.1.22.16. unit_scaling.functional.scaled_dot_product_attention
- 3.1.22.17. unit_scaling.functional.silu
- 3.1.22.18. unit_scaling.functional.silu_glu
- 3.1.22.19. unit_scaling.functional.softmax
- 3.1.23. unit_scaling.optim
- 3.5. unit_scaling.scale
- 3.6. unit_scaling.transforms
- 3.6.1. unit_scaling.transforms.compile
- 3.6.2. unit_scaling.transforms.prune_non_float_tensors
- 3.6.3. unit_scaling.transforms.prune_same_scale_tensors
- 3.6.4. unit_scaling.transforms.prune_selected_nodes
- 3.6.5. unit_scaling.transforms.simulate_format
- 3.6.6. unit_scaling.transforms.simulate_fp8
- 3.6.7. unit_scaling.transforms.track_scales
- 3.6.8. unit_scaling.transforms.unit_scale
- 3.6.9. unit_scaling.transforms.Metrics
- 3.7. unit_scaling.transforms.utils
- 3.8. unit_scaling.utils
- 3.1.21.1. unit_scaling.core.functional
- 3.1. unit_scaling