Unit Scaling
Welcome to the unit-scaling
library. This library is designed to facilitate
the use of the unit scaling method, as outlined in the paper
Unit Scaling: Out-of-the-Box Low-Precision Training (ICML, 2023).
For a demonstration of the library, see Out-of-the-Box FP8 Training — a notebook showing how to unit-scale the nanoGPT model.
Installation
To install unit-scaling
, run:
pip install git+https://github.com/graphcore-research/unit-scaling.git
Getting Started
We recommend that new users get started with Section 1. User guide.
A reference outlining our API can be found at Section 4. API reference.
The following video gives a broad overview of the workings of unit scaling.
Note
The library is currently in its beta release. Some features have yet to be implemented and occasional bugs may be present. We’re keen to help users with any problems they encounter.
Development
For those who wish to develop on the unit-scaling
codebase, clone or fork our
GitHub repo and follow the
instructions in our developer guide.
- 1. User guide
- 2. Developer guide
- 3. Limitations
- 4. API reference
- 4.1. unit_scaling
- 4.1.1. unit_scaling.Parameter
- 4.1.2. unit_scaling.transformer_residual_scaling_rule
- 4.1.3. unit_scaling.visualiser
- 4.1.4. unit_scaling.Conv1d
- 4.1.5. unit_scaling.CrossEntropyLoss
- 4.1.6. unit_scaling.DepthModuleList
- 4.1.7. unit_scaling.DepthSequential
- 4.1.8. unit_scaling.Dropout
- 4.1.9. unit_scaling.Embedding
- 4.1.10. unit_scaling.GELU
- 4.1.11. unit_scaling.LayerNorm
- 4.1.12. unit_scaling.Linear
- 4.1.13. unit_scaling.LinearReadout
- 4.1.14. unit_scaling.MHSA
- 4.1.15. unit_scaling.MLP
- 4.1.16. unit_scaling.RMSNorm
- 4.1.17. unit_scaling.SiLU
- 4.1.18. unit_scaling.Softmax
- 4.1.19. unit_scaling.TransformerDecoder
- 4.1.20. unit_scaling.TransformerLayer
- 4.1.21. unit_scaling.core
- 4.1.22. unit_scaling.functional
- 4.1.23. unit_scaling.optim
- 4.1.24. unit_scaling.parameter
- 4.2. unit_scaling.analysis
- 4.3. unit_scaling.constraints
- 4.3.1. unit_scaling.constraints.amean
- 4.3.2. unit_scaling.constraints.apply_constraint
- 4.3.3. unit_scaling.constraints.gmean
- 4.3.4. unit_scaling.constraints.hmean
- 4.3.5. unit_scaling.constraints.to_grad_input_scale
- 4.3.6. unit_scaling.constraints.to_left_grad_scale
- 4.3.7. unit_scaling.constraints.to_output_scale
- 4.3.8. unit_scaling.constraints.to_right_grad_scale
- 4.4. unit_scaling.formats
- 4.1.22. unit_scaling.functional
- 4.1.22.1. unit_scaling.functional.add
- 4.1.22.2. unit_scaling.functional.conv1d
- 4.1.22.3. unit_scaling.functional.cross_entropy
- 4.1.22.4. unit_scaling.functional.dropout
- 4.1.22.5. unit_scaling.functional.embedding
- 4.1.22.6. unit_scaling.functional.gelu
- 4.1.22.7. unit_scaling.functional.layer_norm
- 4.1.22.8. unit_scaling.functional.linear
- 4.1.22.9. unit_scaling.functional.linear_readout
- 4.1.22.10. unit_scaling.functional.matmul
- 4.1.22.11. unit_scaling.functional.mse_loss
- 4.1.22.12. unit_scaling.functional.residual_add
- 4.1.22.13. unit_scaling.functional.residual_apply
- 4.1.22.14. unit_scaling.functional.residual_split
- 4.1.22.15. unit_scaling.functional.rms_norm
- 4.1.22.16. unit_scaling.functional.scaled_dot_product_attention
- 4.1.22.17. unit_scaling.functional.silu
- 4.1.22.18. unit_scaling.functional.silu_glu
- 4.1.22.19. unit_scaling.functional.softmax
- 4.1.23. unit_scaling.optim
- 4.5. unit_scaling.scale
- 4.6. unit_scaling.transforms
- 4.6.1. unit_scaling.transforms.compile
- 4.6.2. unit_scaling.transforms.prune_non_float_tensors
- 4.6.3. unit_scaling.transforms.prune_same_scale_tensors
- 4.6.4. unit_scaling.transforms.prune_selected_nodes
- 4.6.5. unit_scaling.transforms.simulate_format
- 4.6.6. unit_scaling.transforms.simulate_fp8
- 4.6.7. unit_scaling.transforms.track_scales
- 4.6.8. unit_scaling.transforms.unit_scale
- 4.6.9. unit_scaling.transforms.Metrics
- 4.7. unit_scaling.transforms.utils
- 4.8. unit_scaling.utils
- 4.1.21.1. unit_scaling.core.functional
- 4.1. unit_scaling