unit-scaling
Contents
1. User guide
2. Limitations
3. API reference
3.1. unit_scaling
3.1.1. unit_scaling.Parameter
3.1.2. unit_scaling.transformer_residual_scaling_rule
3.1.3. unit_scaling.visualiser
3.1.4. unit_scaling.Conv1d
3.1.5. unit_scaling.CrossEntropyLoss
3.1.6. unit_scaling.DepthModuleList
3.1.7. unit_scaling.DepthSequential
3.1.8. unit_scaling.Dropout
3.1.9. unit_scaling.Embedding
3.1.10. unit_scaling.GELU
3.1.11. unit_scaling.LayerNorm
3.1.12. unit_scaling.Linear
3.1.13. unit_scaling.LinearReadout
3.1.14. unit_scaling.MHSA
3.1.15. unit_scaling.MLP
3.1.16. unit_scaling.RMSNorm
3.1.17. unit_scaling.SiLU
3.1.18. unit_scaling.Softmax
3.1.19. unit_scaling.TransformerDecoder
3.1.20. unit_scaling.TransformerLayer
3.1.21. unit_scaling.core
3.1.21.1. unit_scaling.core.functional
3.1.22. unit_scaling.functional
3.1.23. unit_scaling.optim
3.1.24. unit_scaling.parameter
3.2. unit_scaling.analysis
3.3. unit_scaling.constraints
3.4. unit_scaling.formats
3.1.22. unit_scaling.functional
3.1.23. unit_scaling.optim
3.5. unit_scaling.scale
3.6. unit_scaling.transforms
3.7. unit_scaling.transforms.utils
3.8. unit_scaling.utils
3.1.21.1. unit_scaling.core.functional
3.1.21.1.1. unit_scaling.core.functional.logarithmic_interpolation
3.1.21.1.2. unit_scaling.core.functional.rms
rms()
3.1.21.1.3. unit_scaling.core.functional.scale_elementwise
3.1.21.1.4. unit_scaling.core.functional.transformer_residual_scaling_rule
unit-scaling
3.
API reference
3.1.
unit_scaling
3.1.21.
unit_scaling.core
3.1.21.1.
unit_scaling.core.functional
3.1.21.1.2.
unit_scaling.core.functional.rms
View page source
3.1.21.1.2.
unit_scaling.core.functional.rms
unit_scaling.core.functional.
rms
(
x
:
Tensor
,
dims
:
Tuple
[
int
,
...
]
|
None
=
None
,
keepdim
:
bool
=
False
,
eps
:
float
=
0.0
)
→
Tensor
[source]
Compute the RMS
\(\sqrt{\mathrm{mean}(x^2) + \epsilon}\)
of a tensor.