3.1.13. unit_scaling.LinearReadout
- class unit_scaling.LinearReadout(in_features: int, out_features: int, bias: bool = False, device: Any = None, dtype: Any = None, constraint: str | None = None, weight_mup_type: Literal['weight', 'bias', 'norm', 'output'] = 'output')[source]
Applies a unit-scaled linear transformation to the incoming data, scaled appropriately for the final network output. Note that this layer sets
bias=False
by default. Note that this layer setsbias=False
by default.This module supports TensorFloat32.
On certain ROCm devices, when using float16 inputs this module will use different precision for backward.
- Parameters:
in_features – size of each input sample
out_features – size of each output sample
bias – If set to
False
, the layer will not learn an additive bias. Default:True
constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.
constraint – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.
- weight
the learnable weights of the module of shape \((\text{out\_features}, \text{in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_features}}\)
- Type:
- bias
the learnable bias of the module of shape \((\text{out\_features})\). If
bias
isTrue
, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in\_features}}\)
- Shape:
Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in\_features}\).
Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out\_features}\).
Examples
>>> m = nn.Linear(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30])