3.1.11. unit_scaling.Linear

class unit_scaling.Linear(in_features: int, out_features: int, bias: bool = False, device: Any = None, dtype: Any = None, constraint: str | None = 'to_output_scale', weight_mup_type: Literal['weight', 'bias', 'norm', 'output'] = 'weight')[source]

Applies a unit-scaled linear transformation to the incoming data. Note that this layer sets bias=False by default.

This module supports TensorFloat32.

On certain ROCm devices, when using float16 inputs this module will use different precision for backward.

Parameters:

in_features – size of each input sample
out_features – size of each output sample
bias – If set to False, the layer will not learn an additive bias. Default: True
constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.

weight

the learnable weights of the module of shape \((\text{out\_features}, \text{in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_features}}\)

Type:: torch.Tensor

bias: the learnable bias of the module of shape \((\text{out\_features})\). If bias is True, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in\_features}}\)

Shape:

Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in\_features}\).
Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out\_features}\).

Examples

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])