3.1.21.8. unit_scaling.functional.linear

unit_scaling.functional.linear(input: Tensor, weight: Tensor, bias: Tensor | None, constraint: str | None = 'to_output_scale', scale_power: Tuple[float, float, float] = (0.5, 0.5, 0.5)) → Tensor[source]

Applies a unit-scaled linear transformation.

Applies a linear transformation to the incoming data: \(y = xA^T + b\).

This operation supports 2-D weight with sparse layout

Warning

Sparse support is a beta feature and some layout(s)/dtype/device combinations may not be supported, or may not have autograd support. If you notice missing functionality please open a feature request.

This operator supports TensorFloat32.

Parameters:

constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.
scale_power ((float, float, float)?) – scaling power for each of (output, grad(input), grad(weight|bias))

Shape:

Input: \((*, in\_features)\) where * means any number of additional dimensions, including none
Weight: \((out\_features, in\_features)\) or \((in\_features)\)
Bias: \((out\_features)\) or \(()\)
Output: \((*, out\_features)\) or \((*)\), based on the shape of the weight