3.1.18. unit_scaling.Softmax

class unit_scaling.Softmax(dim: int, mult: float = 1.0, constraint: str | None = 'to_output_scale')[source]

Applies a unit-scaled Softmax function to an n-dimensional input Tensor.

The standard softmax rescales values so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1. Unit scaling multiplies by n, meaning the output Tensor lies in the range [0,n].

The documentation below is from the standard nn.Softmax implementation. Values there (for example [0,1] ranges) should be adjusted accordingly.

Rescales them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.

Softmax is defined as:

\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]

When the input Tensor is a sparse tensor then the unspecified values are treated as -inf.

Parameters:
  • dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).

  • mult (float?) – a multiplier to be applied to change the shape of a nonlinear function. Typically, high multipliers (> 1) correspond to a ‘sharper’ (low temperature) function, while low multipliers (< 1) correspond to a ‘flatter’ (high temperature) function.

  • constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.

Returns:

a Tensor of the same dimension and shape as the input with

values in the range [0, 1]

Shape:
  • Input: \((*)\) where * means, any number of additional dimensions

  • Output: \((*)\), same shape as the input

Examples

>>> m = nn.Softmax(dim=1)
>>> input = torch.randn(2, 3)
>>> output = m(input)