3.1.21.19. unit_scaling.functional.softmax

unit_scaling.functional.softmax(input: Tensor, dim: int, dtype: dtype | None = None, constraint: str | None = 'to_output_scale', mult: float = 1.0) → Tensor[source]

Applies a unit-scaled softmax function.

Softmax is defined as:

\(\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\)

It is applied to all slices along dim, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1.

See Softmax for more details.

Parameters:

input (Tensor) – input
dim (int) – A dimension along which softmax will be computed.
( (dtype) – class:torch.dtype, optional): the desired data type of returned tensor. If specified, the input tensor is casted to dtype before the operation is performed. This is useful for preventing data type overflows. Default: None.
mult (float?) – a multiplier to be applied to change the shape of a nonlinear function. Typically, high multipliers (> 1) correspond to a ‘sharper’ (low temperature) function, while low multipliers (< 1) correspond to a ‘flatter’ (high temperature) function.
constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.