3.1.17. unit_scaling.SiLU

class unit_scaling.SiLU(mult: float = 1.0, constraint: str | None = 'to_output_scale', inplace: bool = False)[source]

Applies a unit-scaled Sigmoid Linear Unit function:

The SiLU function is also known as the swish function.

\[\text{silu}(x) = x * \sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.}\]

Note

See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the SiLU was experimented with later.

Parameters:
  • mult (float?) – a multiplier to be applied to change the shape of a nonlinear function. Typically, high multipliers (> 1) correspond to a ‘sharper’ (low temperature) function, while low multipliers (< 1) correspond to a ‘flatter’ (high temperature) function.

  • constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.

Shape:
  • Input: \((*)\), where \(*\) means any number of dimensions.

  • Output: \((*)\), same shape as the input.

Examples

>>> m = nn.SiLU()
>>> input = torch.randn(2)
>>> output = m(input)