3.1.10. unit_scaling.GELU
- class unit_scaling.GELU(mult: float = 1.0, constraint: str | None = 'to_output_scale', approximate: str = 'none')[source]
Applies a unit-scaled Gaussian Error Linear Units function:
\[\text{GELU}(x) = x * \Phi(x)\]where \(\Phi(x)\) is the Cumulative Distribution Function for Gaussian Distribution.
When the approximate argument is ‘tanh’, Gelu is estimated with:
\[\text{GELU}(x) = 0.5 * x * (1 + \text{Tanh}(\sqrt{2 / \pi} * (x + 0.044715 * x^3)))\]- Parameters:
approximate (str?) – the gelu approximation algorithm to use:
'none'
|'tanh'
. Default:'none'
mult (float?) – a multiplier to be applied to change the shape of a nonlinear function. Typically, high multipliers (> 1) correspond to a ‘sharper’ (low temperature) function, while low multipliers (< 1) correspond to a ‘flatter’ (high temperature) function.
constraint (Optional[str]?) – The name of the constraint function to be applied to the outputs & input gradient. In this case, the constraint name must be one of: [None, ‘gmean’, ‘hmean’, ‘amean’, ‘to_output_scale’, ‘to_grad_input_scale’] (see unit_scaling.constraints for details on these constraint functions). Defaults to gmean.
- Shape:
Input: \((*)\), where \(*\) means any number of dimensions.
Output: \((*)\), same shape as the input.
Examples
>>> m = nn.GELU() >>> input = torch.randn(2) >>> output = m(input)