3.1.22.5. unit_scaling.functional.embedding
- unit_scaling.functional.embedding(input: Tensor, weight: Tensor, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False) Tensor [source]
A unit-scaled lookup table that looks up embeddings in a fixed dictionaryand size.
This module is often used to retrieve word embeddings using indices. The input to the module is a list of indices, and the embedding matrix, and the output is the corresponding word embeddings.
See
torch.nn.Embedding
for more details.Note
Note that the analytical gradients of this function with respect to entries in
weight
at the row specified bypadding_idx
are expected to differ from the numerical ones.Note
Note that :class:`torch.nn.Embedding differs from this function in that it initializes the row of
weight
specified bypadding_idx
to all zeros on construction.- Parameters:
input (LongTensor) – Tensor containing indices into the embedding matrix
weight (Tensor) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size
padding_idx (int?) – If specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”.max_norm (float?) – If given, each embedding vector with norm larger than
max_norm
is renormalized to have normmax_norm
. Note: this will modifyweight
in-place.norm_type (float?) – The p of the p-norm to compute for the
max_norm
option. Default2
.scale_grad_by_freq (bool?) – [not supported by unit-scaling] If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default
False
.sparse (bool?) – [not supported by unit-scaling] If
True
, gradient w.r.t.weight
will be a sparse tensor. See Notes undertorch.nn.Embedding
for more details regarding sparse gradients.
- Shape:
Input: LongTensor of arbitrary shape containing the indices to extract
Weight: Embedding matrix of floating point type with shape (V, embedding_dim), where V = maximum index + 1 and embedding_dim = the embedding size
Output: (*, embedding_dim), where * is the input shape
Examples
>>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([[1, 2, 4, 5], [4, 3, 2, 9]]) >>> # an embedding matrix containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> F.embedding(input, embedding_matrix) tensor([[[ 0.8490, 0.9625, 0.6753], [ 0.9666, 0.7761, 0.6108], [ 0.6246, 0.9751, 0.3618], [ 0.4161, 0.2419, 0.7383]],
- [[ 0.6246, 0.9751, 0.3618],
[ 0.0237, 0.7794, 0.0528], [ 0.9666, 0.7761, 0.6108], [ 0.3385, 0.8612, 0.1867]]])
>>> # example with padding_idx >>> weights = torch.rand(10, 3) >>> weights[0, :].zero_() >>> embedding_matrix = weights >>> input = torch.tensor([[0, 2, 0, 5]]) >>> F.embedding(input, embedding_matrix, padding_idx=0) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.5609, 0.5384, 0.8720], [ 0.0000, 0.0000, 0.0000], [ 0.6262, 0.2438, 0.7471]]])