SelfAttentionSeqBlock(*args: Any, **kwargs: Any)¶
Implementation of a self-attention block as described by reference: https://arxiv.org/abs/1706.03762
Within this block the torch nn.MuliheadAttention is used to model the self attention. This block can then be used for 1d data as well as sequential data, where the embedding dimensionality has to be equal to the last dimension of the input.
in_keys – Keys identifying the input tensors. First key is self_attention output, second optional key is attention mask.
out_keys – Keys identifying the output tensors. First key is self-attention output, second optional key is attention map.
in_shapes – List of input shapes.
num_heads – Parallel attention heads.
dropout – A dropout layer on attn_output_weights.
add_input_to_output – Specifies weather the computed self attention is added to the input and returned.
bias – Add bias as module parameter.