class maze.perception.blocks.feed_forward.graph_attention.GraphAttentionBlock(*args: Any, **kwargs: Any)

A block containing multiple subsequent graph (multi-head) attention stacks.

One convolution stack consists of one graph multi-head attention in addition to an activation layer. The block expects the input tensors to have the form:

  • Feature matrix: first in_key: (batch-dim, num-of-nodes, feature-dim)

  • Adjacency matrix: second in_key: (batch-dim, num-of-nodes, num-of-nodes) (also symmetric)

And returns a tensor of the form (batch-dim, num-of-nodes, feature-out-dim).

  • in_keys – Two keys identifying the feature matrix and adjacency matrix respectively.

  • out_keys – One key identifying the output tensors.

  • in_shapes – List of input shapes.

  • hidden_features – List containing the number of hidden features for hidden layers.

  • non_lins – The non-linearity/ies to apply after each layer (the same in all layers, or a list corresponding to each layer).

  • n_heads – The number of heads each stack should have. (default suggestion 8)

  • attention_alpha – Specify the negative slope of the leakyReLU in each of the attention layers. parameter with init value :param node_self_importance. (default suggestion 0.2)

  • avg_last_head_attentions – Specify whether to average the outputs from the attention head in the last layer of the attention stack. (default suggestion True or n_heads=0 in the last layer)

  • attention_dropout – Specify the dropout to be within the layers applied on the computed attention.


Compiles a block-specific dictionary of network layers.

This could be overwritten by derived layers (e.g. to get a ‘BatchNormalizedConvolutionBlock’).


Ordered dictionary of torch modules [str, nn.Module].

normalized_forward(block_input: Dict[str, torch.Tensor]) → Dict[str, torch.Tensor]

(overrides ShapeNormalizationBlock)

implementation of ShapeNormalizationBlock interface