Graph neural networks
There are several kinds of Graph neural network (GNN) in the literature. We adopt the unifying framework described by Battaglia et al. [1] since this subsumes many of the different approaches. In accordance with this, we will sometimes refer to a GNN as a Graph Network (GN) although this term isn't in common use.
Graph structure
A GN graph is made up of a set of \(N^v\) nodes (vertices) \(V\), a set of \(N^e\) directed edges \(E\) connecting pairs of nodes (including self-connections), and a global attribute \(\myvec{u}\). Each node has an associated attribute vector \(\myvec{v}_i\), so we write the set of nodes as \(V=\{(\myvec{v}_i,i)\}_{i=1:N^v}\). Each directed edge has an associated attribute \(\myvec{e}_k\), so we write the set of edges as \(E=\{(\myvec{e}_k,r_k,s_k)\}_{k=1:N^e}\), where \(r_k\) is the index of the receiver node (i.e. the endpoint of the directed edge) and \(s_k\) is the index of the sender node (i.e. the start-point of the directed edge). This terminology comes from the idea of sending messages along edges. The attributes \(\myvec{u}, \myvec{v}_i, \myvec{e}_k\) could be scalars, vectors, tensors (e.g. a colour image), or something different altogether. We will simply denote them as vectors.
Note
The set notation in the previous paragraph is used for consistency with Battaglia et al. [1]. However, we differ from Battaglia et al. [1] by including the node index in the specification of each node, just as they are included in for the specification of edges. It is interesting to observe that this notation scales to a hypergraph by simply listing the indices of the nodes involved in each hyperedge, including the special case of a hyperedge involving a single node, which is synonymous with a node.
In standard terms, a GN Graph is a directed, attributed multigraph with a global attribute. A multigraph is a graph where there may be more than one edge between any pair of nodes. In what follows, we will refer to it as a 'graph' for brevity.
In the following diagram of a graph with four nodes and six edges, we show only the attribute values at each node and on each edge.
The definition of this graph is as follows:
GN Block
The active part of a GNN is a function that updates the edge, node and global attribute values of a graph. The function is applied iteratively to produce a final output graph. We are interested in functions that are implemented using neural networks and refer to the function as a GN block. Thus, a GN block takes a graph as input and produces a graph with updated attribute values as output. The output graph has the same structure of nodes and edges as the input graph. The transformation of attribute values involves six steps, all of which depend on the structure of the graph. Most of the computations are local to a node or edge in the graph. For example, node attribute values are updated based on the values of attributes at that node, and at the edges for which the node is a receiver:
The six steps are as follows:
(1) Update edge attribute values
The attribute value on each edge is updated based on the attribute values at the nodes on either end, and on the global attribute. As a result, the attribute value on edge \(k\) is updated as follows:
The following diagram shows the attributes involved in updating the attribute value on edge 6:
(2) Aggregate edge attribute values at receiving nodes
At each node, \(i\), we aggregate the (updated) attribute values on edges \(E'_i\) for which this node is a receiver (i.e. all incoming edges). Because we don't want the result to depend on an ordering of incoming edges, the aggregation function \(\rho^{e->v}\) must be invariant to permutations of the edges. Typically we use the sum, mean or maximum value applying the function component-wise to the inputs:
In our running example, using the mean as the aggregation function, the aggregate of edge attribute values at the node 2 is given by \(\myvec{\bar{e}}'_2=(\myvec{e}'_2+\myvec{e}'_3+\myvec{e}'_4)/3\):
(3) Update node attribute values
The attribute value at each node is updated based on its current value, the aggregate edge attribute value for the node, computed in step 2, and the global attribute value:
Here we show the update to the attribute value at node 2:
(4) Aggregate edge attribute values globally
We aggregate the updated edge attribute values across all edges \(E'\). Again, the aggregation function needs to be invariant to permutations of the edges, so we normally choose one of sum, mean or max:
Using max as the aggregation function, the aggregate of edge attribute values is:
(5) Aggregate node attribute values globally
We aggregate the updated node attribute values across all nodes \(V'\):
Using sum as the aggregation function, the aggregate of updated node attributes values is:
(6) Update global attribute value
Finally, we update the global attribute value as a function of the global aggregate values across edges (step 4) and nodes (step 5), and the current global attribute value:
Steps 1-3 can be carried out in parallel at each node or edge as appropriate.
Composing GN blocks
The three functions at the heart of the GN block, \(\phi^e,\phi^v,\phi^u\) can have many forms. Typically, it is a single linear layer but could also be a multi-layer (deep) network, or CNN when the attributes are images.
The GN block either operates locally (steps 1-3) or globally (steps 4-5). Just as for a CNN, we increase the scale of the local updates by composing (layering) GN blocks, as illustrated below, where \(M\) GN blocks are composed together, taking the graph \(G_0\) as input and producing the graph \(G_M\) as output:
Physics-based example
Battaglia et al. [1] give a physics-based example to motivate the six steps of the GNN.
Imagine we have a set of rubber balls connected by springs and subject to gravity. We can represent this as a graph as below:
A simulation of the balls in motion could be achieved via the six steps of a GN block as follows:
GN block step | Physical process |
---|---|
Step 1: Update edge attribute values | Forces between balls |
Step 2: Aggregate edge attribute values at receiving nodes | Sum of forces (net force) on each ball |
Step 3: Update node attribute values | Position, velocity and kinetic energy of each ball |
Step 4: Aggregate edge attribute values globally | Sum of per-node forces (should be zero) and potential energy store in springs |
Step 5: Aggregate node attribute values globally | Total kinetic energy of system |
Step 6: Update global attribute value | Net forces and total energy of system |
References
[1] Battaglia, P.W., et al., Relational inductive biases, deep learning, and graph networks