Graph Neural Networks (GNNs) Concept

Graph Neural Networks (GNNs) extend deep learning to non-Euclidean graph-structured data. By mapping nodes, edges, and their connections through graph convolutions, GNNs capture relationships in complex networks like social graphs and molecules.

Introduction to Graph Structured Data

Graphs represent non-Euclidean relations using nodes and edges, requiring different processing architectures than standard grid-based CNNs.

Graphs vs. Grids

Standard deep learning architectures (like CNNs and RNNs) are designed for Euclidean data. Images are 2D grids of pixels, and text is a 1D sequence of words. In these structures, the spatial relations are fixed: a pixel always has a top, bottom, left, and right neighbor. However, many real-world datasets are non-Euclidean, structured as graphs with arbitrary connectivity. Examples include social networks, chemical molecules, and citation networks.

In a graph, the number of neighbors for a node can vary from zero to millions, and there is no fixed spatial orientation. Applying standard CNNs to graphs is impossible because we cannot define a fixed-size sliding filter window. Graph Neural Networks address this by processing nodes and edges directly, adapting computations to the graph's local structure.

Adjacency and Feature Matrices

A graph is defined as \\( G = (V, E) \\), where \\( V \\) is a set of vertices (nodes) and \\( E \\) is a set of edges connecting those nodes. Mathematically, the graph structure is represented by an Adjacency Matrix \\( \\mathbf{A} \\in \\mathbb{R}^{|V| \\times |V|} \\), where \\( A_{i,j} = 1 \\) if there is an edge between node \\( i \\ and node \\( j \\), and 0 otherwise. If the graph is weighted, \\( A_{i,j} \\) contains the weight value.

Each node \\( i \\) is associated with a feature vector \\( \\mathbf{x}_i \\), grouped into a Node Feature Matrix \\( \\mathbf{X} \\in \\mathbb{R}^{|V| \\times D} \\), where \\( D \\) is the number of input features per node. A GNN uses both the feature matrix \\( \\mathbf{X} \\) and the structural adjacency matrix \\( \\mathbf{A} \\) to update node representations, integrating features and connections in its forward pass.

Graph Convolution Networks (GCNs)

Graph Convolutional Networks define spatial convolutions that aggregate feature representations from immediate node neighborhoods.

Spectral vs. Spatial Graph Convolutions

GNNs are divided into two main categories: spectral and spatial methods. Spectral methods define graph convolutions in the Fourier domain using the graph Laplacian. While mathematically elegant, spectral convolutions are computationally expensive because they require calculating the eigenvectors of the Laplacian matrix, which has \\( O(|V|^3) \\) complexity. This prevents them from scaling to large graphs.

Spatial methods, pioneered by Kipf and Welling (2016) in their Graph Convolutional Network (GCN) formulation, define convolutions directly in the spatial domain. They aggregate features from a node's immediate neighbors. This approach is highly efficient, scales to large graphs, and allows the model to process local subgraphs independently.

PyTorch Graph Layer

Kipf & Welling's spatial Graph Convolution layer formulation is defined as: \\( \\mathbf{H}^{(l+1)} = \\sigma(\\mathbf{\\tilde{D}}^{-1/2} \\mathbf{\\tilde{A}} \\mathbf{\\tilde{D}}^{-1/2} \\mathbf{H}^{(l)} \\mathbf{W}^{(l)}) \\), where \\( \\mathbf{\\tilde{A}} = \\mathbf{A} + \\mathbf{I}_N \\) is the adjacency matrix with added self-loops, and \\( \\mathbf{\\tilde{D}} \\) is its diagonal degree matrix. This PyTorch module implements this operation:

<pre><code class="language-python">import torch import torch.nn as nn class GCNLayer(nn.Module): def __init__(self, in_features, out_features): super().__init__() # Learnable projection weights self.weight = nn.Parameter(torch.FloatTensor(in_features, out_features)) nn.init.xavier_uniform_(self.weight) def forward(self, x, adj): """ Args: x: Node features [num_nodes, in_features] adj: Normalized adjacency matrix [num_nodes, num_nodes] """ # Step 1: Linear transformation of node features # h shape: [num_nodes, out_features] h = torch.matmul(x, self.weight) # Step 2: Aggregate neighbor features using the adjacency matrix # out shape: [num_nodes, out_features] out = torch.matmul(adj, h) return out # Example execution layer = GCNLayer(in_features=16, out_features=32) nodes_feat = torch.randn(5, 16) # 5 nodes, 16 features each # Symmetric normalized adjacency matrix (with self-loops) adj_normalized = torch.eye(5) + 0.1 * torch.ones(5, 5) out_feats = layer(nodes_feat, adj_normalized) print("Output node representations shape:", out_feats.shape) # [5, 32]</pre>

Graph Machine Learning Tasks

GNNs are applied to node classification, link prediction, and graph classification, utilizing invariance constraints.

Node, Edge, and Graph-Level Tasks

Graph machine learning tasks fall into three main categories: node classification, link prediction (edge classification), and graph classification. Node classification predicts labels for individual nodes (such as identifying fraud accounts in a transaction network). The GNN outputs a feature representation for each node, which is passed to a classification head.

Link prediction determines whether an edge exists between two nodes (such as recommender systems predicting user-item interactions). Graph classification assigns a label to the entire graph (such as classifying chemical molecules as toxic or non-toxic). For graph-level tasks, we aggregate all node vectors using a global pooling layer (such as global mean pooling) before classification.

Permutation Equivariance and Invariance

A critical constraint for graph algorithms is permutation equivariance and invariance. In a graph, there is no natural ordering of the nodes. If we reorder the rows and columns of the adjacency matrix, the underlying graph remains identical. For node classification, the GNN must be permutation equivariant: if we permute the input nodes, the output representations must permute in the exact same order.

For graph classification, the model must be permutation invariant: the output prediction must remain identical regardless of the node ordering. Standard MLPs are not invariant to input permutation, which is why GNN architectures must rely on symmetric aggregators (like mean or sum) to process neighbor features.