ONNX (Open Neural Network Exchange) Format

The Open Neural Network Exchange (ONNX) format defines a standardized representation for machine learning models, enabling interoperability between different frameworks. Exporting models to ONNX allows developers to transition from PyTorch training to high-performance inference engines.


The Role of ONNX

ONNX acts as a universal intermediate representation, mapping machine learning models to standardized execution graphs.

Interoperability

Deep learning development is often split between training and deployment. While researchers prefer frameworks like PyTorch for their dynamic graphs and debugging flexibility, production environments often require optimized deployment on diverse hardware (such as mobile devices or edge GPUs). Historically, transitioning models between these environments required rewriting model code from scratch, introducing conversion errors.

The Open Neural Network Exchange (ONNX) format addresses this by establishing an open-source standard for model representation. ONNX acts as a universal intermediate representation (IR). Models trained in PyTorch, TensorFlow, or MXNet can be exported to a standardized ONNX file (a .onnx file), which can then be compiled and executed on any hardware using optimized runtimes.

Computation Graph Representation

An ONNX model represents a neural network as a Directed Acyclic Graph (DAG). Each node in the graph represents a mathematical operator (such as a convolution, matrix multiplication, or activation), and the edges represent the tensors flowing between them. ONNX defines a standardized library of operators (called ONNX Operators) with strict input, output, and type definitions.

This graph representation is independent of the source framework. The file format is serialized using Protocol Buffers (protobuf), which enables fast, language-independent reading and writing. By translating framework-specific APIs into a unified operator graph, ONNX enables downstream compilers to optimize execution structures globally.

Exporting PyTorch Models to ONNX

Exporting models from PyTorch to ONNX requires tracing execution using dummy inputs and configuring dynamic dimensions.

PyTorch Export API

This PyTorch script demonstrates how to export a multi-layer perceptron to an ONNX model file using the torch.onnx.export API:

<pre><code class="language-python">import torch import torch.nn as nn class TargetMLP(nn.Module): def __init__(self): super().__init__() self.net = nn.Sequential( nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10) ) def forward(self, x): return self.net(x) model = TargetMLP().eval() # Ensure model is in evaluation mode # Create a dummy input matching the expected shape: [batch_size, input_dim] dummy_input = torch.randn(1, 784) # Export to ONNX torch.onnx.export( model, dummy_input, "C:/Users/yuvra/Documents/projects/Pexcera/scratch/mlp.onnx", export_params=True, # Store trained parameter weights inside the file opset_version=12, # Target ONNX operator set version input_names=['input'], # Define graph input node names output_names=['output'] # Define graph output node names ) print("ONNX model exported successfully.")</pre>

Dynamic Axes

By default, PyTorch's ONNX exporter traces the computational graph using the exact dimensions of the dummy input. If the dummy input has a batch size of 1, the exported ONNX model will be locked to a batch size of 1. Attempting to run inference on a batch of 32 samples will trigger a runtime error. To prevent this, we must declare dynamic axes.

The dynamic_axes argument takes a dictionary specifying which dimensions of the input and output tensors can vary during inference (such as batch size or sequence length). The exporter then marks these dimensions as symbolic variables (e.g., "batch_size") in the ONNX graph, allowing the runtime engine to allocate memory dynamically.

Verification and Runtime Execution

ONNX models can be validated structurally and executed using the optimized ONNX Runtime engine.

Model Validation

Before deploying an exported ONNX model, it is essential to validate its graph structure. We can import the onnx library in Python and run onnx.checker.check_model("model.onnx"). This checker verifies that the model graph is structurally valid, confirming that all node inputs and outputs match operator schemas, and that there are no cycles or missing links in the DAG.

For visual verification, developers use toolkits like Netron. Netron is a visualizer that renders the ONNX node graph, showing the input and output tensor shapes, operator attributes (e.g., convolution stride and padding), and the connections between layers. Visualizing the graph helps verify that dynamic axes and operators were exported correctly.

ONNX Runtime (ORT)

To execute the exported model, we use the ONNX Runtime (ORT) engine. ORT is a high-performance execution library optimized for running ONNX models on diverse hardware backends (using execution providers like CUDA, TensorRT, or CPU OpenMP). This script demonstrates how to execute the model using ONNX Runtime:

<pre><code class="language-python">import onnxruntime as ort import numpy as np # Initialize the ONNX Runtime session # This compiles the graph and prepares execution kernels ort_session = ort.InferenceSession( "C:/Users/yuvra/Documents/projects/Pexcera/scratch/mlp.onnx", providers=['CPUExecutionProvider'] ) # Generate input numpy array (must match name and shape of exported model) x_in = np.random.randn(5, 784).astype(np.float32) # batch size 5 (dynamic) # Run inference inputs = {ort_session.get_inputs()[0].name: x_in} outputs = ort_session.run(None, inputs) print("ONNX Runtime output shape:", outputs[0].shape) # [5, 10] print("Inference execution completed successfully.")</pre>