ONNX, ONNX Runtime, and TensortRT
- Data, AI & Analytics
ONNX, ONNX Runtime, and TensortRT
What is ONNX?
ONNX(Open Neural Network Exchange) defines a common set of operators – the building blocks of machine learning and deep learning models – and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

ONNX Design Principles
- Support DNN but also allows for traditional ML
- Flexible enough to keep up with rapid advances
- Compact and cross-platform representation
- Standardized list of well-defined operators informed by real-world usage
Export to ONNX
- Tensorflow to ONNX:
| 1 2 | <span style="font-size: 1rem;"></span>!pip install git+https://github.com/onnx/tensorflow-onnx !python -m tf2onnx.convert --saved-model /content/model.tf --output tfmodel.onnx | 
- Pytorch to ONNX:
| 1 2 3 | from torch.autograd import Variable dummy_input = Variable(torch.randn(1,3,28,28)) torch.onnx.export(model_name, dummy_input, "model_pt.onxx") | 
Load ONNX model
| 1 2 | import onnx onnx_model = onnx.load('tfmodel.onnx') | 
What is ONNX Runtime?
It is the High-Performance Inference engine for ONNX models founded and open-sourced by Microsoft under MIT License. It is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms.
While designing ONNX Runtime, they mainly focus on performance and scalability in order to support heavy workloads in high-scale production scenarios. So, it is supported on different Operating Systems and hardware platforms. The Execution Provider enables easy integration with Hardware accelerators.

Installation
| 1 | !pip install onnxruntime | 
Create Inference session to run onnx model in onnxruntime
| 1 2 | import onnxruntime session_tf = onnxruntime.InferenceSession('model_path.onnx') | 
Run the session
| 1 2 3 4 5 | input_name = session_tf.get_inputs()[0].name output_name = session_tf.get_outputs()[0].name results_ort = session_tf.run(                      [output_name], {input_name: X.astype(np.float32)}                     ) | 
ONNX Tool (Netron)
Netron is an open-source multi-platform visualizer of saved models. It supports many extensions for deep learning, machine learning, and neural network models.
NVIDIA TensorRT
It is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications.
TensorRT based applications are 40 times faster than CPU-only based platforms during inference. With this, we can optimize performance Neural Network models trained in all major frameworks.
Features

Precision Calibration
- 
- Maximizes throughput with FP16 or INT8 by quantizing models while preserving accuracy
- Quantization is an optimization method in which model parameters and activations are converted from a floating-point to a lower-precision representation i.e., from FP32 to FP16 or INT8.
 
Layer & Tensor Fusion
- 
- It Combines the several kernels so it executes at ones, so it is also called kernel fusion
- Kernel Fusion further classified into two types: Vertical Fusion and Horizontal Fusion
- In Vertical Fusion, layers with unused output are eliminated to avoid unnecessary computation
- In Horizontal layer fusion, layers that take the same source tensor and apply the same operations with similar parameters, result in a single larger layer for higher computational efficiency.
 
Kernel Auto-Tuning
- 
- Selects best data layers and algorithms based on the target GPU platform
 
Multi-Stream Execution
- 
- Process multiple inputs streams in parallel
 
Dynamic Tensor Memory
- 
- Memory is allocated for each tensor and only for the duration of its usage.
 
TensorRT is also integrated with application-specific SDKs such as NVIDIA DeepStream, Riva, Merlin™, Maxine™, and Broadcast Engine to provide developers a unified path to deploy intelligent video analytics, conversational AI, recommender systems, video conference, and streaming apps in production.

Related content
Auriga: Leveling Up for Enterprise Growth!
Auriga’s journey began in 2010 crafting products for India’s
