Server
Visit ToolTriton Inference Server is an open-source inference serving software that optimizes AI inferencing for cloud and edge environments. It supports multiple deep learning and machine learning frameworks.
At a glance
Trending
Triton Inference Server is an open-source inference serving software that optimizes AI inferencing for cloud and edge environments. It supports multiple deep learning and machine learning frameworks.
Trending
About
Triton Inference Server is an open-source inference serving software designed to streamline AI inferencing across various environments, including cloud, data centers, edge, and embedded devices. It supports a wide array of deep learning and machine learning frameworks such as TensorRT, PyTorch, ONNX, OpenVINO, and Python. Triton optimizes performance for different query types, including real-time, batched, ensembles, and audio/video streaming. Key features include concurrent model execution, dynamic batching, sequence batching for stateful models, and a Backend API for custom operations. It also provides HTTP/REST and gRPC inference protocols, C and Java APIs for in-process use cases, and metrics for GPU utilization and server latency. Triton is part of NVIDIA AI Enterprise, offering enterprise support.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending