TensorRT-LLM
Visit ToolTensorRT-LLM is an NVIDIA library for optimizing and serving Large Language Models (LLMs) efficiently on GPUs. It provides a Python API for defining LLMs and supports optimizations for improved inference performance.
At a glance
Trending