Exllamav2
Visit ToolExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs, offering fast performance and supporting various quantization formats. It provides dynamic batching and smart prompt caching for efficient generation.
At a glance
Trending