Nunchaku
Visit ToolNunchaku is a high-performance inference engine optimized for 4-bit neural networks, based on the SVDQuant paper. It significantly reduces memory usage and accelerates AI inference for diffusion models.
At a glance
Trending
Nunchaku is a high-performance inference engine optimized for 4-bit neural networks, based on the SVDQuant paper. It significantly reduces memory usage and accelerates AI inference for diffusion models.
Trending
About
Nunchaku is a high-performance inference engine specifically designed for 4-bit neural networks, implementing the SVDQuant post-training quantization technique. This technology allows for 4-bit weights and activations while maintaining visual fidelity, as detailed in the accompanying ICLR 2025 Spotlight paper. The engine achieves significant memory reduction, up to 3.6x for 12B FLUX.1-dev models, and offers substantial speedups, such as 8.7x over 16-bit models on a 16GB laptop 4090 GPU by eliminating CPU offloading. Nunchaku also supports various features like LoRA, ControlNet, asynchronous offloading, and compatibility with ComfyUI, making it a versatile tool for accelerating diffusion models and other AI applications.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending