Pyllama
Visit ToolPyllama is an open-source coding & development tool that enables running LLaMA models on a single consumer-grade GPU. It supports quantization for efficient inference on GPUs with as little as 4GB memory.
At a glance
Trending
Pyllama is an open-source coding & development tool that enables running LLaMA models on a single consumer-grade GPU. It supports quantization for efficient inference on GPUs with as little as 4GB memory.
Trending
About
Pyllama is an open-source project that provides a hacked version of Facebook's LLaMA language model implementation, optimized for running on a single consumer-grade GPU. It offers features like model quantization (2-bit, 3-bit, 4-bit, 8-bit) to significantly reduce memory requirements, allowing models like 7B LLaMA to run on GPUs with as little as 4GB of memory. The tool supports both official and community-based methods for downloading LLaMA model files and includes scripts for single and multi-GPU inference, as well as integration with Gradio for a web UI and Flask for a web server. It also facilitates model fine-tuning with datasets like Stanford Alpaca.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending