Mllm
Visit Toolmllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices. It enables developers to run large language models efficiently on resource-constrained hardware.
At a glance
Trending
mllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices. It enables developers to run large language models efficiently on resource-constrained hardware.
Trending
About
mllm is a fast and lightweight multimodal LLM inference engine specifically designed for mobile and edge devices. It allows developers to deploy and run large language models (LLMs) on hardware with limited resources, supporting both text and image processing. The tool features a Pythonic eager execution API for rapid model development, unified hardware support across Arm CPU, OpenCL GPU, and QNN NPU, and advanced optimizations like quantization, pruning, and speculative execution. It integrates seamlessly with popular community frameworks' checkpoints, converting PyTorch and SafeTensors models into its optimized format. mllm also provides a deployment toolkit including an SDK and CLI inference tool, making it a central hub for AI inference on mobile platforms.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending