Tiny-Llm
Visit Tooltiny-llm is a Research & Education tool that offers a course on LLM inference serving on Apple Silicon. It guides system engineers through building a tiny vLLM using MLX and Qwen.
At a glance
Trending
Also listed in
tiny-llm is a Research & Education tool that offers a course on LLM inference serving on Apple Silicon. It guides system engineers through building a tiny vLLM using MLX and Qwen.
Trending
Also listed in
About
tiny-llm provides a comprehensive course for system engineers focused on learning LLM inference serving, specifically tailored for Apple Silicon. The curriculum guides users through building a tiny vLLM using MLX and Qwen, with a codebase primarily utilizing MLX array/matrix APIs. This approach allows participants to construct model serving infrastructure from scratch, gaining deep insights into optimizations. The course covers essential components like attention, RoPE, KV cache, and continuous batching, with a roadmap extending to advanced topics such as Paged Attention and Speculative Decoding. It's designed for those who want to understand the underlying techniques for efficiently serving large language models.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending