Large Scale Distributed LLM Inference with LLM D and Kubernetes by Abdel Sghiouar

Presenters Abdel Sghiouar Source Devoxx Belgium 2025 Scaling LLMs: A Kubernetes Deep Dive 🚀💡👨‍💻 Deploying Large Language Models (LLMs) in production is no longer a simple matter of scaling web applications. It demands a fundamentally new approach, and Kubernetes is emerging as the central orchestration platform. Let’s dive into the latest techniques and innovations for mastering LLM deployments! 1. Deploying Small LLMs on Kubernetes: A Balancing Act 🛠️ Getting started with smaller LLMs (around 1 billion parameters) on Kubernetes can be surprisingly tricky. A recent presentation highlighted a practical approach using the VLM framework, demonstrating how to leverage Kubernetes for container orchestration and kubectl for management. ...

October 6, 2025 · 4 min