From Manual to Managed: Prometheus Agent Deployment at Scale - Mihail Mihaylov

Presenters

Mihail Mihaylov

Source

PromCon EU 2025

🚀 Scaling Observability: How MariaDB Conquered the Prometheus Deployment Challenge 🤖

Let’s be honest, setting up monitoring – especially something as powerful as Prometheus – can feel like wrestling an octopus. 🐙 It’s complex, it’s demanding, and it’s definitely not something you want to be doing manually at scale. That’s exactly the problem MariaDB, a leading observability platform, faced, and the brilliant solution they developed is a fantastic case study for anyone tackling similar challenges. 💡

The Initial Struggle: A Manual Mess 🤯

MariaDB’s journey started with a frustrating reality: getting their core monitoring system – built on the powerful Thanos and Grafana stack – into the hands of their customers was proving incredibly difficult. Early deployments were plagued by network quirks, restrictions on where agents could live, and a general lack of control for the customer. It was a manual, slow, and frankly, overwhelming process. They realized they needed a fundamentally different approach – one that prioritized ease of use alongside robust functionality.

The MariaDB Solution: Four Paths to Prometheus 🗺️

Instead of a one-size-fits-all solution, MariaDB smartly developed four distinct deployment methods, each designed to address a specific customer need and environment. Let’s break down these approaches:

Cloudflare-Based (The Workhorse): This was their primary strategy, and for good reason. By leveraging Cloudflare’s incredible infrastructure – geo-load balancing, customer-specific tokens, and static context package delivery – they dramatically reduced latency and cost. Think of it as a super-efficient delivery service for your metrics. 📦
Helm Charts & GitOps (Speed & Consistency): For customers already using Kubernetes, Helm charts, combined with the GitOps workflow managed by Argo CD, provided a rapid and consistent way to deploy Prometheus agents. A single “global values file” ensured everyone got the same configuration, simplifying management. Deployments were described as “super easy” and taking just seconds! ⏱️
Ansible (On-Premise Power): Recognizing that not all customers had access to Kubernetes, MariaDB created a custom Ansible package. Ansible is essentially a remote execution tool, allowing them to deploy Prometheus agents, node exporters, and other components directly to on-premise servers. 🦾
Cluster API & Kamaji (The Advanced Route): For customers with Kubernetes, but wanting to minimize MariaDB’s control plane footprint, they utilized Cluster API and Kamaji. This sophisticated approach allowed them to deploy Prometheus agents as Kubernetes pods, managed by a centralized control plane – a truly elegant solution. 🌐

Challenges & Tradeoffs: It’s Not Always Smooth Sailing 🚧

Of course, scaling any complex system comes with challenges. MariaDB faced hurdles like:

Network Connectivity: Getting data from agents outside the core cluster was a persistent concern.
Customer Restrictions: Limited access to Kubernetes and restrictions on system modifications meant adapting their approach.
Management Overhead (Ansible): The Ansible method, while necessary, introduced significant management complexity due to its task-driven nature.

They wisely acknowledged these tradeoffs, prioritizing customer satisfaction and a manageable solution. As the speaker put it, “The good design always wins.” – a powerful reminder to focus on a well-thought-out deployment process.

Key Takeaways & Metrics 🎯

One Year Traction: Within just one year, MariaDB’s monitoring system gained significant traction, leading to increased user adoption and integration.
Automation is Key: The shift in focus moved from simply maintaining the core system to automating agent deployment and management.

Conclusion: A Pragmatic Approach to Observability ✨

MariaDB’s journey isn’t about chasing the latest shiny technology. It’s about a pragmatic approach – understanding customer constraints, tailoring solutions, and prioritizing ease of use. By combining strategic infrastructure choices (like Cloudflare), leveraging automation tools (Helm, Argo CD), and embracing sophisticated orchestration (Cluster API/Kamaji), MariaDB has built a scalable and customer-centric observability platform. It’s a fantastic example of how to turn a complex challenge into a resounding success. 🏆

Would you like me to delve deeper into a specific aspect of this post, such as the Cluster API/Kamaji approach, or perhaps tailor it for a particular audience (e.g., DevOps engineers, system administrators)?

🚀 Scaling Observability: How MariaDB Conquered the Prometheus Deployment Challenge 🤖#

The Initial Struggle: A Manual Mess 🤯#

The MariaDB Solution: Four Paths to Prometheus 🗺️#

Challenges & Tradeoffs: It’s Not Always Smooth Sailing 🚧#

Key Takeaways & Metrics 🎯#

Conclusion: A Pragmatic Approach to Observability ✨#

Appendix#