Eight definitions of “observability” vs. “monitoring” – and why none of them work - Björn Rabenstein

Presenters

Björn Rabenstein

Source

PromCon EU 2025

Decoding Observability: It’s Not Just a Buzzword 🚀

Let’s be honest, the word “observability” has been thrown around a lot lately. It’s plastered on dashboards, featured in marketing campaigns, and debated in countless tech circles. But is it truly a revolutionary concept, or just a fancy rebranding of something we’ve been doing for decades? This presentation, delivered by a seasoned veteran of distributed systems and Prometheus, cuts straight to the heart of the matter, offering a refreshingly grounded perspective. Let’s dive in and unpack what observability really means. 💡

From Monitoring to Understanding: A Historical Shift 🕰️

Our journey begins long before “observability” became a buzzword. The speaker traces the roots of this concept back to 2006 with Google’s early distributed tracing efforts – a project known as Dapper. Interestingly, the term itself didn’t gain traction until the 2016 SRE book, which, despite using “observability” only once, arguably planted the seed for the entire conversation. It’s a fascinating reminder that innovation often builds upon existing ideas, rather than springing from a vacuum. The speaker cleverly highlights that the SRE movement, and the book that fueled it, shifted the focus from simply detecting problems – monitoring – to understanding why they occur.

The Core Difference: What vs. Why 🧐

This is where things get really interesting. Monitoring is about knowing what is broken. It’s like a smoke detector – it alerts you when there’s a fire. Observability, on the other hand, is about understanding why the fire started and how to prevent it in the future. It’s about proactively identifying potential issues before they escalate. Think of it like a skilled mechanic diagnosing a car problem versus simply noticing the engine isn’t running. The key difference lies in the ability to explore and investigate – to delve deeper than just surface-level alerts.

Control Theory: The Unexpected Root 🤯

Here’s a surprising revelation: the term “observability” originally comes from control theory – a field that’s been around for over 150 years! Control theory is all about regulating complex systems, identifying “unknown unknowns” – those unexpected events that can throw everything off balance. The speaker references Brian Carrell’s 2006 ACMQ paper, which uses the word “observability” 19 times, emphasizing this connection. It’s a powerful reminder that the concept isn’t new, but rather a sophisticated application of principles that have been used to manage complex systems for decades.

Beyond Metrics, Logs, and Traces: A Holistic Approach 🛠️

Let’s be clear: distributed tracing, metrics, and logs are essential components of observability. However, the speaker rightly cautions against reducing observability to simply collecting these data points. It’s a more holistic approach – a way of thinking about your system as a whole, understanding its interactions, and anticipating potential problems. It’s about building a system that can be understood, not just one that produces data.

The “Mage” Hypothesis: A Playful Take 👾

To make a complex idea more memorable, the speaker playfully suggests the term “mage” as a better alternative to “observability.” It’s a clever observation – the term’s popularity likely stems from its novelty and the desire for a catchy, technical descriptor. It’s a reminder that sometimes, the best way to understand a concept is to approach it with a bit of humor.

Rating the Observability Landscape: A Practical Framework 🚦

To illustrate his point, the speaker introduces his own “Cooper Rating” system – a subjective assessment of observability based on Correctness, Concrete applicability, Usefulness, and Popularity. It’s a red, yellow, green traffic light system that provides a clear and concise way to evaluate the maturity of an observability strategy. It’s a valuable tool for anyone looking to assess their current approach and identify areas for improvement.

Tools of the Trade: Prometheus and Grafana 📡

The presentation highlights key technologies like Prometheus (a representative of monitoring systems), Grafana (for visualization and dashboards), and distributed tracing. These tools are undoubtedly valuable, but they’re just pieces of the puzzle. The real power of observability lies in how these tools are used – and how they’re integrated into a broader understanding of the system.

Conclusion: A Call for Nuance ✨

Ultimately, the speaker advocates for a nuanced understanding of observability, emphasizing its roots in control theory and its role as an extension of traditional monitoring practices. He cautions against treating it as a purely marketing-driven concept and encourages a focus on its core purpose: deeply understanding system behavior and proactively addressing potential issues. Let’s move beyond the buzzwords and embrace a more strategic and insightful approach to observability.

Are you ready to level up your understanding of observability? 🎯

Decoding Observability: It’s Not Just a Buzzword 🚀#

From Monitoring to Understanding: A Historical Shift 🕰️#

The Core Difference: What vs. Why 🧐#

Control Theory: The Unexpected Root 🤯#

Beyond Metrics, Logs, and Traces: A Holistic Approach 🛠️#

The “Mage” Hypothesis: A Playful Take 👾#

Rating the Observability Landscape: A Practical Framework 🚦#

Tools of the Trade: Prometheus and Grafana 📡#

Conclusion: A Call for Nuance ✨#

Appendix#