Presenters
Source
Level Up Your Monitoring: A Deep Dive into Prometheus & Resource Attributes 🚀
Let’s be honest, monitoring can feel like navigating a complex maze. You’re drowning in metrics, trying to pinpoint the root cause of an issue, and often wrestling with convoluted queries. But what if there was a simpler, more intuitive way to understand where your systems are struggling? That’s the core of a fascinating project underway within the Prometheus and OpenTelemetry communities – and it’s a game-changer for anyone serious about observability. 💡
The Problem: A Messy Landscape 🧩
For a while now, handling resource attributes – things like CPU usage, memory consumption, and network traffic – in Prometheus has been…well, a bit of a headache. The current system relies heavily on manual label conversion and complex instrumentation. This approach, while technically sound, creates a significant barrier to entry for many users. As Andrej, the presenter, eloquently put it, “The current approach is problematic for many users.” It’s a system that demands a lot of upfront work and often leads to frustrating inconsistencies.
Research Reveals the Pain Points 🤕
To understand the scope of the issue, the Linux Foundation team – including Andrej, Victoria Nduka, Arthur, Amy, and a dedicated user researcher – embarked on a deep dive. They spoke with 70 users and surveyed 63 more, uncovering four distinct workflows for managing resource attributes. Let’s break them down:
- Complete Mapping: Converting everything to labels. Think of it like meticulously labeling every single item in a warehouse – incredibly thorough, but also incredibly time-consuming and prone to errors.
- Attribute Promotion: Selecting a few key attributes to promote to labels. This is better, but still requires careful planning and can be resource-intensive.
- Target Info: Storing attributes as a single “target” metric and joining with actual metrics at query time. This is efficient, but the complex query syntax quickly becomes overwhelming – approximately 80% of users found writing these queries challenging.
- Manual Attribute Injection: Adding attributes directly to the scrape target. This is the least preferred method, leading to inconsistencies and a lack of historical data.
The biggest takeaway? Complex joins were the most significant pain point, highlighting a critical disconnect between user expectations and the current implementation.
User Voices: What They Really Want 🎯
The research didn’t stop at identifying problems; it also uncovered what users actually want. They crave:
- Native Query Support: Like the existing “info promql” function, which simplifies joins and makes exploring resource attributes a breeze.
- Simplified Syntax: Goodbye, complex joins! Hello, intuitive queries.
- Flexible Attribute Selection: The ability to choose which attributes to track, minimizing the need for constant manual adjustments.
- Easy Filtering & Aggregation: Quickly drill down into specific resources and analyze their performance.
Interestingly, 25% of users reported struggling with the existing documentation, emphasizing the need for clearer, more accessible guidance.
A Philosophical Shift? 🤔
The project also highlighted a fascinating difference in philosophy between Prometheus and OpenTelemetry. Prometheus, focused on metrics, and OpenTelemetry, designed for broader observability, needed to find a way to work together. As the speaker noted, “A philosophical difference between Prometheus and OpenTelemetry, leading to integration challenges.” Bridging this gap is crucial for unlocking the full potential of resource attribute data.
Tools & Technologies in the Mix 🛠️
Let’s take a quick look at the key players:
- Prometheus: The core monitoring system.
- OpenTelemetry: The observability framework being integrated.
- Grafana: Used for visualization and dashboarding.
- PromQL: Prometheus’s powerful query language.
- info promql: A function that simplifies joins with target info.
- Target Info: A method for storing resource attributes as a single metric.
Tradeoffs & Challenges 🚧
It’s not all smooth sailing. There’s a natural tradeoff between collecting more data and increasing query complexity. More attributes mean more labels, which can quickly lead to convoluted queries. The project also acknowledges the challenge of aligning the Prometheus and OpenTelemetry communities, recognizing that a collaborative approach is essential.
Moving Forward: A Call to Action 📡
This isn’t just about fixing a technical glitch; it’s about empowering users to truly understand their systems. The Linux Foundation team is actively seeking contributions, including UI wireframe projects and volunteer tech writers. If you’re passionate about observability and want to help shape the future of monitoring, this is your chance to get involved! 👾
Resources:
Let’s work together to build a more intuitive and powerful monitoring experience. 💪 #Prometheus #OpenTelemetry #Observability #Monitoring #TechConference