Presenters

Source

Decoding Telemetry: Building Schemas for a Data-Driven World 🚀

Let’s be honest, the world of observability and telemetry can feel… overwhelming. A deluge of metrics, logs, and spans – where do you even start when trying to understand what’s happening in your systems? This presentation tackled a surprisingly elegant solution: engineering telemetry like you engineer a pipeline, using schemas to unlock clarity and drive adoption. 💡

The Problem: Existing Telemetry & The Schema Struggle 🧩

The speakers started with a relatable challenge: most organizations already have telemetry data flowing – likely through tools like Prometheus, OTLP, or even legacy systems. But manually creating schemas to document this data is a massive time sink. As one attendee pointed out, “writing schemas by hand is very tiresome.” This wasn’t a new problem; it echoed the challenges faced when Open API became popular – documenting a wild west of APIs. The core issue? How do you efficiently surface the schema information to developers and users who need it? 🎯

Telecat: A Runtime Schema Extractor 🤖

Enter Telecat, a tool built by Nicholas (who, admittedly, was on a break and hard to find!) that addresses this head-on. The key idea is to treat telemetry pipelines as data sources and extract schemas from them at runtime. Think of it like this: you’re not creating the schema; you’re discovering it.

Here’s how it works:

  • OTEL Demo: They showcased a demo environment with microservices sending OTLP data to Telecat.
  • Scope-Based Organization: Telecat groups metrics into logical scopes (e.g., Java metrics, Go metrics, HTTP metrics).
  • API Access: You can query the API (v1/v1/scopes) to retrieve the schema for a specific scope. For example, accessing the Java metrics schema reveals all the associated metrics, their types, and label values. 💾

The Weaver Connection: A Future of Telemetry Governance ✨

The team is actively exploring integrating this schema extraction process with the Weaver project. Weaver is currently an ongoing discussion for donation, and has ambitions to become a telemetry catalog management tool – essentially a central repository for all your telemetry data and its associated schemas. This integration is driven by a desire to incentivize engineers to onboard telemetry, without needing to rebuild everything from scratch.

  • Rust Challenge: A slight hurdle – Weaver is written in Rust, and the speaker admits to having no experience with the language! 😅
  • Green Light: Thankfully, they’ve secured buy-in from the Weaver maintainers, giving them the green light to proceed.

A Potential Path Forward: Scraping & Schema Inference 🛠️

To overcome the Weaver challenge and accelerate the process, they’re considering a two-pronged approach:

  1. Integrate Prometheus Scraping: Pull the scrape manager code from Prometheus directly into Telecat.
  2. Telecat as a Scraper: Telecat would then scrape metrics from the usual Slmatrix endpoint and generate schemas. This eliminates the need for an initial OTLP conversion step. 📡

Schema Inference: From SDK to Documentation 👾

Currently, much of the telemetry data originates from SDKs instrumented within code, lacking initial schemas. Telecat’s solution involves inferring schemas from runtime data in formats like OpenTelemetry or Prometheus. It extracts key information – metric names, data types, and label values – to build the schema. This schema is then used to generate SDKs, creating a feedback loop for consistent documentation.

OpenTelemetry & the Future 🌐

The team is primarily focused on OpenTelemetry, recognizing its growing importance in the observability landscape. However, they’re open to contributions and exploring ways to leverage Telecat to streamline the onboarding process for new telemetry systems.

Key Takeaway: Don’t let the complexity of telemetry data paralyze you. By treating it like a pipeline and leveraging schema extraction tools, you can unlock valuable insights, improve developer adoption, and build a truly data-driven organization. 🚀

Appendix