Presenters

Source

Unlocking Hidden Connections: A Deep Dive into Time Series Correlation 🚀

Hey tech enthusiasts! 👋 Ever felt like you’re drowning in a sea of time series data, desperately searching for meaningful insights? You’re not alone. This presentation tackled a surprisingly complex challenge – efficiently identifying correlations within massive time series datasets – and the results are genuinely fascinating. Let’s break down the key takeaways and explore how this research could revolutionize your monitoring and observability strategies. 💡

The Problem: N x N Correlation is a Beast 🤖

Traditionally, finding correlations between time series requires comparing every time series to every other time series – an “n by n” operation. As the number of time series grows (we’re talking tens of thousands!), this quickly becomes computationally infeasible. The speaker highlighted this issue, stating that it’s “like o of n squ over the number of your time series,” making it a truly daunting task. Researchers have tackled this with datasets of around 5,000 time series, but that’s still a significant hurdle for many organizations.

A Clever Algorithm Emerges 💻

Fortunately, a researcher had already developed an algorithm to scale this problem. This algorithm, based on a paper, managed to handle approximately 80,000 time series on a relatively modest Kubernetes cluster – a two-CPU, 4GB RAM VM. This is a huge win! The speaker demonstrated the algorithm’s effectiveness, running it and finding correlations within a timeframe of just a few minutes.

The Data Dilemma: Beyond the Basics 👾

But here’s where things got interesting. Simply finding correlations wasn’t enough. The speaker realized that the initial results were, frankly, a lot of “boring” correlations. A huge percentage of the time series were constant, or had a constant slope – essentially, they were trivially correlated with each other. This highlighted a critical point: many default monitoring setups generate a lot of data that isn’t actually insightful. “A lot of the others, they’re also kind of boring because they got sort of counters with a constant slope,” the speaker noted.

The 20-30 Minute Delay Challenge 💾

The next challenge was storing and querying these interesting correlations. The speaker found a 20-30 minute delay between processing a time window and having the results available. This presented a significant bottleneck. They were evaluating two options:

  • Victoria Metrics: Currently hitting a limit of around 15 million time series ingested at once. Tweaking the data model and potentially optimizing ingestion were considered.
  • PocketFund: A promising alternative, but requiring significant custom querying logic to be built.

Filtering for the Gems ✨

The speaker’s approach involved a clever filtering strategy. They realized that the algorithm could be used to quickly identify and discard the “boring” correlations – constant time series and counters with constant slopes. This dramatically reduced the number of correlations needing further investigation. They found that, within 80,000 time series, there were between 6 and 12 million correlated pairs, but a large portion were irrelevant.

Bottlenecks and Future Steps 🛠️

The speaker identified the primary bottleneck as the time taken to process a time window and write the results to Victoria Metrics. They noted a strong correlation between the number of correlated pairs found and the processing time, as well as the delay in writing to Grafana.

The next steps involve:

  • Streaming Aggregations: Exploring more sophisticated streaming aggregations to potentially uncover even more valuable insights.
  • Scaling Data Storage: Investigating ways to scale up the data ingestion and writing process, potentially through discussions with Victoria Metrics about optimizing their data schema.

Key Takeaways & Future Directions 🎯

This research demonstrates that even within seemingly overwhelming datasets, there’s potential for uncovering hidden connections. By combining a clever algorithm with strategic filtering, it’s possible to identify meaningful correlations and gain a deeper understanding of your systems. The speaker’s work highlights the importance of not just collecting data, but also actively seeking out useful data. 📡

This is a fantastic example of how a focused approach, combined with a bit of clever engineering, can transform a seemingly intractable problem into a valuable opportunity. Keep exploring, keep experimenting, and keep uncovering those hidden connections! 💫

Appendix