Presenters
Source
Unlocking Hidden Connections: A Deep Dive into Time Series Correlation 🚀
Hey tech enthusiasts! 👋 Ever felt like you’re drowning in a sea of time series data, desperately searching for meaningful insights? You’re not alone. This presentation tackled a surprisingly complex challenge – efficiently identifying correlations within massive time series datasets – and the results are genuinely fascinating. Let’s break down the key takeaways and explore how this research could revolutionize your monitoring and observability strategies. 💡
The Problem: N x N Correlation is a Beast 🤖
Traditionally, finding correlations between time series requires comparing every time series to every other time series – an “n by n” operation. As the number of time series grows (we’re talking tens of thousands!), this quickly becomes computationally infeasible. The speaker highlighted this issue, stating that it’s “like o of n squ over the number of your time series,” making it a truly daunting task. Researchers have tackled this with datasets of around 5,000 time series, but that’s still a significant hurdle for many organizations.
A Clever Algorithm Emerges 💻
Fortunately, a researcher had already developed an algorithm to scale this problem. This algorithm, based on a paper, managed to handle approximately 80,000 time series on a relatively modest Kubernetes cluster – a two-CPU, 4GB RAM VM. This is a huge win! The speaker demonstrated the algorithm’s effectiveness, running it and finding correlations within a timeframe of just a few minutes.
The Data Dilemma: Beyond the Basics 👾
But here’s where things got interesting. Simply finding correlations wasn’t enough. The speaker realized that the initial results were, frankly, a lot of “boring” correlations. A huge percentage of the time series were constant, or had a constant slope – essentially, they were trivially correlated with each other. This highlighted a critical point: many default monitoring setups generate a lot of data that isn’t actually insightful. “A lot of the others, they’re also kind of boring because they got sort of counters with a constant slope,” the speaker noted.
The 20-30 Minute Delay Challenge 💾
The next challenge was storing and querying these interesting correlations. The speaker found a 20-30 minute delay between processing a time window and having the results available. This presented a significant bottleneck. They were evaluating two options:
- Victoria Metrics: Currently hitting a limit of around 15 million time series ingested at once. Tweaking the data model and potentially optimizing ingestion were considered.
- PocketFund: A promising alternative, but requiring significant custom querying logic to be built.
Filtering for the Gems ✨
The speaker’s approach involved a clever filtering strategy. They realized that the algorithm could be used to quickly identify and discard the “boring” correlations – constant time series and counters with constant slopes. This dramatically reduced the number of correlations needing further investigation. They found that, within 80,000 time series, there were between 6 and 12 million correlated pairs, but a large portion were irrelevant.
Bottlenecks and Future Steps 🛠️
The speaker identified the primary bottleneck as the time taken to process a time window and write the results to Victoria Metrics. They noted a strong correlation between the number of correlated pairs found and the processing time, as well as the delay in writing to Grafana.
The next steps involve:
- Streaming Aggregations: Exploring more sophisticated streaming aggregations to potentially uncover even more valuable insights.
- Scaling Data Storage: Investigating ways to scale up the data ingestion and writing process, potentially through discussions with Victoria Metrics about optimizing their data schema.
Key Takeaways & Future Directions 🎯
This research demonstrates that even within seemingly overwhelming datasets, there’s potential for uncovering hidden connections. By combining a clever algorithm with strategic filtering, it’s possible to identify meaningful correlations and gain a deeper understanding of your systems. The speaker’s work highlights the importance of not just collecting data, but also actively seeking out useful data. 📡
This is a fantastic example of how a focused approach, combined with a bit of clever engineering, can transform a seemingly intractable problem into a valuable opportunity. Keep exploring, keep experimenting, and keep uncovering those hidden connections! 💫