When auto scaling isn't fast enough: Handling high volume live event traffic by Ivan Schwarz

Presenters

Ivan Schwarz

Source

Devoxx Belgium 2025

Scaling for the Big Game: Mastering Advertising at Live Events 🚀💡

Ever wondered how companies deliver ads during massive events like the Super Bowl or a Premier League match? It’s not as simple as just throwing more servers at the problem! This presentation delved deep into the challenges of advertising in live, high-volume events, revealing the complexities of scaling infrastructure and applications to handle those intense, unpredictable spikes. Let’s break down the key takeaways and why “more servers” often isn’t the answer.

The Problem: When Standard Scaling Fails 💥

Imagine millions of people tuning in simultaneously to watch a live event. That’s a lot of traffic! Traditional cloud scaling solutions – the kind that work great for typical website traffic – often fall short when faced with these concentrated bursts. Here’s why:

High-Volume, Short-Duration Spikes: Live events generate massive traffic bursts that are short, intense, and unpredictable.
Standard Solutions Fall Short: Auto-scaling and seasonal scaling often aren’t enough to handle the extreme, short-duration nature of these events.
The Consequences: Outages, slow ad delivery, a frustrating user experience, and lost revenue.
The Anycast Misconception: While Anycast (distributing traffic to the nearest server) sounds ideal, it can get overwhelmed by sudden, intense spikes, especially if locations are under-resourced.

Key Concepts: Unpacking the Tech 🛠️

To truly understand the challenges, let’s look at some core concepts:

Unicast vs. Anycast:
- Unicast: Traffic directed to a specific, known server. Simple, but less flexible.
- Anycast: Traffic routed to the “best” server based on network conditions. More resilient, but more complex to manage.
Big vs. Small Data Centers: Finding the balance between low latency (smaller, local data centers) and the ability to handle massive volume (larger, more powerful data centers).
Adaptive Scaling: A game-changing API that allows scaling based on real-time data from live events – viewership numbers, locations, etc. This is a significant step up from traditional methods.
Online vs. Offline Processing: Minimizing the processing done in real-time (“online”). Deferring non-critical tasks to background processing (“offline”) to reduce the load on live systems. This allows the live system to focus on what’s most important right now.

Strategic Approaches: Navigating the Trade-offs 🌐

So, how do you tackle this challenge? Here’s a breakdown of the strategies and the compromises involved:

Understand Your Limits: Don’t assume standard scaling will suffice. Rigorous testing to identify bottlenecks is essential.
Collaboration is Key: Work closely with customers and stakeholders. Their concerns about stability and performance are directly aligned with yours.
Reduced Service Mode: In high-stress situations, temporarily disable non-essential features to prioritize core functionality and prevent outages. Think of it as a conscious decision to shed complexity and focus on what absolutely matters.
Calendar-Based Scaling: Use event schedules to proactively scale resources. However, be prepared for rescheduling and potential errors – flexibility is still crucial.
Focus on the Critical Path: Prioritize scaling efforts on the most crucial parts of the advertising delivery chain. Don’t waste resources on areas that have minimal impact.

Recap: The Essential Takeaways ✨

Let’s quickly recap the key takeaways:

Right View: Understand the problem deeply.
Trade-offs: Consider alternatives and the implications of each choice.
Standard Solutions Limitations: Be aware that standard solutions might not be sufficient.
Collaboration: Build relationships with all stakeholders.
Reduced Service: Prioritize core functionality during high-stress events.
Critical Path Focus: Concentrate scaling efforts on the most important parts of the system.

Looking Ahead: Potential Questions & Deeper Dives 📡

This is a complex topic, and here are some questions that came up during the presentation:

Easy/Clarifying: What’s the difference between online and offline processing? What features might be disabled in reduced service mode?
Medium Difficulty: How do you balance the cost of adaptive scaling with the potential for revenue loss from outages? What are the architectural implications of reduced service mode?
Difficult/Deep Dive: How do you handle geographically diverse user bases with varying network conditions when implementing adaptive scaling? What emerging technologies could further improve scalability and resilience?

Scaling for live events is a constant evolution. By understanding these challenges and embracing innovative solutions, we can ensure a smooth and engaging experience for viewers worldwide. 👨‍💻🦾

Scaling for the Big Game: Mastering Advertising at Live Events 🚀💡#

The Problem: When Standard Scaling Fails 💥#

Key Concepts: Unpacking the Tech 🛠️#

Strategic Approaches: Navigating the Trade-offs 🌐#

Recap: The Essential Takeaways ✨#

Looking Ahead: Potential Questions & Deeper Dives 📡#

Appendix#