Presenters

Source

Postgres’ Path to Multi-Threading: A Complex Journey 🚀💡👨‍💻

PostgreSQL is a powerhouse database, but its architecture presents some interesting challenges when it comes to scaling performance. This presentation segment dives deep into the ambitious goal of transforming PostgreSQL into a truly multi-threaded database, outlining the challenges, proposed solutions, and current progress. The speaker emphasizes that this isn’t a simple task, requiring significant architectural changes and community effort.

The Core Problem & Proposed Solution: From Processes to Threads 🌐🛠️

Currently, PostgreSQL’s threading model is inefficient. It utilizes a process-per-socket model, meaning each incoming connection spawns a new process. This leads to significant context switching overhead as the operating system jumps between these processes. The speaker proposes a shift to a more efficient model where one thread runs per hardware core, maximizing CPU utilization – a common approach used by systems like SQL Server, Sybase, and many HPC environments.

Challenges and Tradeoffs: A Mountain to Climb ⛰️

The path to multi-threading isn’t straightforward. Several significant hurdles stand in the way:

  • Global Variable Dependence: PostgreSQL’s current architecture is heavily reliant on global variables representing transaction and session state. This is a major obstacle, as these variables need to be converted into thread-local variables to avoid data corruption and race conditions. A brilliant approach being pioneered by Heikki (whose public branch is crucial to this effort) is to classify these variables, allowing for more informed decisions about concurrency and future optimizations.
  • Complexity & Architectural Overhaul: Achieving the ultimate goal – a system where the executor is broken into small pieces runnable on a custom scheduler – is incredibly complex and not a starting point. It requires a complete architectural re-think.
  • Signal Usage: PostgreSQL’s reliance on signals for backend communication is problematic in a multi-threaded environment. The speaker advocates for replacing signals with a latch-based (and eventually interrupt-based) system for wakeups.
  • Cache Invalidation: Duplicated cache replacement mechanisms across backends lead to memory inefficiency. Sharing this data introduces complexity and is currently deferred.
  • Memory Contexts & DSM/DSA: Simplifying these memory management aspects is desirable but complex and likely delayed until a multi-process mode is phased out.

Technical Details & Technologies: The Building Blocks 💾📡

  • Green Threads: Mentioned as an alternative, but deemed impractical for C due to portability issues.
  • POSIX & C11: The speaker proposes a new header file, pg_threads.h, to standardize thread-related functions, acknowledging potential resistance and the possibility of using POSIX names instead.
  • Latches & Interrupts: Transitioning from signals to latches and, ultimately, interrupts for backend communication is key.
  • Heikki’s Branch: A critical development effort, introducing a classification scheme for global variables to facilitate thread-local conversion.

Current Progress & Non-Goals: A Phased Approach 🎯

  • Ongoing Code Cleanup: Numerous efforts are underway to remove non-thread-safe code, including eliminating global locale usage.
  • Prototype Goals: The initial prototype aims for near-identical behavior to the existing multi-process mode, allowing for a gradual transition and minimizing disruption to the ecosystem.
  • Non-Goals: Sharing file descriptors between backends, complex cache invalidation features, and intricate memory management are explicitly excluded from the initial prototype to keep the scope manageable.

Looking Ahead: A Community Effort 👾

The speaker emphasizes that this is a multi-year effort requiring broad community involvement. The transition to a truly multi-threaded PostgreSQL is essential for future scalability and performance. The speaker encourages contributions and exploration of the ongoing work, particularly in identifying and removing remaining non-thread-safe code. This isn’t just a technical challenge; it’s a community-driven evolution of a vital piece of infrastructure.

This journey is a testament to the dedication of the PostgreSQL community and a glimpse into the future of database performance.

Appendix