Scaling Postgres to the next level at OpenAI

Presenters

Bohan Zhang

Source

PGConf.dev 2025

Pushing PostgreSQL to the Limit: Insights & Feature Requests from a High-Load Environment 🚀

Running a mature database like PostgreSQL at scale is a testament to its power and flexibility. However, as workloads grow and demands increase, the limitations of even the most robust systems become apparent. This presentation segment, gleaned from a deep dive into real-world PostgreSQL performance challenges, highlighted a wealth of observations and feature requests from a high-load environment. Let’s break down the key takeaways and explore how these insights can benefit the wider PostgreSQL community.

1. The Observability Gap: Beyond Average Latency 🎯

One of the most pressing challenges identified was the lack of granular query latency data. While pg_stat_statement provides average latency, it falls short when you need to understand tail latency (P95, P99) and identify performance bottlenecks. Imagine trying to diagnose a slow website without knowing which specific requests are taking the longest – frustrating, right? The speaker advocated for a more robust system, ideally built into PostgreSQL, to track query latency with histograms or percentile data. This would empower developers to pinpoint slow queries and optimize their code more effectively.

2. Schema Changes: A Silent Audit Trail 💾

Maintaining accurate records of schema changes (column additions, index creations, etc.) is crucial for auditing, dependency tracking, and troubleshooting. Currently, this often relies on external monitoring tools, which can be inaccurate or incomplete. The speaker proposed a valuable feature: PostgreSQL natively storing events related to DDL operations. This would streamline schema change tracking and improve the overall reliability of database management.

3. Index Management: The Need for a “Pause” Button 🛠️

Dropping and recreating indexes is a disruptive and time-consuming process, especially for large indexes. The speaker passionately requested the ability to temporarily disable indexes without resorting to this drastic measure. This would enable faster maintenance, more efficient testing, and reduced impact on production workloads. It’s a small change with the potential for significant performance gains.

4. The Mystery of Stale Active Queries 📡

A truly peculiar observation was the discovery of queries remaining in an “active” state for extended periods (up to 2+ hours), preventing termination by idle transaction timeouts. This potentially links to TCP keeper lives and indicates a possible bug or unexpected interaction within PostgreSQL’s query state logic. This observation underscores the importance of continuous monitoring and the potential for unexpected behavior even in mature systems.

5. Tuning for Performance: Beyond the Defaults 🌐

PostgreSQL’s default configuration knobs are often a compromise and may not be optimal for all workloads. The speaker emphasized the need for improved default values and the potential for adaptive, workload-based tuning. It’s a reminder that even with a powerful database, proactive tuning is essential for maximizing performance.

Key Themes & Recommendations:

Automation & Observability are Paramount: The speaker’s requests consistently revolved around automating tasks and gaining deeper insights into PostgreSQL’s behavior.
Valuable Feedback for the PostgreSQL Community: These observations and feature requests represent invaluable feedback for the PostgreSQL development team, and sharing them through appropriate channels can shape the future of the database.
Right Amplification: A critical point was the need to optimize applications to reduce unnecessary database writes. This includes strategies like smoothing out writes over time (“daisy writes”) and avoiding redundant updates.
Scaling PostgreSQL: PostgreSQL can scale remarkably well for read-heavy workloads, often without resorting to sharding. Start with PostgreSQL and scale up instance size as needed.

Looking Ahead:

This presentation segment wasn’t just about highlighting problems; it was about identifying opportunities to improve PostgreSQL’s capabilities and empower developers to build even more powerful and efficient applications. By sharing these insights and engaging with the PostgreSQL community, we can collectively push the boundaries of what’s possible with this remarkable database.

Want to contribute?

Bug Reports: If you encounter unexpected behavior, file a bug report with the PostgreSQL project.
Mailing Lists: Engage in discussions on the PostgreSQL mailing lists.
Conferences: Share your experiences and insights at PostgreSQL conferences.

What are your thoughts on these observations? Have you encountered similar challenges in your PostgreSQL deployments? Share your experiences in the comments below!

Pushing PostgreSQL to the Limit: Insights & Feature Requests from a High-Load Environment 🚀#

Appendix#

Pushing PostgreSQL to the Limit: Insights & Feature Requests from a High-Load Environment 🚀

Appendix