Presenters
Source
🚀 Supercharge Your Analytics: Fujitsu’s Calamina and the Vertical Clustered Index (VCI) 💡
Are you struggling to keep up with the ever-increasing demands of real-time analytics? Do you need blazing-fast query performance without sacrificing the speed of your operational updates? Then you’re going to want to learn about Fujitsu’s exciting work with PostgreSQL, the Calamina data store, and a groundbreaking community proposal called the Vertical Clustered Index (VCI)!
Aya Iwata from Fujitsu recently presented a deep dive into this innovative approach, and we’re breaking down the key takeaways for you. Let’s dive in!
💾 The Challenge: Analytics vs. Operations 🎯
Traditional databases, while reliable, often fall short when it comes to analytical workloads. Think about e-commerce, finance, manufacturing, or transportation – these industries need to analyze data constantly, and they need it fast. The problem? Row-oriented databases scan entire rows, even when you only need a few columns. This is a major performance bottleneck.
✨ Introducing Calamina: Column-Oriented Storage to the Rescue 🦾
Fujitsu’s Calamina data store tackles this issue head-on with column-oriented storage. What does that mean? Instead of scanning entire rows, Calamina scans only the columns you need for a query. This dramatically reduces I/O costs and accelerates searches.
Here’s how Calamina shines:
- Column-Oriented Storage: Imagine querying product names and purchase dates – Calamina avoids scanning irrelevant data like product IDs and prices.
- Efficient Data Compression: Frequent values (like product names) are stored efficiently using techniques like dictionary compression. This is far more effective than row-wise storage because data is often repetitive.
🚧 The Catch: Updates Slow Down Analytics 👨💻
While Calamina is a powerhouse for analytics, there’s a tradeoff. Its column-oriented nature makes updates (inserts, updates, deletes) slower because each column must be individually read and modified. This contrasts sharply with the speed of updates in traditional row-oriented databases.
🚀 VCI: Bridging the Gap with a Community-Driven Solution 🌐
This is where the Vertical Clustered Index (VCI) comes in. VCI is a community-driven initiative designed to enhance PostgreSQL’s capabilities and enable real-time analytics on frequently updated data without sacrificing operational performance.
So, how does VCI work its magic?
VCI utilizes a clever two-tiered approach:
- Asynchronous Conversion: Data is converted to the Calamina format asynchronously, minimizing impact on your transaction processing (OLTP) speed.
- Combined Scanning: Queries scan both existing Calamina data (Read Optimized Storage - ROS) and recently updated data (Write Optimized Storage - WOS) to ensure you’re always working with the latest information.
- Two Storage Structures: ROS is optimized for querying, while WOS manages updates.
- Parallel Scanning: While the prototype removes it for standardization, VCI originally supported parallel scanning for even faster query execution.
- Data in Memory: Future plans include holding Calamina data in memory for lightning-fast retrieval.
Here’s a quick rundown of the VCI workflow:
- Index Creation: You create a VCI using
CREATE INDEX ... USING VCI
, establishing ROS for the columns you want to analyze. - Inserts & Updates: New data is inserted into the original row-oriented table, and metadata about the update is stored in WOS.
- Scanning: Queries scan ROS for fast retrieval. If data isn’t in ROS, WOS is consulted to convert the data on the fly.
📊 Performance That Speaks Volumes 🗣️
Fujitsu’s internal testing has shown some seriously impressive results:
- Scanning Performance: VCI parallel scans were 4.4 times faster than standard OSS parallel scans.
- Sequential Scans (TPC Measurements): VCI scans were a staggering 46 times faster for one query and 11 times faster for another!
- Update Performance: Crucially, VCI’s update performance was nearly identical to standard row-oriented storage, ensuring your operational updates remain fast and efficient.
🚧 Future Work and How You Can Get Involved 🛠️
While VCI shows immense promise, there’s still work to be done:
- Expanded Data Type Support: Currently, support for data types like text and JSON is limited.
- Simplified Index Creation: Creating indexes on large tables can be time-consuming.
- Improved PG Upgrade Support: The current implementation requires manual steps during PostgreSQL upgrades.
- Custom Hooks: Several hooks were added to manage the conversion between ROS and WOS and need further refinement.
Fujitsu is actively encouraging community participation in the development of VCI. If you’re passionate about PostgreSQL and want to contribute to the future of real-time analytics, now’s your chance! Check out their community resources and join the conversation!
Tools/Technologies Mentioned:
- PostgreSQL: The core database platform
- Calamina Data Store: The foundation for column-oriented storage
- Vertical Clustered Index (VCI): The community-developed extension for PostgreSQL
- Read Optimized Storage (ROS): Storage for fast querying
- Write Optimized Storage (WOS): Storage for managing updates