RAG Without the Hassle: Building AI-Powered Applications By Peng Wang.

Presenters

Peng Wang

Source

apidays London 2025

🚀 Simplifying AI Application Development: Beyond the Rag Complexity 💡

Hey everyone! 👋 Peng Wang here, and it was a pleasure sharing insights at API Days London. We’re seeing a massive shift in how we build applications, driven by the rise of AI. Retrieval Augmented Generation (RAG) has been a key ingredient, but let’s be honest, it’s often a messy, complex undertaking. Today, we’ll explore how to simplify that process and build AI-powered applications faster and more reliably.

The AI Data Challenge: Why Things Are Getting Complicated 🌐

The demand for AI is exploding, and with it, the need for data. Here’s the reality:

Unstructured Data Reigns Supreme: International Data Corporation (IDC) estimates that over 92% of global data is unstructured. That’s the data that fuels AI workloads like RAG and semantic search.
Executives See the Value: 41% of executives say RAG is essential, and over 80% believe using their own data with GenAI models will be a competitive edge.
Enterprises Are Moving Fast: Two-thirds are adopting AI infrastructure platforms, and 60% are embedding agent capabilities directly into their core applications.

This isn’t experimental anymore – AI workloads are becoming mainstream. But this rapid adoption brings new challenges for database and platform teams.

The Fragmented Tech Stack: A Common Pain Point 🛠️

The problem? Different AI workloads need different capabilities:

Intelligent Q&A: Low latency, semantic search, context awareness.
Log Analytics: High ingestion rates.
Core Business Systems: Strong consistency, ACID properties, and SEO.

Historically, this has led to a patchwork of specialized databases – transactional, vector, data warehouse, graph, and more. This creates a fragmented technology stack, increasing complexity, risk, and overhead.

Many engineers are craving:

A single platform for all data workloads.
Strong consistency and high availability.
Seamless integration with AI pipelines.
Elastic scaling on Kubernetes.
Unified access and multi-tenancy with resource isolation.

Instead, they’re facing:

Trade-offs between performance and consistency.
Complex data movement between services.
Difficult scaling across different workloads.
Inconsistent query models and APIs.

A Real-World Example: The Coffee Shop Recommendation ☕

Let’s illustrate this with a common scenario: an intelligent Q&A system recommending coffee shops.

Imagine a user asking: “Please recommend a coffee shop within 500 meters with an average consumption of less than $8 per person.”

This seemingly simple request requires three distinct queries:

Spatial Query: Finding shops within 500 meters.
Scalar Query: Filtering by average consumption.
Semantic Query: Identifying which establishments are primarily coffee shops.

In a conventional architecture, each query would hit a different database (e.g., PostgreSQL for the spatial and scalar queries, a vector database for the semantic query). The results would then need to be merged and sorted, creating a complex and potentially slow process. This architecture can quickly become a bottleneck.

The Solution: Embracing the Multimodal Database 💾

The key is to unify these capabilities within a multimodal database. What does that mean?

Support for Diverse Data Types: Structured (integers, strings), semi-structured (JSON, XML), key-value pairs, spatial data, vector data, and even unstructured data derived from documents, images, and videos.
Unified Query Access: SQL, NoSQL, and GraphQL.
Unified Data Storage: Eliminating data silos.

With a multimodal database, that coffee shop query can be expressed as a single, hybrid query – combining spatial, scalar, and vector search capabilities within a single statement. This dramatically simplifies the architecture and improves performance.

HTTP Capabilities & Real-Time Consistency 📡

Another crucial aspect is the move towards HTTP capabilities. Traditionally, data synchronization between OLTP (transactional) and OLAP (analytical) databases required ETL processes, leading to data latency and potential inconsistencies. With HTTP capabilities, this ETL step is eliminated, enabling real-time data consistency and eliminating the risk of stale data.

Supercharging Vector Search: HSW + RapidQ ✨

We’re also seeing exciting advancements in vector search technology. Specifically, the combination of the HSW (Hierarchical Navigable Small World) algorithm and RapidQ is a game-changer.

HSW: Efficiently searches large datasets.
RapidQ: Compresses vectors, reducing memory footprint.

Combining these two technologies resulted in a 95% reduction in memory usage during our experiments – a massive cost saving for customers!

Key Takeaways: Simplify, Accelerate, and Build with Confidence 🎯

Here’s the bottom line:

Simplify: Unify OLTP and OLAP capabilities in a single database, eliminating glue code.
Accelerate: Build RAG pipelines directly with SQL, avoiding juggling multiple systems.
Reliability: Benefit from consistency and scalability from a distributed database engine.
Developer-Friendly: Easily expose database capabilities through APIs for services like Q&A and recommendations.

You don’t always need a multimodal database for every AI application, but for complex workloads, it’s a powerful solution. Explore options like CockroachDB, TiDB, or other databases embracing this direction.

Let’s build the future of AI together! Connect with me on LinkedIn or via email – I’m eager to hear your thoughts and experiences.

🚀 Simplifying AI Application Development: Beyond the Rag Complexity 💡#

The AI Data Challenge: Why Things Are Getting Complicated 🌐#

The Fragmented Tech Stack: A Common Pain Point 🛠️#

A Real-World Example: The Coffee Shop Recommendation ☕#

The Solution: Embracing the Multimodal Database 💾#

HTTP Capabilities & Real-Time Consistency 📡#

Supercharging Vector Search: HSW + RapidQ ✨#

Key Takeaways: Simplify, Accelerate, and Build with Confidence 🎯#

Appendix#