Published 2 months ago

Oracle 26ai MCP Server: In‑Database Hybrid RAG Pipeline for Policy Q&A

Enterprise retrieval pipelines are often sprawling constructs — retrieval APIs, orchestration middleware, embedding services, and LLM gateways stitched together across multiple layers. Oracle Autonomous AI Database 26ai takes a different position: move the orchestration logic closer to the data, and expose the result as a governed, discoverable tool through a managed MCP endpoint.

This post breaks down how that architecture works in practice, using a hybrid RAG pipeline for policy Q&A as the concrete implementation.

229

7 mins read

15 sections

Key Highlights

Hybrid RAG runs entirely inside Oracle 26ai, from retrieval to LLM generation
DBMS_HYBRID_VECTOR_SEARCH blends vector and keyword search for precise policy Q&A
HRAG pipeline is exposed as a governed MCP tool returning structured JSON

The Problem with Traditional RAG Architectures

Most RAG setups treat the database as a passive store. Retrieval happens in a separate service, ranking logic lives in application code, and the LLM call is made from yet another layer. Each handoff adds latency, operational complexity, and a new failure point.

For enterprise use cases — HR policy documents, compliance manuals, benefit eligibility rules — this complexity compounds quickly. Exact terminology matters. Governance matters. Predictable output structure matters. A loosely coupled pipeline makes all three harder to guarantee.

What Oracle 26ai Changes

Oracle Autonomous AI Database 26ai introduces a managed, per-database MCP endpoint that exposes Select AI Agent tools directly from the database tier. There is no separate MCP bridge to stand up and maintain.

The implication is significant: retrieval logic, context ranking, LLM orchestration, and tool registration all live inside the database. The database is no longer waiting for an external service to fetch rows — it participates actively in the full retrieval-and-generation lifecycle.

The Implementation: A Policy Q&A Prototype

The team built an end-to-end Hybrid RAG pipeline — referred to throughout as HRAG — validated against an HR policy-style document. The pipeline covers every step from raw question to structured answer, entirely within Oracle Database 26ai.

What the Pipeline Does

The flow is straightforward to describe, even if the internals are precise:

Accept the user’s natural language question.
Retrieve relevant document chunks using hybrid search.
Rank and prepare the retrieved context.
Call the LLM with a structured prompt.
Return a clean JSON response with the answer and supporting chunk text.

Each step is coordinated by a PL/SQL function called HRAG_PIPELINE. This makes the entire flow a callable database function — not a one-off query, not a script, but a reusable, testable unit of logic.

Why Hybrid Retrieval

Pure vector search handles paraphrased questions well. It struggles when the user’s question contains exact policy terms, acronyms, or eligibility language that must match precisely.

Keyword search handles exact terms well. It struggles when the user phrases a question in their own words, away from the document’s literal vocabulary.

Hybrid retrieval — combining semantic vector search with keyword search through DBMS_HYBRID_VECTOR_SEARCH — addresses both failure modes simultaneously. For HR and policy-heavy content, this is not a theoretical improvement. It produces meaningfully better chunk retrieval in practice.

The Embedding and Ranking Layer

An ONNX embedding model handles token generation inside the database. Retrieved chunks are ranked before being passed to the LLM, ensuring the prompt contains the most relevant context rather than the most recently retrieved rows.

The LLM call itself is made via DBMS_CLOUD_AI.GENERATE, which routes to OCI Generative AI through a configured AI profile and credential. The entire call chain — from hybrid retrieval to LLM generation — stays within the database tier.

Registering the Pipeline as an MCP Tool

Once the HRAG_PIPELINE function is validated, it is registered as a Select AI Agent tool using DBMS_CLOUD_AI_AGENT.CREATE_TOOL. The registered tool is named HRAG_PIPELINE_TOOL.

From that point, the Oracle 26ai managed MCP endpoint exposes the tool to any compatible client. The client does not need to know anything about the internal hybrid retrieval logic. It sees a tool with a defined interface: pass a question, receive a structured answer.

This is the practical value of the MCP pattern in this context. The consumer — whether a developer tool, an orchestrator agent, or an automated workflow — interacts with a clean contract, not with database internals.

Cline as the Validation Client

The team validated the integration using Cline, configured with type: "streamableHttp", the per-database MCP URL, and a short-lived Bearer token. Once connected, Cline can discover both built-in Select AI tools and the custom HRAG_PIPELINE_TOOL, then invoke it directly.

This path is useful for rapid validation: a developer can confirm the tool is discoverable, inspect its definition, pass a test question, and verify the response structure — all without writing application code.

Direct API Access

The same capability is available through a direct HTTP call, without any client tooling:

curl -X POST "https://dataaccess.adb.<region>.oraclecloudapps.com/adb/mcp/v1/databases/<db-ocid>" 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{"method":"tools/call","params":{"name":"HRAG_PIPELINE_TOOL","arguments":{"question":"What is the overtime policy?"}}}'

The response returns a structured JSON result with the generated answer and the supporting chunk text. This makes the output predictable and directly consumable by downstream systems.

Orchestrator-Agent Integration

Beyond Cline, the team tested a broader path where a main orchestrator agent — itself MCP-capable — calls HRAG_PIPELINE_TOOL as part of a larger enterprise workflow. This confirms that the tool is not limited to developer validation. It can participate in multi-step agent pipelines without any modification to the database-side implementation.

Architecture at a Glance

The architecture has five distinct layers, each with a clear responsibility:

Oracle Autonomous AI Database 26ai holds the source document, the hybrid vector index (HRAG_INDEX_MCP), the HRAG_PIPELINE function, and the tool registration. This is where retrieval and orchestration logic lives.

DBMS_HYBRID_VECTOR_SEARCH performs the combined semantic and keyword retrieval over the indexed document chunks.

DBMS_CLOUD_AI.GENERATE makes the LLM call from the database tier, routing to OCI Generative AI through the configured AI profile.

DBMS_CLOUD_AI_AGENT.CREATE_TOOL registers the pipeline as a discoverable MCP tool, exposed through the managed per-database endpoint.

MCP-compatible clients and orchestrators — Cline, direct API callers, or orchestrator agents — consume the tool through a defined interface without knowledge of the internal retrieval logic.

Governance and Simplification

Keeping the pipeline inside the database has concrete governance benefits. Identity and access control are handled at the database tier, not bolted on externally. The structured JSON output is consistent and predictable across all callers. There are fewer external moving parts to monitor, secure, and maintain.

For enterprise environments where auditability and access control are non-negotiable, this architecture reduces the surface area that needs to be governed.

Honest Scope and What Comes Next

The current implementation is validated end-to-end with one remote PDF and one ONNX embedding model. That scope is intentional — it demonstrates the full pipeline cleanly without introducing variables that obscure the core architecture.

Several natural extensions are on the roadmap:

Multi-document ingestion with metadata filters for scoped retrieval across document collections.
Retrieval quality evaluation to measure chunk relevance and detect hallucination patterns.
Observability metrics covering latency, token usage, and chunk quality per query.
Enterprise MCP gateway for throttling, multi-tenant routing, and centralized tool governance.

Each of these builds on the same foundation without requiring architectural changes to the core pipeline.

The Takeaway

The architectural insight here is not that Oracle added MCP support. It is that the retrieval-and-generation lifecycle can be owned by the database itself — not delegated to a chain of external services.

By packaging hybrid retrieval and in-database LLM generation as a single, registered MCP tool, the implementation reduces integration complexity while improving governance, maintainability, and output predictability. The client sees a clean interface. The database handles the rest.

For teams building enterprise Q&A systems over policy documents, compliance content, or any structured knowledge base, this pattern is worth examining carefully. The complexity did not disappear — it moved to where it is easier to govern.