4 hours ago

How to Build Agentic AI on AWS: A Governed Serverless Data Mesh with Amazon Bedrock and S3 Tables

Agentic AI is not RAG with extra steps. When an AI agent autonomously discovers schemas, constructs SQL, queries multiple data sources, and synthesizes a response, every single step is a potential governance gap.

A single metadata filter at retrieval time — the approach that worked fine for RAG — cannot govern a five-step autonomous chain. You need authorization enforced natively at each layer: tool discovery, schema inspection, query execution, vector retrieval, and response synthesis.

This guide shows you exactly how to build that architecture on AWS using Amazon Bedrock AgentCore, S3 Tables with Apache Iceberg, Lake Formation, Amazon Athena, and S3 Vectors. The result is a governed, serverless data mesh that production agentic AI actually requires.

14 mins read

23 sections

3 visuals

Key Highlights

Enforce authorization at every agent step, not just vector retrieval
Combine S3 Tables, Lake Formation, and Athena for a governed data mesh
Use AgentCore Gateway and interceptors to control tool access with JWT scopes

Why Agentic AI Breaks the RAG Governance Model

RAG enforces governance at one checkpoint: metadata-filtered vector retrieval. That works when your system queries a single pre-built index and returns chunks.

Agentic AI does five things instead of one. It discovers which tables exist, reads schemas, constructs SQL dynamically, queries vector stores, and synthesizes results across all of it. Each step is an independent authorization decision.

Three specific limitations make the RAG model insufficient here:

Stale revocations. Vector databases synchronize permissions periodically. A revoked user can still access data until the next sync cycle.
Identity complexity. Role hierarchies, attribute-based access, and row-level filters cannot be expressed as simple metadata key-value pairs on vector chunks.
Multi-hop exposure. A single compromised tool in a multi-step chain can leak data that no single checkpoint would have allowed.

The solution is a governed data mesh where authorization is enforced natively at each data access layer — not bolted on at the edge.

Architecture Overview: Four Layers That Each Enforce Their Own Controls

The architecture runs from customer request through governed data access and back. No single point of failure can expose unauthorized data because each layer owns its own authorization logic.

Layer 1: Agent Layer

AgentCore Runtime hosts the LangGraph agent in isolated microVM environments with session isolation. The agent integrates with MCP tools through the MCPClient class. This is your execution environment — serverless, isolated, and stateless per session.

Layer 2: Gateway Layer

The Gateway is where deterministic access control lives. It includes:

A request interceptor that validates JWT tokens and enforces scope before any tool executes.
A response interceptor that filters tool lists, redacts sensitive data, and writes audit logs.
AgentCore Policy with Bedrock Guardrails that evaluates every tool invocation for prompt injection, harmful content, and sensitive information exposure in real time.

Layer 3: Tools Layer

Four Lambda-backed MCP tools provide governed data access: get_user_tables, get_schema, run_query, and kb_search. Each tool is scoped, audited, and constrained by the layers above and below it.

Layer 4: Governed Data Mesh

This is the data foundation. It consists of S3 Tables (Apache Iceberg format) registered in the AWS Glue Data Catalog, Amazon Athena with workgroup cost controls, Lake Formation enforcing row/column/cell-level security, and S3 Vectors powering Amazon Bedrock Knowledge Bases.

Prerequisites Before You Build

Make sure you have the following in place before starting:

An AWS account with administrator access.
IAM permissions to create roles, policies, Lambda functions, S3 Tables table buckets, Athena workgroups, and Lake Formation configurations.
Familiarity with Lake Formation concepts: data lake administrator, LF-Tags, and data filters.
Amazon Bedrock enabled with model access configured.
Amazon Bedrock AgentCore access configured in your account.
AWS CLI v2 installed and configured.

Step 1: Build the Governed Serverless Data Mesh

A data mesh decentralizes data ownership to domain teams while centralizing governance and discoverability. On AWS, this means domain teams own their data products end-to-end, the AWS Glue Data Catalog provides centralized metadata discovery, and Lake Formation enforces permissions across databases, tables, columns, rows, and cells.

Each producer domain lives in its own AWS account. Producers register data products in a central governance account — a dedicated AWS account that hosts the authoritative Glue Data Catalog and Lake Formation permission policies for the entire organization.

Data is shared through Lake Formation cross-account sharing. No data is copied. Only metadata is linked through resource links in consumer catalogs. At query time, Lake Formation verifies permissions and issues temporary credentials to the query engine.

Tag-based access control (LF-TBAC) scales this dynamically. Assign LF-Tags like classification=PII or department=customer_service to resources, then grant permissions based on those tags rather than managing individual resource grants.

Setting Up S3 Tables with Apache Iceberg

For structured transactional data, use Amazon S3 Tables. It is the first cloud object store with built-in Apache Iceberg support, delivering up to 10x higher transactions per second compared to self-managed Iceberg tables on general-purpose S3 buckets. Compaction, snapshot management, and unreferenced file removal are handled automatically.

S3 Tables integrates with Amazon SageMaker Lakehouse, which populates the Glue Data Catalog and federates access through Lake Formation.

For the customer service agent, the Order Management domain publishes three data products:

customer_orders
customer_profiles
interaction_history

All three are queryable from Athena, governed by Lake Formation permissions, and automatically compacted by S3 Tables.

Enforcing Row-Level and Column-Level Security with Lake Formation

This is where the governance gets precise. Lake Formation data filters enforce row-level security so the agent can only access records belonging to the authenticated customer.

A data filter on customer_orders with the row filter expression customer_id = :customer_id restricts every query to the current customer’s records — regardless of how the agent constructs its SQL. The run_query Lambda function injects the authenticated customer’s identity as a session parameter before submitting queries to Athena.

Column-level security hides sensitive fields like payment_method and billing_address from query results entirely. The agent never sees those columns, even if it tries to select them.

Building the Knowledge Base with Amazon S3 Vectors

Structured data answers transactional questions. Unstructured knowledge — return policies, product manuals, FAQs, troubleshooting guides — requires semantic search.

Amazon S3 Vectors provides native vector storage and querying as a fully serverless service. It supports up to 2 billion vectors per index with strong write consistency, meaning newly added vectors are immediately queryable.

Cost advantage: S3 Vectors can reduce vector storage and query costs by up to 90% compared to specialized vector database solutions in moderate query-frequency workloads. For high-QPS workloads requiring single-digit millisecond latency, Amazon OpenSearch Serverless remains the better fit. AWS provides a single-step export path from S3 Vectors to OpenSearch Serverless for workloads that outgrow the S3 Vectors performance profile.

S3 Vectors supports filterable metadata with string, number, boolean, and list types using operators like $eq, $ne, $gt, $in, $and, and $or. Store documents with filterable metadata keys like product_category and document_type to enable targeted semantic search.

A metadata filter that retrieves only electronics return policies looks like this:

{"$and": [{"product_category": {"$eq": "electronics"}}, {"document_type": {"$eq": "return"}}]}

Step 2: Expose the Data Mesh Through AgentCore Gateway

AgentCore Gateway consolidates authentication, observability, and policy enforcement into a single endpoint. It converts Lambda functions, APIs, and existing MCP servers into MCP-compatible tools with protocol translation, inbound OAuth authorization, and outbound credential management.

Agents connect through streamable HTTP transport with an OAuth Bearer token.

The Four Core MCP Tools

Each tool is scoped to a specific data access pattern:

get_user_tables — Queries the Glue Data Catalog filtered by Lake Formation permissions. Returns only the tables the authenticated user is authorized to see.
get_schema — Retrieves column names, types, and descriptions for a specified table. Lake Formation column-level security automatically excludes restricted fields.
run_query — Validates SQL against a read-only allowlist, injects customer identity for row-level filtering, and executes through Athena with byte-scan cost limits enforced.
kb_search — Performs metadata-filtered semantic search against Amazon Bedrock Knowledge Bases backed by S3 Vectors.

The tool schema registration for run_query looks like this:

{
  "name": "run_query",
  "description": "Executes a read-only SQL query against governed Iceberg tables via Athena.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "sql": {"type": "string", "description": "A read-only SQL SELECT statement."},
      "database": {"type": "string", "description": "The Glue Data Catalog database name."}
    },
    "required": ["sql", "database"]
  }
}

Note on native Knowledge Base targets: Amazon Bedrock Managed Knowledge Base is now available as a native pre-built target type in AgentCore Gateway. For production workloads where custom interceptor logic is not required, the native Managed KB target type eliminates the Lambda function entirely while retaining MCP compatibility and AgentCore Policy enforcement. The custom kb_search Lambda in this architecture demonstrates how Gateway interceptors enforce fine-grained authorization at the tool invocation boundary.

Step 3: Deploy the MCP Tools and Lambda Functions

Follow these steps to deploy each tool:

1. Clone the repository.

Clone the AgentCore Gateway interceptor samples repository to your local environment.

2. Create each Lambda function using the AWS CLI.

Repeat this for get_user_tables, get_schema, run_query, kb_search, the request interceptor, and the response interceptor:

aws lambda create-function --function-name get_user_tables 
  --runtime python3.12 --handler lambda_function.lambda_handler 
  --role arn:aws:iam::ACCOUNT_ID:role/mcp-tool-role 
  --zip-file fileb://function.zip

3. Attach IAM policies.

Attach the IAM policies from the repository’s policies/ directory to each function’s execution role. These policies enforce read-only access and restrict Glue Data Catalog mutations.

4. Register Lambda functions as MCP tool targets in AgentCore Gateway.

Follow the Registering tool targets documentation in the AgentCore Gateway console or CLI.

5. Attach the request and response interceptors to the Gateway.

Follow the AgentCore Gateway interceptor samples documentation for attachment instructions.

Step 4: Implement Gateway Interceptors for Deterministic Access Control

Gateway interceptors are custom Lambda functions that enforce authorization at two stages: before the Gateway calls the target Lambda (request interceptor) and after the target responds but before results reach the caller (response interceptor).

Three Interceptor Patterns That Matter

JWT scope-based tool invocation control. The request interceptor decodes the JWT scope claim and blocks unauthorized tool invocations before they execute.

Dynamic tool filtering. The response interceptor removes unauthorized tools from the tools/list response based on per-user scopes. The agent never discovers tools it cannot use.

Act-on-behalf identity propagation. Each hop receives a separate, scoped-down token. The Order tool gets only order:read. The KB tool gets only kb:search. An unauthorized downstream tool cannot reuse an overly privileged token.

The authorization check in the request interceptor:

def check_tool_authorization(scopes, tool, target):
    if target in scopes:
        return True
    return f"{target}:{tool}" in scopes

The response interceptor for dynamic tool filtering:

def lambda_handler(event, context):
    gateway_response = event['mcp']['gatewayResponse']
    auth_header = gateway_response['headers'].get('Authorization', '')
    token = auth_header.replace('Bearer ', '')
    claims = decode_jwt_payload(token)
    scopes = claims.get('scope', '').split()
    tools = gateway_response['body']['result'].get('tools', [])
    filtered_tools = [
        t for t in tools
        if check_tool_authorization(scopes, t['name'].split('.')[1], t['name'].split('.')[0])
    ]
    return {
        "interceptorOutputVersion": "1.0",
        "mcp": {
            "transformedGatewayResponse": {
                "statusCode": 200,
                "headers": {"Authorization": auth_header},
                "body": {"result": {"tools": filtered_tools}}
            }
        }
    }

Gateway interceptors enforce authorization deterministically at the tool invocation boundary before the model sees or executes tools. Athena byte-scan limits, read-only IAM policies, and Lake Formation row filters serve as compensating controls that bound the scope of any malformed queries the model might produce.

Step 5: Trace the Full Agent Request Flow

Here is what the complete flow looks like for a real customer service query.

User query: “Where is my order #12345, and can I still return the headphones I bought last week?”

The agent must query governed Iceberg tables for order status, retrieve return policies from the vector knowledge base, and synthesize a complete response — all while respecting cost guardrails and regulatory constraints.

Step-by-Step Execution

Step 1 — Tool Discovery. The agent calls tools/list. The response interceptor filters the tool list based on JWT scopes and returns four authorized tools.

Step 2 — Table Discovery. get_user_tables is invoked. The request interceptor validates the JWT and confirms the order:read scope. The Lambda returns three tables: customer_orders, customer_profiles, and interaction_history.

Step 3 — Schema Discovery. get_schema on customer_orders reveals visible columns: order_id, status, ship_date, estimated_delivery, product_name. Lake Formation column-level security has already excluded payment_method and billing_address — they do not appear in the response.

Step 4 — Query Execution. The agent constructs:

SELECT order_id, status, ship_date, estimated_delivery
FROM customer_orders
WHERE order_id = '12345'

run_query injects the authenticated customer’s identity to resolve the Lake Formation row filter. Athena workgroup enforces a BytesScannedCutoffPerQuery limit. The query returns only records belonging to the authenticated customer.

Step 5 — Knowledge Base Retrieval. kb_search runs with query "return policy for electronics" and metadata filter {"product_category": {"$eq": "electronics"}}. The knowledge base returns: “Electronics may be returned within 30 days of purchase in original packaging for a full refund.”

Step 6 — Response Synthesis. The agent combines both results: “Your order #12345 shipped on March 20 and is estimated to arrive by March 25. Regarding the headphones, our electronics return policy allows returns within 30 days of purchase in original packaging. I can initiate a return for you. Would you like to proceed?”

Authorization was enforced at a different layer at each step — Gateway interceptors for steps 1–2, Lake Formation for steps 3–4, Athena workgroup limits for step 4, and S3 Vectors metadata filtering for step 5.

Step 6: Configure the Five Security Guardrail Layers

Five overlapping layers of protection constrain what the agent can query, how much data it can scan, and what information reaches the model.

Layer 1 — Athena workgroup cost controls. Set BytesScannedCutoffPerQuery limits and enable EnforceWorkGroupConfiguration so agents cannot override workgroup settings.

Layer 2 — DDL prevention. Read-only IAM policies explicitly deny all mutating Glue Data Catalog actions. The agent cannot create, alter, or drop tables.

Layer 3 — Lake Formation fine-grained access. Database, table, column, row, and cell-level permissions enforced natively across all integrated services.

Layer 4 — Gateway interceptors. JWT scope-based authorization enforced before tool execution. Response filtering and redaction applied before results reach the model.

Layer 5 — Amazon Bedrock Guardrails via AgentCore Policy. Every agent-to-tool interaction is evaluated in real time for prompt injection, harmful content, and sensitive information exposure.

Why Gateway-Level Guardrails Beat Model-Only Guardrails

Applying guardrails solely at the model inference boundary is insufficient for agentic workloads. Agents invoke multiple tools and synthesize results across several hops. A guardrail at the model boundary only sees the final output — it misses what happened in between.

Gateway-level guardrails evaluate every agent-to-tool interaction at the point of action, not after the fact.

Step 7: Verify Your Implementation

Run these checks to confirm your governance controls are working correctly.

Scope enforcement:

Call tools/list with a JWT that includes order:read. Verify four authorized tools appear.
Call tools/list with a JWT missing order:read. Verify get_user_tables is absent from the response.

Lake Formation table visibility:

Invoke get_user_tables. Verify only your authorized tables appear. Tables from other domains must not be visible.

Row-level security:

Run run_query against customer_orders for the authenticated customer. Verify results contain only that customer’s records.
Run the same query attempting to access another customer’s records. Verify the result set is empty.

Column-level security:

Run get_schema on customer_orders. Verify payment_method and billing_address are not listed.

Cost controls:

Submit a query designed to scan more than the BytesScannedCutoffPerQuery limit. Verify Athena cancels the query and returns an error.

If all five checks pass, your governance controls are enforced end-to-end.

The Governance Model That Scales With Agentic AI

The architecture you have built here does something the single-checkpoint RAG model cannot: it enforces authorization at every step of the autonomous chain, not just at retrieval time.

S3 Tables handles transactional data with native Iceberg support and automatic compaction. Lake Formation enforces row, column, and cell-level security natively across Athena and the Glue Data Catalog. S3 Vectors delivers cost-optimized semantic search with up to 90% cost reduction over specialized vector databases. AgentCore Gateway interceptors enforce JWT scope-based authorization deterministically before any tool executes. And five overlapping security layers ensure that even a malformed model-generated query cannot expose unauthorized data.

As your agent’s capabilities grow — more tools, more domains, more data products — this architecture scales with you. Domain teams add new data products to the mesh. LF-Tags propagate governance policies automatically. New tools register in the Gateway with their own scoped tokens.

The goal was never to restrict what your agent can do. It was to ensure that everything it does is authorized, audited, and governed. That is the foundation production Agentic AI requires.

Key Highlights

Why Agentic AI Breaks the RAG Governance Model

Architecture Overview: Four Layers That Each Enforce Their Own Controls

Layer 1: Agent Layer

Layer 2: Gateway Layer

Layer 3: Tools Layer

Layer 4: Governed Data Mesh

Prerequisites Before You Build

Step 1: Build the Governed Serverless Data Mesh

Setting Up S3 Tables with Apache Iceberg

Enforcing Row-Level and Column-Level Security with Lake Formation

Building the Knowledge Base with Amazon S3 Vectors

Step 2: Expose the Data Mesh Through AgentCore Gateway

The Four Core MCP Tools

Step 3: Deploy the MCP Tools and Lambda Functions

Step 4: Implement Gateway Interceptors for Deterministic Access Control

Three Interceptor Patterns That Matter

Step 5: Trace the Full Agent Request Flow

Step-by-Step Execution

Step 6: Configure the Five Security Guardrail Layers

Why Gateway-Level Guardrails Beat Model-Only Guardrails

Step 7: Verify Your Implementation

The Governance Model That Scales With Agentic AI

Related · Content

Accenture Leak Reveals the Real AI Cost Center: Token Chewing by Business Users, Not Engineers

From Fragmented Stack to Unified Platform: Preparing MSPs for Agentic AI

General Intuition Raises $320M to Build Gameplay-Trained World Models for Robotics and Simulation

Rebooting Democracy: How AI and Digital Platforms Can Move Us Beyond Likes

Comments (0) No comments yet

Related · Tools

Empromptu

MyPersonas

Aissist

Snowflake Cortex AI

LangWatch

WRITER