Why Agentic AI Demands a Different Safety Approach

Traditional LLM safety work focused on outputs — was the response harmful, biased, or factually wrong? Agentic systems raise the stakes considerably. When an AI can invoke tools, retrieve external data, and chain actions together autonomously, the attack surface expands in every direction.
The most expensive failures rarely start with a clever adversary. They start with a design decision nobody questioned early enough — a product team granting an agent access to a sensitive tool, or defining a user flow without fully working through what could go wrong. By the time a red team surfaces the issue, the system is largely built and the cost of correction is high.
Microsoft’s framing is direct: AI safety must become a continuous engineering discipline, not a checkpoint. RAMPART and Clarity are the practical instruments for that shift.
What It Is
RAMPART stands for Risk Assessment & Measurement Platform for Agentic Red Teaming. It is an open-source testing framework built on top of PyRIT, Microsoft’s existing automation framework for red teaming generative AI systems.
The distinction between the two is important. PyRIT is optimized for black-box discovery by security researchers after a system is built. RAMPART is designed for engineers as the system is being built — the same people writing features are expected to write safety tests alongside them.
How It Works

The developer experience will feel familiar to anyone who has written integration tests. Teams write standard pytest tests that describe scenarios drawn from their threat model. Each test connects to the agent through a thin adapter, orchestrates an interaction, and evaluates observable outcomes. Tests return a clear pass or fail signal and can be gated in CI just like any other integration test.
When a new tool or data source is added to an agent, the corresponding safety test can be added in the same pull request. Safety coverage grows with the system rather than lagging behind it.
Three Defining Characteristics

1. Built for prompt injection attacks
RAMPART’s most mature coverage focuses on cross-prompt injection — scenarios where an agent retrieves or processes poisoned content from documents, emails, tickets, or other external data sources that manipulate its behavior indirectly. This is one of the most prevalent and underappreciated attack vectors in agentic systems today. New threat categories can be added incrementally as attack patterns evolve, with extension points defined as Python protocols to keep integration lightweight.
2. Built for probabilistic behavior
LLM behavior is not deterministic. A single-shot test that passes once tells you very little. RAMPART supports statistical trials, allowing the same test to run multiple times with configurable policies — for example, “this action must be safe in at least 80 percent of runs.” This reflects how agents actually behave in production far more accurately than binary validation ever could.
3. Built to reproduce red team findings and production incidents
When something goes wrong in a live system, two things need to happen quickly: replicate the incident precisely, and verify that the fix holds up against variants of the original attack. RAMPART is purpose-built for exactly this workflow. Findings from a red team engagement can be encoded as RAMPART tests, permanently covering the issue and ensuring it never silently regresses across future changes.
The Ownership Model
The framework deliberately flips the traditional ownership model. Engineers write the tests, engineers run them, and engineers treat failures like any other bug. RAMPART supplies the attack strategies, adversarial payload generation, and evaluation logic. The test author focuses on expressing expectations about what their agent should and should not do.
Evaluators in RAMPART are composable. Teams can combine them with boolean logic to express nuanced safety conditions — inspecting which tools the agent invokes, what side effects occur, and whether actions stay within expected boundaries — rather than relying on a single binary signal.
The Problem It Solves
Most AI tools are designed to help teams execute faster. Clarity was designed to help teams determine whether they are executing on the right thing in the first place.
In the current era of rapid AI-assisted development, execution is easy. The harder question is the “why.” Clarity asks the kinds of questions that experienced architects, product managers, and safety engineers would ask — the ones that are easy to skip when a team is excited about shipping something new.
A Concrete Example

Consider a team adding real-time collaboration to a document editor. Instead of jumping straight to implementation options, Clarity asks what happens when two people edit the same paragraph simultaneously — and whether the team actually needs true real-time collaboration with cursors and presence indicators, or whether “nobody loses their work” is the real requirement. Those two answers lead to very different architectures with very different failure modes. Getting that distinction right early can save months of rework.
How It Works

Clarity runs as a desktop app, a web UI, or embedded directly in a coding agent. It guides teams through structured conversations covering four areas: problem clarification, solution exploration, failure analysis, and decision tracking.
As the conversation progresses, results are written to a .clarity-protocol/ directory in the repository as plain, human-readable Markdown files. These files get committed, reviewed in pull requests, and diffed just like source code. They capture the problem statement, solution rationale, failure analysis, and the key decisions made along the way — including which alternatives were ruled out and why.
The Failure Analysis Layer

The failure analysis component deserves particular attention. Multiple AI “thinkers” independently examine the system from different angles — security, human factors, adversarial scenarios, and operational concerns. The team then works through the results together with Clarity, grouping related failures, tracing causal chains, and building management plans.
Clarity also tracks staleness across these documents, because they form a dependency graph. When a problem statement changes, Clarity recognizes that the solution description and failure analysis may need revisiting and prompts the team accordingly. Six months later, anyone on the team can reconstruct the full reasoning behind a design decision — not just the conclusion, but the path that led there.
The .clarity-protocol/ directory becomes a shared artifact that the entire team can see and contribute to. For stakeholders who need a summary before a review, Clarity can generate a coherent review packet on demand.
How RAMPART and Clarity Work Together

These two tools address different phases of the same problem. Clarity operates at the design stage, helping teams clarify intent, surface assumptions, and document decisions before implementation begins. RAMPART operates throughout the development lifecycle, giving teams the building blocks to write concrete agent safety tests and keep them running as the system evolves.
Together, they represent a spec-driven, engineering-native approach to AI safety — one where safety is not a gate at the end of a release cycle but a set of living artifacts that developers use continuously. Clarity captures the “what should this agent do and not do” question in structured, reviewable form. RAMPART turns those expectations into executable tests that run on every change.
This approach also scales the lessons of red teaming across the industry. A cross-prompt injection attack that works against one agentic system will often work, with minor variations, against another. RAMPART gives teams a way to encode those lessons as runnable engineering assets rather than leaving them locked inside individual engagement reports.
Who Should Pay Attention

Security engineers and AI red teamers will find RAMPART immediately useful — it brings their findings into the development workflow and ensures discovered vulnerabilities are permanently covered rather than fixed once and forgotten.
Product managers and engineering leads building agentic workflows will benefit most from Clarity, particularly during the design phase when the cost of changing direction is lowest and the value of structured questioning is highest.
Enterprise teams deploying LLM agents at scale — in customer service, coding assistance, document processing, or workflow automation — face exactly the incident response and regression challenges that both tools are designed to address.
Both tools are available now as open-source projects from Microsoft. Teams interested in enterprise deployment can reach the team at aisafetytools@microsoft.com.
A Closing Observation
The release of RAMPART and Clarity signals something worth noting beyond the tools themselves. Microsoft is treating agentic AI safety as an engineering problem with engineering solutions — repeatable tests, version-controlled design documents, composable evaluators, and CI integration. That framing matters.
The alternative — periodic red team engagements, ad hoc incident response, and safety reviews that happen once before launch — is not adequate for systems that act in the world continuously and evolve with every deployment. The teams building those systems need instruments that evolve with them. RAMPART and Clarity are a credible attempt to provide exactly that.
Comments (0) No comments yet
Want to join this discussion? Login or Register.
No comments yet. Be the first to share your thoughts!