Guardrail Definition Specification¶
File naming:
guardrails/<guardrail_id>.guardrail.mdAudience: Platform engineers, AI safety teams, product owners
Overview¶
A note on naming: The term guardrail is borrowed from the AI safety world, where it primarily connotes blocking and filtering. In AML, the term is intentionally broader: a guardrail is any runtime-invoked backend that inspects, transforms, tags, or enriches agent data. Use cases range from PII redaction and content moderation to topic classification, sentiment scoring, and metadata injection. If the name feels too narrow for your use case, the underlying concept is simply a runtime pipeline hook.
A guardrail is a special-purpose backend that the agent runtime invokes automatically against agent inputs and outputs — before passing data to the model, or before returning results to the caller. Think of a guardrail as a special tool that the runtime calls, rather than one the model decides to call.
Guardrails share the same invocation architecture as tools: they receive a structured payload, call a transport backend, and return a structured result. The key differences from tools are:
- Invoked by the runtime, not the model. The model is unaware that guardrails are running.
- Description is informational only. Unlike tool descriptions,
meta.descriptionis never injected into the model's context. - Input is implicit. The agent runtime automatically passes parameters of compatible content types from the agent's interface to the guardrail — no input schema needs to be declared.
- Result determines pipeline behavior. The guardrail's output drives a decision: halt, transform the payload, annotate it, or continue.
A guardrail definition file maps a stable identifier (e.g., pii-scan) to a transport backend, invocation policy, and result type. Agent files reference guardrails by ID — they never copy guardrail configuration inline. When a guardrail backend changes, only the guardrail definition file changes; the compiler re-validates all referencing agents automatically.
How guardrails are invoked¶
When the agent runtime reaches a guardrail-instrumented pipeline position (defined in the agent's guardrails section), it passes the relevant content as the guardrail's input payload. Four positions are supported:
| Position | Content scanned | Primary threat addressed |
|---|---|---|
input |
Agent input fields matching behaviour.content_types, before the model sees them |
Direct injection, unsafe user input |
tool_input |
Tool call arguments generated by the model, before dispatch | Injected instructions crafted by a compromised model |
tool_output |
Tool results returned to the runtime, before the model processes them | Indirect prompt injection via retrieved content |
output |
Agent output fields matching behaviour.content_types, before returning to the caller |
Model-generated harmful content, PII leakage |
For input and output positions, the runtime collects all parameters from the agent's interface that match the guardrail's declared behaviour.content_types:
- A guardrail with
behaviour.content_types: ["text"]receives allstring-typed fields from the agent's input or output object. - A guardrail with
behaviour.content_types: ["image"]receives all binary image-typed fields. - A guardrail with
behaviour.content_types: ["text", "image"]receives both.
For tool_input and tool_output positions, the content is the serialised tool call arguments (as a JSON object) or the tool result (as a JSON object). The behaviour.content_types matching logic applies the same way — a guardrail with content_types: ["text"] receives string-valued fields from the tool argument or result object.
If no parameters at the attached position match the guardrail's behaviour.content_types, it is a hard validation error at compile time.
The guardrail receives a normalized input object of the form:
{
"content": { "<field_name>": "<field_value>", ... },
"position": "input",
"agent_id": "support-agent",
"run_id": "run-abc123"
}
The guardrail response follows the standard output format described below.
Design decisions¶
AML's guardrail spec makes two deliberate choices that diverge from how managed AI platforms typically expose guardrails. This section explains the reasoning so the constraints are understood rather than just followed.
Only rest-api and lambda are supported transport types¶
Platforms such as AWS Bedrock, Azure AI Content Safety, and GCP Natural Language API offer native guardrail or moderation endpoints with their own request/response shapes, versioning schemes, and severity scales. Supporting them as first-class transport types would require the AML runtime to understand each provider's response envelope — and update that logic every time a provider changes their API.
Instead, AML defines a standard output format and requires all guardrail backends to return that shape. Provider-specific calls — including any normalisation from a provider's native scale (e.g., Azure's 0–6) to AML's 0–10 scale — belong in a Lambda function or REST API that the team owns and versions independently.
This makes the runtime simple and stable. It also makes the normalisation logic explicit, testable, and auditable: it lives in a Lambda you control, not inside the platform.
Using AWS Bedrock, Azure or GCP?
Write a Lambda that calls the provider API, maps the response to the AML output format, and configure this guardrail to call that Lambda. The Lambda's IAM role can hold the provider credentials. The Lambda ARN is pinned in the guardrail definition — versioning and rollback are handled via Lambda aliases or versions.
Built-in guardrails omit transport — no special type needed¶
Platform-built checks (jailbreak detection, token budget enforcement, etc.) follow the same pattern as tool type: "function": they are in-process by definition and do not need an external call. Declaring their absence via a transport: { type: "platform-native" } block would be noise — it adds syntax to say "no transport".
The rule is simple: if transport is present, the guardrail makes an external call and invocation is required. If transport is absent, the runtime invokes it in-process. No special transport type, no exception-handling in the schema.
File structure¶
---
[YAML front matter — all structured fields]
---
[optional prose — what this guardrail checks, when to use it, usage guidance]
The Markdown body is optional and intended for the guardrail registry documentation UI. It does not affect runtime behavior.
YAML front matter — complete field reference¶
Top-level required fields¶
The AML format version. Must equal a platform-approved version string. Stable, immutable identifier for this guardrail entry. Used as the value ofref in agent definition guardrails sections. Lowercase kebab-case. Must match ^[a-z0-9_-]{3,64}$. Once published, the guardrail_id cannot change — create a new entry for a different check.
Semantic version of this guardrail definition (MAJOR.MINOR.PATCH). Increment whenever transport config or behavior changes.
Lifecycle state. Enum: active | deprecated | disabled. A deprecated guardrail triggers a lint warning on any agent that references it. A disabled guardrail causes a hard compile error.
meta — descriptive metadata (required)¶
meta:
name: "PII Scan (Bedrock)" # Required — display name in the registry UI
description: > # Optional — informational only, never sent to the model
Detects and redacts personally identifiable information in agent inputs
and outputs using AWS Bedrock Guardrails.
owner: "ai-safety-team" # Optional
tags: ["pii", "privacy", "gdpr"] # Optional — searchable labels
last_updated: "2026-04-12" # Optional — ISO 8601 date
meta.name is the only required sub-field. All others are optional but recommended.
meta.description is informational only — it is displayed in the guardrail registry UI and used for documentation, but is never injected into the model's context. The model has no knowledge that guardrails are running.
behaviour — guardrail contract (required)¶
Groups the two fields that define what this guardrail does and what content it can process.
behaviour.result_type declares what kind of output this guardrail produces. The runtime uses this to determine how to interpret the result and which call-site parameters in the agent definition are valid.
result_type |
What it returns | Effect | Typical use cases |
|---|---|---|---|
score |
A 0–10 severity integer | The agent compares the severity against its configured severity_threshold; if met or exceeded, on_fail triggers |
Content moderation, jailbreak detection, prompt injection |
transform |
Modified content | The original content is replaced with the guardrail's output before the pipeline continues | PII redaction, text anonymization |
annotate |
Metadata tags | Key-value tags attached to the payload for downstream routing or observability; the payload itself is unchanged | Language detection, topic classification |
enrich |
Content to append | Additional context injected alongside the existing payload | Disclaimer injection, compliance watermarking |
behaviour.content_types lists the content modalities this guardrail is able to process. Supported values: text | image | video | document.
The runtime uses this to match the guardrail against agent interface parameters. At the attached pipeline position, only parameters whose type matches an entry in behaviour.content_types are passed to the guardrail. Attaching a guardrail to a position where no parameters match its declared content types is a hard compile error. Note that types that can be easily converted to test like numbers are considered as text by guardrails.
transport — invocation details (required for external guardrails)¶
The transport block defines how the guardrail is called. Two transport types are supported, both identical in definition to their tool transport counterparts:
| Type | Description |
|---|---|
rest-api |
HTTP/REST endpoint |
lambda |
AWS Lambda function |
Refer to Transport & Credentials for the full field reference, credential schemes, and secret sources.
Provider-specific backends (AWS Bedrock Guardrails, Azure AI Content Safety, GCP Natural Language, etc.) are not directly supported as transport types. Wrap them in a Lambda function or REST API that calls the provider and returns a response conforming to the standard output format. This keeps the AML runtime agnostic to provider-specific response envelopes and normalisation logic.
Platform-built checks follow the same pattern as tool type: "function": omit the transport block entirely. When transport is absent, the runtime invokes the guardrail in-process. Built-in guardrails never time out, never require credentials, and do not need an invocation block.
invocation — execution settings (required when transport is present)¶
invocation:
timeout_ms: 300
on_timeout:
severity: 10 # Severity returned to the agent if the call times out
on_provider_error:
severity: 10 # Severity returned to the agent on provider 5xx or network failure
retry_policy:
max_attempts: 2
backoff_ms: 100
| Field | Default | Description |
|---|---|---|
timeout_ms |
500 |
Maximum time in milliseconds to wait for the transport response. |
on_timeout.severity |
10 |
The severity value (0–10) returned to the agent when the call times out before the retry policy is exhausted. |
on_provider_error.severity |
10 |
The severity value (0–10) returned to the agent on a provider error (5xx, network failure) after retries. |
retry_policy.max_attempts |
1 |
Total call attempts including the initial one. |
retry_policy.backoff_ms |
100 |
Initial backoff delay in milliseconds between retries (exponential). |
Severity-based error policy
Rather than a binary fail_open / fail_closed on the guardrail, error conditions return a synthetic severity score. The agent's configured severity_threshold determines whether that synthetic score triggers a pipeline action — all decision logic stays on the agent side.
The conventional mapping is:
| Synthetic severity | Equivalent intent |
|---|---|
10 |
Equivalent to fail_closed — any configured threshold will trigger |
0 |
Equivalent to fail_open — no threshold will trigger |
5 |
Permissive default — triggers on strict agents, passes on lenient ones |
For security-critical guardrails (PII, prompt injection, unsafe content) set both on_timeout.severity and on_provider_error.severity to 10. For supplementary or quality checks, consider 5 to allow lenient agents to continue while strict agents still block.
fallback — degraded-mode behavior (recommended for external transports)¶
fallback:
enabled: true
fallback_guardrail_id: "pii-scan-lite" # guardrail_id of the fallback guardrail
emit_warning: true # Log a warning event when fallback activates
| Field | Default | Description |
|---|---|---|
enabled |
false |
Whether to activate a fallback when the primary transport is unavailable. |
fallback_guardrail_id |
— | The guardrail_id of the guardrail to invoke as a fallback. The fallback guardrail's own definition — including its transport, invocation policy, and error behavior — governs how it is called. |
emit_warning |
true |
Emit a structured warning event to the audit log when the fallback activates. |
The fallback guardrail is typically a built-in (no transport) check that provides lighter-weight coverage and never fails. Its behaviour.result_type must match the primary guardrail's.
Standard guardrail output format¶
AML defines a canonical response format for all guardrails. The runtime enforces this format for agents compiled and deployed with AML Studio. When implementing a custom guardrail transport manually (e.g., a rest-api or lambda backend), the service should return a response conforming to this shape.
{
"result_type": "score",
"severity": 7,
"triggered": true,
"category_scores": {
"violence": 7,
"harassment": 1
},
"content": null,
"annotations": null,
"enrichment": null,
"raw": { }
}
| Field | Present when | Description |
|---|---|---|
result_type |
Always | Mirrors the guardrail's declared result_type. |
severity |
score |
Normalized 0–10 severity. Provider-native scales (e.g., Azure 0–6, OpenAI 0–1) are mapped to 0–10 by the runtime. |
triggered |
score |
true if severity meets or exceeds the agent's configured severity_threshold. |
category_scores |
score (optional) |
Per-category severity map (0–10 each). Useful for audit and routing decisions. |
content |
transform |
The modified content that replaces the original field value in the pipeline. |
annotations |
annotate |
Key-value metadata tags added to the payload. Downstream agents or routing logic may read these. |
enrichment |
enrich |
Content to be appended alongside the existing payload before the next pipeline stage. |
raw |
Always | Full provider-native response, preserved for audit trail and debugging. |
Agent call-site reference¶
The guardrails section in an agent definition attaches guardrails to pipeline positions. The same per-result_type parameters apply at all four positions (input, tool_input, tool_output, output). This section documents the expected call-site configuration for each type.
Note
The agent definition specification (01-agent-definition.md) is the authoritative reference for the guardrails section structure. This section documents the parameters that are specific to each guardrail result_type.
score guardrails¶
guardrails:
input:
- ref: "content-moderation"
severity_threshold: 3 # Trigger on_fail if the returned severity >= 3
on_fail: "block" # block | warn | log | escalate
tool_output:
- ref: "indirect-injection-scan"
severity_threshold: 6
on_fail: "block"
output:
- ref: "content-moderation"
severity_threshold: 6 # More permissive on output
on_fail: "warn"
| Parameter | Required | Description |
|---|---|---|
severity_threshold |
yes | Integer 0–10. If the guardrail's returned severity is greater than or equal to this value, on_fail is triggered. |
on_fail |
yes | block — halt the pipeline and return an error to the caller. warn — log a warning event and continue. log — silently record the event and continue. escalate — route the run to the matching rule in orchestration.escalation. |
transform guardrails¶
guardrails:
input:
- ref: "pii-redact"
on_fail: "apply" # apply | reject
tool_output:
- ref: "pii-redact" # strip PII from tool results before model sees them
on_fail: "apply"
| Parameter | Required | Description |
|---|---|---|
on_fail |
yes | apply — replace the original content with the guardrail's transformed output and continue. reject — if there is any transformation given or the transformation fails or cannot be applied, halt the pipeline. |
annotate and enrich guardrails¶
| Parameter | Required | Description |
|---|---|---|
on_fail |
yes | skip — if the guardrail fails, continue the pipeline without the annotation or enrichment. fail_closed — if the guardrail fails, halt the pipeline. |
Validation rules¶
Hard validation failures¶
- Missing any required field (
spec_version,guardrail_id,version,status,behaviour,meta.name). behaviour.result_typeis not one ofscore|transform|annotate|enrich.behaviour.content_typescontains a value other thantext|image|video|document.transport.typeis present but not one ofrest-api|lambda.transport.credentialsmissing whentransportis present.invocationmissing whentransportis present.on_timeout.severityoron_provider_error.severityis outside the 0–10 range.fallback.fallback_guardrail_iddoes not resolve to a registered guardrail.- Agent attaches a guardrail to a position where no parameters at that position match its
behaviour.content_types.
Recommended lint rules¶
on_timeout.severityoron_provider_error.severityset to0for ascoreguardrail (effectivelyfail_openfor all agents).fallback.enabled: falsefor guardrails with atransportblock.status: "deprecated"without ameta.last_updateddate.- Agent call-site
severity_thresholdabsent for ascoreguardrail.
Full example¶
---
spec_version: "1.2"
guardrail_id: "pii-scan"
version: "1.0.0"
status: "active"
behaviour:
result_type: "transform"
content_types: ["text"]
meta:
name: "PII Scan (Bedrock)"
owner: "ai-safety-team"
last_updated: "2026-04-12"
description: >
Detects and redacts personally identifiable information — names, email addresses,
phone numbers, national identifiers, and similar — using AWS Bedrock Guardrails.
Suitable for agents operating under GDPR, HIPAA, or any data policy that restricts
exposure of personal data.
tags: ["pii", "privacy", "gdpr", "hipaa"]
transport:
type: "lambda"
function_arn: "arn:aws:lambda:eu-west-1:123456789:function:pii-scan-v3"
invocation_type: "RequestResponse"
payload_format: "json"
credentials:
scheme: "iam-role"
invocation:
timeout_ms: 300
on_timeout:
severity: 10
on_provider_error:
severity: 10
retry_policy:
max_attempts: 2
backoff_ms: 100
fallback:
enabled: true
fallback_guardrail_id: "pii-scan-lite"
emit_warning: true
---
Uses AWS Bedrock Guardrails to detect and redact personally identifiable information
before the model processes input and before responses are returned to callers.
## Recommended agent configuration
```yaml
guardrails:
input:
- ref: "pii-scan"
on_fail: "apply" # Apply the redacted version before the model sees the data
output:
- ref: "pii-scan"
on_fail: "reject" # If output redaction fails, halt rather than leak PII
Relationship to policies.pii_redaction¶
The policies.pii_redaction: true flag in the agent definition and a ref: "pii-scan" guardrail can coexist and both run. They serve complementary roles:
policies.pii_redaction: true |
ref: "pii-scan" guardrail |
|
|---|---|---|
| What runs it | Platform-built redaction pass (always available) | External transport (Bedrock, Azure, etc.) |
| Configuration | None — on or off | Full transport config, version pinning, fallback |
| Precision | Broad coverage, heuristic-based | Provider-specific, often tunable |
| Failure mode | Never errors (platform-built) | Can fail — governed by on_provider_error |
In a defense-in-depth configuration, set both: pii_redaction: true for baseline coverage that never fails, and a transport-backed guardrail for higher-precision detection on sensitive workflows.