Guardrail Definition Specification¶

File naming: guardrails/<guardrail_id>.guardrail.md

Audience: Platform engineers, AI safety teams, product owners

Overview¶

A note on naming: The term guardrail is borrowed from the AI safety world, where it primarily connotes blocking and filtering. In AML, the term is intentionally broader: a guardrail is any runtime-invoked backend that inspects, transforms, tags, or enriches agent data. Use cases range from PII redaction and content moderation to topic classification, sentiment scoring, and metadata injection. If the name feels too narrow for your use case, the underlying concept is simply a runtime pipeline hook.

A guardrail is a special-purpose backend that the agent runtime invokes automatically against agent inputs and outputs — before passing data to the model, or before returning results to the caller. Think of a guardrail as a special tool that the runtime calls, rather than one the model decides to call.

Guardrails share the same invocation architecture as tools: they receive a structured payload, call a transport backend, and return a structured result. The key differences from tools are:

Invoked by the runtime, not the model. The model is unaware that guardrails are running.
Description is informational only. Unlike tool descriptions, meta.description is never injected into the model's context.
Input is implicit. The agent runtime automatically passes parameters of compatible content types from the agent's interface to the guardrail — no input schema needs to be declared.
Result determines pipeline behavior. The guardrail's output drives a decision: halt, transform the payload, annotate it, or continue.

A guardrail definition file maps a stable identifier (e.g., pii-scan) to a transport backend, invocation policy, and result type. Agent files reference guardrails by ID — they never copy guardrail configuration inline. When a guardrail backend changes, only the guardrail definition file changes; the compiler re-validates all referencing agents automatically.

How guardrails are invoked¶

When the agent runtime reaches a guardrail-instrumented pipeline position (defined in the agent's guardrails section), it passes the relevant content as the guardrail's input payload. Four positions are supported:

Position	Content scanned	Primary threat addressed
`input`	Agent input fields matching `behaviour.content_types`, before the model sees them	Direct injection, unsafe user input
`tool_input`	Tool call arguments generated by the model, before dispatch	Injected instructions crafted by a compromised model
`tool_output`	Tool results returned to the runtime, before the model processes them	Indirect prompt injection via retrieved content
`output`	Agent output fields matching `behaviour.content_types`, before returning to the caller	Model-generated harmful content, PII leakage

For input and output positions, the runtime collects all parameters from the agent's interface that match the guardrail's declared behaviour.content_types:

A guardrail with behaviour.content_types: ["text"] receives all string-typed fields from the agent's input or output object.
A guardrail with behaviour.content_types: ["image"] receives all binary image-typed fields.
A guardrail with behaviour.content_types: ["text", "image"] receives both.

For tool_input and tool_output positions, the content is the serialised tool call arguments (as a JSON object) or the tool result (as a JSON object). The behaviour.content_types matching logic applies the same way — a guardrail with content_types: ["text"] receives string-valued fields from the tool argument or result object.

If no parameters at the attached position match the guardrail's behaviour.content_types, it is a hard validation error at compile time.

The guardrail receives a normalized input object of the form:

{
  "content": { "<field_name>": "<field_value>", ... },
  "position": "input",
  "agent_id": "support-agent",
  "run_id": "run-abc123"
}

The guardrail response follows the standard output format described below.

Design decisions¶

AML's guardrail spec makes two deliberate choices that diverge from how managed AI platforms typically expose guardrails. This section explains the reasoning so the constraints are understood rather than just followed.

Only `rest-api` and `lambda` are supported transport types¶

Platforms such as AWS Bedrock, Azure AI Content Safety, and GCP Natural Language API offer native guardrail or moderation endpoints with their own request/response shapes, versioning schemes, and severity scales. Supporting them as first-class transport types would require the AML runtime to understand each provider's response envelope — and update that logic every time a provider changes their API.

Instead, AML defines a standard output format and requires all guardrail backends to return that shape. Provider-specific calls — including any normalisation from a provider's native scale (e.g., Azure's 0–6) to AML's 0–10 scale — belong in a Lambda function or REST API that the team owns and versions independently.

This makes the runtime simple and stable. It also makes the normalisation logic explicit, testable, and auditable: it lives in a Lambda you control, not inside the platform.

Using AWS Bedrock, Azure or GCP?

Write a Lambda that calls the provider API, maps the response to the AML output format, and configure this guardrail to call that Lambda. The Lambda's IAM role can hold the provider credentials. The Lambda ARN is pinned in the guardrail definition — versioning and rollback are handled via Lambda aliases or versions.

Built-in guardrails omit `transport` — no special type needed¶

Platform-built checks (jailbreak detection, token budget enforcement, etc.) follow the same pattern as tool type: "function": they are in-process by definition and do not need an external call. Declaring their absence via a transport: { type: "platform-native" } block would be noise — it adds syntax to say "no transport".

The rule is simple: if transport is present, the guardrail makes an external call and invocation is required. If transport is absent, the runtime invokes it in-process. No special transport type, no exception-handling in the schema.

File structure¶

---
[YAML front matter — all structured fields]
---

[optional prose — what this guardrail checks, when to use it, usage guidance]

The Markdown body is optional and intended for the guardrail registry documentation UI. It does not affect runtime behavior.

YAML front matter — complete field reference¶

Top-level required fields¶

spec_version: "1.2"

The AML format version. Must equal a platform-approved version string.

guardrail_id: "pii-scan"

Stable, immutable identifier for this guardrail entry. Used as the value of ref in agent definition guardrails sections. Lowercase kebab-case. Must match ^[a-z0-9_-]{3,64}$. Once published, the guardrail_id cannot change — create a new entry for a different check.

version: "1.0.0"

Semantic version of this guardrail definition (MAJOR.MINOR.PATCH). Increment whenever transport config or behavior changes.

status: "active"

Lifecycle state. Enum: active | deprecated | disabled. A deprecated guardrail triggers a lint warning on any agent that references it. A disabled guardrail causes a hard compile error.

`meta` — descriptive metadata (required)¶

meta:
  name: "PII Scan (Bedrock)"              # Required — display name in the registry UI
  description: >                          # Optional — informational only, never sent to the model
    Detects and redacts personally identifiable information in agent inputs
    and outputs using AWS Bedrock Guardrails.
  owner: "ai-safety-team"                 # Optional
  tags: ["pii", "privacy", "gdpr"]        # Optional — searchable labels
  last_updated: "2026-04-12"             # Optional — ISO 8601 date

meta.name is the only required sub-field. All others are optional but recommended.

meta.description is informational only — it is displayed in the guardrail registry UI and used for documentation, but is never injected into the model's context. The model has no knowledge that guardrails are running.

`behaviour` — guardrail contract (required)¶

behaviour:
  result_type: "score"
  content_types: ["text"]

Groups the two fields that define what this guardrail does and what content it can process.

behaviour.result_type declares what kind of output this guardrail produces. The runtime uses this to determine how to interpret the result and which call-site parameters in the agent definition are valid.

`result_type`	What it returns	Effect	Typical use cases
`score`	A 0–10 severity integer	The agent compares the severity against its configured `severity_threshold`; if met or exceeded, `on_fail` triggers	Content moderation, jailbreak detection, prompt injection
`transform`	Modified content	The original content is replaced with the guardrail's output before the pipeline continues	PII redaction, text anonymization
`annotate`	Metadata tags	Key-value tags attached to the payload for downstream routing or observability; the payload itself is unchanged	Language detection, topic classification
`enrich`	Content to append	Additional context injected alongside the existing payload	Disclaimer injection, compliance watermarking

behaviour.content_types lists the content modalities this guardrail is able to process. Supported values: text | image | video | document.

The runtime uses this to match the guardrail against agent interface parameters. At the attached pipeline position, only parameters whose type matches an entry in behaviour.content_types are passed to the guardrail. Attaching a guardrail to a position where no parameters match its declared content types is a hard compile error. Note that types that can be easily converted to test like numbers are considered as text by guardrails.

`transport` — invocation details (required for external guardrails)¶

The transport block defines how the guardrail is called. Two transport types are supported, both identical in definition to their tool transport counterparts:

Type	Description
`rest-api`	HTTP/REST endpoint
`lambda`	AWS Lambda function

Refer to Transport & Credentials for the full field reference, credential schemes, and secret sources.

Provider-specific backends (AWS Bedrock Guardrails, Azure AI Content Safety, GCP Natural Language, etc.) are not directly supported as transport types. Wrap them in a Lambda function or REST API that calls the provider and returns a response conforming to the standard output format. This keeps the AML runtime agnostic to provider-specific response envelopes and normalisation logic.

Platform-built checks follow the same pattern as tool type: "function": omit the transport block entirely. When transport is absent, the runtime invokes the guardrail in-process. Built-in guardrails never time out, never require credentials, and do not need an invocation block.

`invocation` — execution settings (required when `transport` is present)¶

invocation:
  timeout_ms: 300
  on_timeout:
    severity: 10                  # Severity returned to the agent if the call times out
  on_provider_error:
    severity: 10                  # Severity returned to the agent on provider 5xx or network failure
  retry_policy:
    max_attempts: 2
    backoff_ms: 100

Field	Default	Description
`timeout_ms`	`500`	Maximum time in milliseconds to wait for the transport response.
`on_timeout.severity`	`10`	The severity value (0–10) returned to the agent when the call times out before the retry policy is exhausted.
`on_provider_error.severity`	`10`	The severity value (0–10) returned to the agent on a provider error (5xx, network failure) after retries.
`retry_policy.max_attempts`	`1`	Total call attempts including the initial one.
`retry_policy.backoff_ms`	`100`	Initial backoff delay in milliseconds between retries (exponential).

Severity-based error policy

Rather than a binary fail_open / fail_closed on the guardrail, error conditions return a synthetic severity score. The agent's configured severity_threshold determines whether that synthetic score triggers a pipeline action — all decision logic stays on the agent side.

The conventional mapping is:

Synthetic severity	Equivalent intent
`10`	Equivalent to `fail_closed` — any configured threshold will trigger
`0`	Equivalent to `fail_open` — no threshold will trigger
`5`	Permissive default — triggers on strict agents, passes on lenient ones

For security-critical guardrails (PII, prompt injection, unsafe content) set both on_timeout.severity and on_provider_error.severity to 10. For supplementary or quality checks, consider 5 to allow lenient agents to continue while strict agents still block.

`fallback` — degraded-mode behavior (recommended for external transports)¶

fallback:
  enabled: true
  fallback_guardrail_id: "pii-scan-lite"   # guardrail_id of the fallback guardrail
  emit_warning: true                       # Log a warning event when fallback activates

Field	Default	Description
`enabled`	`false`	Whether to activate a fallback when the primary transport is unavailable.
`fallback_guardrail_id`	—	The `guardrail_id` of the guardrail to invoke as a fallback. The fallback guardrail's own definition — including its transport, invocation policy, and error behavior — governs how it is called.
`emit_warning`	`true`	Emit a structured warning event to the audit log when the fallback activates.

The fallback guardrail is typically a built-in (no transport) check that provides lighter-weight coverage and never fails. Its behaviour.result_type must match the primary guardrail's.

Standard guardrail output format¶

AML defines a canonical response format for all guardrails. The runtime enforces this format for agents compiled and deployed with AML Studio. When implementing a custom guardrail transport manually (e.g., a rest-api or lambda backend), the service should return a response conforming to this shape.

{
  "result_type": "score",
  "severity": 7,
  "triggered": true,
  "category_scores": {
    "violence": 7,
    "harassment": 1
  },
  "content": null,
  "annotations": null,
  "enrichment": null,
  "raw": { }
}

Field	Present when	Description
`result_type`	Always	Mirrors the guardrail's declared `result_type`.
`severity`	`score`	Normalized 0–10 severity. Provider-native scales (e.g., Azure 0–6, OpenAI 0–1) are mapped to 0–10 by the runtime.
`triggered`	`score`	`true` if `severity` meets or exceeds the agent's configured `severity_threshold`.
`category_scores`	`score` (optional)	Per-category severity map (0–10 each). Useful for audit and routing decisions.
`content`	`transform`	The modified content that replaces the original field value in the pipeline.
`annotations`	`annotate`	Key-value metadata tags added to the payload. Downstream agents or routing logic may read these.
`enrichment`	`enrich`	Content to be appended alongside the existing payload before the next pipeline stage.
`raw`	Always	Full provider-native response, preserved for audit trail and debugging.

Agent call-site reference¶

The guardrails section in an agent definition attaches guardrails to pipeline positions. The same per-result_type parameters apply at all four positions (input, tool_input, tool_output, output). This section documents the expected call-site configuration for each type.

Note

The agent definition specification (01-agent-definition.md) is the authoritative reference for the guardrails section structure. This section documents the parameters that are specific to each guardrail result_type.

`score` guardrails¶

guardrails:
  input:
    - ref: "content-moderation"
      severity_threshold: 3        # Trigger on_fail if the returned severity >= 3
      on_fail: "block"             # block | warn | log | escalate
  tool_output:
    - ref: "indirect-injection-scan"
      severity_threshold: 6
      on_fail: "block"
  output:
    - ref: "content-moderation"
      severity_threshold: 6        # More permissive on output
      on_fail: "warn"

Parameter	Required	Description
`severity_threshold`	yes	Integer 0–10. If the guardrail's returned severity is greater than or equal to this value, `on_fail` is triggered.
`on_fail`	yes	`block` — halt the pipeline and return an error to the caller. `warn` — log a warning event and continue. `log` — silently record the event and continue. `escalate` — route the run to the matching rule in `orchestration.escalation`.

`transform` guardrails¶

guardrails:
  input:
    - ref: "pii-redact"
      on_fail: "apply"             # apply | reject
  tool_output:
    - ref: "pii-redact"            # strip PII from tool results before model sees them
      on_fail: "apply"

Parameter	Required	Description
`on_fail`	yes	`apply` — replace the original content with the guardrail's transformed output and continue. `reject` — if there is any transformation given or the transformation fails or cannot be applied, halt the pipeline.

`annotate` and `enrich` guardrails¶

guardrails:
  output:
    - ref: "language-detect"
      on_fail: "skip"              # skip | fail_closed

Parameter	Required	Description
`on_fail`	yes	`skip` — if the guardrail fails, continue the pipeline without the annotation or enrichment. `fail_closed` — if the guardrail fails, halt the pipeline.

Validation rules¶

Hard validation failures¶

Missing any required field (spec_version, guardrail_id, version, status, behaviour, meta.name).
behaviour.result_type is not one of score | transform | annotate | enrich.
behaviour.content_types contains a value other than text | image | video | document.
transport.type is present but not one of rest-api | lambda.
transport.credentials missing when transport is present.
invocation missing when transport is present.
on_timeout.severity or on_provider_error.severity is outside the 0–10 range.
fallback.fallback_guardrail_id does not resolve to a registered guardrail.
Agent attaches a guardrail to a position where no parameters at that position match its behaviour.content_types.

Recommended lint rules¶

on_timeout.severity or on_provider_error.severity set to 0 for a score guardrail (effectively fail_open for all agents).
fallback.enabled: false for guardrails with a transport block.
status: "deprecated" without a meta.last_updated date.
Agent call-site severity_threshold absent for a score guardrail.

Full example¶

---
spec_version: "1.2"
guardrail_id: "pii-scan"
version: "1.0.0"
status: "active"

behaviour:
  result_type: "transform"
  content_types: ["text"]

meta:
  name: "PII Scan (Bedrock)"
  owner: "ai-safety-team"
  last_updated: "2026-04-12"
  description: >
    Detects and redacts personally identifiable information — names, email addresses,
    phone numbers, national identifiers, and similar — using AWS Bedrock Guardrails.
    Suitable for agents operating under GDPR, HIPAA, or any data policy that restricts
    exposure of personal data.
  tags: ["pii", "privacy", "gdpr", "hipaa"]

transport:
  type: "lambda"
  function_arn: "arn:aws:lambda:eu-west-1:123456789:function:pii-scan-v3"
  invocation_type: "RequestResponse"
  payload_format: "json"
  credentials:
    scheme: "iam-role"

invocation:
  timeout_ms: 300
  on_timeout:
    severity: 10
  on_provider_error:
    severity: 10
  retry_policy:
    max_attempts: 2
    backoff_ms: 100

fallback:
  enabled: true
  fallback_guardrail_id: "pii-scan-lite"
  emit_warning: true
---

Uses AWS Bedrock Guardrails to detect and redact personally identifiable information
before the model processes input and before responses are returned to callers.

## Recommended agent configuration

```yaml
guardrails:
  input:
    - ref: "pii-scan"
      on_fail: "apply"          # Apply the redacted version before the model sees the data
  output:
    - ref: "pii-scan"
      on_fail: "reject"         # If output redaction fails, halt rather than leak PII

Relationship to `policies.pii_redaction`¶

The policies.pii_redaction: true flag in the agent definition and a ref: "pii-scan" guardrail can coexist and both run. They serve complementary roles:

	`policies.pii_redaction: true`	`ref: "pii-scan"` guardrail
What runs it	Platform-built redaction pass (always available)	External transport (Bedrock, Azure, etc.)
Configuration	None — on or off	Full transport config, version pinning, fallback
Precision	Broad coverage, heuristic-based	Provider-specific, often tunable
Failure mode	Never errors (platform-built)	Can fail — governed by `on_provider_error`

In a defense-in-depth configuration, set both: pii_redaction: true for baseline coverage that never fails, and a transport-backed guardrail for higher-precision detection on sensitive workflows.

Guardrail Definition Specification¶

Overview¶

How guardrails are invoked¶

Design decisions¶

Only rest-api and lambda are supported transport types¶

Built-in guardrails omit transport — no special type needed¶

File structure¶

YAML front matter — complete field reference¶

Top-level required fields¶

meta — descriptive metadata (required)¶

behaviour — guardrail contract (required)¶

transport — invocation details (required for external guardrails)¶

invocation — execution settings (required when transport is present)¶

fallback — degraded-mode behavior (recommended for external transports)¶