Content Moderation (Azure AI Content Safety)¶
Uses Azure AI Content Safety to evaluate agent inputs and outputs against four harm categories: Hate, SelfHarm, Sexual, and Violence. Each category is scored on a 0–6 severity scale; this guardrail blocks any request or response that meets or exceeds the configured threshold for any category.
When to use this guardrail¶
Attach to the input position to refuse harmful or off-topic user requests before they
reach the model. Attach to the output position to catch any harmful content the model
may generate, for example from adversarial prompts or retrieved document fragments.
Use both positions for customer-facing agents (support bots, public chat interfaces) where any exposure to harmful output carries reputational or legal risk.
Recommended agent configuration¶
guardrails:
input:
- ref: "azure-content-moderation"
severity_threshold: 5 # Trigger on moderate+ content (AML normalized 0–10)
on_fail: "block"
output:
- ref: "azure-content-moderation"
severity_threshold: 5
on_fail: "block"
Adjust severity_threshold to match your product's content policy:
- 3 — strict: block anything above very mild
- 5 — balanced: block moderate and above (recommended default)
- 7 — permissive: block only severe content
Severity scale mapping¶
Azure Content Safety returns a 0–6 severity per category. The Lambda wrapper normalizes this to AML's 0–10 scale before returning the result:
| Azure severity | AML severity | Meaning |
|---|---|---|
| 0 | 0–2 | Safe |
| 2 | 3–4 | Low |
| 4 | 5–6 | Medium |
| 6 | 8–10 | High |
Relationship to other guardrails¶
This guardrail and bedrock-pii-scan serve different purposes and can be stacked:
guardrails:
input:
- ref: "bedrock-pii-scan"
on_fail: "apply" # Redact PII before content moderation runs
- ref: "azure-content-moderation"
severity_threshold: 5
on_fail: "block" # Block if moderation score >= 5
PII redaction runs first to strip personal data, then content moderation checks the sanitised input for harmful content. Both guardrails must pass before the model receives the request.