Skip to main content
Bifrost integrates with Azure AI Content Safety to provide multi-modal content moderation powered by Microsoft’s advanced AI models. This page covers the configuration and capabilities of the Azure Content Safety guardrail provider. Azure Content Safety configuration form

Capabilities

  • Severity-Based Filtering: 4-level severity classification (Safe, Low, Medium, High)
  • Multi-Category Detection: Hate, sexual, violence, self-harm content
  • Prompt Shield: Advanced jailbreak and injection detection
  • Indirect Attack Detection: Identify hidden malicious instructions
  • Protected Material: Detect copyrighted content (output only)
  • Custom Blocklists: Define organization-specific blocked terms

Configuration Fields

FieldTypeRequiredDefaultDescription
endpointstringYes-Azure Content Safety endpoint URL
api_keystringYes-Azure subscription key
analyze_enabledbooleanNotrueEnable content analysis for Hate, Sexual, Violence, SelfHarm
analyze_severity_thresholdenumNo”medium”Severity level to trigger: low, medium, or high
jailbreak_shield_enabledbooleanNofalseEnable jailbreak detection (input only)
indirect_attack_shield_enabledbooleanNofalseEnable indirect prompt attack detection (input only)
copyright_enabledbooleanNofalseEnable copyrighted content detection (output only)
text_blocklist_enabledbooleanNofalseEnable custom blocklist filtering
blocklist_namesarrayNo-List of Azure blocklist names to apply

Collecting your API key and URL

Navigate to Azure foundry dashboard
Azure foundry dashboard
  • Copy API key to use it in the Azure content moderation config form
  • Copy project endpoint and use base URL as endpoint in the form. e.g. (https://xxx-resource.services.ai.azure.com)

Severity Threshold Levels

ThresholdNumeric ValueBehavior
low2Most strict - blocks severity 2 and above
medium4Balanced - blocks severity 4 and above
high6Least strict - blocks only severity 6

Detection Categories

  • Hate and fairness
  • Sexual content
  • Violence
  • Self-harm
Input-only features: Jailbreak Shield and Indirect Attack Shield only apply to input validation. Output-only features: Copyright detection only applies to output validation.
For provider comparison and information on configuring guardrail rules and profiles, see Guardrails.