Skip to content

Introduction

FloTorch Guardrails are comprehensive content filtering and safety mechanisms that protect your AI applications from malicious requests and inappropriate content. With FloTorch Guardrails, you can apply robust content filtering to any model without worrying about provider support or requiring code changes.

FloTorch provides multiple guardrail types out of the box and supports integration with provider-specific guardrails. You can also create your own custom guardrails tailored to your specific needs.

Currently, FloTorch supports the following guardrail types:

  • Keyword Filter: Block or filter content based on specific keywords or phrases
  • Regex Filter: Use regular expressions to detect and filter patterns like SSNs, phone numbers, credit cards, etc.
  • Bedrock Guardrails: Integration with AWS Bedrock’s native guardrail capabilities

When a guardrail detects matching content, it can take one of the following actions:

  • Block: Completely prevent the request/response from proceeding
  • Redact: Remove or mask the sensitive content while allowing the rest to proceed
  • Replace: Substitute the sensitive content with predefined replacement text (available for custom guardrails only)
  • Log: Record the incident for monitoring and auditing purposes

Note: Provider guardrails (e.g., AWS Bedrock) may support a subset of these actions. For example, Bedrock guardrails support Block, Redact, and Log, but not Replace.

Guardrails can be configured with different severity levels to help you prioritize and categorize content issues:

  • Low: Minor content issues that should be monitored
  • Medium: Moderate content concerns that require attention
  • High: Serious content violations that need immediate action
  • Critical: Severe content threats that must be blocked immediately (available for custom guardrails only)

Note: Provider guardrails (e.g., AWS Bedrock) may support a subset of these severity levels. For example, Bedrock guardrails support Low, Medium, and High, but not Critical.

Guardrails can be applied at different points in the request/response lifecycle:

  • Input: Filter incoming user requests before they reach the model
  • Output: Filter model responses before they are sent to users
  • Input-Output: Apply filtering to both incoming requests and outgoing responses

Guardrails can have the following status values:

  • Active: The guardrail is active and will be applied when configured on models
  • Inactive: The guardrail is inactive and will not be applied
  • Disabled: The guardrail is disabled (reserved for system use)

You can set a guardrail to Inactive status to temporarily disable it without removing it from models.

FloTorch includes several pre-built guardrail templates to get you started quickly:

  • SSN Detection: Blocks Social Security Numbers in various formats (XXX-XX-XXXX, XXXXXXXXX, XXX XX XXXX)
  • Phone Number Detection: Filters phone numbers in U.S. and India formats

All guardrails are available in the Guardrails section of the FloTorch Console. When creating or configuring a FloTorch Model, you can select which guardrails to apply and configure their behavior.