Introduction

FloTorch Guardrails are comprehensive content filtering and safety mechanisms that protect your AI applications from malicious requests and inappropriate content. With FloTorch Guardrails, you can apply robust content filtering to any model without worrying about provider support or requiring code changes.

FloTorch provides multiple guardrail types out of the box and supports integration with provider-specific guardrails. You can also create your own custom guardrails tailored to your specific needs.

Supported Guardrail Types

Currently, FloTorch supports the following guardrail types:

Keyword Filter: Block or filter content based on specific keywords or phrases
Regex Filter: Use regular expressions to detect and filter patterns like SSNs, phone numbers, credit cards, etc.
Bedrock Guardrails: Integration with AWS Bedrock’s native guardrail capabilities

Guardrail Actions

When a guardrail detects matching content, it can take one of the following actions:

Block: Completely prevent the request/response from proceeding
Redact: Remove or mask the sensitive content while allowing the rest to proceed
Replace: Substitute the sensitive content with predefined replacement text (available for custom guardrails only)
Log: Record the incident for monitoring and auditing purposes

Note: Provider guardrails (e.g., AWS Bedrock) may support a subset of these actions. For example, Bedrock guardrails support Block, Redact, and Log, but not Replace.

Guardrail Severity Levels

Guardrails can be configured with different severity levels to help you prioritize and categorize content issues:

Low: Minor content issues that should be monitored
Medium: Moderate content concerns that require attention
High: Serious content violations that need immediate action
Critical: Severe content threats that must be blocked immediately (available for custom guardrails only)

Note: Provider guardrails (e.g., AWS Bedrock) may support a subset of these severity levels. For example, Bedrock guardrails support Low, Medium, and High, but not Critical.

Guardrail Hooks

Guardrails can be applied at different points in the request/response lifecycle:

Input: Filter incoming user requests before they reach the model
Output: Filter model responses before they are sent to users
Input-Output: Apply filtering to both incoming requests and outgoing responses

Guardrail Status

Guardrails can have the following status values:

Active: The guardrail is active and will be applied when configured on models
Inactive: The guardrail is inactive and will not be applied
Disabled: The guardrail is disabled (reserved for system use)

You can set a guardrail to Inactive status to temporarily disable it without removing it from models.

Pre-built Templates

FloTorch includes several pre-built guardrail templates to get you started quickly:

SSN Detection: Blocks Social Security Numbers in various formats (XXX-XX-XXXX, XXXXXXXXX, XXX XX XXXX)
Phone Number Detection: Filters phone numbers in U.S. and India formats

All guardrails are available in the Guardrails section of the FloTorch Console. When creating or configuring a FloTorch Model, you can select which guardrails to apply and configure their behavior.