Introduction
FloTorch Guardrails are comprehensive content filtering and safety mechanisms that protect your AI applications from malicious requests and inappropriate content. With FloTorch Guardrails, you can apply robust content filtering to any model without worrying about provider support or requiring code changes.
FloTorch provides multiple guardrail types out of the box and supports integration with provider-specific guardrails. You can also create your own custom guardrails tailored to your specific needs.
Supported Guardrail Types
Section titled “Supported Guardrail Types”Currently, FloTorch supports the following guardrail types:
- Keyword Filter: Block or filter content based on specific keywords or phrases
- Regex Filter: Use regular expressions to detect and filter patterns like SSNs, phone numbers, credit cards, etc.
- Bedrock Guardrails: Integration with AWS Bedrock’s native guardrail capabilities
Guardrail Actions
Section titled “Guardrail Actions”When a guardrail detects matching content, it can take one of the following actions:
- Block: Completely prevent the request/response from proceeding
- Redact: Remove or mask the sensitive content while allowing the rest to proceed
- Replace: Substitute the sensitive content with predefined replacement text (available for custom guardrails only)
- Log: Record the incident for monitoring and auditing purposes
Note: Provider guardrails (e.g., AWS Bedrock) may support a subset of these actions. For example, Bedrock guardrails support Block, Redact, and Log, but not Replace.
Guardrail Severity Levels
Section titled “Guardrail Severity Levels”Guardrails can be configured with different severity levels to help you prioritize and categorize content issues:
- Low: Minor content issues that should be monitored
- Medium: Moderate content concerns that require attention
- High: Serious content violations that need immediate action
- Critical: Severe content threats that must be blocked immediately (available for custom guardrails only)
Note: Provider guardrails (e.g., AWS Bedrock) may support a subset of these severity levels. For example, Bedrock guardrails support Low, Medium, and High, but not Critical.
Guardrail Hooks
Section titled “Guardrail Hooks”Guardrails can be applied at different points in the request/response lifecycle:
- Input: Filter incoming user requests before they reach the model
- Output: Filter model responses before they are sent to users
- Input-Output: Apply filtering to both incoming requests and outgoing responses
Guardrail Status
Section titled “Guardrail Status”Guardrails can have the following status values:
- Active: The guardrail is active and will be applied when configured on models
- Inactive: The guardrail is inactive and will not be applied
- Disabled: The guardrail is disabled (reserved for system use)
You can set a guardrail to Inactive status to temporarily disable it without removing it from models.
Pre-built Templates
Section titled “Pre-built Templates”FloTorch includes several pre-built guardrail templates to get you started quickly:
- SSN Detection: Blocks Social Security Numbers in various formats (XXX-XX-XXXX, XXXXXXXXX, XXX XX XXXX)
- Phone Number Detection: Filters phone numbers in U.S. and India formats
All guardrails are available in the Guardrails section of the FloTorch Console. When creating or configuring a FloTorch Model, you can select which guardrails to apply and configure their behavior.