Models
Creating a Model
Section titled “Creating a Model”To create a model, click Create FloTorch Model in the top right corner of the FloTorch Console. Choose the model Type (Chat or Embedding), provide a Name and optional Description. The model name must be unique.
- Chat – After creation you are taken to the version configuration page; the model appears in Model Registry. A default version
1is created inDraftstate. Configure the version (router, cache, guardrails, providers) and publish it to make it available for routing requests through the Gateway. - Embedding – After creation the model appears in Model Registry and is ready to use for the embeddings API and vector storage; no version configuration or publish step is required. See Embeddings for details.
Model Versions
Section titled “Model Versions”Model versions are the individual configurations of a model. Each model can have multiple versions. Each version can have a different provider models, guardrails, router and cache configuration. Once a model version is published, it is immutable and cannot be changed.
A new version can be created by clicking on New Version button in the top right corner of the FloTorch Console or on Model version configure page by clicking on three dots button beside publish button which will open a dropdown which contains Create new version.
A new copy of a any published or draft version (make a revision) can be created.
To create a copy of published version, click on Make a revision button which is located top right of the page.
To create a copy of draft version, click on three dots button, which will open dropdown, select make a revision option.
Configuring a Model Version
Section titled “Configuring a Model Version”Model versions can be configured using below instructions.
- Identify the model you want to configure from the
Model Registrytable. - Click on
Actionsdropdown button and selectView Versions. - You will be see list of
Model versionsin slideover table. - Click on a version row to open that version’s configuration page, or click a row to go to the latest model version configure page for that model.
You will be presented with a model configuration canvas. You can configure the following:
- Input Guardrails – Rules or policies that control the input of the model.
- Router – Chooses which provider model handles each request. Available only when at least two provider models are added to the version. On the canvas, click the Router card to open the configuration slideover.
- Fallback – Set the order of models (Primary, then Fallback 1, 2, …). The gateway uses the next in line if the primary fails. See Fallback below for details.
- Round Robin – Requests are sent to each provider in turn. See Round Robin below for details.
- Weighted – Set a weight per model to distribute traffic (load balancing). See Weighted below for details.
- Smart Router – Incoming requests will be analyzed and routed to the best-fit model based on task complexity. When multiple models share the same complexity level, selection follows priority-ordered config filters (cost, keywords, schedule) and applies round-robin within the same priority. In the slideover, assign complexity levels to each model using the dropdown. Click the settings (gear) icon to configure optional conditions. See Smart Router below for details.
- Cache – Caches responses to reduce latency and cost. On the canvas, click the Cache card to open the configuration slideover (title Cache, “Configure the cache for the model”). Choose None, Simple, or Semantic. When you enable Simple or Semantic, set Minimum input tokens needed to generate cache (slider from 10 to 1000)—only requests with at least that many input tokens are cached. Simple caches by exact request match; Semantic caches by similarity so similar requests can hit the cache. Click Save Cache in the slideover footer to apply.
- Output Guardrails - A set of rules or policies to control the output of the model.
Once the model version is configured, you can publish it to make it available for routing requests through Gateway.
Adding Keyword-, Schedule-, and Cost-Based Routing
Section titled “Adding Keyword-, Schedule-, and Cost-Based Routing”Keyword, schedule, and cost rules are configured per provider model in the same modal. They optionally refine which model handles a request (e.g. by keywords, time windows, or budget).
- On the model version configuration canvas, add at least two provider models, then click the Router card. A slideover opens titled Router with the strategy options (Fallback, Round Robin, Weighted, Smart Router) and the list of models.
- Choose a strategy. For Fallback you can drag to reorder models (Primary, Fallback 1, …). For Weighted you set a weight per model. For Smart Router you assign complexity levels per model.
- Click the settings (gear) icon next to a model name in that slideover. A modal opens where you can configure routing conditions for that model.
- In the modal you can enable any of (evaluated in this order: cost first, then keyword, then schedule):
-
Cost-Based Routing (first priority) – Set a Budget ($) (monthly budget) and Alert Threshold (%) (1–100). Email notifications are sent when spending reaches the alert threshold and when the budget limit is exceeded. When the budget is exceeded, the model is excluded from routing. Use this to enforce budget limits per model.
-
Keyword-Based Routing – Route requests based on metadata key-value matching. Configure conditions with:
- Key: The metadata field name to match against
- Operator: Equals, Not Equals, Contains, Starts With, Ends With, Greater Than, Less Than, Is Null, Is Not Null, or Regex
- Value: Target values to match (comma-separated for multiple values)
Requests containing matching metadata will be routed to this model. Use Add Keyword Condition to add multiple rules (all conditions must match).
-
Schedule-Based Routing – Define time windows when this model should be available. Configure time slots with:
- Active From: Start time of the availability window (HH:MM format, 15-minute intervals)
- Active Until: End time of the availability window (HH:MM format, 15-minute intervals)
Use Add Time Slot to add multiple time windows. The model will only be selected for routing during the configured time slots.
-
- Click Save Configuration in the modal to save conditions for that model. Click Save Configuration in the Router slideover footer to save the routing strategy and order/weights.
Routing Strategies
Section titled “Routing Strategies”Routing Strategies Comparison
Section titled “Routing Strategies Comparison”| Strategy | Description | Use Case |
|---|---|---|
| Fallback | Tries configs in order until one succeeds | High availability, simple failover |
| Round Robin | Distributes requests evenly across all configs | Load balancing |
| Weighted | Distributes based on assigned weights | Traffic shaping, A/B testing |
| Smart Router | Routes based on complexity + conditions | Cost optimization, intelligent routing |
Fallback
Section titled “Fallback”The Fallback strategy provides high availability by trying models in a defined order. If the primary model fails, the gateway automatically tries the next model in the fallback chain.
How it works:
- Requests are sent to the Primary Model first
- If the primary fails (error, timeout, rate limit), the gateway tries Fallback 1
- If Fallback 1 fails, it tries Fallback 2, and so on
- The request fails only if all models in the chain fail
Configuration:
- Select Fallback as the routing strategy
- Drag models to set the order (Primary, Fallback 1, Fallback 2, …)
- Click Save Configuration
Best for: Mission-critical applications requiring high availability and automatic failover.
Round Robin
Section titled “Round Robin”The Round Robin strategy distributes requests evenly across all configured models in a rotating order.
How it works:
- Each incoming request is sent to the next model in the rotation
- After the last model, it cycles back to the first
- All models receive approximately equal traffic over time
Configuration:
- Select Round Robin as the routing strategy
- Optionally drag to reorder models
- Click Save Configuration
Best for: Load balancing across multiple equivalent models or providers.
Weighted
Section titled “Weighted”The Weighted strategy distributes traffic based on assigned weights, allowing fine-grained control over traffic distribution.
How it works:
- Each model is assigned a weight (e.g., Model A: 70, Model B: 30)
- Traffic is distributed proportionally to the weights
- Higher weight = more traffic
Configuration:
- Select Weighted as the routing strategy
- Set a Weight value for each model (minimum 1)
- Click Save Configuration
Example: With weights of 70 and 30, Model A receives ~70% of traffic and Model B receives ~30%.
Best for: A/B testing, gradual rollouts, or routing more traffic to preferred models.
Smart Router
Section titled “Smart Router”The Smart Router strategy automatically selects the optimal model based on query complexity and configurable conditions.
How it works:
- Incoming requests are analyzed by a complexity API (score 1–5)
- The score maps to a complexity level: Very Low, Low, Medium, High, Very High
- Models configured for that complexity level are selected
- You can assign multiple complexity levels to a single model to handle a range of request complexities
- Default complexity levels are suggested based on model capabilities, but you can update them as needed
- If multiple models match, conditions are evaluated in priority order: Cost → Keywords → Schedule
- If no exact match, the router falls back to the nearest complexity level
Complexity Levels:
| Level | Label | Typical Use Cases |
|---|---|---|
| 1 | Very Low | Simple greetings, basic Q&A |
| 2 | Low | Straightforward questions, simple tasks |
| 3 | Medium | Multi-step reasoning, moderate analysis |
| 4 | High | Complex analysis, detailed explanations |
| 5 | Very High | Advanced reasoning, multi-domain tasks |
Configuration:
- Select Smart Router as the routing strategy
- For each model, select one or more Complexity Levels from the dropdown
- Optionally click the settings (gear) icon to configure conditions (Cost, Keywords, Schedule)
- Click Save Configuration
Condition Priority:
| Priority | Condition | Description |
|---|---|---|
| 1 (Highest) | Cost | Budget check passes (monthly limit not exceeded) |
| 2 | Keywords | Request metadata matches keyword conditions |
| 3 | Schedule | Current time matches schedule conditions |
| 4 (Lowest) | None | No conditions configured |
Fallback behavior: When no exact complexity match exists, the router tries the nearest complexity levels in order of distance. When equidistant from two levels, it prefers the higher complexity level (better to over-provision than under-provision). When multiple models match at the same complexity and priority level, round-robin distributes requests evenly.
Best for: Cost optimization, routing simple queries to cheaper models and complex queries to advanced models.
Publishing a Model Version
Section titled “Publishing a Model Version”Model versions can be published using the Publish button which is located top right corner of the model version canvas.
Once you click Publish button a new form will be opened in slideover which contains summary of version and mark as latest checkbox.
If you check Mark as latest, the published version becomes the latest version of the model.
To publish versions that are still in draft, click the History button (before Publish/Make a revision). A slideover opens with the model version list. Use Publish for unpublished versions and Mark as latest to set that version as the latest.
Once a model version is published, it is immutable and cannot be changed.
The published model can be used in OpenAI API as flotorch/<model-name>:<version>
Publishing Latest Version of a Model
Section titled “Publishing Latest Version of a Model”You can publish the latest version of a model by doing this a model can be used in OpenAI API as flotorch/<model-name> without specifying the version. Internally it is tagged as latest.
- Identify the model you want to publish the latest version of.
- Click on
Actionsdropdown and selectPublish Latestto publish the latest version of the model. - A slideover opens with a version dropdown. Select the version to publish as latest and click Publish.
Similarly like above, you can publish latest by clicking on View Versions options on actions menu, the slideover will be opened with model versions list, there you can publish or mark the version as latest.
Once the latest version is published, it is tagged as published and can be used in OpenAI API as flotorch/<model-name> without specifying the version.
Archiving and Unarchiving a Model
Section titled “Archiving and Unarchiving a Model”Archiving hides a model from the active list but keeps it available for dependencies. You can restore it later.
Archive a model
Section titled “Archive a model”- In the left navigation, go to Model Registry (Models).
- In the models table, open the Actions (three dots) menu for the model and select Archive.
- In the confirmation dialog, review the message: “Are you sure you want to archive this model? You can restore it later if needed.”
- Click Archive. The model is archived and removed from the active list. Existing dependencies continue to work.
- To view archived models, open the top filters dropdown and select Archived.
Unarchive a model
Section titled “Unarchive a model”- In the models table, use the top filters dropdown and select Archived.
- Open the Actions (three dots) menu for the model and select Unarchive.
Deleting a Model
Section titled “Deleting a Model”Deleting a model permanently removes the model and all related dependencies. This action cannot be undone.
- In the left navigation, go to Model Registry (Models).
- In the models table, open the Actions (three dots) menu for the model and select Delete.
- Review the dependency list in the deletion modal (for example: model versions, agents with versions, workflows with versions, RAG endpoints, experiments).
- Click Continue. A confirmation modal opens and asks you to type the model name.
- Enter the model name and click Permanently Delete.
After confirmation, the model and all listed dependencies are deleted permanently and cannot be recovered.
Interacting with the model
Section titled “Interacting with the model”To interact with the model and test it, the model must be published. (Code snippet is available only for Chat models, not Embedding models.)
You can also call the Gateway APIs directly (replace <provider_name> and {id} with your values):
Chat completions
Section titled “Chat completions”POST https://gateway.flotorch.cloud/openai/v1/chat/completions
Request body:
{ "model": "flotorch/<model>", "messages": [ { "role": "user", "content": "What is the capital of Norway?" } ]}List models
Section titled “List models”GET https://gateway.flotorch.cloud/openai/v1/models
No request body. Returns an OpenAI-compatible list of FloTorch models available in your workspace.
Response (example):
{ "object": "list", "data": [ { "id": "my-chat-model", "name": "flotorch/my-chat-model", "object": "model", "owned_by": "flotorch" } ]}Generate embeddings
Section titled “Generate embeddings”POST https://gateway.flotorch.cloud/openai/v1/embeddings
Request body:
{ "model": "<provider_name>/embedding-model", "input": "Your text to embed"}Vector store search
Section titled “Vector store search”POST https://gateway.flotorch.cloud/openai/v1/vector_stores/{id}/search
Request body:
{ "query": "search query", "top_k": 5}- Click the code icon in the models table for a published Chat model.
- A slideover opens with a code snippet; you can copy Python, TypeScript, or cURL and test in your environment.