Models

Creating a Model

To create a model, click Create FloTorch Model in the top right corner of the FloTorch Console. Choose the model Type (Chat or Embedding), provide a Name and optional Description. The model name must be unique.

Chat – After creation you are taken to the version configuration page; the model appears in Model Registry. A default version 1 is created in Draft state. Configure the version (router, cache, guardrails, providers) and publish it to make it available for routing requests through the Gateway.
Embedding – After creation the model appears in Model Registry and is ready to use for the embeddings API and vector storage; no version configuration or publish step is required. See Embeddings for details.

Model Versions

Model versions are the individual configurations of a model. Each model can have multiple versions. Each version can have a different provider models, guardrails, router and cache configuration. Once a model version is published, it is immutable and cannot be changed.

A new version can be created by clicking on New Version button in the top right corner of the FloTorch Console or on Model version configure page by clicking on three dots button beside publish button which will open a dropdown which contains Create new version.

A new copy of a any published or draft version (make a revision) can be created.

To create a copy of published version, click on Make a revision button which is located top right of the page.

To create a copy of draft version, click on three dots button, which will open dropdown, select make a revision option.

Configuring a Model Version

Model versions can be configured using below instructions.

Identify the model you want to configure from the Model Registry table.
Click on Actions dropdown button and select View Versions.
You will be see list of Model versions in slideover table.
Click on a version row to open that version’s configuration page, or click a row to go to the latest model version configure page for that model.

You will be presented with a model configuration canvas. You can configure the following:

Input Guardrails – Rules or policies that control the input of the model.
Router – Chooses which provider model handles each request. Available only when at least two provider models are added to the version. On the canvas, click the Router card to open the configuration slideover.
- Fallback – Set the order of models (Primary, then Fallback 1, 2, …). The gateway uses the next in line if the primary fails. See Fallback below for details.
- Round Robin – Requests are sent to each provider in turn. See Round Robin below for details.
- Weighted – Set a weight per model to distribute traffic (load balancing). See Weighted below for details.
- Smart Router – Incoming requests will be analyzed and routed to the best-fit model based on task complexity. When multiple models share the same complexity level, selection follows priority-ordered config filters (cost, keywords, schedule) and applies round-robin within the same priority. In the slideover, assign complexity levels to each model using the dropdown. Click the settings (gear) icon to configure optional conditions. See Smart Router below for details.
Cache – Caches responses to reduce latency and cost. On the canvas, click the Cache card to open the configuration slideover (title Cache, “Configure the cache for the model”). Choose None, Simple, or Semantic. When you enable Simple or Semantic, set Minimum input tokens needed to generate cache (slider from 10 to 1000)—only requests with at least that many input tokens are cached. Simple caches by exact request match; Semantic caches by similarity so similar requests can hit the cache. Click Save Cache in the slideover footer to apply.
Output Guardrails - A set of rules or policies to control the output of the model.

Once the model version is configured, you can publish it to make it available for routing requests through Gateway.

Adding Keyword-, Schedule-, and Cost-Based Routing

Keyword, schedule, and cost rules are configured per provider model in the same modal. They optionally refine which model handles a request (e.g. by keywords, time windows, or budget).

On the model version configuration canvas, add at least two provider models, then click the Router card. A slideover opens titled Router with the strategy options (Fallback, Round Robin, Weighted, Smart Router) and the list of models.
Choose a strategy. For Fallback you can drag to reorder models (Primary, Fallback 1, …). For Weighted you set a weight per model. For Smart Router you assign complexity levels per model.
Click the settings (gear) icon next to a model name in that slideover. A modal opens where you can configure routing conditions for that model.
In the modal you can enable any of (evaluated in this order: cost first, then keyword, then schedule):
- Cost-Based Routing (first priority) – Set a Budget ($) (monthly budget) and Alert Threshold (%) (1–100). Email notifications are sent when spending reaches the alert threshold and when the budget limit is exceeded. When the budget is exceeded, the model is excluded from routing. Use this to enforce budget limits per model.
- Keyword-Based Routing – Route requests based on metadata key-value matching. Configure conditions with:
  - Key: The metadata field name to match against
  - Operator: Equals, Not Equals, Contains, Starts With, Ends With, Greater Than, Less Than, Is Null, Is Not Null, or Regex
  - Value: Target values to match (comma-separated for multiple values)
  Requests containing matching metadata will be routed to this model. Use Add Keyword Condition to add multiple rules (all conditions must match).
- Schedule-Based Routing – Define time windows when this model should be available. Configure time slots with:
  - Active From: Start time of the availability window (HH:MM format, 15-minute intervals)
  - Active Until: End time of the availability window (HH:MM format, 15-minute intervals)
  Use Add Time Slot to add multiple time windows. The model will only be selected for routing during the configured time slots.
Click Save Configuration in the modal to save conditions for that model. Click Save Configuration in the Router slideover footer to save the routing strategy and order/weights.

Routing Strategies

Routing Strategies Comparison

Strategy	Description	Use Case
Fallback	Tries configs in order until one succeeds	High availability, simple failover
Round Robin	Distributes requests evenly across all configs	Load balancing
Weighted	Distributes based on assigned weights	Traffic shaping, A/B testing
Smart Router	Routes based on complexity + conditions	Cost optimization, intelligent routing

Fallback

The Fallback strategy provides high availability by trying models in a defined order. If the primary model fails, the gateway automatically tries the next model in the fallback chain.

How it works:

Requests are sent to the Primary Model first
If the primary fails (error, timeout, rate limit), the gateway tries Fallback 1
If Fallback 1 fails, it tries Fallback 2, and so on
The request fails only if all models in the chain fail

Configuration:

Select Fallback as the routing strategy
Drag models to set the order (Primary, Fallback 1, Fallback 2, …)
Click Save Configuration

Best for: Mission-critical applications requiring high availability and automatic failover.

Round Robin

The Round Robin strategy distributes requests evenly across all configured models in a rotating order.

How it works:

Each incoming request is sent to the next model in the rotation
After the last model, it cycles back to the first
All models receive approximately equal traffic over time

Configuration:

Select Round Robin as the routing strategy
Optionally drag to reorder models
Click Save Configuration

Best for: Load balancing across multiple equivalent models or providers.

Weighted

The Weighted strategy distributes traffic based on assigned weights, allowing fine-grained control over traffic distribution.

How it works:

Each model is assigned a weight (e.g., Model A: 70, Model B: 30)
Traffic is distributed proportionally to the weights
Higher weight = more traffic

Configuration:

Select Weighted as the routing strategy
Set a Weight value for each model (minimum 1)
Click Save Configuration

Example: With weights of 70 and 30, Model A receives ~70% of traffic and Model B receives ~30%.

Best for: A/B testing, gradual rollouts, or routing more traffic to preferred models.

Smart Router

The Smart Router strategy automatically selects the optimal model based on query complexity and configurable conditions.

How it works:

Incoming requests are analyzed by a complexity API (score 1–5)
The score maps to a complexity level: Very Low, Low, Medium, High, Very High
Models configured for that complexity level are selected
You can assign multiple complexity levels to a single model to handle a range of request complexities
Default complexity levels are suggested based on model capabilities, but you can update them as needed
If multiple models match, conditions are evaluated in priority order: Cost → Keywords → Schedule
If no exact match, the router falls back to the nearest complexity level

Complexity Levels:

Level	Label	Typical Use Cases
1	Very Low	Simple greetings, basic Q&A
2	Low	Straightforward questions, simple tasks
3	Medium	Multi-step reasoning, moderate analysis
4	High	Complex analysis, detailed explanations
5	Very High	Advanced reasoning, multi-domain tasks

Configuration:

Select Smart Router as the routing strategy
For each model, select one or more Complexity Levels from the dropdown
Optionally click the settings (gear) icon to configure conditions (Cost, Keywords, Schedule)
Click Save Configuration

Condition Priority:

Priority	Condition	Description
1 (Highest)	Cost	Budget check passes (monthly limit not exceeded)
2	Keywords	Request metadata matches keyword conditions
3	Schedule	Current time matches schedule conditions
4 (Lowest)	None	No conditions configured

Fallback behavior: When no exact complexity match exists, the router tries the nearest complexity levels in order of distance. When equidistant from two levels, it prefers the higher complexity level (better to over-provision than under-provision). When multiple models match at the same complexity and priority level, round-robin distributes requests evenly.

Best for: Cost optimization, routing simple queries to cheaper models and complex queries to advanced models.

Publishing a Model Version

Model versions can be published using the Publish button which is located top right corner of the model version canvas. Once you click Publish button a new form will be opened in slideover which contains summary of version and mark as latest checkbox. If you check Mark as latest, the published version becomes the latest version of the model.

To publish versions that are still in draft, click the History button (before Publish/Make a revision). A slideover opens with the model version list. Use Publish for unpublished versions and Mark as latest to set that version as the latest.

Once a model version is published, it is immutable and cannot be changed.

The published model can be used in OpenAI API as flotorch/<model-name>:<version>

Publishing Latest Version of a Model

You can publish the latest version of a model by doing this a model can be used in OpenAI API as flotorch/<model-name> without specifying the version. Internally it is tagged as latest.

Identify the model you want to publish the latest version of.
Click on Actions dropdown and select Publish Latest to publish the latest version of the model.
A slideover opens with a version dropdown. Select the version to publish as latest and click Publish.

Similarly like above, you can publish latest by clicking on View Versions options on actions menu, the slideover will be opened with model versions list, there you can publish or mark the version as latest.

Once the latest version is published, it is tagged as published and can be used in OpenAI API as flotorch/<model-name> without specifying the version.

Archiving and Unarchiving a Model

Archiving hides a model from the active list but keeps it available for dependencies. You can restore it later.

Archive a model

In the left navigation, go to Model Registry (Models).
In the models table, open the Actions (three dots) menu for the model and select Archive.
In the confirmation dialog, review the message: “Are you sure you want to archive this model? You can restore it later if needed.”
Click Archive. The model is archived and removed from the active list. Existing dependencies continue to work.
To view archived models, open the top filters dropdown and select Archived.

Unarchive a model

In the models table, use the top filters dropdown and select Archived.
Open the Actions (three dots) menu for the model and select Unarchive.

Deleting a Model

Deleting a model permanently removes the model and all related dependencies. This action cannot be undone.

In the left navigation, go to Model Registry (Models).
In the models table, open the Actions (three dots) menu for the model and select Delete.
Review the dependency list in the deletion modal (for example: model versions, agents with versions, workflows with versions, RAG endpoints, experiments).
Click Continue. A confirmation modal opens and asks you to type the model name.
Enter the model name and click Permanently Delete.

After confirmation, the model and all listed dependencies are deleted permanently and cannot be recovered.

Interacting with the model

To interact with the model and test it, the model must be published. (Code snippet is available only for Chat models, not Embedding models.)

You can also call the Gateway APIs directly (replace <provider_name> and {id} with your values):

Chat completions

POST https://gateway.flotorch.cloud/openai/v1/chat/completions

Request body:

{
  "model": "flotorch/<model>",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of Norway?"
    }
  ]
}

List models

GET https://gateway.flotorch.cloud/openai/v1/models

No request body. Returns an OpenAI-compatible list of FloTorch models available in your workspace.

Response (example):

{
  "object": "list",
  "data": [
    {
      "id": "my-chat-model",
      "name": "flotorch/my-chat-model",
      "object": "model",
      "owned_by": "flotorch"
    }
  ]
}

Generate embeddings

POST https://gateway.flotorch.cloud/openai/v1/embeddings

Request body:

{
  "model": "<provider_name>/embedding-model",
  "input": "Your text to embed"
}

Vector store search

POST https://gateway.flotorch.cloud/openai/v1/vector_stores/{id}/search

Request body:

{
  "query": "search query",
  "top_k": 5
}

Click the code icon in the models table for a published Chat model.
A slideover opens with a code snippet; you can copy Python, TypeScript, or cURL and test in your environment.