Model Registry

Model Registry is a central repository for all your AI models. It allows you to manage your models, their versions, and their configurations. FloTorch gateway enables you to interact with multiple providers with a single OpenAI compatible API.

When a request is made to the gateway, the gateway will look for the model in the model registry and route the request to the appropriate provider.When configured with guardrails, the gateway will also check if the model is compliant with the guardrails

A FloTorch model can be configured with

Input Guardrails - A set of rules or policies to control the input of the model.
Router – Directs the processed input to the appropriate provider based on defined logic. Routing is available only when two or more provider models are configured for the version. Supported strategies:
- Fallback – Requests go to the primary model; if it fails, the gateway tries the next in the ordered list. You arrange models as Primary, Fallback 1, Fallback 2, etc.
- Round Robin – The gateway sends each request to the next provider in the list in turn.
- Weighted – Traffic is distributed by the weight you set per model (load balancing). Per model you can optionally add cost-based (first priority), keyword-based, or schedule-based routing (e.g. budget and alert threshold).
Cache – Caches responses to reduce latency and cost. On the canvas, click the Cache card to configure. Choose None, Simple (exact request match), or Semantic (match by meaning, using a similarity threshold). When cache is enabled, you can set minimum input tokens (e.g. only cache when the request has at least that many tokens).
Output Guardrails - A set of rules or policies to control the output of the model.
Provider Configuration - A set of configuration that you can configure to control the behavior of your AI resources.

It is possible to configure a model with multiple providers and guardrails. The gateway will then route the request to the appropriate provider and apply the guardrails.

Embeddings

The Model Registry supports Embedding models in addition to Chat models. Embedding models are used to generate vector embeddings for your text via the gateway’s OpenAI-compatible POST /openai/v1/embeddings endpoint. They are also used when configuring vector storage (e.g., for Knowledge Base or RAG) to compute embeddings for documents and queries.

Chat models – Used for chat completions; support versioning, router, cache, and guardrails on the model version canvas.
Embedding models – Created by choosing type Embedding when creating a FloTorch model, then selecting an embedding provider and embedding model. They do not use the version configuration canvas; once created, they are ready to use as flotorch/<model-name> for the embeddings API or when linking to vector storage.