Running Experiments in FloTorch

Introduction

FloTorch Experiments module allows you to create structured experiments to test and evaluate the performance of different AI models against your own ground truth datasets. Each experiment is associated with a project, and supports comparison across models and metrics such as faithfulness, context precision, and maliciousness.

Creating a Project

To start an experiment, first create a project:

Click on Create Project in the top right.
Fill the following fields in the form:

Project Name – A unique name for your project.
Knowledge Base – Optional vector store reference for RAG-style evaluations.
Ground Truth File – Upload a ground truth file (required).
N-Shot Prompts – Choose how many examples to include for few-shot prompting.
Shot Prompt File – Upload your few-shot examples (required).
System Prompt – The system-level prompt to configure model behavior.
User Prompt – The user message to be used during evaluation.
Retrieval Models – Select one or more retrieval models to be evaluated.
KNN Number – Number of nearest neighbors to retrieve for context.
Evaluation Service – Currently supports Ragas.
Evaluation Model – Choose a model to run evaluations.

Once filled, click Create Project to begin running your experiment.

Viewing Projects

Once created, projects are listed under Experiments → Projects.

Each row shows:

Id – Unique ID for the project
Name – Project name
Status – Completion status
Created Date – Timestamp of project creation

You can click on a project ID to drill into its experiment results.

Viewing Experiment Results

Each project can contain one or more experiments (based on number of models tested).

For each experiment, you’ll see:

Status – Completed / Running
Inferencing Model – Name of the model used
Faithfulness – Score for factual accuracy
Context Precision – Score for contextual relevance
Maliciousness – Metric for unsafe or harmful content
Duration – Time taken to complete the experiment

This allows you to quickly evaluate which model performed best for your dataset and prompt.

Customizing Columns

You can customize the columns visible in the experiment results table using the Columns dropdown at the top-right of the table. Available options include:

Evaluation Service
Evaluation Inferencing Model – The model used to evaluate generated answers
KB Name – The linked Knowledge Base, if configured
KNN – Number of neighbors retrieved for context
N Shot Prompts – Number of few-shot examples used

These fields help provide more detailed insight into each experiment’s configuration and context.

Deployment Options

Once an experiment is completed, you can perform the following actions from the bottom right:

Deploy – Turn the successful experiment into a live RAG Endpoint. This allows you to serve production-ready answers based on the evaluated configuration.
Download Results – Export the evaluation results (e.g., metrics per question, raw responses, etc.) for external analysis.

Tip: You can deploy only completed experiments. Make sure all required components (vector store, prompts, model, etc.) are correctly configured.