Running Experiments in FloTorch
Introduction
Section titled “Introduction”FloTorch Experiments module allows you to create structured experiments to test and evaluate the performance of different AI models against your own ground truth datasets. Each experiment is associated with a project, and supports comparison across models and metrics such as faithfulness, context precision, and maliciousness.
Creating a Project
Section titled “Creating a Project”To start an experiment, first create a project:
- Click on Create Project in the top right.
- Fill the following fields in the form:
- Project Name – A unique name for your project.
- Knowledge Base – Optional vector store reference for RAG-style evaluations.
- Ground Truth File – Upload a ground truth file (required).
- N-Shot Prompts – Choose how many examples to include for few-shot prompting.
- Shot Prompt File – Upload your few-shot examples (required).
- System Prompt – The system-level prompt to configure model behavior.
- User Prompt – The user message to be used during evaluation.
- Retrieval Models – Select one or more retrieval models to be evaluated.
- KNN Number – Number of nearest neighbors to retrieve for context.
- Evaluation Service – Currently supports
Ragas
. - Evaluation Model – Choose a model to run evaluations.
Once filled, click Create Project to begin running your experiment.
Viewing Projects
Section titled “Viewing Projects”Once created, projects are listed under Experiments → Projects.
Each row shows:
- Id – Unique ID for the project
- Name – Project name
- Status – Completion status
- Created Date – Timestamp of project creation
You can click on a project ID to drill into its experiment results.
Viewing Experiment Results
Section titled “Viewing Experiment Results”Each project can contain one or more experiments (based on number of models tested).
For each experiment, you’ll see:
- Status – Completed / Running
- Inferencing Model – Name of the model used
- Faithfulness – Score for factual accuracy
- Context Precision – Score for contextual relevance
- Maliciousness – Metric for unsafe or harmful content
- Duration – Time taken to complete the experiment
This allows you to quickly evaluate which model performed best for your dataset and prompt.
Customizing Columns
Section titled “Customizing Columns”You can customize the columns visible in the experiment results table using the Columns dropdown at the top-right of the table. Available options include:
- Evaluation Service
- Evaluation Inferencing Model – The model used to evaluate generated answers
- KB Name – The linked Knowledge Base, if configured
- KNN – Number of neighbors retrieved for context
- N Shot Prompts – Number of few-shot examples used
These fields help provide more detailed insight into each experiment’s configuration and context.
Deployment Options
Section titled “Deployment Options”Once an experiment is completed, you can perform the following actions from the bottom right:
- Deploy – Turn the successful experiment into a live RAG Endpoint. This allows you to serve production-ready answers based on the evaluated configuration.
- Download Results – Export the evaluation results (e.g., metrics per question, raw responses, etc.) for external analysis.
Tip: You can deploy only completed experiments. Make sure all required components (vector store, prompts, model, etc.) are correctly configured.