Introduction

What Are Datasets?

FloTorch Datasets are collections of question-answer pairs used to evaluate and improve your AI models. They store your test data (ground truth) and examples in one place, so you can reuse them across evaluation projects and experiments.

Dataset Types

FloTorch supports two dataset types:

Type	Purpose	Used For
RAG_EVALUATION	Evaluate RAG systems and pipelines	Testing retrieval accuracy, answer quality, comparing vector stores and embeddings. Requires a Knowledge Base.
MODEL_EVALUATION	Evaluate FloTorch model performance	Testing model accuracy, comparing versions, benchmarking configurations. No Knowledge Base required.

How to Use Datasets in the Application

Step 1: Create a Dataset

Go to your Workspace → Datasets
Click Create Dataset
Enter a name (lowercase letters, numbers, hyphens only)
Optionally add a description
Choose the type: RAG_EVALUATION or MODEL_EVALUATION
Click Create

Step 2: Add Content to Your Dataset

After creating a dataset, add question-answer pairs. Choose one of these methods:

Method	Available For	When to Use
Upload	Both types	You already have data in JSON/JSONL format
Manual Capture	Both types	You want to build Q&A pairs one by one with a model
Auto Capture	Both types	You want to collect real Q&A from Gateway traffic
Import from HuggingFace	Both types	You want to use a public dataset (e.g., MMLU, SQuAD)
Generate Synthetic	RAG_EVALUATION only	You have a PDF and want to generate Q&A from it

Upload

Open your dataset → Add Content → Upload
Select your ground truth file (required) and optionally an examples file
Files must be JSON or JSONL, max 10 MB each
Upload and confirm

Manual Capture

Open your dataset → Add Content → Manual Capture
Select a FloTorch Chat model and version
Enter a question and click Get answer
Edit the answer if needed, then add to the collection
Repeat until you have enough pairs
Click Save to upload to the dataset

Auto Capture

Open your dataset → Add Content → Auto Capture
Select one or more FloTorch Chat models to watch
Set the target number of Q&A pairs (10–1000)
Click Start — capture runs in the background
When your Gateway receives traffic for those models, pairs are captured automatically
You can stop capture early or wait until the target is reached

Import from HuggingFace

Open your dataset → Add Content → Import from HuggingFace
Enter the HuggingFace repository ID (e.g., allenai/mmlu)
Specify the source file and map columns to question and answer
Optionally add an examples file and mappings
Click Import — the job runs in the background

Generate Synthetic (RAG_EVALUATION only)

Open your dataset → Add Content → Generate Synthetic
Upload a PDF file (max 50 MB)
Select a FloTorch Chat model
Enter how many Q&A pairs to generate (1–10,000)
Click Generate — the job runs in the background

Step 3: Use Your Dataset in Evaluation Projects

Go to Evaluations → Projects → Create Project
Select your dataset from the list
For RAG_EVALUATION: also select a Knowledge Base
For MODEL_EVALUATION: no Knowledge Base needed
Configure and run experiments

Your dataset provides the ground truth used to score model responses.

Ways to Add Content (Summary)

Method	RAG_EVALUATION	MODEL_EVALUATION
Upload files	✓	✓
Manual Capture	✓	✓
Auto Capture (Gateway)	✓	✓
Import from HuggingFace	✓	✓
Generate Synthetic (PDF)	✓	—

Tips

Use clear dataset names that describe the purpose (e.g., customer-support-qa)
Add a description so your team understands the contents
Start with a small dataset to verify format, then add more
Replace files to update datasets — uploading a new file of the same type overwrites the previous one

Notes

Dataset names cannot be changed after creation
Datasets cannot be deleted
Max 10 MB per uploaded file; 50 MB for synthetic PDF source