Skip to content

Introduction

FloTorch Datasets are collections of question-answer pairs used to evaluate and improve your AI models. They store your test data (ground truth) and examples in one place, so you can reuse them across evaluation projects and experiments.


FloTorch supports two dataset types:

TypePurposeUsed For
RAG_EVALUATIONEvaluate RAG systems and pipelinesTesting retrieval accuracy, answer quality, comparing vector stores and embeddings. Requires a Knowledge Base.
MODEL_EVALUATIONEvaluate FloTorch model performanceTesting model accuracy, comparing versions, benchmarking configurations. No Knowledge Base required.

  1. Go to your WorkspaceDatasets
  2. Click Create Dataset
  3. Enter a name (lowercase letters, numbers, hyphens only)
  4. Optionally add a description
  5. Choose the type: RAG_EVALUATION or MODEL_EVALUATION
  6. Click Create

After creating a dataset, add question-answer pairs. Choose one of these methods:

MethodAvailable ForWhen to Use
UploadBoth typesYou already have data in JSON/JSONL format
Manual CaptureBoth typesYou want to build Q&A pairs one by one with a model
Auto CaptureBoth typesYou want to collect real Q&A from Gateway traffic
Import from HuggingFaceBoth typesYou want to use a public dataset (e.g., MMLU, SQuAD)
Generate SyntheticRAG_EVALUATION onlyYou have a PDF and want to generate Q&A from it
  1. Open your dataset → Add ContentUpload
  2. Select your ground truth file (required) and optionally an examples file
  3. Files must be JSON or JSONL, max 10 MB each
  4. Upload and confirm
  1. Open your dataset → Add ContentManual Capture
  2. Select a FloTorch Chat model and version
  3. Enter a question and click Get answer
  4. Edit the answer if needed, then add to the collection
  5. Repeat until you have enough pairs
  6. Click Save to upload to the dataset
  1. Open your dataset → Add ContentAuto Capture
  2. Select one or more FloTorch Chat models to watch
  3. Set the target number of Q&A pairs (10–1000)
  4. Click Start — capture runs in the background
  5. When your Gateway receives traffic for those models, pairs are captured automatically
  6. You can stop capture early or wait until the target is reached
  1. Open your dataset → Add ContentImport from HuggingFace
  2. Enter the HuggingFace repository ID (e.g., allenai/mmlu)
  3. Specify the source file and map columns to question and answer
  4. Optionally add an examples file and mappings
  5. Click Import — the job runs in the background
  1. Open your dataset → Add ContentGenerate Synthetic
  2. Upload a PDF file (max 50 MB)
  3. Select a FloTorch Chat model
  4. Enter how many Q&A pairs to generate (1–10,000)
  5. Click Generate — the job runs in the background

Step 3: Use Your Dataset in Evaluation Projects

Section titled “Step 3: Use Your Dataset in Evaluation Projects”
  1. Go to EvaluationsProjectsCreate Project
  2. Select your dataset from the list
  3. For RAG_EVALUATION: also select a Knowledge Base
  4. For MODEL_EVALUATION: no Knowledge Base needed
  5. Configure and run experiments

Your dataset provides the ground truth used to score model responses.


MethodRAG_EVALUATIONMODEL_EVALUATION
Upload files
Manual Capture
Auto Capture (Gateway)
Import from HuggingFace
Generate Synthetic (PDF)

  • Use clear dataset names that describe the purpose (e.g., customer-support-qa)
  • Add a description so your team understands the contents
  • Start with a small dataset to verify format, then add more
  • Replace files to update datasets — uploading a new file of the same type overwrites the previous one

  • Dataset names cannot be changed after creation
  • Datasets cannot be deleted
  • Max 10 MB per uploaded file; 50 MB for synthetic PDF source