Dataset Management

This guide covers how to manage datasets in FloTorch, from creation to viewing and downloading.

Creating a Dataset

Step-by-Step Process

Go to your Workspace → Datasets
Click Create Dataset (or New Dataset)
Enter the dataset information:
- Name: Unique identifier (lowercase letters, numbers, hyphens only)
- Description: Optional summary of the dataset’s purpose
- Type: Select RAG_EVALUATION or MODEL_EVALUATION
Click Create

After creation, add content using one of the methods in Datasets Introduction: Upload, Manual Capture, Auto Capture, Import from HuggingFace, or Generate Synthetic.

Adding Content via Upload

Open your dataset → Add Content → Upload
Use drag-and-drop or click to browse
Select your ground truth file (required) and optionally an examples file
Files must be JSON or JSONL, max 10 MB each
Upload and confirm

The system validates format, size, and required fields.

Viewing Datasets

Dataset List

The Datasets section lists all datasets in your workspace with:

Dataset name
Description
Type (RAG_EVALUATION or MODEL_EVALUATION)
Creation and update timestamps

Filtering and Search

Search: Find datasets by name or description
Type: Filter by RAG_EVALUATION or MODEL_EVALUATION
Pagination: Navigate through large lists

Downloading Dataset Files

To download files for backup or review:

Open the dataset
In the files section, find the file you want
Click Download next to the file

You can use downloads to back up data, share with others, or review offline.

Deleting Datasets

Dataset deletion is not supported to prevent accidental data loss and protect evaluation projects that reference datasets.

If you no longer need a dataset:

Add “DEPRECATED” or similar note in the description
Create a new dataset with updated data instead

Common Issues and Solutions

File Upload Errors

“Invalid file type”
Ensure the file has a .json or .jsonl extension and contains valid JSON.

“File too large”
Reduce file size to under 10 MB. Split into multiple datasets or remove unnecessary fields if needed.

“Invalid file content”
Check that required fields are present and values match expected types.

Validation Errors

“Question cannot be empty”
All question fields must contain non-empty strings.

“Invalid JSON on line X”
For JSONL files, ensure each line is a valid JSON object.

“File must contain a JSON array”
For JSON format, wrap your data in square brackets.

Best Practices

File Organization

Use clear, consistent naming for datasets
Add version info to descriptions (e.g., “v2 - updated March 2024”)
Download and back up important datasets regularly

Data Quality

Validate files locally before uploading
Start with a small sample to verify format
Add descriptions so your team understands each dataset’s purpose

Collaboration

Write clear descriptions for shared datasets
Use consistent naming conventions across the team
Notify team members when adding or changing content

Using Datasets in Evaluation Projects

Go to Evaluations → Projects → Create Project
Select your dataset during project setup
For RAG_EVALUATION: select a Knowledge Base
For MODEL_EVALUATION: no Knowledge Base needed
Configure and run experiments

The evaluation system uses questions from your ground truth as inputs and compares model outputs to expected answers to generate metrics.

Notes

Dataset names cannot be changed after creation
Datasets cannot be deleted
Max 10 MB per uploaded file; 50 MB for synthetic PDF source