Dataset Management
This guide covers how to manage datasets in FloTorch, from creation to viewing and downloading.
Creating a Dataset
Section titled “Creating a Dataset”Step-by-Step Process
Section titled “Step-by-Step Process”- Go to your Workspace → Datasets
- Click Create Dataset (or New Dataset)
- Enter the dataset information:
- Name: Unique identifier (lowercase letters, numbers, hyphens only)
- Description: Optional summary of the dataset’s purpose
- Type: Select RAG_EVALUATION or MODEL_EVALUATION
- Click Create
After creation, add content using one of the methods in Datasets Introduction: Upload, Manual Capture, Auto Capture, Import from HuggingFace, or Generate Synthetic.
Adding Content via Upload
Section titled “Adding Content via Upload”- Open your dataset → Add Content → Upload
- Use drag-and-drop or click to browse
- Select your ground truth file (required) and optionally an examples file
- Files must be JSON or JSONL, max 10 MB each
- Upload and confirm
The system validates format, size, and required fields.
Viewing Datasets
Section titled “Viewing Datasets”Dataset List
Section titled “Dataset List”The Datasets section lists all datasets in your workspace with:
- Dataset name
- Description
- Type (RAG_EVALUATION or MODEL_EVALUATION)
- Creation and update timestamps
Filtering and Search
Section titled “Filtering and Search”- Search: Find datasets by name or description
- Type: Filter by RAG_EVALUATION or MODEL_EVALUATION
- Pagination: Navigate through large lists
Downloading Dataset Files
Section titled “Downloading Dataset Files”To download files for backup or review:
- Open the dataset
- In the files section, find the file you want
- Click Download next to the file
You can use downloads to back up data, share with others, or review offline.
Deleting Datasets
Section titled “Deleting Datasets”Dataset deletion is not supported to prevent accidental data loss and protect evaluation projects that reference datasets.
If you no longer need a dataset:
- Add “DEPRECATED” or similar note in the description
- Create a new dataset with updated data instead
Common Issues and Solutions
Section titled “Common Issues and Solutions”File Upload Errors
Section titled “File Upload Errors”“Invalid file type”
Ensure the file has a .json or .jsonl extension and contains valid JSON.
“File too large”
Reduce file size to under 10 MB. Split into multiple datasets or remove unnecessary fields if needed.
“Invalid file content”
Check that required fields are present and values match expected types.
Validation Errors
Section titled “Validation Errors”“Question cannot be empty”
All question fields must contain non-empty strings.
“Invalid JSON on line X”
For JSONL files, ensure each line is a valid JSON object.
“File must contain a JSON array”
For JSON format, wrap your data in square brackets.
Best Practices
Section titled “Best Practices”File Organization
Section titled “File Organization”- Use clear, consistent naming for datasets
- Add version info to descriptions (e.g., “v2 - updated March 2024”)
- Download and back up important datasets regularly
Data Quality
Section titled “Data Quality”- Validate files locally before uploading
- Start with a small sample to verify format
- Add descriptions so your team understands each dataset’s purpose
Collaboration
Section titled “Collaboration”- Write clear descriptions for shared datasets
- Use consistent naming conventions across the team
- Notify team members when adding or changing content
Using Datasets in Evaluation Projects
Section titled “Using Datasets in Evaluation Projects”- Go to Evaluations → Projects → Create Project
- Select your dataset during project setup
- For RAG_EVALUATION: select a Knowledge Base
- For MODEL_EVALUATION: no Knowledge Base needed
- Configure and run experiments
The evaluation system uses questions from your ground truth as inputs and compares model outputs to expected answers to generate metrics.
- Dataset names cannot be changed after creation
- Datasets cannot be deleted
- Max 10 MB per uploaded file; 50 MB for synthetic PDF source