Dataset Management
This guide covers how to manage datasets in FloTorch, from creation to viewing and downloading.
Creating a Dataset
Section titled “Creating a Dataset”Step-by-Step Process
Section titled “Step-by-Step Process”- Go to your Workspace → Datasets
- Click Create Dataset (or New Dataset)
- In Choose Dataset Type, select one:
- Question and Answer Pair (Q&A Pair)
- Question and Answer Pair with Context (Q&A + Context)
- Click Configure and enter dataset information:
- Name: Unique identifier (lowercase letters, numbers, hyphens only)
- Description: Optional summary of the dataset’s purpose
- Creation Method: Options depend on selected dataset type
- Click Create Dataset
After selecting the dataset type:
- Q&A Pair supports: Upload files, Manual creation, Auto capture, Import from HuggingFace, and Generate from PDF.
- Q&A + Context supports: Upload Q&A with context files.
Adding Content via Upload
Section titled “Adding Content via Upload”- Open your dataset → Add Content → Upload
- Use drag-and-drop or click to browse
- Select your ground truth file (required) and optionally an examples file
- Files must be JSON or JSONL, max 10 MB each
- Upload and confirm
The system validates format, size, and required fields.
Viewing Datasets
Section titled “Viewing Datasets”Dataset List
Section titled “Dataset List”The Datasets section lists all datasets in your workspace with:
- Dataset name
- Description
- Dataset type (Q&A Pair or Q&A + Context)
- Creation and update timestamps
Filtering and Search
Section titled “Filtering and Search”- Search: Find datasets by name or description
- Archived / Unarchived: Filter datasets by status
- Pagination: Navigate through large lists
Downloading Dataset Files
Section titled “Downloading Dataset Files”To download files for backup or review:
- Open the dataset
- In the files section, find the file you want
- Click Download next to the file
You can use downloads to back up data, share with others, or review offline.
Deleting Datasets
Section titled “Deleting Datasets”Dataset deletion is not supported to prevent accidental data loss and protect evaluation projects that reference datasets.
If you no longer need a dataset:
- Add “DEPRECATED” or similar note in the description
- Create a new dataset with updated data instead
Common Issues and Solutions
Section titled “Common Issues and Solutions”File Upload Errors
Section titled “File Upload Errors”“Invalid file type”
Ensure the file has a .json or .jsonl extension and contains valid JSON.
“File too large”
Reduce file size to under 10 MB. Split into multiple datasets or remove unnecessary fields if needed.
“Invalid file content”
Check that required fields are present and values match expected types.
Validation Errors
Section titled “Validation Errors”“Question cannot be empty”
All question fields must contain non-empty strings.
“Invalid JSON on line X”
For JSONL files, ensure each line is a valid JSON object.
“File must contain a JSON array”
For JSON format, wrap your data in square brackets.
Best Practices
Section titled “Best Practices”File Organization
Section titled “File Organization”- Use clear, consistent naming for datasets
- Add version info to descriptions (e.g., “v2 - updated March 2024”)
- Download and back up important datasets regularly
Data Quality
Section titled “Data Quality”- Validate files locally before uploading
- Start with a small sample to verify format
- Add descriptions so your team understands each dataset’s purpose
Collaboration
Section titled “Collaboration”- Write clear descriptions for shared datasets
- Use consistent naming conventions across the team
- Notify team members when adding or changing content
Using Datasets in Evaluation Projects
Section titled “Using Datasets in Evaluation Projects”- Go to Evaluations → Projects → Create Project
- Select your dataset during project setup
- Select a Knowledge Base if your evaluation setup requires one
- Configure and run experiments
The evaluation system uses questions from your ground truth as inputs and compares model outputs to expected answers to generate metrics.
- Dataset names cannot be changed after creation
- Datasets cannot be deleted
- Dataset names must start with a letter and can contain lowercase letters (
a-z), numbers (0-9), and hyphens (-) - Max 10 MB per uploaded file; 50 MB for synthetic PDF source