Dataset Management
This guide covers all aspects of managing datasets in FloTorch, from creation to file management and updates.
Creating a Dataset
Section titled “Creating a Dataset”Step-by-Step Process
Section titled “Step-by-Step Process”- Navigate to the Datasets section in your workspace
- Click the New Dataset button in the top right corner
- Fill in the dataset information:
- Name: Unique identifier (alphanumeric with dashes only)
- Description: Optional explanation of the dataset’s purpose
- Type: Select RAG Evaluation, Model Evaluation, or Chat
- Upload required files based on dataset type:
- RAG Evaluation: Ground Truth (required), Examples (optional)
- Model Evaluation: Ground Truth (required), Examples (optional)
- Chat: Messages (required)
- Click Create Dataset to save
File Upload Methods
Section titled “File Upload Methods”FloTorch supports two methods for uploading dataset files:
Drag and Drop
Section titled “Drag and Drop”- Drag your JSON or JSONL file from your file system
- Drop it onto the designated file upload area
- The file will be automatically validated
File Selection
Section titled “File Selection”- Click the file upload area or “Browse” button
- Select your JSON or JSONL file from the file picker
- The file will be automatically validated
File Validation
Section titled “File Validation”When you upload a file, FloTorch automatically validates:
- File Format: Must be valid JSON or JSONL
- File Size: Must be less than 10 MB
- Schema Compliance: Required fields must be present
- Content Structure: Each item must match the expected schema
If validation fails, you’ll see a specific error message indicating what needs to be corrected.
Viewing Datasets
Section titled “Viewing Datasets”Dataset List
Section titled “Dataset List”The Datasets section shows all datasets in your workspace with:
- Dataset name
- Description
- Type (RAG_EVALUATION, MODEL_EVALUATION, or CHAT)
- Creation and update timestamps
Filtering and Search
Section titled “Filtering and Search”You can filter and search datasets by:
- Search: Find datasets by name or description
- Type: Filter by dataset type
- Pagination: Navigate through large dataset lists
Updating a Dataset
Section titled “Updating a Dataset”What Can Be Updated
Section titled “What Can Be Updated”When editing a dataset, you can modify:
- Description: Update the dataset description
- Type: Change the dataset type (will require appropriate files)
- Files: Replace existing files or add new optional files
What Cannot Be Updated
Section titled “What Cannot Be Updated”The following fields are immutable:
- Name: Cannot be changed after creation (maintains referential integrity with evaluation projects)
Updating Dataset Files
Section titled “Updating Dataset Files”To replace or add files to an existing dataset:
- Navigate to the dataset you want to update
- Click the Actions dropdown (three vertical dots)
- Select Edit
- The current files will be displayed in the “Existing Files” section
- Upload new files to replace them:
- For required files: Uploading a new file replaces the old one
- For optional files: You can add them if not already present
- Click Update Dataset
Important: When you upload a new file of the same type (e.g., a new ground truth file), it completely replaces the previous file. The old file is preserved in the system for history but is no longer linked to the dataset.
Downloading Dataset Files
Section titled “Downloading Dataset Files”To download dataset files for local review or backup:
- Open the dataset you want to download files from
- In the “Existing Files” section, find the file you want to download
- Click the Download button next to the file
- The file will be downloaded to your local system
You can also use the download functionality to:
- Create backups of your datasets
- Share datasets with team members outside FloTorch
- Review and edit dataset contents offline
Dataset File Structure Examples
Section titled “Dataset File Structure Examples”RAG Evaluation Dataset
Section titled “RAG Evaluation Dataset”Ground Truth File (groundtruth.jsonl):
{"question": "What is FloTorch?", "answer": "FloTorch is an AI orchestration platform for managing and deploying AI applications."}{"question": "What are FloTorch Guardrails?", "answer": "FloTorch Guardrails are content filtering mechanisms that protect AI applications from malicious requests."}Examples File (examples.json) - Optional:
{ "examples": [ { "question": "What is an AI agent?", "answer": "An AI agent is an autonomous program that perceives its environment and takes actions to achieve specific goals." } ]}Model Evaluation Dataset
Section titled “Model Evaluation Dataset”Ground Truth File (groundtruth.json):
[ { "question": "Calculate 15% of 80", "answer": "12", "category": "math", "difficulty": "easy" }, { "question": "What is the capital of Japan?", "answer": "Tokyo", "category": "geography", "difficulty": "easy" }]Examples File (examples.json) - Optional:
[ { "question": "What is 10% of 50?", "answer": "5" }]Chat Dataset
Section titled “Chat Dataset”Messages File (messages.jsonl):
{"role": "system", "content": "You are a helpful customer support assistant."}{"role": "user", "content": "I need help with my order"}{"role": "assistant", "content": "I'd be happy to help! Could you provide your order number?"}{"role": "user", "content": "It's ORDER-12345"}{"role": "assistant", "content": "Thank you. Let me look that up for you."}Deleting Datasets
Section titled “Deleting Datasets”Dataset deletion is currently not supported to prevent accidental data loss and maintain the integrity of evaluation projects that reference these datasets.
Workarounds:
- If you no longer need a dataset, you can note this in the description (e.g., “DEPRECATED - Do not use”)
- Create a new dataset with updated data instead of deleting the old one
- Contact support if you need to remove sensitive data
Common Issues and Solutions
Section titled “Common Issues and Solutions”File Upload Errors
Section titled “File Upload Errors””Invalid file type”
Section titled “”Invalid file type””Solution: Ensure your file has a .json or .jsonl extension and contains valid JSON data.
”File too large”
Section titled “”File too large””Solution: Reduce file size to under 10 MB by:
- Removing unnecessary fields
- Splitting into multiple datasets
- Compressing repetitive data
”Invalid file content”
Section titled “”Invalid file content””Solution: Check that:
- JSON syntax is correct (use a JSON validator)
- All required fields are present
- Field values match expected types (strings for question/answer/content, correct role values)
Validation Errors
Section titled “Validation Errors””Question cannot be empty”
Section titled “”Question cannot be empty””Solution: Ensure all question fields contain non-empty strings.
”Invalid JSON on line X”
Section titled “”Invalid JSON on line X””Solution: For JSONL files, check that line X contains valid JSON. Each line must be a complete JSON object.
”File must contain a JSON array”
Section titled “”File must contain a JSON array””Solution: For JSON format, wrap your data in square brackets [...].
Best Practices
Section titled “Best Practices”File Organization
Section titled “File Organization”- Consistent Naming: Use clear, consistent naming conventions for files
- Version Control: Include version information in file names (e.g.,
groundtruth-v2.json) - Backup: Download and backup important datasets regularly
Data Quality
Section titled “Data Quality”- Validate Locally: Use JSON validators before uploading
- Test with Samples: Start with a small sample to verify format
- Review Regularly: Periodically review and update datasets
- Document Changes: Use the description field to track major updates
Collaboration
Section titled “Collaboration”- Clear Descriptions: Write detailed descriptions so team members understand the dataset’s purpose
- Naming Conventions: Establish team-wide naming conventions
- Access Control: Ensure appropriate team members have necessary permissions
- Communication: Notify team when updating shared datasets
Performance
Section titled “Performance”- Optimize File Size: Remove unnecessary fields and whitespace
- Appropriate Dataset Size: Balance between comprehensive coverage and manageable file size
- Split Large Datasets: Consider splitting very large datasets into logical subsets
Using Datasets in Evaluation Projects
Section titled “Using Datasets in Evaluation Projects”After creating a dataset, you can use it in evaluation projects:
- Navigate to the Evaluations section
- Create a new evaluation project
- Select your dataset during project configuration
- Configure your evaluation experiments
- Run evaluations using the dataset’s ground truth data
The evaluation system will:
- Use questions from the ground truth file as inputs
- Compare model outputs against expected answers
- Generate accuracy and quality metrics
- Provide detailed evaluation results
For more information on running evaluations, see the Evaluations documentation.