Skip to content

Dataset Management

This guide covers all aspects of managing datasets in FloTorch, from creation to file management and updates.

  1. Navigate to the Datasets section in your workspace
  2. Click the New Dataset button in the top right corner
  3. Fill in the dataset information:
    • Name: Unique identifier (alphanumeric with dashes only)
    • Description: Optional explanation of the dataset’s purpose
    • Type: Select RAG Evaluation, Model Evaluation, or Chat
  4. Upload required files based on dataset type:
    • RAG Evaluation: Ground Truth (required), Examples (optional)
    • Model Evaluation: Ground Truth (required), Examples (optional)
    • Chat: Messages (required)
  5. Click Create Dataset to save

FloTorch supports two methods for uploading dataset files:

  1. Drag your JSON or JSONL file from your file system
  2. Drop it onto the designated file upload area
  3. The file will be automatically validated
  1. Click the file upload area or “Browse” button
  2. Select your JSON or JSONL file from the file picker
  3. The file will be automatically validated

When you upload a file, FloTorch automatically validates:

  1. File Format: Must be valid JSON or JSONL
  2. File Size: Must be less than 10 MB
  3. Schema Compliance: Required fields must be present
  4. Content Structure: Each item must match the expected schema

If validation fails, you’ll see a specific error message indicating what needs to be corrected.

The Datasets section shows all datasets in your workspace with:

  • Dataset name
  • Description
  • Type (RAG_EVALUATION, MODEL_EVALUATION, or CHAT)
  • Creation and update timestamps

You can filter and search datasets by:

  • Search: Find datasets by name or description
  • Type: Filter by dataset type
  • Pagination: Navigate through large dataset lists

When editing a dataset, you can modify:

  • Description: Update the dataset description
  • Type: Change the dataset type (will require appropriate files)
  • Files: Replace existing files or add new optional files

The following fields are immutable:

  • Name: Cannot be changed after creation (maintains referential integrity with evaluation projects)

To replace or add files to an existing dataset:

  1. Navigate to the dataset you want to update
  2. Click the Actions dropdown (three vertical dots)
  3. Select Edit
  4. The current files will be displayed in the “Existing Files” section
  5. Upload new files to replace them:
    • For required files: Uploading a new file replaces the old one
    • For optional files: You can add them if not already present
  6. Click Update Dataset

Important: When you upload a new file of the same type (e.g., a new ground truth file), it completely replaces the previous file. The old file is preserved in the system for history but is no longer linked to the dataset.

To download dataset files for local review or backup:

  1. Open the dataset you want to download files from
  2. In the “Existing Files” section, find the file you want to download
  3. Click the Download button next to the file
  4. The file will be downloaded to your local system

You can also use the download functionality to:

  • Create backups of your datasets
  • Share datasets with team members outside FloTorch
  • Review and edit dataset contents offline

Ground Truth File (groundtruth.jsonl):

{"question": "What is FloTorch?", "answer": "FloTorch is an AI orchestration platform for managing and deploying AI applications."}
{"question": "What are FloTorch Guardrails?", "answer": "FloTorch Guardrails are content filtering mechanisms that protect AI applications from malicious requests."}

Examples File (examples.json) - Optional:

{
"examples": [
{
"question": "What is an AI agent?",
"answer": "An AI agent is an autonomous program that perceives its environment and takes actions to achieve specific goals."
}
]
}

Ground Truth File (groundtruth.json):

[
{
"question": "Calculate 15% of 80",
"answer": "12",
"category": "math",
"difficulty": "easy"
},
{
"question": "What is the capital of Japan?",
"answer": "Tokyo",
"category": "geography",
"difficulty": "easy"
}
]

Examples File (examples.json) - Optional:

[
{
"question": "What is 10% of 50?",
"answer": "5"
}
]

Messages File (messages.jsonl):

{"role": "system", "content": "You are a helpful customer support assistant."}
{"role": "user", "content": "I need help with my order"}
{"role": "assistant", "content": "I'd be happy to help! Could you provide your order number?"}
{"role": "user", "content": "It's ORDER-12345"}
{"role": "assistant", "content": "Thank you. Let me look that up for you."}

Dataset deletion is currently not supported to prevent accidental data loss and maintain the integrity of evaluation projects that reference these datasets.

Workarounds:

  • If you no longer need a dataset, you can note this in the description (e.g., “DEPRECATED - Do not use”)
  • Create a new dataset with updated data instead of deleting the old one
  • Contact support if you need to remove sensitive data

Solution: Ensure your file has a .json or .jsonl extension and contains valid JSON data.

Solution: Reduce file size to under 10 MB by:

  • Removing unnecessary fields
  • Splitting into multiple datasets
  • Compressing repetitive data

Solution: Check that:

  • JSON syntax is correct (use a JSON validator)
  • All required fields are present
  • Field values match expected types (strings for question/answer/content, correct role values)

Solution: Ensure all question fields contain non-empty strings.

Solution: For JSONL files, check that line X contains valid JSON. Each line must be a complete JSON object.

Solution: For JSON format, wrap your data in square brackets [...].

  • Consistent Naming: Use clear, consistent naming conventions for files
  • Version Control: Include version information in file names (e.g., groundtruth-v2.json)
  • Backup: Download and backup important datasets regularly
  • Validate Locally: Use JSON validators before uploading
  • Test with Samples: Start with a small sample to verify format
  • Review Regularly: Periodically review and update datasets
  • Document Changes: Use the description field to track major updates
  • Clear Descriptions: Write detailed descriptions so team members understand the dataset’s purpose
  • Naming Conventions: Establish team-wide naming conventions
  • Access Control: Ensure appropriate team members have necessary permissions
  • Communication: Notify team when updating shared datasets
  • Optimize File Size: Remove unnecessary fields and whitespace
  • Appropriate Dataset Size: Balance between comprehensive coverage and manageable file size
  • Split Large Datasets: Consider splitting very large datasets into logical subsets

After creating a dataset, you can use it in evaluation projects:

  1. Navigate to the Evaluations section
  2. Create a new evaluation project
  3. Select your dataset during project configuration
  4. Configure your evaluation experiments
  5. Run evaluations using the dataset’s ground truth data

The evaluation system will:

  • Use questions from the ground truth file as inputs
  • Compare model outputs against expected answers
  • Generate accuracy and quality metrics
  • Provide detailed evaluation results

For more information on running evaluations, see the Evaluations documentation.