Skip to content

Introduction

FloTorch Datasets are structured collections of data that enable you to evaluate, test, and improve your AI models and applications. Datasets provide a centralized way to manage test data, ground truth information, and evaluation examples that can be reused across multiple evaluation projects and experiments.

FloTorch supports three types of datasets, each designed for specific evaluation scenarios:

Purpose: Evaluate Retrieval-Augmented Generation (RAG) systems and pipelines.

Use Cases:

  • Testing RAG retrieval accuracy
  • Evaluating answer quality against ground truth
  • Benchmarking different RAG configurations
  • Comparing vector storage and embedding models

Required Files:

  • Ground Truth (required): Questions and expected answers for evaluation
  • Examples (optional): Few-shot examples to improve prompt quality

Purpose: Evaluate the performance and accuracy of FloTorch models.

Use Cases:

  • Testing model accuracy against known answers
  • Comparing different model versions
  • Benchmarking model configurations
  • Evaluating model behavior with different parameters

Required Files:

  • Ground Truth (required): Test questions and expected responses
  • Examples (optional): Few-shot examples for in-context learning

Purpose: Provide conversational data for testing chat-based applications.

Use Cases:

  • Testing chatbot behavior
  • Evaluating conversation flows
  • Training or fine-tuning chat models
  • Testing multi-turn conversations

Required Files:

  • Messages (required): Conversation data with role-based messages

All dataset files must be in JSON or JSONL (newline-delimited JSON) format with a maximum file size of 10 MB.

  1. JSON Array Format:

    [
    { "question": "What is the capital of France?", "answer": "Paris" },
    { "question": "What is 2+2?", "answer": "4" }
    ]
  2. JSONL Format (one JSON object per line):

    {"question": "What is the capital of France?", "answer": "Paris"}
    {"question": "What is 2+2?", "answer": "4"}
  3. JSON Object Wrapper (for examples files only):

    {
    "examples": [
    { "question": "Example question?", "answer": "Example answer" }
    ]
    }

Used for RAG Evaluation and Model Evaluation datasets.

Required Fields:

  • question (string): The input question or prompt
  • answer (string): The expected answer or response

Example:

[
{
"question": "What is the capital of France?",
"answer": "Paris"
},
{
"question": "What is the largest planet in our solar system?",
"answer": "Jupiter"
}
]

Additional fields: You can include additional fields for metadata, which will be preserved but not validated.

Used for RAG Evaluation and Model Evaluation datasets (optional).

Required Fields:

  • question (string): Example question
  • answer (string): Example answer

Example:

{
"examples": [
{
"question": "What is machine learning?",
"answer": "Machine learning is a subset of artificial intelligence..."
}
]
}

Used for Chat datasets.

Required Fields:

  • role (string): Must be one of: user, assistant, or system
  • content (string): The message content

Example:

[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, how are you?"
},
{
"role": "assistant",
"content": "I'm doing well, thank you for asking!"
}
]
  • Store evaluation data in one place
  • Reuse datasets across multiple evaluation projects
  • Version control through file replacement
  • Easy download and sharing of dataset files

FloTorch automatically validates all uploaded files to ensure:

  • Correct JSON/JSONL format
  • Required fields are present
  • File size is within limits
  • Schema compliance for each dataset type
  • Datasets are scoped to workspaces
  • Share datasets with team members
  • Unique naming within each workspace
  • Role-based access control

Dataset names must follow these rules:

  • Alphanumeric characters and dashes only (a-z, A-Z, 0-9, -)
  • Must be unique within the workspace
  • Cannot be changed after creation (to maintain referential integrity)

Dataset operations require specific workspace roles:

  • List Datasets: Workspace Member or higher
  • Create Dataset: Workspace Developer, Admin, or Org Admin
  • Update Dataset: Workspace Developer, Admin, or Org Admin
  • Upload Files: Workspace Developer, Admin, or Org Admin
  • Download Files: Workspace Member or higher

Once created, datasets can be used in evaluation projects to:

  1. RAG Benchmarking: Test and compare RAG pipeline configurations
  2. Model Testing: Evaluate FloTorch model performance
  3. A/B Testing: Compare different model versions or configurations
  4. Quality Assurance: Ensure consistent model behavior

Datasets are referenced when creating evaluation projects, and the data is used to run experiments and generate evaluation metrics.

  • Use descriptive names: Choose names that clearly indicate the dataset’s purpose (e.g., customer-support-qa, product-faq-v1)
  • Include metadata: Add description fields to explain the dataset’s purpose and contents
  • Keep files focused: Create separate datasets for different evaluation scenarios
  • Version your data: Use naming conventions like -v1, -v2 to track dataset versions
  • Validate locally first: Test your JSON/JSONL files with a validator before uploading
  • Start small: Begin with a small dataset to ensure correct format, then scale up
  • Document your schema: If using additional fields, document them for team members
  • Regular updates: Keep datasets current by replacing files with updated versions
  • File Size: Maximum 10 MB per file
  • File Format: Only JSON and JSONL formats supported
  • Name Immutability: Dataset names cannot be changed after creation
  • No Deletion: Datasets cannot be deleted (to prevent accidental data loss)
  • File Replacement: Uploading a new file of the same type replaces the previous file