Installation & Setup
Prerequisites
Section titled “Prerequisites”- Python 3.11 or higher — check with
python --version - LLM API key — from OpenAI or any OpenAI-compatible provider
- Outbound HTTPS access — to reach your model provider
Install
Section titled “Install”Copy and run this in your terminal to install Floeval:
pip install floevalFor FloTorch-hosted agents, install the optional extra:
pip install floeval[flotorch]From source
Section titled “From source”git clone https://github.com/FloTorch/floeval.gitcd floevalpip install -e .Verify
Section titled “Verify”Run the following to confirm Floeval is installed correctly:
floeval --versionYou should see output similar to floeval 0.1.0b1 (or your installed version). If the command is not found, ensure your Python environment is activated and Floeval is installed in that environment.
Configure LLM Credentials
Section titled “Configure LLM Credentials”Floeval needs LLM credentials for metrics that call the model (answer_relevancy, faithfulness, LLM-as-judge, etc.). You provide credentials through a config file (command line) or a Python object (from code).
Config file (for CLI)
Section titled “Config file (for CLI)”Copy the following into a file named config.yaml:
llm_config: base_url: "https://api.openai.com/v1" api_key: "your-api-key" chat_model: gpt-4o-mini embedding_model: text-embedding-3-small system_prompt: "You are a helpful assistant." # optionalPython object
Section titled “Python object”Use this in your Python code to build the config programmatically:
from floeval.config.schemas.io.llm import OpenAIProviderConfig
llm_config = OpenAIProviderConfig( base_url="https://api.openai.com/v1", api_key="your-api-key", chat_model="gpt-4o-mini", embedding_model="text-embedding-3-small",)Load credentials from environment variables or a secrets manager — Floeval only needs the final config object.
Using the FloTorch Gateway
Section titled “Using the FloTorch Gateway”When your LLMs or agents run on the FloTorch gateway, use your workspace gateway URL and API key:
- Sign in to the FloTorch Console — see Onboarding
- Create an API key — go to Settings > API Keys — see API Keys
- Get your gateway URL — from the API Documentation page
- Set
base_urlto your gateway URL andapi_keyto your FloTorch API key
llm_config: base_url: "https://gateway.flotorch.cloud/openai/v1" api_key: "your-flotorch-api-key" chat_model: flotorch/turbo embedding_model: text-embedding-3-smallQuick Validation
Section titled “Quick Validation”Confirm everything works with a quick test. You need a config file and a dataset file for the CLI, or you can use the Python example which builds the dataset inline.
Step 1: Create a config file
Section titled “Step 1: Create a config file”Create a new file named config.yaml in your project folder. Replace your-api-key with your actual API key:
llm_config: base_url: "https://api.openai.com/v1" api_key: "your-api-key" chat_model: gpt-4o-mini embedding_model: text-embedding-3-small
evaluation_config: metrics: - ragas:answer_relevancyStep 2: Create a dataset file
Section titled “Step 2: Create a dataset file”Create a new file named dataset.json in the same folder. Copy the following (the dataset must have a "samples" array; each sample needs user_input and llm_response):
{ "samples": [ { "user_input": "What is RAG?", "llm_response": "RAG is Retrieval-Augmented Generation." } ]}Step 3: Run the CLI
Section titled “Step 3: Run the CLI”From the same folder where you created config.yaml and dataset.json, run:
floeval evaluate -c config.yaml -d dataset.jsonIf everything is configured correctly, you should see aggregate scores printed to the terminal (e.g. {'ragas:answer_relevancy': 0.85}). Add -o results.json to save the output to a file.
Alternative: Python (no files needed)
Section titled “Alternative: Python (no files needed)”The code below builds the dataset inline, so you can validate your setup without creating any files. Copy and run this script:
from floeval import Evaluation, DatasetLoaderfrom floeval.config.schemas.io.llm import OpenAIProviderConfig
llm_config = OpenAIProviderConfig( base_url="https://api.openai.com/v1", api_key="your-api-key", chat_model="gpt-4o-mini", embedding_model="text-embedding-3-small",)
dataset = DatasetLoader.from_samples( [{"user_input": "What is RAG?", "llm_response": "RAG is Retrieval-Augmented Generation."}], partial_dataset=False,)
evaluation = Evaluation( dataset=dataset, llm_config=llm_config, metrics=["ragas:answer_relevancy"],)
results = evaluation.run()print(results.aggregate_scores)Next Steps
Section titled “Next Steps”- LLM Evaluations — evaluate raw model outputs
- RAG Evaluations — evaluate retrieval-augmented generation
- Agent Evaluations — evaluate tool-using agents
- Workflow Evaluations — evaluate agentic workflows