Skip to content

Installation & Setup

  • Python 3.11 or higher — check with python --version
  • LLM API key — from OpenAI or any OpenAI-compatible provider
  • Outbound HTTPS access — to reach your model provider

Copy and run this in your terminal to install Floeval:

Terminal window
pip install floeval

For FloTorch-hosted agents, install the optional extra:

Terminal window
pip install floeval[flotorch]
Terminal window
git clone https://github.com/FloTorch/floeval.git
cd floeval
pip install -e .

Run the following to confirm Floeval is installed correctly:

Terminal window
floeval --version

You should see output similar to floeval 0.1.0b1 (or your installed version). If the command is not found, ensure your Python environment is activated and Floeval is installed in that environment.


Floeval needs LLM credentials for metrics that call the model (answer_relevancy, faithfulness, LLM-as-judge, etc.). You provide credentials through a config file (command line) or a Python object (from code).

Copy the following into a file named config.yaml:

llm_config:
base_url: "https://api.openai.com/v1"
api_key: "your-api-key"
chat_model: gpt-4o-mini
embedding_model: text-embedding-3-small
system_prompt: "You are a helpful assistant." # optional

Use this in your Python code to build the config programmatically:

from floeval.config.schemas.io.llm import OpenAIProviderConfig
llm_config = OpenAIProviderConfig(
base_url="https://api.openai.com/v1",
api_key="your-api-key",
chat_model="gpt-4o-mini",
embedding_model="text-embedding-3-small",
)

Load credentials from environment variables or a secrets manager — Floeval only needs the final config object.


When your LLMs or agents run on the FloTorch gateway, use your workspace gateway URL and API key:

  1. Sign in to the FloTorch Console — see Onboarding
  2. Create an API key — go to Settings > API Keys — see API Keys
  3. Get your gateway URL — from the API Documentation page
  4. Set base_url to your gateway URL and api_key to your FloTorch API key
llm_config:
base_url: "https://gateway.flotorch.cloud/openai/v1"
api_key: "your-flotorch-api-key"
chat_model: flotorch/turbo
embedding_model: text-embedding-3-small

Confirm everything works with a quick test. You need a config file and a dataset file for the CLI, or you can use the Python example which builds the dataset inline.

Create a new file named config.yaml in your project folder. Replace your-api-key with your actual API key:

llm_config:
base_url: "https://api.openai.com/v1"
api_key: "your-api-key"
chat_model: gpt-4o-mini
embedding_model: text-embedding-3-small
evaluation_config:
metrics:
- ragas:answer_relevancy

Create a new file named dataset.json in the same folder. Copy the following (the dataset must have a "samples" array; each sample needs user_input and llm_response):

{
"samples": [
{
"user_input": "What is RAG?",
"llm_response": "RAG is Retrieval-Augmented Generation."
}
]
}

From the same folder where you created config.yaml and dataset.json, run:

Terminal window
floeval evaluate -c config.yaml -d dataset.json

If everything is configured correctly, you should see aggregate scores printed to the terminal (e.g. {'ragas:answer_relevancy': 0.85}). Add -o results.json to save the output to a file.

The code below builds the dataset inline, so you can validate your setup without creating any files. Copy and run this script:

from floeval import Evaluation, DatasetLoader
from floeval.config.schemas.io.llm import OpenAIProviderConfig
llm_config = OpenAIProviderConfig(
base_url="https://api.openai.com/v1",
api_key="your-api-key",
chat_model="gpt-4o-mini",
embedding_model="text-embedding-3-small",
)
dataset = DatasetLoader.from_samples(
[{"user_input": "What is RAG?", "llm_response": "RAG is Retrieval-Augmented Generation."}],
partial_dataset=False,
)
evaluation = Evaluation(
dataset=dataset,
llm_config=llm_config,
metrics=["ragas:answer_relevancy"],
)
results = evaluation.run()
print(results.aggregate_scores)