SDK: LLM

Overview

The LLM client provides a typed interface to FloTorch’s chat completion API with built-in logging and response parsing.

Exports:

FlotorchLLM: synchronous and asynchronous inference (invoke, ainvoke)
LLMResponse: structured result with content and metadata

Endpoint: /api/openai/v1/chat/completions

Setup

from flotorch.sdk.llm import FlotorchLLM

API_KEY = "<your_api_key>"
BASE_URL = "https://gateway.flotorch.cloud"
MODEL_ID = "<your_flotorch_model_id>"  # e.g., a model from FloTorch Model Registry

llm = FlotorchLLM(model_id=MODEL_ID, api_key=API_KEY, base_url=BASE_URL)

API

FlotorchLLM

FlotorchLLM(
  model_id: str,
  api_key: str,
  base_url: str,
)

Creates an LLM client bound to a model and endpoint.

invoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse

Sends a chat completion request. Arguments map to OpenAI-style parameters plus FloTorch extensions:

messages: list of {role, content} dicts
tools (optional): OpenAI tool definitions
response_format (optional): supports JSON schema via convert_pydantic_to_custom_json_schema
extra_body (optional): merged into extra_body field
**kwargs: additional parameters like temperature, max_tokens, top_p

Returns LLMResponse:

from typing import Dict, Any

resp = llm.invoke([
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Summarize FloTorch in one line."},
], temperature=0.3)

print(resp.content)
print(resp.metadata["totalTokens"])  # includes prompt/completion/total

ainvoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse

Asynchronous version of invoke.

import asyncio

async def main():
    r = await llm.ainvoke([
        {"role": "user", "content": "A haiku about the sea."}
    ])
    print(r.content)

asyncio.run(main())

Response shape

LLMResponse contains:

content: str – final text content (empty if the model returned only tool_calls)
metadata: Dict[str, Any] – includes inputTokens, outputTokens, totalTokens, and raw API response under raw_response