Skip to content

SDK: LLM

The LLM client provides a typed interface to FloTorch’s chat completion API with built-in logging and response parsing.

Exports:

  • FlotorchLLM: synchronous and asynchronous inference (invoke, ainvoke)
  • LLMResponse: structured result with content and metadata

Endpoint: /api/openai/v1/chat/completions


from flotorch.sdk.llm import FlotorchLLM
API_KEY = "<your_api_key>"
BASE_URL = "https://gateway.flotorch.cloud"
MODEL_ID = "<your_flotorch_model_id>" # e.g., a model from FloTorch Model Registry
llm = FlotorchLLM(model_id=MODEL_ID, api_key=API_KEY, base_url=BASE_URL)

FlotorchLLM(
model_id: str,
api_key: str,
base_url: str,
)

Creates an LLM client bound to a model and endpoint.

invoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse

Section titled “invoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse”

Sends a chat completion request. Arguments map to OpenAI-style parameters plus FloTorch extensions:

  • messages: list of {role, content} dicts
  • tools (optional): OpenAI tool definitions
  • response_format (optional): supports JSON schema via convert_pydantic_to_custom_json_schema
  • extra_body (optional): merged into extra_body field
  • **kwargs: additional parameters like temperature, max_tokens, top_p

Returns LLMResponse:

from typing import Dict, Any
resp = llm.invoke([
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Summarize FloTorch in one line."},
], temperature=0.3)
print(resp.content)
print(resp.metadata["totalTokens"]) # includes prompt/completion/total

ainvoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse

Section titled “ainvoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse”

Asynchronous version of invoke.

import asyncio
async def main():
r = await llm.ainvoke([
{"role": "user", "content": "A haiku about the sea."}
])
print(r.content)
asyncio.run(main())

LLMResponse contains:

  • content: str – final text content (empty if the model returned only tool_calls)
  • metadata: Dict[str, Any] – includes inputTokens, outputTokens, totalTokens, and raw API response under raw_response