SDK: LLM
Overview
Section titled “Overview”The LLM client provides a typed interface to FloTorch’s chat completion API with built-in logging and response parsing.
Exports:
FlotorchLLM
: synchronous and asynchronous inference (invoke
,ainvoke
)LLMResponse
: structured result withcontent
andmetadata
Endpoint: /api/openai/v1/chat/completions
from flotorch.sdk.llm import FlotorchLLM
API_KEY = "<your_api_key>"BASE_URL = "https://gateway.flotorch.cloud"MODEL_ID = "<your_flotorch_model_id>" # e.g., a model from FloTorch Model Registry
llm = FlotorchLLM(model_id=MODEL_ID, api_key=API_KEY, base_url=BASE_URL)
FlotorchLLM
Section titled “FlotorchLLM”FlotorchLLM( model_id: str, api_key: str, base_url: str,)
Creates an LLM client bound to a model and endpoint.
invoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse
Section titled “invoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse”Sends a chat completion request. Arguments map to OpenAI-style parameters plus FloTorch extensions:
messages
: list of{role, content}
dictstools
(optional): OpenAI tool definitionsresponse_format
(optional): supports JSON schema viaconvert_pydantic_to_custom_json_schema
extra_body
(optional): merged intoextra_body
field**kwargs
: additional parameters liketemperature
,max_tokens
,top_p
Returns LLMResponse
:
from typing import Dict, Any
resp = llm.invoke([ {"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Summarize FloTorch in one line."},], temperature=0.3)
print(resp.content)print(resp.metadata["totalTokens"]) # includes prompt/completion/total
ainvoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse
Section titled “ainvoke(messages, tools=None, response_format=None, extra_body=None, **kwargs) -> LLMResponse”Asynchronous version of invoke
.
import asyncio
async def main(): r = await llm.ainvoke([ {"role": "user", "content": "A haiku about the sea."} ]) print(r.content)
asyncio.run(main())
Response shape
Section titled “Response shape”LLMResponse
contains:
content: str
– final text content (empty if the model returned onlytool_calls
)metadata: Dict[str, Any]
– includesinputTokens
,outputTokens
,totalTokens
, and raw API response underraw_response