Tracing OpenAI

OpenAI Tracing via autolog

MLflow Tracing provides automatic tracing capability for OpenAI. By enabling auto tracing for OpenAI by calling the mlflow.openai.autolog() function, MLflow will capture traces for LLM invocation and log them to the active MLflow Experiment.

import mlflow

mlflow.openai.autolog()

MLflow trace automatically captures the following information about OpenAI calls:

Prompts and completion responses
Latencies
Model name
Additional metadata such as temperature, max_tokens, if specified.
Function calling if returned in the response
Built-in tools such as web search, file search, computer use, etc.
Any exception if raised

tip

MLflow OpenAI integration is not only about tracing. MLflow offers full tracking experience for OpenAI, including model tracking, prompt management, and evaluation. Please checkout the MLflow OpenAI Flavor to learn more!

Supported APIs

MLflow supports automatic tracing for the following OpenAI APIs. To request support for additional APIs, please open a feature request on GitHub.

Chat Completion API

Normal	Function Calling	Structured Outputs	Streaming	Async	Image	Audio
✅	✅	✅(>=2.21.0)	✅ (>=2.15.0)	✅(>=2.21.0)	-	-

Responses API

Normal	Function Calling	Structured Outputs	Web Search	File Search	Computer Use	Reasoning	Streaming	Async	Image
✅	✅	✅	✅	✅	✅	✅	✅	✅	-

Responses API is supported since MLflow 2.22.0.

Agents SDK

See OpenAI Agents SDK Tracing for more details.

Embedding API

Normal	Async
✅	✅

Basic Example

Chat Completion API
Responses API

import openai
import mlflow

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("OpenAI")

openai_client = openai.OpenAI()

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?",
    }
]

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.1,
    max_tokens=100,
)

import openai
import mlflow

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("OpenAI")

openai_client = openai.OpenAI()

response = client.responses.create(
    model="gpt-4o-mini", input="What is the capital of France?"
)

Streaming

MLflow Tracing supports streaming API of the OpenAI SDK. With the same set up of auto tracing, MLflow automatically traces the streaming response and render the concatenated output in the span UI. The actual chunks in the response stream can be found in the Event tab as well.

Chat Completion API
Responses API

import openai
import mlflow

# Enable trace logging
mlflow.openai.autolog()

client = openai.OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "How fast would a glass of water freeze on Titan?"}
    ],
    stream=True,  # Enable streaming response
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

import openai
import mlflow

# Enable trace logging
mlflow.openai.autolog()

client = openai.OpenAI()

stream = client.responses.create(
    model="gpt-4o-mini",
    input="How fast would a glass of water freeze on Titan?",
    stream=True,  # Enable streaming response
)
for event in stream:
    print(event)

Async

MLflow Tracing supports asynchronous API of the OpenAI SDK since MLflow 2.21.0. The usage is same as the synchronous API.

Chat Completion API
Responses API

import openai

# Enable trace logging
mlflow.openai.autolog()

client = openai.AsyncOpenAI()

response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "How fast would a glass of water freeze on Titan?"}
    ],
    # Async streaming is also supported
    # stream=True
)

import openai

# Enable trace logging
mlflow.openai.autolog()

client = openai.AsyncOpenAI()

response = await client.responses.create(
    model="gpt-4o-mini", input="How fast would a glass of water freeze on Titan?"
)

Function Calling

MLflow Tracing automatically captures function calling response from OpenAI models. The function instruction in the response will be highlighted in the trace UI. Moreover, you can annotate the tool function with the @mlflow.trace decorator to create a span for the tool execution.

OpenAI Function Calling Trace

The following example implements a simple function calling agent using OpenAI Function Calling and MLflow Tracing for OpenAI.

Chat Completion API
Responses API

import json
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType

client = OpenAI()


# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
@mlflow.trace(span_type=SpanType.TOOL)
def get_weather(city: str) -> str:
    if city == "Tokyo":
        return "sunny"
    elif city == "Paris":
        return "rainy"
    return "unknown"


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
            },
        },
    }
]

_tool_functions = {"get_weather": get_weather}


# Define a simple tool calling agent
@mlflow.trace(span_type=SpanType.AGENT)
def run_tool_agent(question: str):
    messages = [{"role": "user", "content": question}]

    # Invoke the model with the given question and available tools
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tools=tools,
    )
    ai_msg = response.choices[0].message
    messages.append(ai_msg)

    # If the model request tool call(s), invoke the function with the specified arguments
    if tool_calls := ai_msg.tool_calls:
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            if tool_func := _tool_functions.get(function_name):
                args = json.loads(tool_call.function.arguments)
                tool_result = tool_func(**args)
            else:
                raise RuntimeError("An invalid tool is returned from the assistant!")

            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": tool_result,
                }
            )

        # Sent the tool results to the model and get a new response
        response = client.chat.completions.create(
            model="gpt-4o-mini", messages=messages
        )

    return response.choices[0].message.content


# Run the tool calling agent
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)

import json
import requests
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType

client = OpenAI()


# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
@mlflow.trace(span_type=SpanType.TOOL)
def get_weather(latitude, longitude):
    response = requests.get(
        f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m"
    )
    data = response.json()
    return data["current"]["temperature_2m"]


tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current temperature for provided coordinates in celsius.",
        "parameters": {
            "type": "object",
            "properties": {
                "latitude": {"type": "number"},
                "longitude": {"type": "number"},
            },
            "required": ["latitude", "longitude"],
            "additionalProperties": False,
        },
        "strict": True,
    }
]


# Define a simple tool calling agent
@mlflow.trace(span_type=SpanType.AGENT)
def run_tool_agent(question: str):
    messages = [{"role": "user", "content": question}]

    # Invoke the model with the given question and available tools
    response = client.responses.create(
        model="gpt-4o-mini",
        input=question,
        tools=tools,
    )

    # Invoke the function with the specified arguments
    tool_call = response.output[0]
    args = json.loads(tool_call.arguments)
    result = get_weather(args["latitude"], args["longitude"])

    # Sent the tool results to the model and get a new response
    messages.append(tool_call)
    messages.append(
        {
            "type": "function_call_output",
            "call_id": tool_call.call_id,
            "output": str(result),
        }
    )

    response = client.responses.create(
        model="gpt-4o-mini",
        input=input_messages,
        tools=tools,
    )

    return response.output[0].content[0].text


# Run the tool calling agent
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)

Token usage

MLflow >= 3.1.0 supports token usage tracking for OpenAI. The token usage for each LLM call will be logged in the mlflow.chat.tokenUsage attribute. The total token usage throughout the trace will be available in the token_usage field of the trace info object.

import json
import mlflow

mlflow.openai.autolog()

# Run the tool calling agent defined in the previous section
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)

# Get the trace object just created
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f"  Input tokens: {total_usage['input_tokens']}")
print(f"  Output tokens: {total_usage['output_tokens']}")
print(f"  Total tokens: {total_usage['total_tokens']}")

# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
    if usage := span.get_attribute("mlflow.chat.tokenUsage"):
        print(f"{span.name}:")
        print(f"  Input tokens: {usage['input_tokens']}")
        print(f"  Output tokens: {usage['output_tokens']}")
        print(f"  Total tokens: {usage['total_tokens']}")

== Total token usage: ==
  Input tokens: 84
  Output tokens: 22
  Total tokens: 106

== Detailed usage for each LLM call: ==
Completions_1:
  Input tokens: 45
  Output tokens: 14
  Total tokens: 59
Completions_2:
  Input tokens: 39
  Output tokens: 8
  Total tokens: 47

Supported APIs:

Token usage tracking is supported for the following OpenAI APIs:

Mode	Chat Completion	Responses
Normal	✅	✅
Streaming	✅(*1)	✅
Async	✅	✅

(*1) By default, OpenAI does not return token usage information for Chat Completion API when streaming. To track token usage, you need to specify stream_options={"include_usage": True} in the request (OpenAI API Reference).

Disable auto-tracing

Auto tracing for OpenAI can be disabled globally by calling mlflow.openai.autolog(disable=True) or mlflow.autolog(disable=True).

Supported APIs​

Chat Completion API​

Responses API​

Agents SDK​

Embedding API​

Basic Example​

Streaming​

Async​

Function Calling​

Token usage​

Supported APIs:​

Disable auto-tracing​