Tracing FAQ
Getting Started and Basic Usage
Q: How do I start using MLflow Tracing?
The easiest way to start is with automatic tracing for supported libraries:
import mlflow
import openai
# Enable automatic tracing for OpenAI
mlflow.openai.autolog()
# Your existing code now generates traces automatically
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello!"}]
)
For custom code, use the @mlflow.trace
decorator:
@mlflow.trace
def my_function(input_data):
# Your logic here
return "processed result"
Q: Which libraries does MLflow Tracing support automatically?
MLflow provides automatic tracing (autolog) for 20+ popular libraries. See the complete list at Automatic Tracing Integrations.
Q: Can I add tracing to custom libraries or unsupported frameworks?
Yes, you can add tracing to any Python code using manual tracing APIs:
Decorators for functions:
@mlflow.trace(name="Custom Operation")
def my_custom_function():
return "result"
Context managers for code blocks:
with mlflow.start_span(name="Custom Processing") as span:
result = custom_processing()
span.set_outputs({"result": result})
Function wrapping for third-party libraries:
traced_function = mlflow.trace(third_party_function)
User Interface and Jupyter Integration
Q: Can I view traces directly in Jupyter notebooks?
Yes! Jupyter integration is available in MLflow 2.20 and above. The trace UI automatically displays within notebooks when:
- Cell code generates a trace
- You call
mlflow.search_traces()
- You display a trace object
import mlflow
# Set tracking URI to your MLflow server
mlflow.set_tracking_uri("http://localhost:5000")
@mlflow.trace
def my_function():
return "Hello World"
# Trace UI will appear automatically in the notebook
my_function()
To control the display:
# Disable notebook display
mlflow.tracing.disable_notebook_display()
# Enable notebook display
mlflow.tracing.enable_notebook_display()
Q: How can I customize the request and response previews in the UI?
You can customize what appears in the Request and Response columns of the trace list using mlflow.update_current_trace()
:
@mlflow.trace
def predict(messages: list[dict]) -> str:
# Customize the request preview for long message histories
custom_preview = f'{messages[0]["content"][:10]} ... {messages[-1]["content"][:10]}'
mlflow.update_current_trace(request_preview=custom_preview)
# Your model logic here
result = process_messages(messages)
# Customize response preview
mlflow.update_current_trace(response_preview=f"Result: {result[:50]}...")
return result
Tags and Metadata
Q: How do I add tags to traces?
There are several ways to add tags depending on when you want to set them:
During execution (for active traces):
@mlflow.trace
def my_func(x):
mlflow.update_current_trace(tags={"user_id": "123", "session": "abc"})
return x + 1
After completion (for finished traces):
from mlflow.client import MlflowClient
client = MlflowClient()
client.set_trace_tag(trace_id="trace_id", key="tag_key", value="tag_value")
Via the MLflow UI: Click the pencil icon next to tags in the trace view to edit them directly.
Q: What are the standard MLflow tags I should use?
MLflow provides standard tags for common use cases:
mlflow.trace.user
: Associate traces with specific usersmlflow.trace.session
: Group traces from multi-turn conversations
mlflow.update_current_trace(
tags={
"mlflow.trace.user": "user-123",
"mlflow.trace.session": "session-abc-456",
"environment": "production",
"app_version": "1.0.0",
}
)
Production and Performance
Q: Can I use MLflow Tracing for production applications?
Yes, MLflow Tracing is stable and designed to be used in production environments.
When using MLflow Tracing in production environments, we recommend using the MLflow Tracing SDK (mlflow-tracing
) to instrument your code/models/agents with a minimal set of dependencies and a smaller installation footprint. The SDK is designed to be a perfect fit for production environments where you want an efficient and lightweight tracing solution. Please refer to the Production Monitoring section for more details.
Q: How do I enable asynchronous trace logging?
Asynchronous logging can significantly reduce performance overhead (about 80% for typical workloads):
import mlflow
# Enable async logging
mlflow.config.enable_async_logging()
# Traces will be logged asynchronously
with mlflow.start_span(name="foo") as span:
span.set_inputs({"a": 1})
span.set_outputs({"b": 2})
# Manually flush if needed
mlflow.flush_trace_async_logging()
Configuration options:
You can configure the detailed behavior of asynchronous logging using the following environment variables:
Environment Variable | Description | Default |
---|---|---|
MLFLOW_ASYNC_TRACE_LOGGING_MAX_WORKERS | Maximum worker threads | 10 |
MLFLOW_ASYNC_TRACE_LOGGING_MAX_QUEUE_SIZE | Maximum queued traces | 1000 |
MLFLOW_ASYNC_TRACE_LOGGING_RETRY_TIMEOUT | Retry timeout in seconds | 500 |
Troubleshooting
Q: I cannot open my trace in the MLflow UI. What should I do?
There are multiple possible reasons why a trace may not be viewable in the MLflow UI:
-
The trace is not completed yet: If the trace is still being collected, MLflow cannot display spans in the UI. Ensure that all spans are properly ended with either "OK" or "ERROR" status.
-
The browser cache is outdated: When you upgrade MLflow to a new version, the browser cache may contain outdated data and prevent the UI from displaying traces correctly. Clear your browser cache (Shift+F5) and refresh the page.
-
MLflow server connectivity: Ensure your MLflow tracking server is running and accessible:
mlflow ui --host 0.0.0.0 --port 5000
-
Experiment permissions: Verify you have access to the experiment containing the trace.
Q: The model execution gets stuck and my trace is "in progress" forever.
Sometimes a model or an agent gets stuck in a long-running operation or an infinite loop, causing the trace to be stuck in the "in progress" state.
To prevent this, you can set a timeout for the trace using the MLFLOW_TRACE_TIMEOUT_SECONDS
environment variable. If the trace exceeds the timeout, MLflow will automatically halt the trace with ERROR
status and export it to the backend, so that you can analyze the spans to identify the issue. By default, the timeout is not set.
The timeout only applies to MLflow trace. The main program, model, or agent, will continue to run even if the trace is halted.
For example, the following code sets the timeout to 5 seconds and simulates how MLflow handles a long-running operation:
import mlflow
import os
import time
# Set the timeout to 5 seconds for demonstration purposes
os.environ["MLFLOW_TRACE_TIMEOUT_SECONDS"] = "5"
# Simulate a long-running operation
@mlflow.trace
def long_running():
for _ in range(10):
child()
@mlflow.trace
def child():
time.sleep(1)
long_running()
MLflow monitors the trace execution time and expiration in a background thread. By default, this check is performed every second and resource consumption is negligible. If you want to adjust the interval, you can set the MLFLOW_TRACE_TIMEOUT_CHECK_INTERVAL_SECONDS
environment variable.
Q: My traces are not appearing in the MLflow UI. What could be wrong?
Several issues could prevent traces from appearing:
Tracking URI not set: Ensure your tracking URI is configured:
import mlflow
mlflow.set_tracking_uri("http://localhost:5000") # or your server URL
Experiment not set: Make sure you're logging to the correct experiment:
mlflow.set_experiment("my-tracing-experiment")
Autolog not called: For supported libraries, ensure autolog is called before usage:
mlflow.openai.autolog() # Call before using OpenAI
Multi-threading and Concurrency
Q: My trace is split into multiple traces when doing multi-threading. How can I combine them into a single trace?
As MLflow Tracing depends on Python ContextVar, each thread has its own trace context by default, but it is possible to generate a single trace for multi-threaded applications with a few additional steps.
Here's a quick example:
import contextvars
import mlflow
from concurrent.futures import ThreadPoolExecutor
@mlflow.trace
def worker_function(data):
# Worker logic here
return process_data(data)
@mlflow.trace
def main_function(data_list):
with ThreadPoolExecutor() as executor:
futures = []
for data in data_list:
# Copy context to worker thread
ctx = contextvars.copy_context()
futures.append(executor.submit(ctx.run, worker_function, data))
results = [future.result() for future in futures]
return results
Q: Does MLflow Tracing work with async/await code?
Yes, MLflow Tracing supports async functions. The @mlflow.trace
decorator works seamlessly with async functions:
import asyncio
import mlflow
@mlflow.trace
async def async_function(query: str):
# Async operations are traced normally
result = await some_async_operation(query)
return result
# Usage
asyncio.run(async_function("test query"))
Configuration and Control
Q: How do I temporarily disable tracing?
To disable tracing, mlflow.tracing.disable()
API will cease the collection of trace data from within MLflow and will not log any data to the MLflow Tracking service regarding traces.
To enable tracing (if it had been temporarily disabled), mlflow.tracing.enable()
API will re-enable tracing functionality for instrumented models that are invoked.
import mlflow
# Disable tracing
mlflow.tracing.disable()
# Your traced functions won't generate traces
@mlflow.trace
def my_function():
return "No trace generated"
my_function()
# Re-enable tracing
mlflow.tracing.enable()
# Now traces will be generated again
my_function() # This will generate a trace
Q: Can I enable/disable tracing for my application without modifying code?
Yes, you can use environment variables and global configuration:
Environment variable: Set MLFLOW_TRACING_ENABLED=false
to disable all tracing:
export MLFLOW_TRACING_ENABLED=false
python your_app.py # No traces will be generated
Conditional tracing: Use programmatic control:
import mlflow
import os
# Only trace in development
if os.getenv("ENVIRONMENT") == "development":
mlflow.openai.autolog()
MLflow Runs Integration
Q: How do I associate traces with MLflow runs?
If a trace is generated within a run context, it will automatically be associated with that run:
import mlflow
# Create and activate an experiment
mlflow.set_experiment("Run Associated Tracing")
# Start a new MLflow Run
with mlflow.start_run() as run:
# Traces created here are associated with the run
with mlflow.start_span(name="Run Span") as parent_span:
parent_span.set_inputs({"input": "a"})
parent_span.set_outputs({"response": "b"})
You can then retrieve traces for a specific run:
# Retrieve traces associated with a specific Run
traces = mlflow.search_traces(run_id=run.info.run_id)
print(traces)
Data Management
Q: How do I delete traces?
You can delete traces using the mlflow.client.MlflowClient.delete_traces()
method:
from mlflow.client import MlflowClient
import time
client = MlflowClient()
# Get the current timestamp in milliseconds
current_time = int(time.time() * 1000)
# Delete traces older than a specific timestamp
deleted_count = client.delete_traces(
experiment_id="1", max_timestamp_millis=current_time, max_traces=10
)
Deleting a trace is an irreversible process. Ensure that the settings provided within the delete_traces
API meet the intended range for deletion.
Read more about trace deletion.
Q: Where are my traces stored?
Traces are stored in your MLflow tracking backend:
Local filesystem: When using mlflow ui
locally, traces are stored in the mlruns
directory
Remote tracking server: When using a remote MLflow server, traces are stored in the configured backend (database + artifact store)
Database: Trace metadata is stored in the MLflow tracking database
Artifact store: Large trace data may be stored in the artifact store (filesystem, S3, etc.)
Integration and Compatibility
Q: Is MLflow Tracing compatible with other observability tools?
Yes, MLflow Tracing is built on OpenTelemetry standards and can integrate with other observability tools:
OpenTelemetry export: Export traces to OTLP-compatible systems
Custom exporters: Build custom integrations for your observability stack
Standard formats: Use industry-standard trace formats for interoperability
For production monitoring, see Production Tracing for integration patterns.
Q: Can I create custom manual traces and spans?
Yes, MLflow provides comprehensive manual tracing capabilities. Please refer to the Manual Tracing guide for detailed information on creating traces and spans manually using decorators, context managers, and low-level APIs.
Getting Help
Q: Where can I find more help or report issues?
Documentation: Start with the MLflow Tracing documentation
GitHub Issues: Report bugs or request features at MLflow GitHub
Community: Join discussions in the MLflow Slack community
Stack Overflow: Search or ask questions tagged with mlflow
Databricks Support: For managed MLflow features, contact Databricks support
For additional questions or issues not covered here, please check the MLflow documentation or reach out to the community.