Skip to main content

MLflow Tracing for LLM Observability

MLflow Tracing is a feature that enhances LLM observability in your Generative AI (GenAI) applications by capturing detailed information about the execution of your application's services. Tracing provides a way to record the inputs, outputs, and metadata associated with each intermediate step of a request, enabling you to easily pinpoint the source of bugs and unexpected behaviors.

MLflow Tracing offers automatic, no-code-added integrations with over 20 popular GenAI libraries, providing immediate observability with just a single line of code. For any other Python-based GenAI code or custom components, MLflow's flexible instrumentation APIs can be used to capture detailed traces, regardless of the specific frameworks in use.

Ready to get started? See how to Instrument your app.

Tracing Gateway Video

Why Choose MLflow Tracing?​

Why Choose MLflow?

πŸͺ½ Free and Open - MLflow is open source and 100% FREE. You don't need to pay additional SaaS costs to add observability to your GenAI stack. Your trace data is hosted on your own infrastructure.

πŸ₯‡ Standard - MLflow Tracing is compatible with OpenTelemetry, an industry-standard observability spec. You can export your trace data to various services in your existing observability stack, such as Grafana, Prometheus, Datadog, New Relic, and more.

🀝 Framework Support - MLflow Tracing integrates with 20+ GenAI libraries, including OpenAI, LangChain, LlamaIndex, DSPy, and others. See the Automatic Tracing section for the full list of supported libraries.

πŸ”„ End-to-End - MLflow is designed for managing the end-to-end machine learning lifecycle. With its model tracking and evaluation capabilities, MLflow empowers you to leverage your trace data fully.

πŸ‘₯ Community - MLflow boasts a vibrant Open Source community as a part of the Linux Foundation. With 19,000+ GitHub Stars and 15MM+ monthly downloads, MLflow is a trusted standard in the MLOps/LLMOps ecosystem.

Use Cases Throughout the ML Lifecycle​

MLflow Tracing empowers you throughout the end-to-end lifecycle of a machine learning project. Here's how it helps you at each step of the workflow:

Complete Debugging Experience in Your IDE or Notebook​

MLflow's tracing capabilities provide deep insights into what happens beneath the abstractions of GenAI libraries, helping you precisely identify where issues occur.

You can navigate traces seamlessly within your preferred IDE, notebook, or within the MLflow UI eliminating the hassle of switching between multiple tabs or searching through an overwhelming list of traces.

Trace Error

Monitor Performance and Optimize Costs​

Understanding and optimizing the performance of your GenAI applications is crucial for maintaining efficient operations. MLflow Tracing enables you to capture and monitor key operational metrics such as latency and execution timing at each step of your application's execution.

This comprehensive monitoring capability allows you to track and identify performance bottlenecks within complex pipelines, monitor execution efficiency to ensure optimal operation, and identify areas for performance improvement in your code or model interactions. By understanding where time is spent in your application flow, you can make informed decisions about optimization strategies.

Evaluate and Enhance Application Quality​

Systematically assessing and improving the quality of your GenAI applications is a core challenge. MLflow Tracing helps by allowing you to attach and track user feedback and the results of quality evaluations (from LLM judges or custom metrics) directly to your traces.

This enables comprehensive quality assessment throughout your application's lifecycle:

Evaluate traces using human reviewers or LLM judges to measure accuracy, relevance, and other quality aspects. Track quality improvements as you iterate on prompts, models, or retrieval strategies while identifying patterns in quality issues (e.g., specific types of queries that lead to poor responses). This data-driven approach enables you to make systematic improvements to your application.

Traces from both evaluation runs and production monitoring can be explored to identify root causes of quality issuesβ€”for instance, insufficiently retrieved documents in a RAG system or degraded performance of a specific model. Traces empower you to analyze these issues in detail and iterate quickly.

Moreover, traces are invaluable for building high-quality evaluation datasets. By capturing real user interactions and their outcomes, you can curate representative test cases based on actual usage patterns, build comprehensive evaluation sets that cover diverse scenarios, and use this data to fine-tune models or improve retrieval mechanisms.

When combined with MLflow GenAI Evaluation, MLflow offers a seamless experience for assessing and improving your application's quality.

Industry-Standard Observability with OpenTelemetry​

MLflow Tracing is built on OpenTelemetry, the industry-standard open source specification for observability. This foundation ensures that your tracing implementation follows widely accepted standards and provides interoperability with the broader observability ecosystem.

The OpenTelemetry compatibility means you can export your MLflow trace data to various monitoring and observability services in your existing infrastructure stack, including Grafana, Prometheus, Datadog, New Relic, and other OpenTelemetry-compatible systems. This standardization gives you the flexibility to integrate MLflow Tracing into your current monitoring workflows without vendor lock-in.

By conforming to OpenTelemetry standards, MLflow Tracing ensures that your observability investment is portable and future-proof, allowing you to leverage the growing ecosystem of OpenTelemetry-compatible tools and services. See Export Traces to Other Services for detailed integration instructions.

Automatic Tracing​

MLflow Tracing is integrated with various GenAI libraries and provides one-line automatic tracing experience for each library (and combinations of them!). Here are some of the 20+ supported frameworks:

Popular Frameworks: OpenAI, LangChain, LangGraph, LlamaIndex, DSPy, Anthropic, AutoGen, AG2, CrewAI, OpenAI Swarm

Model Providers: Bedrock, Gemini, LiteLLM, Ollama, Groq, Mistral, DeepSeek

Specialized Tools: Instructor, txtai, Smolagents, PydanticAI

For the complete list and detailed integration examples, see the Automatic Tracing documentation and Integrations page.

This broad support means you can gain observability without significant code changes, leveraging the tools you already use. For custom components or unsupported libraries, MLflow also provides powerful manual tracing APIs.

Manual Tracing​

In addition to the one-line auto tracing experience, MLflow offers Python SDK for manually instrumenting your code and manipulating traces:

Refer to the Manual Tracing Guide for complete details about the SDK.

Reviewing and Querying Traces​

MLflow Traces can be reviewed and analyzed in several ways:

MLflow UI: The MLflow UI provides a rich interface for exploring traces. You can view traces for a specific experiment, run, and search and filter traces based on various criteria. Start the UI by running mlflow ui in your terminal and navigating to http://localhost:5000.

Jupyter Notebook: The trace UI is also available within Jupyter notebooks! The trace UI will automatically be displayed when a cell generates a trace, eliminating the need to switch between the notebook and web browser.

Programmatic Access: Query and analyze traces with Python APIs to search traces with filters and conditions using mlflow.search_traces(), retrieve specific traces for detailed analysis, export data for custom analysis or integration with other tools, and build custom dashboards and monitoring solutions.

Trace data is useful for various downstream tasks, such as creating evaluation datasets for offline evaluation and production monitoring. MLflow provides several APIs to search and retrieve recorded traces programmatically. See Searching and Retrieving Traces for more details.

Production Monitoring​

MLflow Tracing is production ready and provides comprehensive monitoring capabilities for your GenAI applications in production environments. The tracing system captures detailed execution information that can be integrated with your existing observability stack through OpenTelemetry standards.

For production deployments, consider using the Lightweight Tracing SDK (mlflow-tracing) that is optimized for reducing the total installation size and minimizing dependencies while maintaining full tracing capabilities.

Read Production Tracing for complete guidance on using MLflow Tracing for monitoring models in production and various backend configuration options.

Getting Started​

MLflow Version Recommendation

While tracing features are available in MLflow 2.15.0+, it is strongly recommended to install MLflow 3 for the latest GenAI capabilities, including expanded tracing features and robust support. For production environments, consider the lightweight mlflow-tracing package if you only need tracing functionality.

MLflow 3 provides enhanced support for production-grade tracing, including advanced feedback and labeling functionalities crucial for managing GenAI applications.

Installation​

Install MLflow with the following command:

pip install --upgrade "mlflow"

For production environments focused solely on tracing:

pip install mlflow-tracing

Start Tracing in Minutes​

Getting started with MLflow Tracing is remarkably simple. For many popular GenAI libraries, you can enable comprehensive tracing with just a single line of code:

import mlflow
import openai

# Enable automatic tracing - that's it!
mlflow.openai.autolog()

# Your existing code works unchanged and is now fully traced
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain MLflow Tracing"}],
max_tokens=100,
)

MLflow Tracing captures inputs, outputs, latencies, and metadata automatically. Start the MLflow UI with mlflow ui to view your traces at http://localhost:5000.

Next Steps for Your Application​

Start by instrumenting your application to capture traces with the App Instrumentation Guide for a complete guide to adding tracing to your application. Use Automatic Tracing for one-line tracing with 20+ supported libraries, or Manual Tracing for custom instrumentation with any Python code.

Visit the Instrument your app guide to learn how to integrate MLflow Tracing into your GenAI projects. This will enable you to start logging detailed execution data, which is the foundation for debugging, performance analysis, and quality evaluation.

note

MLflow Tracing support is available with MLflow 2.14.0+, but we strongly recommend MLflow 3 for the latest features and enhanced production support.