MLflow Tracing for LLM Observability
MLflow Tracing is a feature that enhances LLM observability in your Generative AI (GenAI) applications by capturing detailed information about the execution of your application's services. Tracing provides a way to record the inputs, outputs, and metadata associated with each intermediate step of a request, enabling you to easily pinpoint the source of bugs and unexpected behaviors.
MLflow Tracing offers automatic, no-code-added integrations with over 20 popular GenAI libraries, providing immediate observability with just a single line of code. For any other Python-based GenAI code or custom components, MLflow's flexible instrumentation APIs can be used to capture detailed traces, regardless of the specific frameworks in use.
Ready to get started? See how to Instrument your app.
Why Choose MLflow Tracing?β
Why Choose MLflow?
πͺ½ Free and Open - MLflow is open source and 100% FREE. You don't need to pay additional SaaS costs to add observability to your GenAI stack. Your trace data is hosted on your own infrastructure.
π₯ Standard - MLflow Tracing is compatible with OpenTelemetry, an industry-standard observability spec. You can export your trace data to various services in your existing observability stack, such as Grafana, Prometheus, Datadog, New Relic, and more.
π€ Framework Support - MLflow Tracing integrates with 20+ GenAI libraries, including OpenAI, LangChain, LlamaIndex, DSPy, and others. See the Automatic Tracing section for the full list of supported libraries.
π End-to-End - MLflow is designed for managing the end-to-end machine learning lifecycle. With its model tracking and evaluation capabilities, MLflow empowers you to leverage your trace data fully.
π₯ Community - MLflow boasts a vibrant Open Source community as a part of the Linux Foundation. With 19,000+ GitHub Stars and 15MM+ monthly downloads, MLflow is a trusted standard in the MLOps/LLMOps ecosystem.
Use Cases Throughout the ML Lifecycleβ
MLflow Tracing empowers you throughout the end-to-end lifecycle of a machine learning project. Here's how it helps you at each step of the workflow:
- Debugging
- Evaluation Dataset
- Inspect Quality
- Production Monitoring
Complete Debugging Experience in Your IDE or Notebookβ
MLflow's tracing capabilities provide deep insights into what happens beneath the abstractions of GenAI libraries, helping you precisely identify where issues occur.
You can navigate traces seamlessly within your preferred IDE, notebook, or within the MLflow UI eliminating the hassle of switching between multiple tabs or searching through an overwhelming list of traces.
Building a High-Quality Evaluation Datasetβ
Evaluating the performance of your GenAI application is crucial, but creating a reliable evaluation dataset can be challenging. Traces serve as a rich data source, helping you build high-quality datasets with precise metrics for internal components like retrievers and tools.
When combined with MLflow LLM Evaluation, MLflow offers a seamless experience for assessing and improving your applicationβs performance.
Root Cause Analysis for Improved Qualityβ
After evaluating your model using MLflow LLM Evaluation, you can explore auto-generated traces during the evaluation run to identify root causes of quality issues β for instance, insufficiently retrieved documents.
Traces empower you to analyze issues in detail and iterate quickly to enhance the quality of your application.
Monitor Applications with Your Favorite Observability Stackβ
Machine learning projects donβt end with the first launch. Continuous monitoring and incremental improvement are critical to long-term success.
Integrated with various observability platforms such as Databricks, Datadog, Grafana, and Prometheus, MLflow Tracing provides a comprehensive solution for monitoring your GenAI applications in production. Refer to Monitoring GenAI Application in Production for more details.
Monitor Performance and Optimize Costsβ
Understanding and optimizing the performance of your GenAI applications is crucial for maintaining efficient operations. MLflow Tracing enables you to capture and monitor key operational metrics such as latency and execution timing at each step of your application's execution.
This comprehensive monitoring capability allows you to track and identify performance bottlenecks within complex pipelines, monitor execution efficiency to ensure optimal operation, and identify areas for performance improvement in your code or model interactions. By understanding where time is spent in your application flow, you can make informed decisions about optimization strategies.
Evaluate and Enhance Application Qualityβ
Systematically assessing and improving the quality of your GenAI applications is a core challenge. MLflow Tracing helps by allowing you to attach and track user feedback and the results of quality evaluations (from LLM judges or custom metrics) directly to your traces.
This enables comprehensive quality assessment throughout your application's lifecycle:
- During Development
- In Production
Evaluate traces using human reviewers or LLM judges to measure accuracy, relevance, and other quality aspects. Track quality improvements as you iterate on prompts, models, or retrieval strategies while identifying patterns in quality issues (e.g., specific types of queries that lead to poor responses). This data-driven approach enables you to make systematic improvements to your application.
Monitor and assess quality in real-time by tracking quality metrics (derived from user feedback and evaluation results) across deployments. Identify sudden quality degradation or regressions, trigger alerts for critical quality issues, and help maintain quality Service Level Agreements (SLAs).
Traces from both evaluation runs and production monitoring can be explored to identify root causes of quality issuesβfor instance, insufficiently retrieved documents in a RAG system or degraded performance of a specific model. Traces empower you to analyze these issues in detail and iterate quickly.
Moreover, traces are invaluable for building high-quality evaluation datasets. By capturing real user interactions and their outcomes, you can curate representative test cases based on actual usage patterns, build comprehensive evaluation sets that cover diverse scenarios, and use this data to fine-tune models or improve retrieval mechanisms.
When combined with MLflow GenAI Evaluation, MLflow offers a seamless experience for assessing and improving your application's quality.
Industry-Standard Observability with OpenTelemetryβ
MLflow Tracing is built on OpenTelemetry, the industry-standard open source specification for observability. This foundation ensures that your tracing implementation follows widely accepted standards and provides interoperability with the broader observability ecosystem.
The OpenTelemetry compatibility means you can export your MLflow trace data to various monitoring and observability services in your existing infrastructure stack, including Grafana, Prometheus, Datadog, New Relic, and other OpenTelemetry-compatible systems. This standardization gives you the flexibility to integrate MLflow Tracing into your current monitoring workflows without vendor lock-in.
By conforming to OpenTelemetry standards, MLflow Tracing ensures that your observability investment is portable and future-proof, allowing you to leverage the growing ecosystem of OpenTelemetry-compatible tools and services. See Export Traces to Other Services for detailed integration instructions.
Automatic Tracingβ
MLflow Tracing is integrated with various GenAI libraries and provides one-line automatic tracing experience for each library (and combinations of them!). Here are some of the 20+ supported frameworks:
Popular Frameworks: OpenAI, LangChain, LangGraph, LlamaIndex, DSPy, Anthropic, AutoGen, AG2, CrewAI, OpenAI Swarm
Model Providers: Bedrock, Gemini, LiteLLM, Ollama, Groq, Mistral, DeepSeek
Specialized Tools: Instructor, txtai, Smolagents, PydanticAI
For the complete list and detailed integration examples, see the Automatic Tracing documentation and Integrations page.
This broad support means you can gain observability without significant code changes, leveraging the tools you already use. For custom components or unsupported libraries, MLflow also provides powerful manual tracing APIs.
Manual Tracingβ
In addition to the one-line auto tracing experience, MLflow offers Python SDK for manually instrumenting your code and manipulating traces:
- Instrument a function with
@mlflow.trace
decorator - Instrument any block of code using
mlflow.start_span
context manager - Group or annotate traces using tags
- Disable tracing globally
Refer to the Manual Tracing Guide for complete details about the SDK.
Reviewing and Querying Tracesβ
MLflow Traces can be reviewed and analyzed in several ways:
MLflow UI: The MLflow UI provides a rich interface for exploring traces. You can view traces for a specific experiment, run, and search and filter traces based on various criteria. Start the UI by running mlflow ui
in your terminal and navigating to http://localhost:5000
.
Jupyter Notebook: The trace UI is also available within Jupyter notebooks! The trace UI will automatically be displayed when a cell generates a trace, eliminating the need to switch between the notebook and web browser.
Programmatic Access: Query and analyze traces with Python APIs to search traces with filters and conditions using mlflow.search_traces()
, retrieve specific traces for detailed analysis, export data for custom analysis or integration with other tools, and build custom dashboards and monitoring solutions.
Trace data is useful for various downstream tasks, such as creating evaluation datasets for offline evaluation and production monitoring. MLflow provides several APIs to search and retrieve recorded traces programmatically. See Searching and Retrieving Traces for more details.
Production Monitoringβ
MLflow Tracing is production ready and provides comprehensive monitoring capabilities for your GenAI applications in production environments. The tracing system captures detailed execution information that can be integrated with your existing observability stack through OpenTelemetry standards.
For production deployments, consider using the Lightweight Tracing SDK (mlflow-tracing
) that is optimized for reducing the total installation size and minimizing dependencies while maintaining full tracing capabilities.
Read Production Tracing for complete guidance on using MLflow Tracing for monitoring models in production and various backend configuration options.
Getting Startedβ
While tracing features are available in MLflow 2.15.0+, it is strongly recommended to install MLflow 3 for the latest GenAI capabilities, including expanded tracing features and robust support. For production environments, consider the lightweight mlflow-tracing
package if you only need tracing functionality.
MLflow 3 provides enhanced support for production-grade tracing, including advanced feedback and labeling functionalities crucial for managing GenAI applications.
Installationβ
Install MLflow with the following command:
pip install --upgrade "mlflow"
For production environments focused solely on tracing:
pip install mlflow-tracing
Start Tracing in Minutesβ
Getting started with MLflow Tracing is remarkably simple. For many popular GenAI libraries, you can enable comprehensive tracing with just a single line of code:
import mlflow
import openai
# Enable automatic tracing - that's it!
mlflow.openai.autolog()
# Your existing code works unchanged and is now fully traced
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain MLflow Tracing"}],
max_tokens=100,
)
MLflow Tracing captures inputs, outputs, latencies, and metadata automatically. Start the MLflow UI with mlflow ui
to view your traces at http://localhost:5000
.
Next Steps for Your Applicationβ
- Instrument Your App
- Library Integrations
- Analyze Traces
Start by instrumenting your application to capture traces with the App Instrumentation Guide for a complete guide to adding tracing to your application. Use Automatic Tracing for one-line tracing with 20+ supported libraries, or Manual Tracing for custom instrumentation with any Python code.
Explore specific integrations for your framework, including OpenAI Integration to trace OpenAI API calls, LangChain Integration to trace LangChain applications, LlamaIndex Integration to trace LlamaIndex workflows, and All Integrations to browse all supported libraries.
Learn how to work with your trace data through Viewing Traces to explore traces in the MLflow UI, Querying Traces to programmatically search and retrieve traces, and understanding the Trace Data Model to comprehend trace structure and components.
Visit the Instrument your app guide to learn how to integrate MLflow Tracing into your GenAI projects. This will enable you to start logging detailed execution data, which is the foundation for debugging, performance analysis, and quality evaluation.
MLflow Tracing support is available with MLflow 2.14.0+, but we strongly recommend MLflow 3 for the latest features and enhanced production support.