Skip to main content

Monitoring GenAI Application in Production

Machine learning projects don't conclude with their initial launch. Ongoing monitoring and incremental enhancements are critical for long-term success. MLflow Tracing offers observability for your production application, supporting the iterative process of continuous improvement.

Pro Tip: Using the Lightweight Tracing SDK for Reducing the Footprint

The MLflow Tracing SDK mlflow-tracing is a lightweight package that only includes the minimum set of dependencies to instrument your code/models/agents with MLflow Tracing.

Key Benefits

  • ⚡️ Faster Deployment: Significantly smaller package size and fewer dependencies enable quicker deployments in containers and serverless environments
  • 🔧 Simple Dependency Management: Reduced dependencies mean less maintenance overhead and fewer potential conflicts
  • 📦 Enhanced Portability: Easily deploy across different platforms with minimal compatibility concerns
  • 🔒 Improved Security: Smaller attack surface with fewer dependencies reduces security risks

warning

When installing the MLflow Tracing SDK, make sure the environment does not have the full MLflow package installed. Having both in the same environment might cause conflicts and unexpected behaviors.

If you are looking for a managed solution for monitoring your GenAI application with complete observability by MLflow Tracing, we recommend using Lakehouse Monitoring for GenAI on Databricks.

info

Don't have a Databricks account? Sign up for free and get started in a minute!

Monitoring Hero

This solution provides an instant access to a fully functional monitoring system and dashboard for your GenAI application, which includes:

  • Track operational metrics like request volume, latency, errors, and cost.
  • Monitor quality metrics such as correctness, safety, context sufficiency, and more using managed evaluation.
  • Configure custom metrics with Python function.
  • Root cause analysis by looking at the recorded traces from MLflow Tracing.

Lakehouse Monitoring for GenAI can be used for your GenAI application, regardless of whether it is hosted on Databricks or not. You can run the application hosted on any cloud or on-premise, and configure MLflow Tracing to send traces to Databricks to monitor the application.

For more details about the product and how to set it up, please refer to the Lakehouse Monitoring for GenAI documentation.

Configurations

When using Lakehouse Monitoring for GenAI, MLflow logs traces asynchronously by default, allowing your application to continue serving requests without waiting for trace logging to complete.

Environment VariableDescriptionDefault Value
MLFLOW_ENABLE_ASYNC_TRACE_LOGGINGWhether to log traces asynchronously. When set to False, traces will be logged in a blocking manner.True
MLFLOW_ASYNC_TRACE_LOGGING_MAX_WORKERSThe maximum number of worker threads to use for async trace logging per process. Increasing this allows higher throughput of trace logging, but also increases the CPU usage and memory consumption.10
MLFLOW_ASYNC_TRACE_LOGGING_MAX_QUEUE_SIZEThe maximum number of traces that can be queued before being logged to backend by the worker threads. When the queue is full, new traces will be discarded. Increasing this allows higher durability of trace logging, but also increases the memory consumption.1000
MLFLOW_ASYNC_TRACE_LOGGING_RETRY_TIMEOUTThe timeout in seconds for retrying failed trace logging. When a trace logging fails, it will be retried up to this timeout with backoff, after which it will be discarded.500

OpenTelemetry Integration

Traces generated by MLflow are compatible with the OpenTelemetry trace specs. Therefore, MLflow traces can be exported to various observability platforms that support OpenTelemetry.

By default, MLflow exports traces to the MLflow Tracking Server. To enable exporting traces to an OpenTelemetry Collector, set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable (or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) to the target URL of the OpenTelemetry Collector before starting any trace.

pip install opentelemetry-exporter-otlp
import mlflow
import os

# Set the endpoint of the OpenTelemetry Collector
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"
# Optionally, set the service name to group traces
os.environ["OTEL_SERVICE_NAME"] = "<your-service-name>"

# Trace will be exported to the OTel collector at http://localhost:4317/v1/traces
with mlflow.start_span(name="foo") as span:
span.set_inputs({"a": 1})
span.set_outputs({"b": 2})

Click on the following icons to learn more about how to set up OpenTelemetry Collector for your specific observability platform.

Datadog Logo
NewRelic Logo
Signoz Logo
Splunk Logo
Grafana Logo
ServiceNow Logo

Configurations

MLflow uses the standard OTLP Exporter for exporting traces to OpenTelemetry Collector instances. Thereby, you can use all of the configurations supported by OpenTelemetry. The following example configures the OTLP Exporter to use HTTP protocol instead of the default gRPC and sets custom headers:

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317/v1/traces"
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_TRACES_HEADERS="api_key=12345"
warning

MLflow only exports traces to a single destination. When the OTEL_EXPORTER_OTLP_ENDPOINT environment variable is configured, MLflow will not export traces to the MLflow Tracking Server and you will not see traces in the MLflow UI.

Similarly, if you deploy the model to the Databricks Model Serving with tracing enabled, using the OpenTelemetry Collector will result in traces not being recorded in the Inference Table.

Self-host Tracking Server

You can keep using the MLflow tracking server to store production traces. However, tracking server is optimized for offline experience and generally not suitable for handling the hyper scale traffic. Thereby, we recommend using the other two options for production monitoring use case.

If you choose to keep using the tracking server in production, we strongly recommend using SQL-based tracking server on top of a scalable database and artifact storage, as it will be a key factor for write and query performance. Refer to the tracking server setup guide for more details. In addition, tracking server by default uses infinite retention date for trace data, hence it is recommended to set up periodic deletion job using the SDK or REST API.