Skip to main content

Track Versions & Environments

Tracking environments, application versions, and custom contextual information in your GenAI application enables comprehensive observability across different deployment stages, versions, and business-specific dimensions. MLflow provides flexible mechanisms to attach rich metadata to your traces using tags.

Why Track Environments & Context?

Attaching this metadata to your traces provides critical insights for:

Environment-specific analysis: Compare behavior across development, staging, and production environments

Version management: Track performance and regressions across different application versions (e.g., v1.0.1, v1.2.0)

Custom categorization: Add business-specific context (e.g., customer_tier: "premium", feature_flag: "new_algorithm")

Deployment validation: Ensure consistent behavior across different deployment targets

Root cause analysis: Quickly narrow down issues to specific environments, versions, or configurations

Standard & Custom Tags for Context

MLflow uses tags (key-value string pairs) to store contextual information on traces.

Automatically Populated Tags

These standard tags are automatically captured by MLflow based on your execution environment:

  • mlflow.source.name: The entry point or script that generated the trace (automatically populated with the filename for Python scripts, notebook name for Jupyter notebooks)
  • mlflow.source.git.commit: If run from a Git repository, the commit hash is automatically detected and populated
  • mlflow.source.type: NOTEBOOK if running in Jupyter notebook, LOCAL if running a local Python script, else UNKNOWN (automatically detected)

You can override these automatically populated tags manually if needed using mlflow.update_current_trace() or mlflow.set_trace_tag() for more granular control.

Reserved Standard Tags

Some standard tags have special meaning but must be set manually:

  • mlflow.trace.session: Groups traces from multi-turn conversations or user sessions together
  • mlflow.trace.user: Associates traces with specific users for user-centric analysis

Custom Tags

You can define custom tags to capture any business-specific or application-specific context. Common examples include:

  • environment: e.g., "production", "staging" (from DEPLOY_ENV environment variable)
  • app_version: e.g., "1.0.0" (from APP_VERSION environment variable)
  • deployment_id: e.g., "deploy-abc-123" (from DEPLOYMENT_ID environment variable)
  • region: e.g., "us-east-1" (from REGION environment variable)
  • Feature flags and A/B test variants

Basic Implementation

Here's how to add various types of context as tags to your traces:

import mlflow
import os
import platform


@mlflow.trace
def process_data_with_context(data: dict, app_config: dict):
"""Process data and add environment, version, and custom context."""

current_env = os.getenv("APP_ENVIRONMENT", "development")
current_app_version = app_config.get("version", "unknown")
current_model_version = app_config.get("model_in_use", "gpt-3.5-turbo")

# Define custom context tags
context_tags = {
"environment": current_env,
"app_version": current_app_version,
"model_version": current_model_version,
"python_version": platform.python_version(),
"operating_system": platform.system(),
"data_source": data.get("source", "batch"),
"processing_mode": "online" if current_env == "production" else "offline",
}

# Add tags to the current trace
mlflow.update_current_trace(tags=context_tags)

# Your application logic here...
result = (
f"Processed '{data['input']}' in {current_env} with app {current_app_version}"
)

return result


# Example usage
config = {"version": "1.1.0", "model_in_use": "claude-3-sonnet-20240229"}
input_data = {"input": "Summarize this document...", "source": "realtime_api"}

processed_result = process_data_with_context(input_data, config)
print(processed_result)

Key points:

  • Use os.getenv() to fetch environment variables (e.g., APP_ENVIRONMENT, APP_VERSION)
  • Pass application or model configurations into your traced functions
  • Use platform module for system information
  • mlflow.update_current_trace() adds all key-value pairs to the active trace

Querying and Analyzing Context Data

Using the MLflow UI

In the MLflow UI (Traces tab), use the search functionality to filter traces by context tags:

  • tags.environment = 'production'
  • tags.app_version = '2.1.0'
  • tags.model_used = 'advanced_model' AND tags.client_variant = 'treatment'
  • tags.feature_flag_new_ui = 'true'

You can group traces by tags to compare performance or error rates across different contexts.

Programmatic Analysis

Use the MLflow SDK for more complex analysis or to integrate with other tools:

Compare error rates and performance across different application versions:

import mlflow
from mlflow import MlflowClient


def compare_version_metrics(experiment_id: str, versions: list):
"""Compare error rates and performance across app versions"""

version_metrics = {}

for version in versions:
traces = mlflow.search_traces(
experiment_ids=[experiment_id],
filter_string=f"tags.environment = 'production' AND tags.app_version = '{version}'",
)

if traces.empty:
version_metrics[version] = {
"error_rate": None,
"avg_latency": None,
"total_traces": 0,
}
continue

# Calculate metrics
error_count = len(traces[traces["status"] == "ERROR"])
total_traces = len(traces)
error_rate = (error_count / total_traces) * 100

successful_traces = traces[traces["status"] == "OK"]
avg_latency = (
successful_traces["execution_time_ms"].mean()
if not successful_traces.empty
else 0
)

version_metrics[version] = {
"error_rate": error_rate,
"avg_latency": avg_latency,
"total_traces": total_traces,
}

return version_metrics


# Usage
metrics = compare_version_metrics("1", ["1.0.0", "1.1.0", "1.2.0"])
for version, data in metrics.items():
print(
f"Version {version}: {data['error_rate']:.1f}% errors, {data['avg_latency']:.1f}ms avg latency"
)

Best Practices

Tagging Strategy

Standardize tag keys: Use a consistent naming convention (e.g., snake_case) for your custom tags

Environment variables for deployment context: Use environment variables set during your CI/CD or deployment process for version and environment information

Automate context attachment: Ensure context tags are automatically applied by your application or deployment scripts

Balance granularity and simplicity: Capture enough context for useful analysis, but avoid excessive tagging that makes traces hard to manage

Performance Considerations

Minimize tag volume: While adding tags has minimal overhead, avoid attaching excessively large numbers of tags in high-throughput systems

Use short tag values: Keep tag values concise to reduce storage overhead

Consistent tagging: Ensure your tagging strategy is applied consistently across all services and deployment environments

Security and Privacy

Avoid sensitive data: Do not store PII or sensitive information directly in tags

Use anonymized identifiers: When tracking users, use anonymized identifiers rather than personal information

Review tag content: Regularly audit your tags to ensure they don't contain sensitive information

Next Steps

MLflow Tracing UI: Learn to use the UI for filtering and analyzing traces by environment and version

Search Traces: Master advanced search syntax for complex context-based queries

Query Traces via SDK: Build custom analysis and monitoring workflows

Manual Tracing: Add detailed instrumentation with context-aware spans

By implementing comprehensive environment and version tracking, you can build robust observability into your GenAI applications that scales from development through production deployment.