Track Versions & Environments
Tracking environments, application versions, and custom contextual information in your GenAI application enables comprehensive observability across different deployment stages, versions, and business-specific dimensions. MLflow provides flexible mechanisms to attach rich metadata to your traces using tags.
Why Track Environments & Context?
Attaching this metadata to your traces provides critical insights for:
Environment-specific analysis: Compare behavior across development
, staging
, and production
environments
Version management: Track performance and regressions across different application versions (e.g., v1.0.1
, v1.2.0
)
Custom categorization: Add business-specific context (e.g., customer_tier: "premium"
, feature_flag: "new_algorithm"
)
Deployment validation: Ensure consistent behavior across different deployment targets
Root cause analysis: Quickly narrow down issues to specific environments, versions, or configurations
Standard & Custom Tags for Context
MLflow uses tags (key-value string pairs) to store contextual information on traces.
Automatically Populated Tags
These standard tags are automatically captured by MLflow based on your execution environment:
mlflow.source.name
: The entry point or script that generated the trace (automatically populated with the filename for Python scripts, notebook name for Jupyter notebooks)mlflow.source.git.commit
: If run from a Git repository, the commit hash is automatically detected and populatedmlflow.source.type
:NOTEBOOK
if running in Jupyter notebook,LOCAL
if running a local Python script, elseUNKNOWN
(automatically detected)
You can override these automatically populated tags manually if needed using mlflow.update_current_trace()
or mlflow.set_trace_tag()
for more granular control.
Reserved Standard Tags
Some standard tags have special meaning but must be set manually:
mlflow.trace.session
: Groups traces from multi-turn conversations or user sessions togethermlflow.trace.user
: Associates traces with specific users for user-centric analysis
Custom Tags
You can define custom tags to capture any business-specific or application-specific context. Common examples include:
environment
: e.g.,"production"
,"staging"
(fromDEPLOY_ENV
environment variable)app_version
: e.g.,"1.0.0"
(fromAPP_VERSION
environment variable)deployment_id
: e.g.,"deploy-abc-123"
(fromDEPLOYMENT_ID
environment variable)region
: e.g.,"us-east-1"
(fromREGION
environment variable)- Feature flags and A/B test variants
Basic Implementation
Here's how to add various types of context as tags to your traces:
- Basic Example
- Using Context Managers
- Web Application Example
import mlflow
import os
import platform
@mlflow.trace
def process_data_with_context(data: dict, app_config: dict):
"""Process data and add environment, version, and custom context."""
current_env = os.getenv("APP_ENVIRONMENT", "development")
current_app_version = app_config.get("version", "unknown")
current_model_version = app_config.get("model_in_use", "gpt-3.5-turbo")
# Define custom context tags
context_tags = {
"environment": current_env,
"app_version": current_app_version,
"model_version": current_model_version,
"python_version": platform.python_version(),
"operating_system": platform.system(),
"data_source": data.get("source", "batch"),
"processing_mode": "online" if current_env == "production" else "offline",
}
# Add tags to the current trace
mlflow.update_current_trace(tags=context_tags)
# Your application logic here...
result = (
f"Processed '{data['input']}' in {current_env} with app {current_app_version}"
)
return result
# Example usage
config = {"version": "1.1.0", "model_in_use": "claude-3-sonnet-20240229"}
input_data = {"input": "Summarize this document...", "source": "realtime_api"}
processed_result = process_data_with_context(input_data, config)
print(processed_result)
Key points:
- Use
os.getenv()
to fetch environment variables (e.g.,APP_ENVIRONMENT
,APP_VERSION
) - Pass application or model configurations into your traced functions
- Use
platform
module for system information mlflow.update_current_trace()
adds all key-value pairs to the active trace
For more complex scenarios, you can use context managers to ensure consistent tagging:
import mlflow
import os
from contextlib import contextmanager
@contextmanager
def trace_with_environment(operation_name: str):
"""Context manager that automatically adds environment context to traces"""
# Environment context
env_tags = {
"environment": os.getenv("ENVIRONMENT", "development"),
"app_version": os.getenv("APP_VERSION", "unknown"),
"deployment_id": os.getenv("DEPLOYMENT_ID", "local"),
"region": os.getenv("AWS_REGION", "local"),
"kubernetes_namespace": os.getenv("KUBERNETES_NAMESPACE"),
"container_image": os.getenv("CONTAINER_IMAGE"),
}
# Filter out None values
env_tags = {k: v for k, v in env_tags.items() if v is not None}
with mlflow.start_span(name=operation_name, attributes=env_tags) as span:
# Add tags to the trace level as well
mlflow.update_current_trace(tags=env_tags)
yield span
# Usage
def my_genai_pipeline(user_input: str):
with trace_with_environment("genai_pipeline"):
# Your pipeline logic here
return f"Processed: {user_input}"
result = my_genai_pipeline("What is the weather like?")
In a production web application, context can be derived from environment variables, request headers, or application configuration:
import mlflow
import os
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import uvicorn
app = FastAPI()
@mlflow.trace
@app.post("/chat")
async def handle_chat(request: Request):
# Get request data
data = await request.json()
message = data.get("message", "")
# Retrieve context from request headers
client_request_id = request.headers.get("X-Request-ID")
session_id = request.headers.get("X-Session-ID")
user_id = request.headers.get("X-User-ID")
user_agent = request.headers.get("User-Agent")
# Update the current trace with all context and environment metadata
mlflow.update_current_trace(
client_request_id=client_request_id,
tags={
# Session context - groups traces from multi-turn conversations
"mlflow.trace.session": session_id,
# User context - associates traces with specific users
"mlflow.trace.user": user_id,
# Environment metadata - tracks deployment context
"environment": os.getenv("ENVIRONMENT", "development"),
"app_version": os.getenv("APP_VERSION", "1.0.0"),
"deployment_id": os.getenv("DEPLOYMENT_ID", "unknown"),
"region": os.getenv("REGION", "us-east-1"),
# Request context
"user_agent": user_agent,
"request_method": request.method,
"endpoint": request.url.path,
},
)
# Your application logic for processing the chat message
response_text = f"Processed message: '{message}'"
return JSONResponse(content={"response": response_text})
if __name__ == "__main__":
uvicorn.run(app, host="127.0.0.1", port=5000, debug=True)
Example request with context headers:
curl -X POST "http://127.0.0.1:5000/chat" \
-H "Content-Type: application/json" \
-H "X-Request-ID: req-abc-123-xyz-789" \
-H "X-Session-ID: session-def-456-uvw-012" \
-H "X-User-ID: user-jane-doe-12345" \
-d '{"message": "What is my account balance?"}'
Querying and Analyzing Context Data
Using the MLflow UI
In the MLflow UI (Traces tab), use the search functionality to filter traces by context tags:
tags.environment = 'production'
tags.app_version = '2.1.0'
tags.model_used = 'advanced_model' AND tags.client_variant = 'treatment'
tags.feature_flag_new_ui = 'true'
You can group traces by tags to compare performance or error rates across different contexts.
Programmatic Analysis
Use the MLflow SDK for more complex analysis or to integrate with other tools:
- Version Comparison
- Environment Analysis
- Feature Flag Analysis
Compare error rates and performance across different application versions:
import mlflow
from mlflow import MlflowClient
def compare_version_metrics(experiment_id: str, versions: list):
"""Compare error rates and performance across app versions"""
version_metrics = {}
for version in versions:
traces = mlflow.search_traces(
experiment_ids=[experiment_id],
filter_string=f"tags.environment = 'production' AND tags.app_version = '{version}'",
)
if traces.empty:
version_metrics[version] = {
"error_rate": None,
"avg_latency": None,
"total_traces": 0,
}
continue
# Calculate metrics
error_count = len(traces[traces["status"] == "ERROR"])
total_traces = len(traces)
error_rate = (error_count / total_traces) * 100
successful_traces = traces[traces["status"] == "OK"]
avg_latency = (
successful_traces["execution_time_ms"].mean()
if not successful_traces.empty
else 0
)
version_metrics[version] = {
"error_rate": error_rate,
"avg_latency": avg_latency,
"total_traces": total_traces,
}
return version_metrics
# Usage
metrics = compare_version_metrics("1", ["1.0.0", "1.1.0", "1.2.0"])
for version, data in metrics.items():
print(
f"Version {version}: {data['error_rate']:.1f}% errors, {data['avg_latency']:.1f}ms avg latency"
)
Analyze performance differences across environments:
def analyze_environment_performance(experiment_id: str):
"""Compare performance across different environments"""
environments = ["development", "staging", "production"]
env_metrics = {}
for env in environments:
traces = mlflow.search_traces(
experiment_ids=[experiment_id],
filter_string=f"tags.environment = '{env}' AND status = 'OK'",
)
if not traces.empty:
env_metrics[env] = {
"count": len(traces),
"avg_latency": traces["execution_time_ms"].mean(),
"p95_latency": traces["execution_time_ms"].quantile(0.95),
"p99_latency": traces["execution_time_ms"].quantile(0.99),
}
return env_metrics
# Usage
env_performance = analyze_environment_performance("1")
for env, metrics in env_performance.items():
print(
f"{env}: {metrics['count']} traces, "
f"avg: {metrics['avg_latency']:.1f}ms, "
f"p95: {metrics['p95_latency']:.1f}ms"
)
Analyze the impact of feature flags on performance:
def analyze_feature_flag_impact(experiment_id: str, flag_name: str):
"""Analyze performance impact of a feature flag"""
# Get traces with feature flag enabled
flag_on_traces = mlflow.search_traces(
experiment_ids=[experiment_id],
filter_string=f"tags.feature_flag_{flag_name} = 'true' AND status = 'OK'",
)
# Get traces with feature flag disabled
flag_off_traces = mlflow.search_traces(
experiment_ids=[experiment_id],
filter_string=f"tags.feature_flag_{flag_name} = 'false' AND status = 'OK'",
)
results = {}
if not flag_on_traces.empty:
results["flag_on"] = {
"count": len(flag_on_traces),
"avg_latency": flag_on_traces["execution_time_ms"].mean(),
"error_rate": 0, # Only looking at successful traces
}
if not flag_off_traces.empty:
results["flag_off"] = {
"count": len(flag_off_traces),
"avg_latency": flag_off_traces["execution_time_ms"].mean(),
"error_rate": 0, # Only looking at successful traces
}
# Calculate performance impact
if "flag_on" in results and "flag_off" in results:
latency_change = (
results["flag_on"]["avg_latency"] - results["flag_off"]["avg_latency"]
)
latency_change_pct = (latency_change / results["flag_off"]["avg_latency"]) * 100
results["impact"] = {
"latency_change_ms": latency_change,
"latency_change_percent": latency_change_pct,
}
return results
# Usage
flag_analysis = analyze_feature_flag_impact("1", "new_retriever")
if "impact" in flag_analysis:
impact = flag_analysis["impact"]
print(
f"Feature flag impact: {impact['latency_change_ms']:.1f}ms "
f"({impact['latency_change_percent']:.1f}% change)"
)
Best Practices
Tagging Strategy
Standardize tag keys: Use a consistent naming convention (e.g., snake_case
) for your custom tags
Environment variables for deployment context: Use environment variables set during your CI/CD or deployment process for version and environment information
Automate context attachment: Ensure context tags are automatically applied by your application or deployment scripts
Balance granularity and simplicity: Capture enough context for useful analysis, but avoid excessive tagging that makes traces hard to manage
Performance Considerations
Minimize tag volume: While adding tags has minimal overhead, avoid attaching excessively large numbers of tags in high-throughput systems
Use short tag values: Keep tag values concise to reduce storage overhead
Consistent tagging: Ensure your tagging strategy is applied consistently across all services and deployment environments
Security and Privacy
Avoid sensitive data: Do not store PII or sensitive information directly in tags
Use anonymized identifiers: When tracking users, use anonymized identifiers rather than personal information
Review tag content: Regularly audit your tags to ensure they don't contain sensitive information
Next Steps
MLflow Tracing UI: Learn to use the UI for filtering and analyzing traces by environment and version
Search Traces: Master advanced search syntax for complex context-based queries
Query Traces via SDK: Build custom analysis and monitoring workflows
Manual Tracing: Add detailed instrumentation with context-aware spans
By implementing comprehensive environment and version tracking, you can build robust observability into your GenAI applications that scales from development through production deployment.