Skip to main content

Trace Data Model

This document provides a detailed overview of the MLflow Trace data model. Understanding this model is key to leveraging MLflow Tracing for observability and analysis of your Generative AI applications.

MLflow Traces are designed to be compatible with OpenTelemetry specifications, a widely adopted industry standard for observability. This ensures interoperability and allows MLflow Traces to be exported and used with other OpenTelemetry-compatible systems. MLflow enhances the basic OpenTelemetry Span model by defining specific structures and attributes for Generative AI use cases, providing richer context and deeper insight into quality and performance.

Structure of Traces

At a high level, an MLflow Trace is composed of two primary objects:

TraceInfo: Metadata that aids in explaining the origination of the trace, the status of the trace, information about the total execution time, etc. This includes tags that provide additional context for the trace, such as the user, session, and developer-provided key:value pairs. Tags can be used for searching or filtering traces.

TraceData: The actual payload, which contains the instrumented Span objects that capture your application's step-by-step execution from input to output.

tip

Check the API documentation for helper methods on these dataclass objects for more information on how to convert or extract data from them.

Trace Architecture

TraceInfo: Metadata and Context

The TraceInfo within MLflow's tracing feature aims to provide a lightweight snapshot of critical data about the overall trace. TraceInfo is a dataclass object that contains metadata about the trace.

This metadata includes information about the trace's origin, status, and various other data that aids in retrieving and filtering traces when used with mlflow.search_traces() and for navigation of traces within the MLflow UI.

trace_id (str): The primary identifier for the trace.

trace_location (TraceLocation): The location where the trace is stored, represented as a TraceLocation object. MLflow currently supports MLflow Experiment as a trace location.

request_time (int): Start time of the trace, in milliseconds.

state (TraceState): State of the trace, represented as a TraceState enum. Can be one of [OK, ERROR, IN_PROGRESS, STATE_UNSPECIFIED].

execution_duration (Optional[int]): Duration of the trace, in milliseconds.

The data that is contained in the TraceInfo object is used to populate the trace view page within the MLflow tracking UI.

TraceInfo as it is used in the MLflow UI

Tags

The tags property in MLflow's TraceInfo object is used to provide additional context for the trace. These tags can be used for searching, filtering, or providing additional information about the trace.

The tags are key-value pairs, and they are mutable. This means that you can add, modify, or remove tags at any time, even after the trace has been logged to an experiment.

Standard Tags

MLflow uses a set of standard tags for common contextual information about users, sessions and the environment, which enable enhanced filtering and grouping capabilities within the MLflow UI and SDK:

mlflow.trace.session: Standard tag for session ID

mlflow.trace.user: Standard tag for user ID

mlflow.source.name: The entry point or script that generated the trace

mlflow.source.git.commit: If run from a Git repository, the commit hash of the source code

mlflow.source.type: The type of source that generated the trace, commonly PROJECT (for MLflow Project runs) or NOTEBOOK (if run from a notebook)

Custom Tags for Advanced Use Cases

For enterprise and distributed applications, custom tags enable powerful filtering and organization:

Business Context Tags:

# Business-specific identification
mlflow.update_current_trace(
tags={
"business.user_id": "user123",
"business.tenant_id": "tenant_456",
"business.cost_center": "engineering",
"business.department": "ai_platform",
}
)

Deployment and Environment Tags:

# Deployment context
mlflow.update_current_trace(
tags={
"deployment.environment": "production",
"deployment.region": "us-west-2",
"deployment.service": "recommendation_engine",
"deployment.version": "v2.3.1",
}
)

Performance and Quality Tags:

# Performance monitoring
mlflow.update_current_trace(
tags={
"performance.tier": "premium",
"quality.confidence_threshold": "0.8",
"feature.experimental": "true",
"monitoring.alert_level": "high",
}
)

TraceData: Execution Details

The MLflow TraceData object, accessible via Trace.data, holds the core payload of the trace. It primarily contains the sequence of operations (spans) that occurred, along with the initial request that triggered the trace and the final response produced.

spans (List[Span]): This is a list of Span objects (conforming to OpenTelemetry specifications) that represent the individual steps, operations, or function calls within the trace. Each span details a specific unit of work.

Spans are organized hierarchically via parent_id to represent the execution flow.

See the Span Schema section below for a detailed breakdown of a Span object.

Understanding TraceData is key to programmatically analyzing the detailed execution path and the data transformations that occur throughout your GenAI application's lifecycle.

Span Schema

The Span object within MLflow's tracing feature provides detailed information about the individual steps of the trace. It complies with the OpenTelemetry Span spec.

Each Span object contains information about the step being instrumented, including the span_id, name, start_time, parent_id, status, inputs, outputs, attributes, and events.

Span Architecture

Core Span Properties

span_id (str): A unique identifier that is generated for each span within a trace.

trace_id (str): The unique identifier that links this span to its parent trace.

parent_id (Optional[str]): The identifier that establishes the hierarchical association of a given span with its parent span.

name (str): The name of the span, either user-defined or automatically generated based on the function or method being instrumented.

When you view your traces within the MLflow UI, you're looking at a collection of spans, as shown below:

Spans within the MLflow UI

Advanced Span Attributes

For production and enterprise use cases, spans can include rich contextual attributes:

Model and Infrastructure Attributes:

span.set_attributes(
{
"ai.model.name": "gpt-4",
"ai.model.version": "2024-01-01",
"ai.model.provider": "openai",
"ai.model.temperature": 0.7,
"ai.model.max_tokens": 1000,
"infrastructure.gpu.type": "A100",
"infrastructure.memory.used_mb": 2048,
}
)

Business Logic Attributes:

span.set_attributes(
{
"business.operation.type": "content_generation",
"business.user.tier": "premium",
"business.request.priority": "high",
"business.cost.estimate_usd": 0.02,
"business.compliance.region": "eu-west-1",
}
)

Performance and Quality Attributes:

span.set_attributes(
{
"performance.cache.hit": True,
"performance.retry.count": 0,
"quality.confidence.score": 0.95,
"quality.toxicity.score": 0.02,
"monitoring.latency.p99": 150.5,
}
)

Span Types

Span types are a way to categorize spans within a trace. By default, the span type is set to "UNKNOWN" when using the trace decorator. MLflow provides a set of predefined span types for common use cases, while also allowing you to set custom span types.

Span TypeDescription
"CHAT_MODEL"Represents a query to a chat model. This is a special case of an LLM interaction.
"CHAIN"Represents a chain of operations.
"AGENT"Represents an autonomous agent operation.
"TOOL"Represents a tool execution (typically by an agent), such as querying a search engine.
"EMBEDDING"Represents a text embedding operation.
"RETRIEVER"Represents a context retrieval operation, such as querying a vector database.
"PARSER"Represents a parsing operation, transforming text into a structured format.
"RERANKER"Represents a re-ranking operation, ordering the retrieved contexts based on relevance.
"UNKNOWN"A default span type that is used when no other span type is specified.

Specialized Span Schemas

MLflow has predefined types of spans, and certain span types have properties that are required in order to enable additional functionality within the UI and downstream tasks such as evaluation.

Retriever Spans

The RETRIEVER span type is used for operations involving retrieving data from a data store (for example, querying documents from a vector store). The output of a RETRIEVER span is expected to be a list of documents.

Each document in the list should be a dictionary with the following structure:

page_content (str): The text content of the retrieved document chunk.

metadata (Optional[Dict[str, Any]]): A dictionary of additional metadata associated with the document. MLflow UI and evaluation metrics may specifically look for:

  • doc_uri (str): A string URI for the document source
  • chunk_id (str): A string identifier if the document is part of a larger chunked document

id (Optional[str]): An optional unique identifier for the document chunk itself.

Example Usage

import mlflow
from mlflow.entities import SpanType, Document


def search_store(query: str) -> list[tuple[str, str]]:
# Simulate retrieving documents (e.g., from a vector database)
return [
(
"MLflow Tracing helps debug GenAI applications...",
"docs/mlflow/tracing_intro.md",
),
(
"Key components of a trace include spans...",
"docs/mlflow/tracing_datamodel.md",
),
("MLflow provides automatic instrumentation...", "docs/mlflow/auto_trace.md"),
]


@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_relevant_documents(query: str):
# Get documents from the search store
docs = search_store(query)

# Get the current active span (created by @mlflow.trace)
span = mlflow.get_current_active_span()

# Set the outputs of the span in accordance with the tracing schema
outputs = [
Document(page_content=doc, metadata={"doc_uri": uri}) for doc, uri in docs
]
span.set_outputs(outputs)

# Return the original format for downstream usage
return docs


# Example usage
user_query = "MLflow Tracing benefits"
retrieved_docs = retrieve_relevant_documents(user_query)

Advanced Retriever Example with Rich Metadata

@mlflow.trace(span_type=SpanType.RETRIEVER)
def advanced_retrieval(query: str, top_k: int = 5):
"""Advanced document retrieval with comprehensive metadata"""

# Simulate vector database query with scores and metadata
results = vector_db.similarity_search_with_score(
query=query, k=top_k, filter={"status": "active"}
)

span = mlflow.get_current_active_span()

# Rich document structure with comprehensive metadata
documents = []
for doc, score in results:
document = {
"page_content": doc.page_content,
"metadata": {
"doc_uri": doc.metadata.get("source_url"),
"chunk_id": doc.metadata.get("chunk_id"),
"document_title": doc.metadata.get("title"),
"author": doc.metadata.get("author"),
"creation_date": doc.metadata.get("created_at"),
"document_type": doc.metadata.get("type"),
"similarity_score": float(score),
"retrieval_timestamp": time.time(),
"chunk_position": doc.metadata.get("chunk_index"),
"total_chunks": doc.metadata.get("total_chunks"),
},
"id": doc.metadata.get("doc_id"),
}
documents.append(document)

# Set additional span attributes for debugging and analysis
span.set_attributes(
{
"retrieval.query_length": len(query),
"retrieval.results_count": len(documents),
"retrieval.avg_similarity": sum(
d["metadata"]["similarity_score"] for d in documents
)
/ len(documents),
"retrieval.database": "production_vector_db",
"retrieval.index_name": "knowledge_base_v2",
}
)

span.set_outputs(documents)
return documents

Conforming to this structure, especially including page_content and relevant metadata like doc_uri, will ensure that RETRIEVER spans are rendered informatively in the MLflow UI and that downstream evaluation tasks can correctly process the retrieved context.

Chat Completion Spans

Spans of type CHAT_MODEL or LLM are used to represent interactions with a chat completions API (for example, OpenAI's chat completions or Anthropic's messages API). These spans can also capture information about tools (functions) made available to or used by the model.

As providers can have different schemas for their API, there are no strict restrictions on the format of the span's inputs and outputs for the raw LLM call itself. However, to enable rich UI features (like conversation display and tool call visualization) and to standardize data for evaluation, MLflow defines specific attributes for chat messages and tool definitions.

Special Attributes

mlflow.chat.messages: This attribute represents the system/user/assistant messages involved in the conversation with the chat model. The type must be List[ChatMessage].

mlflow.chat.tools: This attribute represents the tools that were available for the chat model to call. The type must be List[ChatTool].

These attributes can be conveniently set using helper functions like mlflow.tracing.set_span_chat_messages() and mlflow.tracing.set_span_chat_tools().

Example Usage

import mlflow
from mlflow.entities import SpanType
from mlflow.tracing.constant import SpanAttributeKey
from mlflow.tracing import set_span_chat_messages, set_span_chat_tools

# Example messages and tools
messages = [
{
"role": "system",
"content": "please use the provided tool to answer the user's questions",
},
{"role": "user", "content": "what is 1 + 1?"},
]

tools = [
{
"type": "function",
"function": {
"name": "add",
"description": "Add two numbers",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"},
},
"required": ["a", "b"],
},
},
}
]


@mlflow.trace(span_type=SpanType.CHAT_MODEL)
def call_chat_model(messages, tools):
# Mocking a response
response = {
"role": "assistant",
"tool_calls": [
{
"id": "123",
"function": {"arguments": '{"a": 1,"b": 2}', "name": "add"},
"type": "function",
}
],
}

combined_messages = messages + [response]

span = mlflow.get_current_active_span()
set_span_chat_messages(span, combined_messages)
set_span_chat_tools(span, tools)

return response


call_chat_model(messages, tools)

last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
span = trace.data.spans[0]

print("Messages: ", span.get_attribute(SpanAttributeKey.CHAT_MESSAGES))
print("Tools: ", span.get_attribute(SpanAttributeKey.CHAT_TOOLS))

Advanced Chat Model Span with Rich Context

@mlflow.trace(span_type=SpanType.CHAT_MODEL)
def advanced_chat_completion(messages, model_config, user_context):
"""Advanced chat completion with comprehensive tracking"""

span = mlflow.get_current_active_span()

# Set rich model and request attributes
span.set_attributes(
{
"ai.model.name": model_config["model"],
"ai.model.provider": "openai",
"ai.model.temperature": model_config.get("temperature", 0.7),
"ai.model.max_tokens": model_config.get("max_tokens", 1000),
"ai.model.top_p": model_config.get("top_p", 1.0),
"ai.request.user_tier": user_context.get("tier", "standard"),
"ai.request.session_id": user_context.get("session_id"),
"ai.request.conversation_length": len(messages),
"ai.request.estimated_cost_usd": estimate_cost(messages, model_config),
}
)

# Make the actual API call
response = openai_client.chat.completions.create(messages=messages, **model_config)

# Extract response details
completion = response.choices[0].message
usage = response.usage

# Set response attributes
span.set_attributes(
{
"ai.response.tokens.input": usage.prompt_tokens,
"ai.response.tokens.output": usage.completion_tokens,
"ai.response.tokens.total": usage.total_tokens,
"ai.response.finish_reason": response.choices[0].finish_reason,
"ai.response.model_used": response.model,
"ai.response.has_tool_calls": bool(completion.tool_calls),
"ai.response.actual_cost_usd": calculate_actual_cost(
usage, model_config["model"]
),
}
)

# Prepare messages for MLflow schema
all_messages = messages + [completion.dict()]
set_span_chat_messages(span, all_messages)

# Set tools if they were provided
if "tools" in model_config:
set_span_chat_tools(span, model_config["tools"])

return response

Working with Trace Data

Understanding the trace data model enables several powerful use cases:

Programmatic Analysis

Query specific spans: Search for spans by type, name, or attributes to analyze specific operations

Extract metrics: Calculate performance metrics like latency, success rates, and resource usage

Data flow analysis: Trace how data transforms through your application pipeline

Cost tracking: Monitor AI model usage and associated costs across different user tiers

Quality Assessment

Input/output validation: Verify that data is processed correctly at each step

Error analysis: Examine failed spans to understand failure patterns

Performance optimization: Identify bottlenecks and optimization opportunities

A/B testing: Compare trace patterns across different model versions or configurations

Integration with Evaluation

Trace-based evaluation: Use trace data as input for quality assessment

Span-level metrics: Evaluate individual operations within larger workflows

Historical comparison: Compare trace patterns across different versions or configurations

Business metrics: Track custom business KPIs through span attributes and tags

Advanced Analytics and Monitoring

Cost Analysis Example

def analyze_costs_by_user_tier(traces):
"""Analyze AI model costs by user tier"""
cost_by_tier = {}

for trace in traces:
user_tier = trace.info.tags.get("business.user.tier", "unknown")

for span in trace.data.spans:
if span.span_type == "CHAT_MODEL":
cost = span.attributes.get("ai.response.actual_cost_usd", 0)
if user_tier not in cost_by_tier:
cost_by_tier[user_tier] = []
cost_by_tier[user_tier].append(cost)

return {tier: sum(costs) for tier, costs in cost_by_tier.items()}

Performance Monitoring Example

def monitor_retrieval_performance(traces):
"""Monitor retrieval system performance"""
retrieval_metrics = []

for trace in traces:
for span in trace.data.spans:
if span.span_type == "RETRIEVER":
metrics = {
"query_time_ms": (span.end_time_ns - span.start_time_ns)
/ 1_000_000,
"results_count": span.attributes.get("retrieval.results_count", 0),
"avg_similarity": span.attributes.get(
"retrieval.avg_similarity", 0
),
"database": span.attributes.get("retrieval.database"),
"timestamp": span.start_time_ns / 1_000_000_000,
}
retrieval_metrics.append(metrics)

return retrieval_metrics

Error Pattern Analysis

def analyze_error_patterns(traces):
"""Analyze common error patterns across traces"""
error_patterns = {}

for trace in traces:
if trace.info.state == "ERROR":
for span in trace.data.spans:
if span.status.status_code == "ERROR":
error_type = span.attributes.get("error_type", "unknown")
operation = span.name

pattern_key = f"{operation}:{error_type}"
if pattern_key not in error_patterns:
error_patterns[pattern_key] = {
"count": 0,
"examples": [],
"user_tiers": set(),
}

error_patterns[pattern_key]["count"] += 1
error_patterns[pattern_key]["examples"].append(
span.attributes.get("error_message", "")[:100]
)
error_patterns[pattern_key]["user_tiers"].add(
trace.info.tags.get("business.user.tier", "unknown")
)

return error_patterns

Next Steps

Tracing Concepts: Understand the fundamental concepts behind tracing

Querying Traces: Learn how to search and retrieve trace data programmatically

Automatic Tracing: Get started with one-line tracing for supported libraries

Manual Tracing: Create custom spans for detailed instrumentation


Understanding the trace data model is essential for leveraging MLflow Tracing effectively. The hierarchical structure of traces and spans, combined with rich metadata and specialized schemas, enables comprehensive observability and analysis of your GenAI applications.