Trace Data Model

This document provides a detailed overview of the MLflow Trace data model. Understanding this model is key to leveraging MLflow Tracing for observability and analysis of your Generative AI applications.

MLflow Traces are designed to be compatible with OpenTelemetry specifications, a widely adopted industry standard for observability. This ensures interoperability and allows MLflow Traces to be exported and used with other OpenTelemetry-compatible systems. MLflow enhances the basic OpenTelemetry Span model by defining specific structures and attributes for Generative AI use cases, providing richer context and deeper insight into quality and performance.

Structure of Traces

At a high level, an MLflow Trace is composed of two primary objects:

TraceInfo: Metadata that aids in explaining the origination of the trace, the status of the trace, information about the total execution time, etc. This includes tags that provide additional context for the trace, such as the user, session, and developer-provided key:value pairs. Tags can be used for searching or filtering traces.

TraceData: The actual payload, which contains the instrumented Span objects that capture your application's step-by-step execution from input to output.

tip

Check the API documentation for helper methods on these dataclass objects for more information on how to convert or extract data from them.

Trace Architecture

TraceInfo: Metadata and Context

The TraceInfo within MLflow's tracing feature aims to provide a lightweight snapshot of critical data about the overall trace. TraceInfo is a dataclass object that contains metadata about the trace.

This metadata includes information about the trace's origin, status, and various other data that aids in retrieving and filtering traces when used with mlflow.search_traces() and for navigation of traces within the MLflow UI.

Core Fields
Request & Response
Metadata & Organization

trace_id (str): The primary identifier for the trace.

trace_location (TraceLocation): The location where the trace is stored, represented as a TraceLocation object. MLflow currently supports MLflow Experiment as a trace location.

request_time (int): Start time of the trace, in milliseconds.

state (TraceState): State of the trace, represented as a TraceState enum. Can be one of [OK, ERROR, IN_PROGRESS, STATE_UNSPECIFIED].

execution_duration (Optional[int]): Duration of the trace, in milliseconds.

request_preview (Optional[str]): Request to the model/agent, equivalent to the input of the root span but JSON-encoded and can be truncated.

response_preview (Optional[str]): Response from the model/agent, equivalent to the output of the root span but JSON-encoded and can be truncated.

These preview fields provide a quick overview of what the trace processed without needing to examine the detailed span data.

client_request_id (Optional[str]): Client supplied request ID associated with the trace. This could be used to identify the trace/request from an external system that produced the trace, e.g., a session ID in a web application.

trace_metadata (dict[str, str]): Key-value pairs associated with the trace. They are designed for immutable values like run ID associated with the trace.

tags (dict[str, str]): Tags associated with the trace. They are designed for mutable values, that can be updated after the trace is created via MLflow UI or API.

The data that is contained in the TraceInfo object is used to populate the trace view page within the MLflow tracking UI.

TraceInfo as it is used in the MLflow UI

TraceData: Execution Details

The MLflow TraceData object, accessible via Trace.data, holds the core payload of the trace. It primarily contains the sequence of operations (spans) that occurred, along with the initial request that triggered the trace and the final response produced.

Spans
Request & Response

spans (List[Span]): This is a list of Span objects (conforming to OpenTelemetry specifications) that represent the individual steps, operations, or function calls within the trace. Each span details a specific unit of work.

Spans are organized hierarchically via parent_id to represent the execution flow.

See the Span Schema section below for a detailed breakdown of a Span object.

request (str): A JSON-serialized string representing the input data for the root span of the trace. This is typically the end-user's request or the initial parameters that invoked the traced application or workflow.

Example: '{"query": "What is MLflow Tracing?", "user_id": "user123"}'

response (str): A JSON-serialized string representing the final output data from the root span of the traced application or workflow.

Example: '{"answer": "MLflow Tracing provides observability...", "confidence": 0.95}'

note

The request and response properties are preserved for backward compatibility. Their values are looked up from the respective inputs and outputs attribute of the root span and are not set directly by the user on the TraceData object.

Understanding TraceData is key to programmatically analyzing the detailed execution path and the data transformations that occur throughout your GenAI application's lifecycle.

Span Schema

The Span object within MLflow's tracing feature provides detailed information about the individual steps of the trace. It complies with the OpenTelemetry Span spec.

Each Span object contains information about the step being instrumented, including the span_id, name, start_time, parent_id, status, inputs, outputs, attributes, and events.

Span Architecture

Core Span Properties

Identity & Hierarchy
Timing & Status
Data & Context

span_id (str): A unique identifier that is generated for each span within a trace.

trace_id (str): The unique identifier that links this span to its parent trace.

parent_id (Optional[str]): The identifier that establishes the hierarchical association of a given span with its parent span.

name (str): The name of the span, either user-defined or automatically generated based on the function or method being instrumented.

start_time_ns (int): The unix timestamp (in nanoseconds) when the span was started.

end_time_ns (int): The unix timestamp (in nanoseconds) when the span was ended.

status (SpanStatus): The status of a span with values of OK, UNSET, or ERROR. The span status object contains an optional description if the status_code is reflecting an error that occurred.

inputs (Optional[Any]): The inputs are stored as JSON-serialized data, representing the input data that is passed into the particular stage of your application.

outputs (Optional[Any]): The outputs are stored as JSON-serialized data, representing the output data that is passed out of the particular stage of your application.

attributes (Dict[str, Any]): Attributes are metadata that are associated with a given step within your application. These are key-value pairs that provide insight into behavioral modifications for function and method calls.

events (List[SpanEvent]): Events are a system-level property that is optionally applied to a span only if there was an issue during the execution of the span. These events contain information about exceptions that were thrown in the instrumented call, as well as the stack trace.

When you view your traces within the MLflow UI, you're looking at a collection of spans, as shown below:

Spans within the MLflow UI

Advanced Span Attributes

For production and enterprise use cases, spans can include rich contextual attributes:

Model and Infrastructure Attributes:

span.set_attributes(
    {
        "ai.model.name": "gpt-4",
        "ai.model.version": "2024-01-01",
        "ai.model.provider": "openai",
        "ai.model.temperature": 0.7,
        "ai.model.max_tokens": 1000,
        "infrastructure.gpu.type": "A100",
        "infrastructure.memory.used_mb": 2048,
    }
)

Business Logic Attributes:

span.set_attributes(
    {
        "business.operation.type": "content_generation",
        "business.user.tier": "premium",
        "business.request.priority": "high",
        "business.cost.estimate_usd": 0.02,
        "business.compliance.region": "eu-west-1",
    }
)

Performance and Quality Attributes:

span.set_attributes(
    {
        "performance.cache.hit": True,
        "performance.retry.count": 0,
        "quality.confidence.score": 0.95,
        "quality.toxicity.score": 0.02,
        "monitoring.latency.p99": 150.5,
    }
)

Span Types

Span types are a way to categorize spans within a trace. By default, the span type is set to "UNKNOWN" when using the trace decorator. MLflow provides a set of predefined span types for common use cases, while also allowing you to set custom span types.

Built-in Types
Usage Examples
Custom Span Types

Span Type	Description
`"CHAT_MODEL"`	Represents a query to a chat model. This is a special case of an LLM interaction.
`"CHAIN"`	Represents a chain of operations.
`"AGENT"`	Represents an autonomous agent operation.
`"TOOL"`	Represents a tool execution (typically by an agent), such as querying a search engine.
`"EMBEDDING"`	Represents a text embedding operation.
`"RETRIEVER"`	Represents a context retrieval operation, such as querying a vector database.
`"PARSER"`	Represents a parsing operation, transforming text into a structured format.
`"RERANKER"`	Represents a re-ranking operation, ordering the retrieved contexts based on relevance.
`"UNKNOWN"`	A default span type that is used when no other span type is specified.

To set a span type, you can pass the span_type parameter to the mlflow.trace() decorator or mlflow.start_span() context manager. When you are using automatic tracing, the span type is automatically set by MLflow.

import mlflow
from mlflow.entities import SpanType


# Using a built-in span type
@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_documents(query: str):
    ...


# Setting a custom span type
with mlflow.start_span(name="add", span_type="MATH") as span:
    span.set_inputs({"x": x, "y": y})
    z = x + y
    span.set_outputs({"z": z})

    print(span.span_type)
    # Output: MATH

For specialized applications, you can define custom span types that reflect your domain:

# Financial services
@mlflow.trace(span_type="RISK_ASSESSMENT")
def assess_credit_risk(application):
    ...


@mlflow.trace(span_type="FRAUD_DETECTION")
def detect_fraud(transaction):
    ...


# Healthcare
@mlflow.trace(span_type="DIAGNOSIS")
def generate_diagnosis(symptoms):
    ...


@mlflow.trace(span_type="TREATMENT_RECOMMENDATION")
def recommend_treatment(diagnosis):
    ...


# E-commerce
@mlflow.trace(span_type="PRODUCT_RECOMMENDATION")
def recommend_products(user_profile):
    ...


@mlflow.trace(span_type="INVENTORY_CHECK")
def check_inventory(product_id):
    ...

Specialized Span Schemas

MLflow has predefined types of spans, and certain span types have properties that are required in order to enable additional functionality within the UI and downstream tasks such as evaluation.

Retriever Spans

The RETRIEVER span type is used for operations involving retrieving data from a data store (for example, querying documents from a vector store). The output of a RETRIEVER span is expected to be a list of documents.

Each document in the list should be a dictionary with the following structure:

page_content (str): The text content of the retrieved document chunk.

metadata (Optional[Dict[str, Any]]): A dictionary of additional metadata associated with the document. MLflow UI and evaluation metrics may specifically look for:

doc_uri (str): A string URI for the document source
chunk_id (str): A string identifier if the document is part of a larger chunked document

id (Optional[str]): An optional unique identifier for the document chunk itself.

Example Usage

import mlflow
from mlflow.entities import SpanType, Document


def search_store(query: str) -> list[tuple[str, str]]:
    # Simulate retrieving documents (e.g., from a vector database)
    return [
        (
            "MLflow Tracing helps debug GenAI applications...",
            "docs/mlflow/tracing_intro.md",
        ),
        (
            "Key components of a trace include spans...",
            "docs/mlflow/tracing_datamodel.md",
        ),
        ("MLflow provides automatic instrumentation...", "docs/mlflow/auto_trace.md"),
    ]


@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_relevant_documents(query: str):
    # Get documents from the search store
    docs = search_store(query)

    # Get the current active span (created by @mlflow.trace)
    span = mlflow.get_current_active_span()

    # Set the outputs of the span in accordance with the tracing schema
    outputs = [
        Document(page_content=doc, metadata={"doc_uri": uri}) for doc, uri in docs
    ]
    span.set_outputs(outputs)

    # Return the original format for downstream usage
    return docs


# Example usage
user_query = "MLflow Tracing benefits"
retrieved_docs = retrieve_relevant_documents(user_query)

Advanced Retriever Example with Rich Metadata

@mlflow.trace(span_type=SpanType.RETRIEVER)
def advanced_retrieval(query: str, top_k: int = 5):
    """Advanced document retrieval with comprehensive metadata"""

    # Simulate vector database query with scores and metadata
    results = vector_db.similarity_search_with_score(
        query=query, k=top_k, filter={"status": "active"}
    )

    span = mlflow.get_current_active_span()

    # Rich document structure with comprehensive metadata
    documents = []
    for doc, score in results:
        document = {
            "page_content": doc.page_content,
            "metadata": {
                "doc_uri": doc.metadata.get("source_url"),
                "chunk_id": doc.metadata.get("chunk_id"),
                "document_title": doc.metadata.get("title"),
                "author": doc.metadata.get("author"),
                "creation_date": doc.metadata.get("created_at"),
                "document_type": doc.metadata.get("type"),
                "similarity_score": float(score),
                "retrieval_timestamp": time.time(),
                "chunk_position": doc.metadata.get("chunk_index"),
                "total_chunks": doc.metadata.get("total_chunks"),
            },
            "id": doc.metadata.get("doc_id"),
        }
        documents.append(document)

    # Set additional span attributes for debugging and analysis
    span.set_attributes(
        {
            "retrieval.query_length": len(query),
            "retrieval.results_count": len(documents),
            "retrieval.avg_similarity": sum(
                d["metadata"]["similarity_score"] for d in documents
            )
            / len(documents),
            "retrieval.database": "production_vector_db",
            "retrieval.index_name": "knowledge_base_v2",
        }
    )

    span.set_outputs(documents)
    return documents

Conforming to this structure, especially including page_content and relevant metadata like doc_uri, will ensure that RETRIEVER spans are rendered informatively in the MLflow UI and that downstream evaluation tasks can correctly process the retrieved context.

Chat Completion Spans

Spans of type CHAT_MODEL or LLM are used to represent interactions with a chat completions API (for example, OpenAI's chat completions or Anthropic's messages API). These spans can also capture information about tools (functions) made available to or used by the model.

As providers can have different schemas for their API, there are no strict restrictions on the format of the span's inputs and outputs for the raw LLM call itself. However, to enable rich UI features (like conversation display and tool call visualization) and to standardize data for evaluation, MLflow defines specific attributes for chat messages and tool definitions.

Special Attributes

mlflow.chat.messages: This attribute represents the system/user/assistant messages involved in the conversation with the chat model. The type must be List[ChatMessage].

mlflow.chat.tools: This attribute represents the tools that were available for the chat model to call. The type must be List[ChatTool].

These attributes can be conveniently set using helper functions like mlflow.tracing.set_span_chat_messages() and mlflow.tracing.set_span_chat_tools().

Example Usage

import mlflow
from mlflow.entities import SpanType
from mlflow.tracing.constant import SpanAttributeKey
from mlflow.tracing import set_span_chat_messages, set_span_chat_tools

# Example messages and tools
messages = [
    {
        "role": "system",
        "content": "please use the provided tool to answer the user's questions",
    },
    {"role": "user", "content": "what is 1 + 1?"},
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two numbers",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number"},
                    "b": {"type": "number"},
                },
                "required": ["a", "b"],
            },
        },
    }
]


@mlflow.trace(span_type=SpanType.CHAT_MODEL)
def call_chat_model(messages, tools):
    # Mocking a response
    response = {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "123",
                "function": {"arguments": '{"a": 1,"b": 2}', "name": "add"},
                "type": "function",
            }
        ],
    }

    combined_messages = messages + [response]

    span = mlflow.get_current_active_span()
    set_span_chat_messages(span, combined_messages)
    set_span_chat_tools(span, tools)

    return response


call_chat_model(messages, tools)

last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
span = trace.data.spans[0]

print("Messages: ", span.get_attribute(SpanAttributeKey.CHAT_MESSAGES))
print("Tools: ", span.get_attribute(SpanAttributeKey.CHAT_TOOLS))

Advanced Chat Model Span with Rich Context

@mlflow.trace(span_type=SpanType.CHAT_MODEL)
def advanced_chat_completion(messages, model_config, user_context):
    """Advanced chat completion with comprehensive tracking"""

    span = mlflow.get_current_active_span()

    # Set rich model and request attributes
    span.set_attributes(
        {
            "ai.model.name": model_config["model"],
            "ai.model.provider": "openai",
            "ai.model.temperature": model_config.get("temperature", 0.7),
            "ai.model.max_tokens": model_config.get("max_tokens", 1000),
            "ai.model.top_p": model_config.get("top_p", 1.0),
            "ai.request.user_tier": user_context.get("tier", "standard"),
            "ai.request.session_id": user_context.get("session_id"),
            "ai.request.conversation_length": len(messages),
            "ai.request.estimated_cost_usd": estimate_cost(messages, model_config),
        }
    )

    # Make the actual API call
    response = openai_client.chat.completions.create(messages=messages, **model_config)

    # Extract response details
    completion = response.choices[0].message
    usage = response.usage

    # Set response attributes
    span.set_attributes(
        {
            "ai.response.tokens.input": usage.prompt_tokens,
            "ai.response.tokens.output": usage.completion_tokens,
            "ai.response.tokens.total": usage.total_tokens,
            "ai.response.finish_reason": response.choices[0].finish_reason,
            "ai.response.model_used": response.model,
            "ai.response.has_tool_calls": bool(completion.tool_calls),
            "ai.response.actual_cost_usd": calculate_actual_cost(
                usage, model_config["model"]
            ),
        }
    )

    # Prepare messages for MLflow schema
    all_messages = messages + [completion.dict()]
    set_span_chat_messages(span, all_messages)

    # Set tools if they were provided
    if "tools" in model_config:
        set_span_chat_tools(span, model_config["tools"])

    return response

Working with Trace Data

Understanding the trace data model enables several powerful use cases:

Programmatic Analysis

Query specific spans: Search for spans by type, name, or attributes to analyze specific operations

Extract metrics: Calculate performance metrics like latency, success rates, and resource usage

Data flow analysis: Trace how data transforms through your application pipeline

Cost tracking: Monitor AI model usage and associated costs across different user tiers

Quality Assessment

Input/output validation: Verify that data is processed correctly at each step

Error analysis: Examine failed spans to understand failure patterns

Performance optimization: Identify bottlenecks and optimization opportunities

A/B testing: Compare trace patterns across different model versions or configurations

Integration with Evaluation

Trace-based evaluation: Use trace data as input for quality assessment

Span-level metrics: Evaluate individual operations within larger workflows

Historical comparison: Compare trace patterns across different versions or configurations

Business metrics: Track custom business KPIs through span attributes and tags

Advanced Analytics and Monitoring

Cost Analysis Example

def analyze_costs_by_user_tier(traces):
    """Analyze AI model costs by user tier"""
    cost_by_tier = {}

    for trace in traces:
        user_tier = trace.info.tags.get("business.user.tier", "unknown")

        for span in trace.data.spans:
            if span.span_type == "CHAT_MODEL":
                cost = span.attributes.get("ai.response.actual_cost_usd", 0)
                if user_tier not in cost_by_tier:
                    cost_by_tier[user_tier] = []
                cost_by_tier[user_tier].append(cost)

    return {tier: sum(costs) for tier, costs in cost_by_tier.items()}

Performance Monitoring Example

def monitor_retrieval_performance(traces):
    """Monitor retrieval system performance"""
    retrieval_metrics = []

    for trace in traces:
        for span in trace.data.spans:
            if span.span_type == "RETRIEVER":
                metrics = {
                    "query_time_ms": (span.end_time_ns - span.start_time_ns)
                    / 1_000_000,
                    "results_count": span.attributes.get("retrieval.results_count", 0),
                    "avg_similarity": span.attributes.get(
                        "retrieval.avg_similarity", 0
                    ),
                    "database": span.attributes.get("retrieval.database"),
                    "timestamp": span.start_time_ns / 1_000_000_000,
                }
                retrieval_metrics.append(metrics)

    return retrieval_metrics

Error Pattern Analysis

def analyze_error_patterns(traces):
    """Analyze common error patterns across traces"""
    error_patterns = {}

    for trace in traces:
        if trace.info.state == "ERROR":
            for span in trace.data.spans:
                if span.status.status_code == "ERROR":
                    error_type = span.attributes.get("error_type", "unknown")
                    operation = span.name

                    pattern_key = f"{operation}:{error_type}"
                    if pattern_key not in error_patterns:
                        error_patterns[pattern_key] = {
                            "count": 0,
                            "examples": [],
                            "user_tiers": set(),
                        }

                    error_patterns[pattern_key]["count"] += 1
                    error_patterns[pattern_key]["examples"].append(
                        span.attributes.get("error_message", "")[:100]
                    )
                    error_patterns[pattern_key]["user_tiers"].add(
                        trace.info.tags.get("business.user.tier", "unknown")
                    )

    return error_patterns

Next Steps

Tracing Concepts: Understand the fundamental concepts behind tracing

Querying Traces: Learn how to search and retrieve trace data programmatically

Automatic Tracing: Get started with one-line tracing for supported libraries

Manual Tracing: Create custom spans for detailed instrumentation

Understanding the trace data model is essential for leveraging MLflow Tracing effectively. The hierarchical structure of traces and spans, combined with rich metadata and specialized schemas, enables comprehensive observability and analysis of your GenAI applications.

Trace Data Model

Structure of Traces

TraceInfo: Metadata and Context

Tags

Standard Tags

Custom Tags for Advanced Use Cases

TraceData: Execution Details

Span Schema

Core Span Properties

Advanced Span Attributes

Span Types

Specialized Span Schemas

Retriever Spans

Example Usage

Advanced Retriever Example with Rich Metadata

Chat Completion Spans

Special Attributes

Example Usage

Advanced Chat Model Span with Rich Context

Working with Trace Data

Programmatic Analysis

Quality Assessment

Integration with Evaluation

Advanced Analytics and Monitoring

Cost Analysis Example

Performance Monitoring Example

Error Pattern Analysis

Next Steps

Structure of Traces​

TraceInfo: Metadata and Context​

Tags​

Standard Tags​

Custom Tags for Advanced Use Cases​

TraceData: Execution Details​

Span Schema​

Core Span Properties​

Advanced Span Attributes​

Span Types​

Specialized Span Schemas​

Retriever Spans​

Example Usage​

Advanced Retriever Example with Rich Metadata​

Chat Completion Spans​

Special Attributes​

Example Usage​

Advanced Chat Model Span with Rich Context​

Working with Trace Data​

Programmatic Analysis​

Quality Assessment​

Integration with Evaluation​

Advanced Analytics and Monitoring​

Cost Analysis Example​

Performance Monitoring Example​

Error Pattern Analysis​

Next Steps​

Structure of Traces

TraceInfo: Metadata and Context

Tags

Standard Tags

Custom Tags for Advanced Use Cases

TraceData: Execution Details

Span Schema

Core Span Properties

Advanced Span Attributes

Span Types

Specialized Span Schemas

Retriever Spans

Example Usage

Advanced Retriever Example with Rich Metadata

Chat Completion Spans

Special Attributes

Example Usage

Advanced Chat Model Span with Rich Context

Working with Trace Data

Programmatic Analysis

Quality Assessment

Integration with Evaluation

Advanced Analytics and Monitoring

Cost Analysis Example

Performance Monitoring Example

Error Pattern Analysis

Next Steps