Skip to main content

Tracing Concepts

This guide introduces the core concepts of tracing and observability for GenAI applications. If you're new to tracing, this conceptual overview will help you understand the fundamental building blocks before diving into implementation.

What is Tracing?

Tracing is an observability technique that captures the complete execution flow of a request through your application. Unlike traditional logging that captures discrete events, tracing creates a detailed map of how data flows through your system, recording every operation, transformation, and decision point.

In the context of GenAI applications, tracing becomes essential because these systems involve complex, multi-step workflows that are difficult to debug and optimize without complete visibility into their execution.

Core Architecture: Trace = TraceInfo + TraceData

MLflow traces follow a simple but powerful structure: Trace = TraceInfo + TraceData where TraceData = List[Span]

A Trace in MLflow consists of two main components:

TraceInfo: Metadata about the overall trace (timing, status, preview data)

TraceData: The core execution data containing all the individual spans

This separation allows for efficient querying and filtering of traces while maintaining detailed execution information.

Trace Architecture

Key Concepts

Trace

A trace represents the complete journey of a single request through your application. It's a collection of related operations (spans) that together tell the story of how your system processed a user's input to generate an output.

Example: A user asks "What's the weather in Paris?" - the trace captures everything from parsing the question to returning the final weather report.

Span

A span represents a single, discrete operation within a trace. Each span has a clear beginning and end, and captures the inputs, outputs, timing, and metadata for that specific operation.

Key span properties:

  • Name: Human-readable identifier (e.g., "Document Retrieval", "LLM Call")
  • Duration: How long the operation took (measured in nanoseconds for precision)
  • Status: Success, failure, or error with detailed information
  • Inputs: Data that went into the operation (JSON-serialized)
  • Outputs: Results produced by the operation (JSON-serialized)
  • Attributes: Additional metadata (model parameters, user ID, configuration values)
  • Events: Significant moments during execution (errors, warnings, checkpoints)

Parent-Child Relationships

Spans form hierarchical relationships that mirror your application's call structure:

  • Root span: The top-level operation representing the entire request
  • Child spans: Operations called by parent operations
  • Sibling spans: Operations at the same level of the hierarchy

The parent_id property establishes these hierarchical associations, creating a clear order-of-operations linkage.

Span Types

MLflow categorizes spans by their purpose to make traces easier to understand and analyze. Each span type has semantic meaning and may have specialized schemas for enhanced functionality:

SpanType.LLM: Calls to language models

SpanType.CHAT_MODEL: Interactions with chat completion APIs

Examples: OpenAI chat completion, Anthropic Claude call, local model inference

Typically captures: model name, parameters, prompt, response, token usage

Special attributes: mlflow.chat.messages and mlflow.chat.tools for rich UI display

Trace Structure Example

Let's examine how these concepts work together in a typical RAG (Retrieval-Augmented Generation) application:

📋 Trace: "Answer User Question" (Root)
├── 🔍 Span: "Query Processing" (UNKNOWN)
│ ├── Input: "What are MLflow's key features?"
│ └── Output: "Processed query: 'mlflow features'"
├── 📚 Span: "Document Retrieval" (RETRIEVER)
│ ├── 🔗 Span: "Embedding Generation" (EMBEDDING)
│ │ ├── Input: "mlflow features"
│ │ └── Output: [0.1, 0.3, -0.2, ...] (vector)
│ └── 🗄️ Span: "Vector Search" (TOOL)
│ ├── Input: {query_vector, top_k: 5}
│ └── Output: [Document(...), Document(...)] (5 docs)
├── 🧠 Span: "Response Generation" (CHAIN)
│ ├── 📝 Span: "Prompt Building" (UNKNOWN)
│ │ ├── Input: {documents, user_query}
│ │ └── Output: "Based on these docs: ... Answer: ..."
│ └── 🤖 Span: "LLM Call" (CHAT_MODEL)
│ ├── Input: {messages, model: "gpt-4", temperature: 0.7}
│ └── Output: "MLflow's key features include..."
└── ✅ Span: "Response Formatting" (UNKNOWN)
├── Input: "MLflow's key features include..."
└── Output: {formatted_response, metadata}

Each span captures specific information relevant to its operation type, and the hierarchical structure shows the logical flow of the application.

Observability Benefits

Understanding these concepts enables several powerful observability capabilities:

Debugging

Root cause analysis: Trace the exact path that led to an error or unexpected result

Performance bottlenecks: Identify which operations consume the most time using precise nanosecond timing

Data flow validation: Verify that data is transformed correctly at each step by examining inputs and outputs

Optimization

Cost tracking: Monitor token usage, API calls, and resource consumption across operations using span attributes

Latency analysis: Understand where delays occur in your application with detailed timing data

Quality correlation: Connect input quality (e.g., retrieval relevance scores) to output quality

Monitoring

System health: Track success rates and error patterns across different components using span status

Usage patterns: Understand how users interact with your application through trace metadata

Trend analysis: Monitor performance and quality changes over time using trace history

Specialized Schemas

Some span types have specialized schemas that enable enhanced functionality:

Retriever Spans

For RETRIEVER spans, the output should conform to a List[Document] structure:

  • page_content: The text content of the document
  • metadata: Additional context including doc_uri for links and chunk_id for evaluation

This enables rich document display in the UI and proper evaluation metric calculation.

Chat Model Spans

For CHAT_MODEL and LLM spans, special attributes provide enhanced conversation display:

  • mlflow.chat.messages: Structured conversation data for rich UI rendering
  • mlflow.chat.tools: Available tools for function calling scenarios

These attributes can be set using helper functions like mlflow.tracing.set_span_chat_messages().

Use Cases

GenAI ChatCompletions Use Case

In Generative AI (GenAI) applications, such as chat completions, tracing becomes essential for developers building GenAI-powered applications. These applications involve generating human-like text based on input prompts, and tracing provides visibility into the entire interaction context.

GenAI ChatCompletions Architecture

What Tracing Captures

Enabling tracing on chat interfaces allows you to evaluate:

  • Full Contextual History: Complete conversation context
  • Prompt Engineering: How prompts are constructed and modified
  • Input Processing: User input validation and preprocessing
  • Configuration Parameters: Model settings and their effects
  • Output Generation: Response quality and characteristics

Key Metadata for ChatCompletions

Additional metadata surrounding the inference process is useful for various reasons:

  • Token Counts: Number of tokens processed (affects billing and performance)
  • Model Name: Specific model used for inference
  • Provider Type: Service or platform providing the model (OpenAI, Anthropic, etc.)
  • Query Parameters: Settings like temperature, top-k, max_tokens
  • Query Input: The request input (user question)
  • Query Response: System-generated response
  • Latency: Time taken for each operation
  • Cost: API costs associated with the request

Example: Enhanced Chat Application

import mlflow
from openai import OpenAI
import time


@mlflow.trace
def enhanced_chat_completion(user_message, conversation_history=None):
start_time = time.time()

# Add context to the trace
mlflow.update_current_trace(
tags={
"application": "customer_support_chat",
"user_type": "premium",
"conversation_length": len(conversation_history or []),
}
)

# Prepare messages with history
messages = conversation_history or []
messages.append({"role": "user", "content": user_message})

client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages, temperature=0.7, max_tokens=500
)

# Add performance metrics
mlflow.update_current_trace(
tags={
"response_time_seconds": time.time() - start_time,
"token_count": response.usage.total_tokens,
"model_used": response.model,
}
)

return response.choices[0].message.content

Getting Started with Concepts

Now that you understand these fundamental concepts:

Instrument Your App: Learn how to add tracing to your applications

Trace Data Model: Explore the detailed schema and API reference

Automatic Tracing: Enable one-line tracing for supported libraries

Manual Tracing: Create custom spans for your application logic


These concepts form the foundation for understanding how MLflow Tracing provides observability into your GenAI applications. The hierarchical structure of traces and spans, combined with rich metadata capture and specialized schemas, enables deep insights into your application's behavior and performance.