Skip to main content

Low-level Client APIs (Advanced)

The MLflow Client APIs provide direct, fine-grained control over trace lifecycle management. While the high-level APIs handle most use cases elegantly, client APIs are essential for advanced scenarios requiring explicit control over trace creation, custom trace IDs, or integration with existing observability systems.

Use With Caution

Before You Begin: We recommend using client APIs only when high-level APIs don't meet your requirements:

  • โŒ No automatic parent-child relationship detection
  • ๐Ÿ› ๏ธ Manual exception handling required
  • ๐Ÿšซ Incompatible with auto-tracing integrations
  • ๐ŸŽ›๏ธ Full control over trace lifecycle
  • ๐Ÿ†” Custom trace ID management

Core Conceptsโ€‹

Trace Lifecycleโ€‹

Every trace follows a strict lifecycle that must be managed explicitly:

  1. ๐Ÿš€ Start Trace - Create the root span
  2. ๐Ÿ“Š Start Span(s) - Add child spans as needed
  3. ๐Ÿ”š End Span(s) - Close spans in reverse order (LIFO)
  4. โœ… End Trace - Complete the root span
important

Golden Rule: Every start_trace or start_span call must have a corresponding end_trace or end_span call. Failing to close spans will result in incomplete traces.

Key Identifiersโ€‹

Understanding these identifiers is crucial for client API usage:

IdentifierDescriptionUsage
request_idUnique trace identifierLinks all spans in a trace
span_idUnique span identifierIdentifies specific span to end
parent_idParent span's IDCreates span hierarchy

Getting Startedโ€‹

Initialize the Clientโ€‹

from mlflow import MlflowClient

# Initialize client with default tracking URI
client = MlflowClient()

# Or specify a custom tracking URI
client = MlflowClient(tracking_uri="http://localhost:5000")

Starting a Traceโ€‹

Unlike high-level APIs, you must explicitly start a trace before adding spans:

# Start a new trace - this creates the root span
root_span = client.start_trace(
name="my_application_flow",
inputs={"user_id": "123", "action": "generate_report"},
attributes={"environment": "production", "version": "1.0.0"},
)

# Extract the request_id for subsequent operations
request_id = root_span.request_id
print(f"Started trace with ID: {request_id}")

Adding Child Spansโ€‹

Create a hierarchy of spans to represent your application's workflow:

# Create a child span for data retrieval
data_span = client.start_span(
name="fetch_user_data",
request_id=request_id, # Links to the trace
parent_id=root_span.span_id, # Creates parent-child relationship
inputs={"user_id": "123"},
attributes={"database": "users_db", "query_type": "select"},
)

# Create a sibling span for processing
process_span = client.start_span(
name="process_data",
request_id=request_id,
parent_id=root_span.span_id, # Same parent as data_span
inputs={"data_size": "1024KB"},
attributes={"processor": "gpu", "batch_size": 32},
)

Ending Spansโ€‹

End spans in reverse order of creation (LIFO - Last In, First Out):

# End the data retrieval span
client.end_span(
request_id=data_span.request_id,
span_id=data_span.span_id,
outputs={"record_count": 42, "cache_hit": True},
attributes={"duration_ms": 150},
)

# End the processing span
client.end_span(
request_id=process_span.request_id,
span_id=process_span.span_id,
outputs={"processed_records": 42, "errors": 0},
status="OK",
)

Ending a Traceโ€‹

Complete the trace by ending the root span:

# End the root span (completes the trace)
client.end_trace(
request_id=request_id,
outputs={"report_url": "https://example.com/report/123"},
attributes={"total_duration_ms": 1250, "status": "success"},
)

Practical Examplesโ€‹

Proper error handling ensures traces are completed even when exceptions occur:

def traced_operation():
client = MlflowClient()
root_span = None

try:
# Start trace
root_span = client.start_trace("risky_operation")

# Start child span
child_span = client.start_span(
name="database_query",
request_id=root_span.request_id,
parent_id=root_span.span_id,
)

try:
# Risky operation
result = perform_database_query()

# End child span on success
client.end_span(
request_id=child_span.request_id,
span_id=child_span.span_id,
outputs={"result": result},
status="OK",
)
except Exception as e:
# End child span on error
client.end_span(
request_id=child_span.request_id,
span_id=child_span.span_id,
status="ERROR",
attributes={"error": str(e)},
)
raise

except Exception as e:
# Log error to trace
if root_span:
client.end_trace(
request_id=root_span.request_id,
status="ERROR",
attributes={"error_type": type(e).__name__, "error_message": str(e)},
)
raise
else:
# End trace on success
client.end_trace(
request_id=root_span.request_id,
outputs={"status": "completed"},
status="OK",
)

Best Practicesโ€‹

Create custom context managers to ensure spans are always closed:

from contextlib import contextmanager


@contextmanager
def traced_span(client, name, request_id, parent_id=None, **kwargs):
"""Context manager for safe span management"""
span = client.start_span(
name=name, request_id=request_id, parent_id=parent_id, **kwargs
)
try:
yield span
except Exception as e:
client.end_span(
request_id=span.request_id,
span_id=span.span_id,
status="ERROR",
attributes={"error": str(e)},
)
raise
else:
client.end_span(request_id=span.request_id, span_id=span.span_id, status="OK")


# Usage
with traced_span(client, "my_operation", request_id, parent_id) as span:
# Your code here
result = perform_operation()

Common Pitfallsโ€‹

Important Considerations

Avoid these common mistakes:

  • ๐Ÿšซ Forgetting to end spans - Always use try/finally or context managers
  • ๐Ÿ”— Incorrect parent-child relationships - Double-check span IDs
  • ๐Ÿ”€ Mixing high-level and low-level APIs - They don't interoperate
  • ๐Ÿ” Hardcoding trace IDs - Always generate unique IDs
  • ๐Ÿงต Ignoring thread safety - Client APIs are not thread-safe by default

Performance Considerationsโ€‹

  • ๐Ÿ“ฆ Batch Operations: When creating many spans, consider batching operations to reduce overhead.
  • ๐Ÿง  Memory Management: Be mindful of keeping references to span objects - clean them up when done.
  • ๐ŸŒ Network Calls: Each start/end operation may result in network calls to the tracking server.
  • ๐Ÿงต Thread Safety: Use locks or thread-local storage when using client APIs in multi-threaded environments.

Next Stepsโ€‹

High-Level APIs - Simpler alternative for most use cases

Automatic Tracing - One-line tracing for supported frameworks

Trace Data Model - Understanding trace structure and components

Querying Traces - Programmatically search and analyze your traces