Skip to main content

MLflow Tracking APIs

MLflow Tracking provides comprehensive APIs across multiple programming languages to capture your machine learning experiments. Whether you prefer automatic instrumentation or granular control, MLflow adapts to your workflow.

Choose Your Approach​

MLflow offers two primary methods for experiment tracking, each optimized for different use cases:

πŸ€– Automatic Logging - Zero Setup, Maximum Coverage​

Perfect for getting started quickly or when using supported ML libraries. Just add one line and MLflow captures everything automatically.

import mlflow

mlflow.autolog() # That's it!

# Your existing training code works unchanged
model.fit(X_train, y_train)

What gets logged automatically:

  • Model parameters and hyperparameters
  • Training and validation metrics
  • Model artifacts and checkpoints
  • Training plots and visualizations
  • Framework-specific metadata

Supported libraries: Scikit-learn, XGBoost, LightGBM, PyTorch, Keras/TensorFlow, Spark, and more.

β†’ Explore Auto Logging

πŸ› οΈ Manual Logging - Complete Control, Custom Workflows​

Ideal for custom training loops, advanced experimentation, or when you need precise control over what gets tracked.

import mlflow

with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("batch_size", 32)

# Your training logic here
for epoch in range(num_epochs):
train_loss = train_model()
val_loss = validate_model()

# Log metrics with step tracking
mlflow.log_metrics({"train_loss": train_loss, "val_loss": val_loss}, step=epoch)

# Log final model
mlflow.sklearn.log_model(model, name="model")

Core Logging Functions​

Setup & Configuration​

FunctionPurposeExample
mlflow.set_tracking_uri()Connect to tracking server or databasemlflow.set_tracking_uri("http://localhost:5000")
mlflow.get_tracking_uri()Get current tracking URIuri = mlflow.get_tracking_uri()
mlflow.create_experiment()Create new experimentexp_id = mlflow.create_experiment("my-experiment")
mlflow.set_experiment()Set active experimentmlflow.set_experiment("fraud-detection")

Run Management​

FunctionPurposeExample
mlflow.start_run()Start new run (with context manager)with mlflow.start_run(): ...
mlflow.end_run()End current runmlflow.end_run(status="FINISHED")
mlflow.active_run()Get currently active runrun = mlflow.active_run()
mlflow.last_active_run()Get last completed runlast_run = mlflow.last_active_run()

Data Logging​

FunctionPurposeExample
mlflow.log_param() / mlflow.log_params()Log hyperparametersmlflow.log_param("lr", 0.01)
mlflow.log_metric() / mlflow.log_metrics()Log performance metricsmlflow.log_metric("accuracy", 0.95, step=10)
mlflow.log_input()Log dataset informationmlflow.log_input(dataset)
mlflow.set_tag() / mlflow.set_tags()Add metadata tagsmlflow.set_tag("model_type", "CNN")

Artifact Management​

FunctionPurposeExample
mlflow.log_artifact()Log single file/directorymlflow.log_artifact("model.pkl")
mlflow.log_artifacts()Log entire directorymlflow.log_artifacts("./plots/")
mlflow.get_artifact_uri()Get artifact storage locationuri = mlflow.get_artifact_uri()

Model Management (New in MLflow 3)​

FunctionPurposeExample
mlflow.initialize_logged_model()Initialize a logged model in PENDING statemodel = mlflow.initialize_logged_model(name="my_model")
mlflow.create_external_model()Create external model (artifacts stored outside MLflow)model = mlflow.create_external_model(name="agent")
mlflow.finalize_logged_model()Update model status to READY or FAILEDmlflow.finalize_logged_model(model_id, "READY")
mlflow.get_logged_model()Retrieve logged model by IDmodel = mlflow.get_logged_model(model_id)
mlflow.last_logged_model()Get most recently logged modelmodel = mlflow.last_logged_model()
mlflow.search_logged_models()Search for logged modelsmodels = mlflow.search_logged_models(filter_string="name='my_model'")
mlflow.log_model_params()Log parameters to a specific modelmlflow.log_model_params({"param": "value"}, model_id)
mlflow.set_logged_model_tags()Set tags on a logged modelmlflow.set_logged_model_tags(model_id, {"key": "value"})
mlflow.delete_logged_model_tag()Delete tag from a logged modelmlflow.delete_logged_model_tag(model_id, "key")

Active Model Management (New in MLflow 3)​

FunctionPurposeExample
mlflow.set_active_model()Set active model for trace linkingmlflow.set_active_model(name="my_model")
mlflow.get_active_model_id()Get current active model IDmodel_id = mlflow.get_active_model_id()
mlflow.clear_active_model()Clear active modelmlflow.clear_active_model()

Language-Specific API Coverage​

CapabilityPythonJavaRREST API
Basic Loggingβœ… Fullβœ… Fullβœ… Fullβœ… Full
Auto Loggingβœ… 15+ Libraries❌ Not Availableβœ… Limited❌ Not Available
Model Loggingβœ… 20+ Flavorsβœ… Basic Supportβœ… Basic Supportβœ… Via Artifacts
Logged Model Managementβœ… Full (MLflow 3)❌ Not Available❌ Not Availableβœ… Basic
Dataset Trackingβœ… Fullβœ… Basicβœ… Basicβœ… Basic
Search & Queryβœ… Advancedβœ… Basicβœ… Basicβœ… Full
api-parity

The Python API provides the most comprehensive feature set. Java and R APIs offer core functionality with ongoing feature additions in each release.

Advanced Tracking Patterns​

Working with Logged Models (New in MLflow 3)​

MLflow 3 introduces powerful logged model management capabilities for tracking models independently of runs:

Creating and Managing External Models​

For models stored outside MLflow (like deployed agents or external model artifacts):

import mlflow

# Create an external model for tracking without storing artifacts in MLflow
model = mlflow.create_external_model(
name="chatbot_agent",
model_type="agent",
tags={"version": "v1.0", "environment": "production"},
)

# Log parameters specific to this model
mlflow.log_model_params(
{"temperature": "0.7", "max_tokens": "1000"}, model_id=model.model_id
)

# Set as active model for automatic trace linking
mlflow.set_active_model(model_id=model.model_id)


@mlflow.trace
def chat_with_agent(message):
# This trace will be automatically linked to the active model
return agent.chat(message)


# Traces are now linked to your external model
traces = mlflow.search_traces(model_id=model.model_id)

Advanced Model Lifecycle Management​

For models that require custom preparation or validation:

import mlflow
from mlflow.entities import LoggedModelStatus

# Initialize model in PENDING state
model = mlflow.initialize_logged_model(
name="custom_neural_network",
model_type="neural_network",
tags={"architecture": "transformer", "dataset": "custom"},
)

try:
# Custom model preparation logic
train_model()
validate_model()

# Save model artifacts using standard MLflow model logging
mlflow.pytorch.log_model(
pytorch_model=model_instance,
name="model",
model_id=model.model_id, # Link to the logged model
)

# Finalize model as READY
mlflow.finalize_logged_model(model.model_id, LoggedModelStatus.READY)

except Exception as e:
# Mark model as FAILED if issues occur
mlflow.finalize_logged_model(model.model_id, LoggedModelStatus.FAILED)
raise

# Retrieve and work with the logged model
final_model = mlflow.get_logged_model(model.model_id)
print(f"Model {final_model.name} is {final_model.status}")

Searching and Querying Logged Models​

# Find all production-ready transformer models
production_models = mlflow.search_logged_models(
filter_string="tags.environment = 'production' AND model_type = 'transformer'",
order_by=[{"field_name": "creation_time", "ascending": False}],
output_format="pandas",
)

# Search for models with specific performance metrics
high_accuracy_models = mlflow.search_logged_models(
filter_string="metrics.accuracy > 0.95",
datasets=[{"dataset_name": "test_set"}], # Only consider test set metrics
max_results=10,
)

# Get the most recently logged model in current session
latest_model = mlflow.last_logged_model()
if latest_model:
print(f"Latest model: {latest_model.name} (ID: {latest_model.model_id})")

Precise Metric Tracking​

Control exactly when and how metrics are recorded with custom timestamps and steps:

import time
from datetime import datetime

# Log with custom step (training iteration/epoch)
for epoch in range(100):
loss = train_epoch()
mlflow.log_metric("train_loss", loss, step=epoch)

# Log with custom timestamp
now = int(time.time() * 1000) # MLflow expects milliseconds
mlflow.log_metric("inference_latency", latency, timestamp=now)

# Log with both step and timestamp
mlflow.log_metric("gpu_utilization", gpu_usage, step=epoch, timestamp=now)

Step Requirements:

  • Must be a valid 64-bit integer
  • Can be negative or out of order
  • Supports gaps in sequences (e.g., 1, 5, 75, -20)

Experiment Organization​

Structure your experiments for easy comparison and analysis:

# Method 1: Environment variables
import os

os.environ["MLFLOW_EXPERIMENT_NAME"] = "fraud-detection-v2"

# Method 2: Explicit experiment setting
mlflow.set_experiment("hyperparameter-tuning")

# Method 3: Create with custom configuration
experiment_id = mlflow.create_experiment(
"production-models",
artifact_location="s3://my-bucket/experiments/",
tags={"team": "data-science", "environment": "prod"},
)

Hierarchical Runs with Parent-Child Relationships​

Organize complex experiments like hyperparameter sweeps or cross-validation:

# Parent run for the entire experiment
with mlflow.start_run(run_name="hyperparameter_sweep") as parent_run:
mlflow.log_param("search_strategy", "random")

best_score = 0
best_params = {}

# Child runs for each parameter combination
for lr in [0.001, 0.01, 0.1]:
for batch_size in [16, 32, 64]:
with mlflow.start_run(
nested=True, run_name=f"lr_{lr}_bs_{batch_size}"
) as child_run:
mlflow.log_params({"learning_rate": lr, "batch_size": batch_size})

# Train and evaluate
model = train_model(lr, batch_size)
score = evaluate_model(model)
mlflow.log_metric("accuracy", score)

# Track best configuration in parent
if score > best_score:
best_score = score
best_params = {"learning_rate": lr, "batch_size": batch_size}

# Log best results to parent run
mlflow.log_params(best_params)
mlflow.log_metric("best_accuracy", best_score)

# Query child runs
child_runs = mlflow.search_runs(
filter_string=f"tags.mlflow.parentRunId = '{parent_run.info.run_id}'"
)
print("Child run results:")
print(child_runs[["run_id", "params.learning_rate", "metrics.accuracy"]])

Parallel Execution Strategies​

Handle multiple runs efficiently with different parallelization approaches:

Perfect for simple hyperparameter sweeps or A/B testing:

configs = [
{"model": "RandomForest", "n_estimators": 100},
{"model": "XGBoost", "max_depth": 6},
{"model": "LogisticRegression", "C": 1.0},
]

for config in configs:
with mlflow.start_run(run_name=config["model"]):
mlflow.log_params(config)
model = train_model(config)
score = evaluate_model(model)
mlflow.log_metric("f1_score", score)

Smart Tagging for Organization​

Use tags strategically to organize and filter experiments:

with mlflow.start_run():
# Descriptive tags for filtering
mlflow.set_tags(
{
"model_family": "transformer",
"dataset_version": "v2.1",
"environment": "production",
"team": "nlp-research",
"gpu_type": "V100",
"experiment_phase": "hyperparameter_tuning",
}
)

# Special notes tag for documentation
mlflow.set_tag(
"mlflow.note.content",
"Baseline transformer model with attention dropout. "
"Testing different learning rate schedules.",
)

# Training code here...

Search experiments by tags:

# Find all transformer experiments
transformer_runs = mlflow.search_runs(filter_string="tags.model_family = 'transformer'")

# Find production-ready models
prod_models = mlflow.search_runs(
filter_string="tags.environment = 'production' AND metrics.accuracy > 0.95"
)

System Tags Reference​

MLflow automatically sets several system tags to capture execution context:

TagDescriptionWhen Set
mlflow.source.nameSource file or notebook nameAlways
mlflow.source.typeSource type (NOTEBOOK, JOB, LOCAL, etc.)Always
mlflow.userUser who created the runAlways
mlflow.source.git.commitGit commit hashWhen run from git repo
mlflow.source.git.branchGit branch nameMLflow Projects only
mlflow.parentRunIdParent run ID for nested runsChild runs only
mlflow.docker.image.nameDocker image usedDocker environments
mlflow.note.contentUser-editable descriptionManual only
pro-tip

Use mlflow.note.content to document experiment insights, hypotheses, or results directly in the MLflow UI. This tag appears in a dedicated Notes section on the run page.

Integration with Auto Logging​

Combine auto logging with manual tracking for the best of both worlds:

import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Enable auto logging
mlflow.autolog()

with mlflow.start_run():
# Auto logging captures model training automatically
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Add custom metrics and artifacts
predictions = model.predict(X_test)

# Log custom evaluation metrics
report = classification_report(y_test, predictions, output_dict=True)
mlflow.log_metrics(
{
"precision_macro": report["macro avg"]["precision"],
"recall_macro": report["macro avg"]["recall"],
"f1_macro": report["macro avg"]["f1-score"],
}
)

# Log custom artifacts
feature_importance = pd.DataFrame(
{"feature": feature_names, "importance": model.feature_importances_}
)
feature_importance.to_csv("feature_importance.csv")
mlflow.log_artifact("feature_importance.csv")

# Access the auto-logged run for additional processing
current_run = mlflow.active_run()
print(f"Auto-logged run ID: {current_run.info.run_id}")

# Access the completed run
last_run = mlflow.last_active_run()
print(f"Final run status: {last_run.info.status}")

Language-Specific Guides​


Next Steps: