MLflow Tracking APIs
MLflow Tracking provides comprehensive APIs across multiple programming languages to capture your machine learning experiments. Whether you prefer automatic instrumentation or granular control, MLflow adapts to your workflow.
Choose Your Approachβ
MLflow offers two primary methods for experiment tracking, each optimized for different use cases:
π€ Automatic Logging - Zero Setup, Maximum Coverageβ
Perfect for getting started quickly or when using supported ML libraries. Just add one line and MLflow captures everything automatically.
import mlflow
mlflow.autolog() # That's it!
# Your existing training code works unchanged
model.fit(X_train, y_train)
What gets logged automatically:
- Model parameters and hyperparameters
- Training and validation metrics
- Model artifacts and checkpoints
- Training plots and visualizations
- Framework-specific metadata
Supported libraries: Scikit-learn, XGBoost, LightGBM, PyTorch, Keras/TensorFlow, Spark, and more.
π οΈ Manual Logging - Complete Control, Custom Workflowsβ
Ideal for custom training loops, advanced experimentation, or when you need precise control over what gets tracked.
- Python
- Java
- R
import mlflow
with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("batch_size", 32)
# Your training logic here
for epoch in range(num_epochs):
train_loss = train_model()
val_loss = validate_model()
# Log metrics with step tracking
mlflow.log_metrics({"train_loss": train_loss, "val_loss": val_loss}, step=epoch)
# Log final model
mlflow.sklearn.log_model(model, name="model")
MlflowClient client = new MlflowClient();
RunInfo run = client.createRun();
// Log parameters
client.logParam(run.getRunId(), "learning_rate", "0.01");
client.logParam(run.getRunId(), "batch_size", "32");
// Log metrics with timesteps
for (int epoch = 0; epoch < numEpochs; epoch++) {
double trainLoss = trainModel();
client.logMetric(run.getRunId(), "train_loss", trainLoss,
System.currentTimeMillis(), epoch);
}
library(mlflow)
with(mlflow_start_run(), {
# Log parameters
mlflow_log_param("learning_rate", 0.01)
mlflow_log_param("batch_size", 32)
# Training loop
for (epoch in 1:num_epochs) {
train_loss <- train_model()
mlflow_log_metric("train_loss", train_loss, step = epoch)
}
})
Core Logging Functionsβ
Setup & Configurationβ
Function | Purpose | Example |
---|---|---|
mlflow.set_tracking_uri() | Connect to tracking server or database | mlflow.set_tracking_uri("http://localhost:5000") |
mlflow.get_tracking_uri() | Get current tracking URI | uri = mlflow.get_tracking_uri() |
mlflow.create_experiment() | Create new experiment | exp_id = mlflow.create_experiment("my-experiment") |
mlflow.set_experiment() | Set active experiment | mlflow.set_experiment("fraud-detection") |
Run Managementβ
Function | Purpose | Example |
---|---|---|
mlflow.start_run() | Start new run (with context manager) | with mlflow.start_run(): ... |
mlflow.end_run() | End current run | mlflow.end_run(status="FINISHED") |
mlflow.active_run() | Get currently active run | run = mlflow.active_run() |
mlflow.last_active_run() | Get last completed run | last_run = mlflow.last_active_run() |
Data Loggingβ
Function | Purpose | Example |
---|---|---|
mlflow.log_param() / mlflow.log_params() | Log hyperparameters | mlflow.log_param("lr", 0.01) |
mlflow.log_metric() / mlflow.log_metrics() | Log performance metrics | mlflow.log_metric("accuracy", 0.95, step=10) |
mlflow.log_input() | Log dataset information | mlflow.log_input(dataset) |
mlflow.set_tag() / mlflow.set_tags() | Add metadata tags | mlflow.set_tag("model_type", "CNN") |
Artifact Managementβ
Function | Purpose | Example |
---|---|---|
mlflow.log_artifact() | Log single file/directory | mlflow.log_artifact("model.pkl") |
mlflow.log_artifacts() | Log entire directory | mlflow.log_artifacts("./plots/") |
mlflow.get_artifact_uri() | Get artifact storage location | uri = mlflow.get_artifact_uri() |
Model Management (New in MLflow 3)β
Function | Purpose | Example |
---|---|---|
mlflow.initialize_logged_model() | Initialize a logged model in PENDING state | model = mlflow.initialize_logged_model(name="my_model") |
mlflow.create_external_model() | Create external model (artifacts stored outside MLflow) | model = mlflow.create_external_model(name="agent") |
mlflow.finalize_logged_model() | Update model status to READY or FAILED | mlflow.finalize_logged_model(model_id, "READY") |
mlflow.get_logged_model() | Retrieve logged model by ID | model = mlflow.get_logged_model(model_id) |
mlflow.last_logged_model() | Get most recently logged model | model = mlflow.last_logged_model() |
mlflow.search_logged_models() | Search for logged models | models = mlflow.search_logged_models(filter_string="name='my_model'") |
mlflow.log_model_params() | Log parameters to a specific model | mlflow.log_model_params({"param": "value"}, model_id) |
mlflow.set_logged_model_tags() | Set tags on a logged model | mlflow.set_logged_model_tags(model_id, {"key": "value"}) |
mlflow.delete_logged_model_tag() | Delete tag from a logged model | mlflow.delete_logged_model_tag(model_id, "key") |
Active Model Management (New in MLflow 3)β
Function | Purpose | Example |
---|---|---|
mlflow.set_active_model() | Set active model for trace linking | mlflow.set_active_model(name="my_model") |
mlflow.get_active_model_id() | Get current active model ID | model_id = mlflow.get_active_model_id() |
mlflow.clear_active_model() | Clear active model | mlflow.clear_active_model() |
Language-Specific API Coverageβ
Capability | Python | Java | R | REST API |
---|---|---|---|---|
Basic Logging | β Full | β Full | β Full | β Full |
Auto Logging | β 15+ Libraries | β Not Available | β Limited | β Not Available |
Model Logging | β 20+ Flavors | β Basic Support | β Basic Support | β Via Artifacts |
Logged Model Management | β Full (MLflow 3) | β Not Available | β Not Available | β Basic |
Dataset Tracking | β Full | β Basic | β Basic | β Basic |
Search & Query | β Advanced | β Basic | β Basic | β Full |
The Python API provides the most comprehensive feature set. Java and R APIs offer core functionality with ongoing feature additions in each release.
Advanced Tracking Patternsβ
Working with Logged Models (New in MLflow 3)β
MLflow 3 introduces powerful logged model management capabilities for tracking models independently of runs:
Creating and Managing External Modelsβ
For models stored outside MLflow (like deployed agents or external model artifacts):
import mlflow
# Create an external model for tracking without storing artifacts in MLflow
model = mlflow.create_external_model(
name="chatbot_agent",
model_type="agent",
tags={"version": "v1.0", "environment": "production"},
)
# Log parameters specific to this model
mlflow.log_model_params(
{"temperature": "0.7", "max_tokens": "1000"}, model_id=model.model_id
)
# Set as active model for automatic trace linking
mlflow.set_active_model(model_id=model.model_id)
@mlflow.trace
def chat_with_agent(message):
# This trace will be automatically linked to the active model
return agent.chat(message)
# Traces are now linked to your external model
traces = mlflow.search_traces(model_id=model.model_id)
Advanced Model Lifecycle Managementβ
For models that require custom preparation or validation:
import mlflow
from mlflow.entities import LoggedModelStatus
# Initialize model in PENDING state
model = mlflow.initialize_logged_model(
name="custom_neural_network",
model_type="neural_network",
tags={"architecture": "transformer", "dataset": "custom"},
)
try:
# Custom model preparation logic
train_model()
validate_model()
# Save model artifacts using standard MLflow model logging
mlflow.pytorch.log_model(
pytorch_model=model_instance,
name="model",
model_id=model.model_id, # Link to the logged model
)
# Finalize model as READY
mlflow.finalize_logged_model(model.model_id, LoggedModelStatus.READY)
except Exception as e:
# Mark model as FAILED if issues occur
mlflow.finalize_logged_model(model.model_id, LoggedModelStatus.FAILED)
raise
# Retrieve and work with the logged model
final_model = mlflow.get_logged_model(model.model_id)
print(f"Model {final_model.name} is {final_model.status}")
Searching and Querying Logged Modelsβ
# Find all production-ready transformer models
production_models = mlflow.search_logged_models(
filter_string="tags.environment = 'production' AND model_type = 'transformer'",
order_by=[{"field_name": "creation_time", "ascending": False}],
output_format="pandas",
)
# Search for models with specific performance metrics
high_accuracy_models = mlflow.search_logged_models(
filter_string="metrics.accuracy > 0.95",
datasets=[{"dataset_name": "test_set"}], # Only consider test set metrics
max_results=10,
)
# Get the most recently logged model in current session
latest_model = mlflow.last_logged_model()
if latest_model:
print(f"Latest model: {latest_model.name} (ID: {latest_model.model_id})")
Precise Metric Trackingβ
Control exactly when and how metrics are recorded with custom timestamps and steps:
import time
from datetime import datetime
# Log with custom step (training iteration/epoch)
for epoch in range(100):
loss = train_epoch()
mlflow.log_metric("train_loss", loss, step=epoch)
# Log with custom timestamp
now = int(time.time() * 1000) # MLflow expects milliseconds
mlflow.log_metric("inference_latency", latency, timestamp=now)
# Log with both step and timestamp
mlflow.log_metric("gpu_utilization", gpu_usage, step=epoch, timestamp=now)
Step Requirements:
- Must be a valid 64-bit integer
- Can be negative or out of order
- Supports gaps in sequences (e.g., 1, 5, 75, -20)
Experiment Organizationβ
Structure your experiments for easy comparison and analysis:
# Method 1: Environment variables
import os
os.environ["MLFLOW_EXPERIMENT_NAME"] = "fraud-detection-v2"
# Method 2: Explicit experiment setting
mlflow.set_experiment("hyperparameter-tuning")
# Method 3: Create with custom configuration
experiment_id = mlflow.create_experiment(
"production-models",
artifact_location="s3://my-bucket/experiments/",
tags={"team": "data-science", "environment": "prod"},
)
Hierarchical Runs with Parent-Child Relationshipsβ
Organize complex experiments like hyperparameter sweeps or cross-validation:
# Parent run for the entire experiment
with mlflow.start_run(run_name="hyperparameter_sweep") as parent_run:
mlflow.log_param("search_strategy", "random")
best_score = 0
best_params = {}
# Child runs for each parameter combination
for lr in [0.001, 0.01, 0.1]:
for batch_size in [16, 32, 64]:
with mlflow.start_run(
nested=True, run_name=f"lr_{lr}_bs_{batch_size}"
) as child_run:
mlflow.log_params({"learning_rate": lr, "batch_size": batch_size})
# Train and evaluate
model = train_model(lr, batch_size)
score = evaluate_model(model)
mlflow.log_metric("accuracy", score)
# Track best configuration in parent
if score > best_score:
best_score = score
best_params = {"learning_rate": lr, "batch_size": batch_size}
# Log best results to parent run
mlflow.log_params(best_params)
mlflow.log_metric("best_accuracy", best_score)
# Query child runs
child_runs = mlflow.search_runs(
filter_string=f"tags.mlflow.parentRunId = '{parent_run.info.run_id}'"
)
print("Child run results:")
print(child_runs[["run_id", "params.learning_rate", "metrics.accuracy"]])
Parallel Execution Strategiesβ
Handle multiple runs efficiently with different parallelization approaches:
- Sequential Runs
- Multiprocessing
- Multithreading
Perfect for simple hyperparameter sweeps or A/B testing:
configs = [
{"model": "RandomForest", "n_estimators": 100},
{"model": "XGBoost", "max_depth": 6},
{"model": "LogisticRegression", "C": 1.0},
]
for config in configs:
with mlflow.start_run(run_name=config["model"]):
mlflow.log_params(config)
model = train_model(config)
score = evaluate_model(model)
mlflow.log_metric("f1_score", score)
Scale training across multiple CPU cores:
import multiprocessing as mp
def train_with_config(config):
# Set tracking URI in each process (required for spawn method)
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("parallel-training")
with mlflow.start_run():
mlflow.log_params(config)
model = train_model(config)
score = evaluate_model(model)
mlflow.log_metric("accuracy", score)
return score
if __name__ == "__main__":
configs = [{"lr": lr, "bs": bs} for lr in [0.01, 0.1] for bs in [16, 32]]
with mp.Pool(processes=4) as pool:
results = pool.map(train_with_config, configs)
print(f"Completed {len(results)} experiments")
Use child runs for thread-safe parallel execution:
import threading
from concurrent.futures import ThreadPoolExecutor
def train_worker(config):
with mlflow.start_run(nested=True):
mlflow.log_params(config)
model = train_model(config)
score = evaluate_model(model)
mlflow.log_metric("accuracy", score)
return score
# Start parent run
with mlflow.start_run(run_name="threaded_experiment"):
configs = [{"lr": 0.01, "epochs": e} for e in range(10, 101, 10)]
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(train_worker, config) for config in configs]
results = [future.result() for future in futures]
# Log summary to parent run
mlflow.log_metric("avg_accuracy", sum(results) / len(results))
mlflow.log_metric("max_accuracy", max(results))
Smart Tagging for Organizationβ
Use tags strategically to organize and filter experiments:
with mlflow.start_run():
# Descriptive tags for filtering
mlflow.set_tags(
{
"model_family": "transformer",
"dataset_version": "v2.1",
"environment": "production",
"team": "nlp-research",
"gpu_type": "V100",
"experiment_phase": "hyperparameter_tuning",
}
)
# Special notes tag for documentation
mlflow.set_tag(
"mlflow.note.content",
"Baseline transformer model with attention dropout. "
"Testing different learning rate schedules.",
)
# Training code here...
Search experiments by tags:
# Find all transformer experiments
transformer_runs = mlflow.search_runs(filter_string="tags.model_family = 'transformer'")
# Find production-ready models
prod_models = mlflow.search_runs(
filter_string="tags.environment = 'production' AND metrics.accuracy > 0.95"
)
System Tags Referenceβ
MLflow automatically sets several system tags to capture execution context:
Tag | Description | When Set |
---|---|---|
mlflow.source.name | Source file or notebook name | Always |
mlflow.source.type | Source type (NOTEBOOK, JOB, LOCAL, etc.) | Always |
mlflow.user | User who created the run | Always |
mlflow.source.git.commit | Git commit hash | When run from git repo |
mlflow.source.git.branch | Git branch name | MLflow Projects only |
mlflow.parentRunId | Parent run ID for nested runs | Child runs only |
mlflow.docker.image.name | Docker image used | Docker environments |
mlflow.note.content | User-editable description | Manual only |
Use mlflow.note.content
to document experiment insights, hypotheses, or results directly in the MLflow UI. This tag appears in a dedicated Notes section on the run page.
Integration with Auto Loggingβ
Combine auto logging with manual tracking for the best of both worlds:
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# Enable auto logging
mlflow.autolog()
with mlflow.start_run():
# Auto logging captures model training automatically
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Add custom metrics and artifacts
predictions = model.predict(X_test)
# Log custom evaluation metrics
report = classification_report(y_test, predictions, output_dict=True)
mlflow.log_metrics(
{
"precision_macro": report["macro avg"]["precision"],
"recall_macro": report["macro avg"]["recall"],
"f1_macro": report["macro avg"]["f1-score"],
}
)
# Log custom artifacts
feature_importance = pd.DataFrame(
{"feature": feature_names, "importance": model.feature_importances_}
)
feature_importance.to_csv("feature_importance.csv")
mlflow.log_artifact("feature_importance.csv")
# Access the auto-logged run for additional processing
current_run = mlflow.active_run()
print(f"Auto-logged run ID: {current_run.info.run_id}")
# Access the completed run
last_run = mlflow.last_active_run()
print(f"Final run status: {last_run.info.status}")
Language-Specific Guidesβ
- Python: Complete Python API Reference
- Java: Java API Documentation
- R: R API Documentation
- REST: REST API Reference
Next Steps:
- Set up MLflow Tracking Server for team collaboration
- Explore Auto Logging for supported frameworks
- Learn advanced search patterns for experiment analysis