MLflow Tracking APIs

MLflow Tracking provides comprehensive APIs across multiple programming languages to capture your machine learning experiments. Whether you prefer automatic instrumentation or granular control, MLflow adapts to your workflow.

Choose Your Approach

MLflow offers two primary methods for experiment tracking, each optimized for different use cases:

🤖 Automatic Logging - Zero Setup, Maximum Coverage

Perfect for getting started quickly or when using supported ML libraries. Just add one line and MLflow captures everything automatically.

import mlflow

mlflow.autolog()  # That's it!

# Your existing training code works unchanged
model.fit(X_train, y_train)

What gets logged automatically:

Model parameters and hyperparameters
Training and validation metrics
Model artifacts and checkpoints
Training plots and visualizations
Framework-specific metadata

Supported libraries: Scikit-learn, XGBoost, LightGBM, PyTorch, Keras/TensorFlow, Spark, and more.

→ Explore Auto Logging

🛠️ Manual Logging - Complete Control, Custom Workflows

Ideal for custom training loops, advanced experimentation, or when you need precise control over what gets tracked.

Python
Java
R

import mlflow

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("batch_size", 32)

    # Your training logic here
    for epoch in range(num_epochs):
        train_loss = train_model()
        val_loss = validate_model()

        # Log metrics with step tracking
        mlflow.log_metrics({"train_loss": train_loss, "val_loss": val_loss}, step=epoch)

    # Log final model
    mlflow.sklearn.log_model(model, name="model")

MlflowClient client = new MlflowClient();
RunInfo run = client.createRun();

// Log parameters
client.logParam(run.getRunId(), "learning_rate", "0.01");
client.logParam(run.getRunId(), "batch_size", "32");

// Log metrics with timesteps
for (int epoch = 0; epoch < numEpochs; epoch++) {
    double trainLoss = trainModel();
    client.logMetric(run.getRunId(), "train_loss", trainLoss,
                    System.currentTimeMillis(), epoch);
}

library(mlflow)

with(mlflow_start_run(), {
  # Log parameters
  mlflow_log_param("learning_rate", 0.01)
  mlflow_log_param("batch_size", 32)

  # Training loop
  for (epoch in 1:num_epochs) {
    train_loss <- train_model()
    mlflow_log_metric("train_loss", train_loss, step = epoch)
  }
})

Core Logging Functions

Setup & Configuration

Function	Purpose	Example
`mlflow.set_tracking_uri()`	Connect to tracking server or database	`mlflow.set_tracking_uri("http://localhost:5000")`
`mlflow.get_tracking_uri()`	Get current tracking URI	`uri = mlflow.get_tracking_uri()`
`mlflow.create_experiment()`	Create new experiment	`exp_id = mlflow.create_experiment("my-experiment")`
`mlflow.set_experiment()`	Set active experiment	`mlflow.set_experiment("fraud-detection")`

Run Management

Function	Purpose	Example
`mlflow.start_run()`	Start new run (with context manager)	`with mlflow.start_run(): ...`
`mlflow.end_run()`	End current run	`mlflow.end_run(status="FINISHED")`
`mlflow.active_run()`	Get currently active run	`run = mlflow.active_run()`
`mlflow.last_active_run()`	Get last completed run	`last_run = mlflow.last_active_run()`

Data Logging

Function	Purpose	Example
`mlflow.log_param()` / `mlflow.log_params()`	Log hyperparameters	`mlflow.log_param("lr", 0.01)`
`mlflow.log_metric()` / `mlflow.log_metrics()`	Log performance metrics	`mlflow.log_metric("accuracy", 0.95, step=10)`
`mlflow.log_input()`	Log dataset information	`mlflow.log_input(dataset)`
`mlflow.set_tag()` / `mlflow.set_tags()`	Add metadata tags	`mlflow.set_tag("model_type", "CNN")`

Artifact Management

Function	Purpose	Example
`mlflow.log_artifact()`	Log single file/directory	`mlflow.log_artifact("model.pkl")`
`mlflow.log_artifacts()`	Log entire directory	`mlflow.log_artifacts("./plots/")`
`mlflow.get_artifact_uri()`	Get artifact storage location	`uri = mlflow.get_artifact_uri()`

Model Management (New in MLflow 3)

Function	Purpose	Example
`mlflow.initialize_logged_model()`	Initialize a logged model in PENDING state	`model = mlflow.initialize_logged_model(name="my_model")`
`mlflow.create_external_model()`	Create external model (artifacts stored outside MLflow)	`model = mlflow.create_external_model(name="agent")`
`mlflow.finalize_logged_model()`	Update model status to READY or FAILED	`mlflow.finalize_logged_model(model_id, "READY")`
`mlflow.get_logged_model()`	Retrieve logged model by ID	`model = mlflow.get_logged_model(model_id)`
`mlflow.last_logged_model()`	Get most recently logged model	`model = mlflow.last_logged_model()`
`mlflow.search_logged_models()`	Search for logged models	`models = mlflow.search_logged_models(filter_string="name='my_model'")`
`mlflow.log_model_params()`	Log parameters to a specific model	`mlflow.log_model_params({"param": "value"}, model_id)`
`mlflow.set_logged_model_tags()`	Set tags on a logged model	`mlflow.set_logged_model_tags(model_id, {"key": "value"})`
`mlflow.delete_logged_model_tag()`	Delete tag from a logged model	`mlflow.delete_logged_model_tag(model_id, "key")`

Active Model Management (New in MLflow 3)

Function	Purpose	Example
`mlflow.set_active_model()`	Set active model for trace linking	`mlflow.set_active_model(name="my_model")`
`mlflow.get_active_model_id()`	Get current active model ID	`model_id = mlflow.get_active_model_id()`
`mlflow.clear_active_model()`	Clear active model	`mlflow.clear_active_model()`

Language-Specific API Coverage

Capability	Python	Java	R	REST API
Basic Logging	✅ Full	✅ Full	✅ Full	✅ Full
Auto Logging	✅ 15+ Libraries	❌ Not Available	✅ Limited	❌ Not Available
Model Logging	✅ 20+ Flavors	✅ Basic Support	✅ Basic Support	✅ Via Artifacts
Logged Model Management	✅ Full (MLflow 3)	❌ Not Available	❌ Not Available	✅ Basic
Dataset Tracking	✅ Full	✅ Basic	✅ Basic	✅ Basic
Search & Query	✅ Advanced	✅ Basic	✅ Basic	✅ Full

api-parity

The Python API provides the most comprehensive feature set. Java and R APIs offer core functionality with ongoing feature additions in each release.

Advanced Tracking Patterns

Working with Logged Models (New in MLflow 3)

MLflow 3 introduces powerful logged model management capabilities for tracking models independently of runs:

Creating and Managing External Models

For models stored outside MLflow (like deployed agents or external model artifacts):

import mlflow

# Create an external model for tracking without storing artifacts in MLflow
model = mlflow.create_external_model(
    name="chatbot_agent",
    model_type="agent",
    tags={"version": "v1.0", "environment": "production"},
)

# Log parameters specific to this model
mlflow.log_model_params(
    {"temperature": "0.7", "max_tokens": "1000"}, model_id=model.model_id
)

# Set as active model for automatic trace linking
mlflow.set_active_model(model_id=model.model_id)


@mlflow.trace
def chat_with_agent(message):
    # This trace will be automatically linked to the active model
    return agent.chat(message)


# Traces are now linked to your external model
traces = mlflow.search_traces(model_id=model.model_id)

Advanced Model Lifecycle Management

For models that require custom preparation or validation:

import mlflow
from mlflow.entities import LoggedModelStatus

# Initialize model in PENDING state
model = mlflow.initialize_logged_model(
    name="custom_neural_network",
    model_type="neural_network",
    tags={"architecture": "transformer", "dataset": "custom"},
)

try:
    # Custom model preparation logic
    train_model()
    validate_model()

    # Save model artifacts using standard MLflow model logging
    mlflow.pytorch.log_model(
        pytorch_model=model_instance,
        name="model",
        model_id=model.model_id,  # Link to the logged model
    )

    # Finalize model as READY
    mlflow.finalize_logged_model(model.model_id, LoggedModelStatus.READY)

except Exception as e:
    # Mark model as FAILED if issues occur
    mlflow.finalize_logged_model(model.model_id, LoggedModelStatus.FAILED)
    raise

# Retrieve and work with the logged model
final_model = mlflow.get_logged_model(model.model_id)
print(f"Model {final_model.name} is {final_model.status}")

Searching and Querying Logged Models

# Find all production-ready transformer models
production_models = mlflow.search_logged_models(
    filter_string="tags.environment = 'production' AND model_type = 'transformer'",
    order_by=[{"field_name": "creation_time", "ascending": False}],
    output_format="pandas",
)

# Search for models with specific performance metrics
high_accuracy_models = mlflow.search_logged_models(
    filter_string="metrics.accuracy > 0.95",
    datasets=[{"dataset_name": "test_set"}],  # Only consider test set metrics
    max_results=10,
)

# Get the most recently logged model in current session
latest_model = mlflow.last_logged_model()
if latest_model:
    print(f"Latest model: {latest_model.name} (ID: {latest_model.model_id})")

Precise Metric Tracking

Control exactly when and how metrics are recorded with custom timestamps and steps:

import time
from datetime import datetime

# Log with custom step (training iteration/epoch)
for epoch in range(100):
    loss = train_epoch()
    mlflow.log_metric("train_loss", loss, step=epoch)

# Log with custom timestamp
now = int(time.time() * 1000)  # MLflow expects milliseconds
mlflow.log_metric("inference_latency", latency, timestamp=now)

# Log with both step and timestamp
mlflow.log_metric("gpu_utilization", gpu_usage, step=epoch, timestamp=now)

Step Requirements:

Must be a valid 64-bit integer
Can be negative or out of order
Supports gaps in sequences (e.g., 1, 5, 75, -20)

Experiment Organization

Structure your experiments for easy comparison and analysis:

# Method 1: Environment variables
import os

os.environ["MLFLOW_EXPERIMENT_NAME"] = "fraud-detection-v2"

# Method 2: Explicit experiment setting
mlflow.set_experiment("hyperparameter-tuning")

# Method 3: Create with custom configuration
experiment_id = mlflow.create_experiment(
    "production-models",
    artifact_location="s3://my-bucket/experiments/",
    tags={"team": "data-science", "environment": "prod"},
)

Hierarchical Runs with Parent-Child Relationships

Organize complex experiments like hyperparameter sweeps or cross-validation:

# Parent run for the entire experiment
with mlflow.start_run(run_name="hyperparameter_sweep") as parent_run:
    mlflow.log_param("search_strategy", "random")

    best_score = 0
    best_params = {}

    # Child runs for each parameter combination
    for lr in [0.001, 0.01, 0.1]:
        for batch_size in [16, 32, 64]:
            with mlflow.start_run(
                nested=True, run_name=f"lr_{lr}_bs_{batch_size}"
            ) as child_run:
                mlflow.log_params({"learning_rate": lr, "batch_size": batch_size})

                # Train and evaluate
                model = train_model(lr, batch_size)
                score = evaluate_model(model)
                mlflow.log_metric("accuracy", score)

                # Track best configuration in parent
                if score > best_score:
                    best_score = score
                    best_params = {"learning_rate": lr, "batch_size": batch_size}

    # Log best results to parent run
    mlflow.log_params(best_params)
    mlflow.log_metric("best_accuracy", best_score)

# Query child runs
child_runs = mlflow.search_runs(
    filter_string=f"tags.mlflow.parentRunId = '{parent_run.info.run_id}'"
)
print("Child run results:")
print(child_runs[["run_id", "params.learning_rate", "metrics.accuracy"]])

Parallel Execution Strategies

Handle multiple runs efficiently with different parallelization approaches:

Sequential Runs
Multiprocessing
Multithreading

Perfect for simple hyperparameter sweeps or A/B testing:

configs = [
    {"model": "RandomForest", "n_estimators": 100},
    {"model": "XGBoost", "max_depth": 6},
    {"model": "LogisticRegression", "C": 1.0},
]

for config in configs:
    with mlflow.start_run(run_name=config["model"]):
        mlflow.log_params(config)
        model = train_model(config)
        score = evaluate_model(model)
        mlflow.log_metric("f1_score", score)

Scale training across multiple CPU cores:

import multiprocessing as mp


def train_with_config(config):
    # Set tracking URI in each process (required for spawn method)
    mlflow.set_tracking_uri("http://localhost:5000")
    mlflow.set_experiment("parallel-training")

    with mlflow.start_run():
        mlflow.log_params(config)
        model = train_model(config)
        score = evaluate_model(model)
        mlflow.log_metric("accuracy", score)
        return score


if __name__ == "__main__":
    configs = [{"lr": lr, "bs": bs} for lr in [0.01, 0.1] for bs in [16, 32]]

    with mp.Pool(processes=4) as pool:
        results = pool.map(train_with_config, configs)

    print(f"Completed {len(results)} experiments")

Use child runs for thread-safe parallel execution:

import threading
from concurrent.futures import ThreadPoolExecutor


def train_worker(config):
    with mlflow.start_run(nested=True):
        mlflow.log_params(config)
        model = train_model(config)
        score = evaluate_model(model)
        mlflow.log_metric("accuracy", score)
        return score


# Start parent run
with mlflow.start_run(run_name="threaded_experiment"):
    configs = [{"lr": 0.01, "epochs": e} for e in range(10, 101, 10)]

    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = [executor.submit(train_worker, config) for config in configs]
        results = [future.result() for future in futures]

    # Log summary to parent run
    mlflow.log_metric("avg_accuracy", sum(results) / len(results))
    mlflow.log_metric("max_accuracy", max(results))

Smart Tagging for Organization

Use tags strategically to organize and filter experiments:

with mlflow.start_run():
    # Descriptive tags for filtering
    mlflow.set_tags(
        {
            "model_family": "transformer",
            "dataset_version": "v2.1",
            "environment": "production",
            "team": "nlp-research",
            "gpu_type": "V100",
            "experiment_phase": "hyperparameter_tuning",
        }
    )

    # Special notes tag for documentation
    mlflow.set_tag(
        "mlflow.note.content",
        "Baseline transformer model with attention dropout. "
        "Testing different learning rate schedules.",
    )

    # Training code here...

Search experiments by tags:

# Find all transformer experiments
transformer_runs = mlflow.search_runs(filter_string="tags.model_family = 'transformer'")

# Find production-ready models
prod_models = mlflow.search_runs(
    filter_string="tags.environment = 'production' AND metrics.accuracy > 0.95"
)

System Tags Reference

MLflow automatically sets several system tags to capture execution context:

Tag	Description	When Set
`mlflow.source.name`	Source file or notebook name	Always
`mlflow.source.type`	Source type (NOTEBOOK, JOB, LOCAL, etc.)	Always
`mlflow.user`	User who created the run	Always
`mlflow.source.git.commit`	Git commit hash	When run from git repo
`mlflow.source.git.branch`	Git branch name	MLflow Projects only
`mlflow.parentRunId`	Parent run ID for nested runs	Child runs only
`mlflow.docker.image.name`	Docker image used	Docker environments
`mlflow.note.content`	User-editable description	Manual only

pro-tip

Use mlflow.note.content to document experiment insights, hypotheses, or results directly in the MLflow UI. This tag appears in a dedicated Notes section on the run page.

Integration with Auto Logging

Combine auto logging with manual tracking for the best of both worlds:

import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Enable auto logging
mlflow.autolog()

with mlflow.start_run():
    # Auto logging captures model training automatically
    model = RandomForestClassifier(n_estimators=100)
    model.fit(X_train, y_train)

    # Add custom metrics and artifacts
    predictions = model.predict(X_test)

    # Log custom evaluation metrics
    report = classification_report(y_test, predictions, output_dict=True)
    mlflow.log_metrics(
        {
            "precision_macro": report["macro avg"]["precision"],
            "recall_macro": report["macro avg"]["recall"],
            "f1_macro": report["macro avg"]["f1-score"],
        }
    )

    # Log custom artifacts
    feature_importance = pd.DataFrame(
        {"feature": feature_names, "importance": model.feature_importances_}
    )
    feature_importance.to_csv("feature_importance.csv")
    mlflow.log_artifact("feature_importance.csv")

    # Access the auto-logged run for additional processing
    current_run = mlflow.active_run()
    print(f"Auto-logged run ID: {current_run.info.run_id}")

# Access the completed run
last_run = mlflow.last_active_run()
print(f"Final run status: {last_run.info.status}")

Language-Specific Guides

Python: Complete Python API Reference
Java: Java API Documentation
R: R API Documentation
REST: REST API Reference

Next Steps:

Set up MLflow Tracking Server for team collaboration
Explore Auto Logging for supported frameworks
Learn advanced search patterns for experiment analysis

Choose Your Approach​

🤖 Automatic Logging - Zero Setup, Maximum Coverage​

🛠️ Manual Logging - Complete Control, Custom Workflows​

Core Logging Functions​

Setup & Configuration​

Run Management​

Data Logging​

Artifact Management​

Model Management (New in MLflow 3)​

Active Model Management (New in MLflow 3)​

Language-Specific API Coverage​

Advanced Tracking Patterns​

Working with Logged Models (New in MLflow 3)​

Creating and Managing External Models​

Advanced Model Lifecycle Management​

Searching and Querying Logged Models​

Precise Metric Tracking​

Experiment Organization​

Hierarchical Runs with Parent-Child Relationships​

Parallel Execution Strategies​

Smart Tagging for Organization​

System Tags Reference​

Integration with Auto Logging​

Language-Specific Guides​