MLflow Projects
MLflow Projects provide a standard format for packaging and sharing reproducible data science code. Based on simple conventions, Projects enable seamless collaboration and automated execution across different environments and platforms.
Quick Startβ
Running Your First Projectβ
Execute any Git repository or local directory as an MLflow Project:
# Run a project from GitHub
mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.5
# Run a local project
mlflow run . -P data_file=data.csv -P regularization=0.1
# Run with specific entry point
mlflow run . -e validate -P data_file=data.csv
# Run projects programmatically
import mlflow
# Execute remote project
result = mlflow.run(
"https://github.com/mlflow/mlflow-example.git",
parameters={"alpha": 0.5, "l1_ratio": 0.01},
experiment_name="elasticnet_experiment",
)
# Execute local project
result = mlflow.run(
".", entry_point="train", parameters={"epochs": 100}, synchronous=True
)
Any directory with a MLproject
file or containing .py
/.sh
files can be run as an MLflow Project. No complex setup required!
Core Conceptsβ
Project Componentsβ
Every MLflow Project consists of three key elements:
Project Nameβ
A human-readable identifier for your project, typically defined in the MLproject
file.
Entry Pointsβ
Commands that can be executed within the project. Entry points define:
- Parameters - Inputs with types and default values
- Commands - What gets executed when the entry point runs
- Environment - The execution context and dependencies
Environmentβ
The software environment containing all dependencies needed to run the project. MLflow supports multiple environment types:
Environment | Use Case | Dependencies |
---|---|---|
Virtualenv (Recommended) | Python packages from PyPI | python_env.yaml |
Conda | Python + native libraries | conda.yaml |
Docker | Complex dependencies, non-Python | Dockerfile |
System | Use current environment | None |
Project Structure & Configurationβ
Convention-Based Projectsβ
Projects without an MLproject
file use these conventions:
my-project/
βββ train.py # Executable entry point
βββ validate.sh # Shell script entry point
βββ conda.yaml # Optional: Conda environment
βββ python_env.yaml # Optional: Python environment
βββ data/ # Project data and assets
Default Behavior:
- Name: Directory name
- Entry Points: Any
.py
or.sh
file - Environment: Conda environment from
conda.yaml
, or Python-only environment - Parameters: Passed via command line as
--key value
MLproject File Configurationβ
For advanced control, create an MLproject
file:
name: My ML Project
# Environment specification (choose one)
python_env: python_env.yaml
# conda_env: conda.yaml
# docker_env:
# image: python:3.9
entry_points:
main:
parameters:
data_file: path
regularization: {type: float, default: 0.1}
max_epochs: {type: int, default: 100}
command: "python train.py --reg {regularization} --epochs {max_epochs} {data_file}"
validate:
parameters:
model_path: path
test_data: path
command: "python validate.py {model_path} {test_data}"
hyperparameter_search:
parameters:
search_space: uri
n_trials: {type: int, default: 50}
command: "python hyperparam_search.py --trials {n_trials} --config {search_space}"
Parameter Typesβ
MLflow supports four parameter types with automatic validation and transformation:
Type | Description | Example | Special Handling |
---|---|---|---|
string | Text data | "hello world" | None |
float | Decimal numbers | 0.1 , 3.14 | Validation |
int | Whole numbers | 42 , 100 | Validation |
path | Local file paths | data.csv , s3://bucket/file | Downloads remote URIs to local files |
uri | Any URI | s3://bucket/ , ./local/path | Converts relative paths to absolute |
path
parameters automatically download remote files (S3, GCS, etc.) to local storage before execution. Use uri
for applications that can read directly from remote storage.
Environment Managementβ
Python Virtual Environments (Recommended)β
Create a python_env.yaml
file for pure Python dependencies:
# python_env.yaml
python: "3.9.16"
# Optional: build dependencies
build_dependencies:
- pip
- setuptools
- wheel==0.37.1
# Runtime dependencies
dependencies:
- mlflow>=2.0.0
- scikit-learn==1.2.0
- pandas>=1.5.0
- numpy>=1.21.0
# MLproject
name: Python Project
python_env: python_env.yaml
entry_points:
main:
command: "python train.py"
Conda Environmentsβ
For projects requiring native libraries or complex dependencies:
# conda.yaml
name: ml-project
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- cudnn=8.2.1 # CUDA libraries
- scikit-learn
- pip
- pip:
- mlflow>=2.0.0
- tensorflow==2.10.0
# MLproject
name: Deep Learning Project
conda_env: conda.yaml
entry_points:
train:
parameters:
gpu_count: {type: int, default: 1}
command: "python train_model.py --gpus {gpu_count}"
By using Conda, you agree to Anaconda's Terms of Service.
Docker Environmentsβ
For maximum reproducibility and complex system dependencies:
# Dockerfile
FROM python:3.9-slim
RUN apt-get update && apt-get install -y \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install -r requirements.txt
WORKDIR /mlflow/projects/code
# MLproject
name: Containerized Project
docker_env:
image: my-ml-image:latest
volumes: ["/host/data:/container/data"]
environment:
- ["CUDA_VISIBLE_DEVICES", "0,1"]
- "AWS_PROFILE" # Copy from host
entry_points:
train:
command: "python distributed_training.py"
Advanced Docker Options:
docker_env:
image: 012345678910.dkr.ecr.us-west-2.amazonaws.com/ml-training:v1.0
volumes:
- "/local/data:/data"
- "/tmp:/tmp"
environment:
- ["MODEL_REGISTRY", "s3://my-bucket/models"]
- ["EXPERIMENT_NAME", "production-training"]
- "MLFLOW_TRACKING_URI" # Copy from host
Environment Manager Selectionβ
Control which environment manager to use:
# Force virtualenv (ignores conda.yaml)
mlflow run . --env-manager virtualenv
# Use local environment (no isolation)
mlflow run . --env-manager local
# Use conda (default if conda.yaml present)
mlflow run . --env-manager conda
Execution & Deploymentβ
Local Executionβ
# Basic execution
mlflow run .
# With parameters
mlflow run . -P lr=0.01 -P batch_size=32
# Specific entry point
mlflow run . -e hyperparameter_search -P n_trials=100
# Custom environment
mlflow run . --env-manager virtualenv
Remote Executionβ
Databricks Platformβ
# Run on Databricks cluster
mlflow run . --backend databricks --backend-config cluster-config.json
// cluster-config.json
{
"cluster_spec": {
"new_cluster": {
"node_type_id": "i3.xlarge",
"num_workers": 2,
"spark_version": "11.3.x-scala2.12"
}
},
"run_name": "distributed-training"
}
Kubernetes Clustersβ
# Run on Kubernetes
mlflow run . --backend kubernetes --backend-config k8s-config.json
// k8s-config.json
{
"kube-context": "my-cluster",
"repository-uri": "gcr.io/my-project/ml-training",
"kube-job-template-path": "k8s-job-template.yaml"
}
# k8s-job-template.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{replaced-with-project-name}"
namespace: mlflow
spec:
ttlSecondsAfterFinished: 3600
backoffLimit: 2
template:
spec:
containers:
- name: "{replaced-with-project-name}"
image: "{replaced-with-image-uri}"
command: ["{replaced-with-entry-point-command}"]
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
env:
- name: MLFLOW_TRACKING_URI
value: "https://my-mlflow-server.com"
restartPolicy: Never
Python APIβ
import mlflow
from mlflow.projects import run
# Synchronous execution
result = run(
uri="https://github.com/mlflow/mlflow-example.git",
entry_point="main",
parameters={"alpha": 0.5},
backend="local",
synchronous=True,
)
# Asynchronous execution
submitted_run = run(
uri=".",
entry_point="train",
parameters={"epochs": 100},
backend="databricks",
backend_config="cluster-config.json",
synchronous=False,
)
# Monitor progress
if submitted_run.wait():
print("Training completed successfully!")
run_data = mlflow.get_run(submitted_run.run_id)
print(f"Final accuracy: {run_data.data.metrics['accuracy']}")
Building Workflowsβ
Multi-Step Pipelinesβ
Combine multiple projects into sophisticated ML workflows:
import mlflow
from mlflow.tracking import MlflowClient
def ml_pipeline():
client = MlflowClient()
# Step 1: Data preprocessing
prep_run = mlflow.run(
"./preprocessing", parameters={"input_path": "s3://bucket/raw-data"}
)
# Wait for completion and get output
if prep_run.wait():
prep_run_data = client.get_run(prep_run.run_id)
processed_data_path = prep_run_data.data.params["output_path"]
# Step 2: Feature engineering
feature_run = mlflow.run(
"./feature_engineering", parameters={"data_path": processed_data_path}
)
if feature_run.wait():
feature_data = client.get_run(feature_run.run_id)
features_path = feature_data.data.params["features_output"]
# Step 3: Parallel model training
model_runs = []
algorithms = ["random_forest", "xgboost", "neural_network"]
for algo in algorithms:
run = mlflow.run(
"./training",
entry_point=algo,
parameters={"features_path": features_path, "algorithm": algo},
synchronous=False, # Run in parallel
)
model_runs.append(run)
# Wait for all models and select best
best_model = None
best_metric = 0
for run in model_runs:
if run.wait():
run_data = client.get_run(run.run_id)
accuracy = run_data.data.metrics.get("accuracy", 0)
if accuracy > best_metric:
best_metric = accuracy
best_model = run.run_id
# Step 4: Deploy best model
if best_model:
mlflow.run(
"./deployment",
parameters={"model_run_id": best_model, "stage": "production"},
)
# Execute pipeline
ml_pipeline()
Hyperparameter Optimizationβ
import mlflow
import itertools
from concurrent.futures import ThreadPoolExecutor
def hyperparameter_search():
# Define parameter grid
param_grid = {
"learning_rate": [0.01, 0.1, 0.2],
"n_estimators": [100, 200, 500],
"max_depth": [3, 6, 10],
}
# Generate all combinations
param_combinations = [
dict(zip(param_grid.keys(), values))
for values in itertools.product(*param_grid.values())
]
def train_model(params):
return mlflow.run("./training", parameters=params, synchronous=False)
# Launch parallel training jobs
with ThreadPoolExecutor(max_workers=5) as executor:
submitted_runs = list(executor.map(train_model, param_combinations))
# Collect results
results = []
for run in submitted_runs:
if run.wait():
run_data = mlflow.get_run(run.run_id)
results.append(
{
"run_id": run.run_id,
"params": run_data.data.params,
"metrics": run_data.data.metrics,
}
)
# Find best model
best_run = max(results, key=lambda x: x["metrics"].get("f1_score", 0))
print(f"Best model: {best_run['run_id']}")
print(f"Best F1 score: {best_run['metrics']['f1_score']}")
return best_run
# Execute hyperparameter search
best_model = hyperparameter_search()
Advanced Featuresβ
Docker Image Buildingβ
Build custom images during execution:
# Build new image based on project's base image
mlflow run . --backend kubernetes --build-image
# Use pre-built image
mlflow run . --backend kubernetes
# Programmatic image building
mlflow.run(
".",
backend="kubernetes",
backend_config="k8s-config.json",
build_image=True, # Creates new image with project code
docker_auth={ # Registry authentication
"username": "myuser",
"password": "mytoken",
},
)
Git Integrationβ
MLflow automatically tracks Git information:
# Run specific commit
mlflow run https://github.com/mlflow/mlflow-example.git --version <commit hash>
# Run branch
mlflow run https://github.com/mlflow/mlflow-example.git --version feature-branch
# Run from subdirectory
mlflow run https://github.com/my-repo.git#subdirectory/my-project
Environment Variable Propagationβ
Critical environment variables are automatically passed to execution environments:
export MLFLOW_TRACKING_URI="https://my-tracking-server.com"
export AWS_PROFILE="ml-experiments"
export CUDA_VISIBLE_DEVICES="0,1"
# These variables are available in the project execution environment
mlflow run .
Custom Backend Developmentβ
Create custom execution backends:
# custom_backend.py
from mlflow.projects.backend import AbstractBackend
class MyCustomBackend(AbstractBackend):
def run(
self,
project_uri,
entry_point,
parameters,
version,
backend_config,
tracking_uri,
experiment_id,
):
# Custom execution logic
# Return SubmittedRun object
pass
Register as plugin:
# setup.py
setup(
entry_points={
"mlflow.project_backend": [
"my-backend=my_package.custom_backend:MyCustomBackend"
]
}
)
Best Practicesβ
Project Organizationβ
ml-project/
βββ MLproject # Project configuration
βββ python_env.yaml # Environment dependencies
βββ src/ # Source code
β βββ train.py
β βββ evaluate.py
β βββ utils/
βββ data/ # Sample/test data
βββ configs/ # Configuration files
β βββ model_config.yaml
β βββ hyperparams.json
βββ tests/ # Unit tests
βββ README.md # Project documentation
Environment Managementβ
Development Tips:
- Use virtualenv for pure Python projects
- Use conda when you need system libraries (CUDA, Intel MKL)
- Use Docker for complex dependencies or production deployment
- Pin exact versions in production environments
Performance Optimization:
# Fast iteration during development
python_env: python_env.yaml
entry_points:
develop:
command: "python train.py"
production:
parameters:
full_dataset: {type: path}
epochs: {type: int, default: 100}
command: "python train.py --data {full_dataset} --epochs {epochs}"
Parameter Managementβ
# Good: Typed parameters with defaults
entry_points:
train:
parameters:
learning_rate: {type: float, default: 0.01}
batch_size: {type: int, default: 32}
data_path: path
output_dir: {type: str, default: "./outputs"}
command: "python train.py --lr {learning_rate} --batch {batch_size} --data {data_path} --output {output_dir}"
Reproducibilityβ
# Include environment info in tracking
import mlflow
import platform
import sys
with mlflow.start_run():
# Log environment info
mlflow.log_param("python_version", sys.version)
mlflow.log_param("platform", platform.platform())
# Log Git commit if available
try:
import git
repo = git.Repo(".")
mlflow.log_param("git_commit", repo.head.commit.hexsha)
except:
pass
Troubleshootingβ
Common Issuesβ
Docker Permission Denied
# Solution: Add user to docker group or use sudo
sudo usermod -aG docker $USER
# Then restart shell/session
Conda Environment Creation Fails
# Solution: Clean conda cache and retry
conda clean --all
mlflow run . --env-manager conda
Git Authentication for Private Repos
# Solution: Use SSH with key authentication
mlflow run git@github.com:private/repo.git
# Or HTTPS with token
mlflow run https://token:x-oauth-basic@github.com/private/repo.git
Kubernetes Job Fails
# Debug: Check job status
kubectl get jobs -n mlflow
kubectl describe job <job-name> -n mlflow
kubectl logs -n mlflow job/<job-name>
Debugging Tipsβ
Enable Verbose Logging:
export MLFLOW_LOGGING_LEVEL=DEBUG
mlflow run . -v
Test Locally First:
# Test with local environment before remote deployment
mlflow run . --env-manager local
# Then test with environment isolation
mlflow run . --env-manager virtualenv
Validate Project Structure:
from mlflow.projects import load_project
# Load and inspect project
project = load_project(".")
print(f"Project name: {project.name}")
print(f"Entry points: {list(project._entry_points.keys())}")
print(f"Environment type: {project.env_type}")
Ready to get started? Check out our MLflow Projects Examples for hands-on tutorials and real-world use cases.