Skip to main content

Models From Code

availability

Models from Code is available in MLflow 2.12.2 and above. For earlier versions, use the legacy serialization methods outlined in the Custom Python Model documentation.

target use cases

Models from Code is designed for models without optimized weights (GenAI Agents, applications, custom logic). For traditional ML/DL models with trained weights, use the built-in log_model() APIs or custom PythonModel with mlflow.pyfunc.log_model().

Models from Code transforms how you define, store, and load custom models and applications. Instead of relying on complex serialization, it saves your model as readable Python scripts, making development more transparent and debugging significantly easier.

Why Models From Code?​

The key difference lies in how models are represented during serialization:

Legacy Approach - Serializes model objects using cloudpickle or custom serializers, creating binary files that can be difficult to debug and have compatibility limitations.

Models From Code - Saves simple Python scripts with your model definition, making them readable, debuggable, and portable across environments.

Models from code comparison with legacy serialization

Key Advantages​

Transparency and Readability - Your model code is stored as plain Python scripts, making it easy to understand and debug directly in the MLflow UI.

Reduced Debugging Complexity - No more trial-and-error with serialization issues. What you write is exactly what gets executed.

Better Compatibility - Eliminates pickle/cloudpickle limitations like Python version dependencies, complex object serialization issues, and performance bottlenecks.

Enhanced Security - Human-readable code makes it easier to audit and verify model behavior before deployment.

Core Requirements​

Understanding these key concepts will help you use Models from Code effectively:

Script Execution​

Your model script is executed during logging to validate correctness. Ensure any external dependencies or authentication are properly configured in your logging environment.

Import Management​

Only include imports you actually use. MLflow infers requirements from all top-level imports, so unused imports will unnecessarily bloat your model's dependencies.

External Dependencies​

Non-pip installable packages must be specified via code_paths. The system doesn't automatically capture external references beyond standard package imports.

development workflow

Use a linter to identify unused imports while developing. This keeps your model's requirements clean and deployment lightweight.

security consideration

Model code is stored in plain text. Never include sensitive information like API keys or passwords in your scripts. Use environment variables or secure configuration management instead.

Development in Jupyter Notebooks​

Jupyter notebooks are excellent for AI development, but Models from Code requires Python scripts (.py files). Fortunately, IPython's %%writefile magic command bridges this gap perfectly.

Using %%writefile​

The %%writefile magic command captures cell contents and writes them to a file:

# %%writefile "./hello.py"  # Uncomment to create the file locally

print("hello!")

This creates a hello.py file containing:

print("hello!")

Best Practices for Jupyter​

Overwrite, Don't Append - Use the default %%writefile behavior rather than the -a append option to avoid duplicate code and debugging confusion.

Cell-by-Cell Development - Each %%writefile cell creates one script file. This keeps your model definition clean and focused.

Immediate Testing - You can run your generated script immediately after writing it to verify it works correctly.

Examples and Patterns​

This example demonstrates the basics of Models from Code with a simple mathematical model.

Creating the Model Script​

# If running in a Jupyter notebook, uncomment the next line:
# %%writefile "./basic.py"

import pandas as pd
from typing import List, Dict
from mlflow.pyfunc import PythonModel
from mlflow.models import set_model


class BasicModel(PythonModel):
def exponential(self, numbers):
return {f"{x}": 2**x for x in numbers}

def predict(self, context, model_input) -> Dict[str, float]:
if isinstance(model_input, pd.DataFrame):
model_input = list(model_input.iloc[0].values())
return self.exponential(model_input)


# This tells MLflow which object to use for inference
set_model(BasicModel())

Logging the Model​

import mlflow

mlflow.set_experiment("Basic Model From Code")

model_info = mlflow.pyfunc.log_model(
python_model="basic.py", # Path to your script
name="arithmetic_model",
input_example=[42.0, 24.0],
)

Using the Model​

# Load and use the model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

# Make predictions
result = loaded_model.predict([2.2, 3.1, 4.7])
print(result) # {'2.2': 4.59, '3.1': 8.57, '4.7': 25.99}

The MLflow UI showing the stored model code as a serialized python script

Troubleshooting Common Issues​

NameError When Loading Models​

Problem: Getting NameError when loading your saved model.

Solution: Ensure all required imports are defined within your model script:

# ❌ Bad - imports missing in script
def predict(self, context, model_input):
return pd.DataFrame(model_input) # NameError: pd not defined


# βœ… Good - imports included
import pandas as pd


def predict(self, context, model_input):
return pd.DataFrame(model_input)

ImportError with External Dependencies​

Problem: ImportError when loading models with external dependencies.

Solution: Use code_paths for non-PyPI dependencies:

mlflow.pyfunc.log_model(
python_model="my_model.py",
name="model",
code_paths=["utils.py", "helpers/"], # Include external files
extra_pip_requirements=["custom-package==1.0.0"], # Manual requirements
)

Bloated Requirements File​

Problem: requirements.txt contains unnecessary packages.

Solution: Clean up your imports to only include what you use:

# ❌ Bad - unused imports
import pandas as pd
import numpy as np
import tensorflow as tf
import torch
from sklearn.ensemble import RandomForestClassifier


def predict(self, context, model_input):
return {"result": model_input * 2} # Only uses basic operations


# βœ… Good - minimal imports
def predict(self, context, model_input):
return {"result": model_input * 2}

Migration from Legacy Serialization​

If you're currently using legacy model serialization, here's how to migrate:

Before (Legacy)​

class MyModel(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input):
return model_input * 2


# Log object instance
model_instance = MyModel()
mlflow.pyfunc.log_model(python_model=model_instance, name="model")

After (Models from Code)​

# Save as script: my_model.py
# %%writefile "./my_model.py"
import mlflow
from mlflow.pyfunc import PythonModel
from mlflow.models import set_model


class MyModel(PythonModel):
def predict(self, context, model_input):
return model_input * 2


set_model(MyModel())

# Log script path
mlflow.pyfunc.log_model(python_model="my_model.py", name="model")

Best Practices Summary​

Code Organization

  • Keep model scripts focused and minimal
  • Use descriptive names for model files and functions
  • Organize related functionality into separate modules using code_paths

Security

  • Never hardcode sensitive information in model scripts
  • Use environment variables for configuration
  • Review code before logging to ensure no secrets are included

Performance

  • Import only what you need to minimize dependencies
  • Use lazy loading for expensive resources
  • Consider memory management for long-running models

Development Workflow

  • Use %%writefile in Jupyter for rapid prototyping
  • Test your scripts independently before logging
  • Use linters to catch unused imports and other issues

Additional Resources​

For more information on related topics: