Skip to main content

Model Signatures and Input Examples

Model signatures and input examples are foundational components that define how your models should be used, ensuring consistent and reliable interactions across MLflow's ecosystem.

What Are Model Signatures and Input Examples?​

Model Signature - Defines the expected format for model inputs, outputs, and parameters. Think of it as a contract that specifies exactly what data your model expects and what it will return.

Model Input Example - Provides a concrete example of valid model input. This helps developers understand the required data format and validates that your model works correctly.

Model signatures comparison

Why They Matter​

Model signatures and input examples provide crucial benefits:

  • Consistency: Ensure all model interactions follow the same data format
  • Validation: Catch data format errors before they reach your model
  • Documentation: Serve as living documentation for model usage
  • Deployment Safety: Enable MLflow deployment tools to validate requests automatically
  • UI Integration: Allow MLflow UI to display clear model requirements
Databricks Unity Catalog Requirement

Model signatures are REQUIRED for registering models in Databricks Unity Catalog. Unity Catalog enforces concrete type definitions for all registered models and will reject models without proper signatures. Always include a signature when logging models that you plan to register in Databricks environments.

# βœ… Required for Databricks registration
mlflow.sklearn.log_model(
model,
name="my_model",
input_example=X_sample, # Generates required signature
signature=signature, # Or provide explicit signature
)

# ❌ Will fail in Databricks Unity Catalog
mlflow.sklearn.log_model(model, name="my_model") # No signature

Quick Start: Adding Signatures to Your Models​

The easiest way to add a signature is to provide an input example when logging your model:

import mlflow
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Load data and train model
iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
model = RandomForestClassifier().fit(X, y)

with mlflow.start_run():
# The input example automatically generates a signature
mlflow.sklearn.log_model(
model, name="iris_model", input_example=X.iloc[[0]] # First row as example
)

MLflow automatically:

  1. Infers the signature from your input example
  2. Validates the model works with the example
  3. Stores both signature and example with your model
Automatic Signature Inference

MLflow automatically generates model signatures when you provide an input_example during model logging. This works for all model flavors and is the recommended approach for most use cases.

Understanding Model Signatures​

Model signatures consist of three components:

Defines the structure and types of data your model expects:

# Column-based signature (DataFrames)
input_schema = Schema(
[
ColSpec("double", "sepal_length"),
ColSpec("double", "sepal_width"),
ColSpec("string", "species", required=False), # Optional field
]
)

# Tensor-based signature (NumPy arrays)
input_schema = Schema(
[TensorSpec(np.dtype(np.float32), (-1, 28, 28, 1))] # Batch of 28x28 images
)

Key Features: Support for both tabular (DataFrame) and tensor (NumPy) data, optional fields using required=False, and rich data type support including arrays and objects.

Signature Types Overview​

MLflow supports two primary signature types:

Column-Based Signatures - For tabular data (DataFrames, dictionaries):

# Perfect for traditional ML models
{"feature_1": 1.5, "feature_2": "category_a", "feature_3": [1, 2, 3]}

Tensor-Based Signatures - For array data (images, audio, embeddings):

# Perfect for deep learning models
np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [1, 2, 3]]]) # Shape: (2, 2, 3)

Type Hints for Model Signatures​

Version Compatibility

Type hint support was introduced in MLflow 2.20.0. If you are using an earlier version of MLflow, see the Working with Signatures section.

You can use Python type hints to automatically define model signatures and enable data validation. This provides a more Pythonic way to specify your model's interface while getting automatic validation and schema inference.

Quick Start with Type Hints​

import mlflow
from typing import List, Dict, Optional
import pydantic


class Message(pydantic.BaseModel):
role: str
content: str
metadata: Optional[Dict[str, str]] = None


class CustomModel(mlflow.pyfunc.PythonModel):
def predict(self, model_input: List[Message]) -> List[str]:
# Signature automatically inferred from type hints!
return [msg.content for msg in model_input]


# Log model - signature is auto-generated from type hints
with mlflow.start_run():
mlflow.pyfunc.log_model(
name="chat_model",
python_model=CustomModel(),
input_example=[
{"role": "user", "content": "Hello"}
], # Validates against type hints
)

Key Benefits​

  • Automatic Validation: Input data validated against type hints at runtime
  • Schema Inference: Model signature automatically generated from type annotations
  • Type Safety: Catch type mismatches before they reach your model
  • IDE Support: Better autocomplete and error detection during development
  • Documentation: Type hints serve as self-documenting code
  • Consistency: Same validation for PythonModel instances and loaded PyFunc models

When to Use Type Hints​

βœ… Recommended for: Complex data structures (chat messages, tool definitions, nested objects), models requiring strict input validation, teams using modern Python development practices, and GenAI and LLM applications with structured inputs.

⚠️ Consider alternatives for: Simple tabular data (DataFrames work fine with input examples), legacy codebases without type hint adoption, and models with highly dynamic input structures.

Type Hints Best Practices​

Development Workflow:

# βœ… Recommended pattern
class MyModel(mlflow.pyfunc.PythonModel):
def predict(self, model_input: List[MyPydanticModel]) -> List[str]:
# Clear type annotations
# Automatic validation
# Good IDE support
return [process(item) for item in model_input]

Key Guidelines:

  • Use Pydantic models for complex data structures
  • Set default values for optional fields in Pydantic models
  • Don't pass explicit signature parameter when using type hints
  • Always provide input examples that match your type hints
  • Use TypeFromExample when you want flexibility without explicit typing
  • Test validation locally before deployment
Important Notes
  • Never pass explicit signature parameter when using type hints - MLflow will use the inferred signature and warn if they don't match
  • Union types become AnyType - use Pydantic discriminated unions for proper validation
  • Input examples are required for TypeFromExample and legacy type hints

Data Types and Examples​

Primitive Types​

Python to MLflow type mappings:

Type Restrictions

Usage of these types support only scalar definitions or 1-dimensional Arrays. Mixed types are not permitted.

Python TypeMLflow TypeExampleNotes
strstring"hello world"
intlong4264-bit integers
np.int32integernp.int32(42)32-bit integers
floatdouble3.1415964-bit floats
np.float32floatnp.float32(3.14)32-bit floats
boolbooleanTrue
np.bool_booleannp.bool_(True)NumPy boolean
datetimedatetimepd.Timestamp("2023-01-01")
bytesbinaryb"binary data"
bytearraybinarybytearray(b"data")
np.bytes_binarynp.bytes_(b"data")NumPy bytes

Composite Types​

Arrays (Lists/NumPy arrays):

{
"simple_list": ["a", "b", "c"],
"nested_array": [[1, 2], [3, 4], [5, 6]],
"numpy_array": np.array([1.1, 2.2, 3.3]),
}

Objects (Dictionaries):

{"user_profile": {"name": "Alice", "age": 30, "preferences": ["sports", "music"]}}

Optional Fields:

# Include None values to make fields optional
pd.DataFrame(
{
"required_field": [1, 2, 3],
"optional_field": [1.0, None, 3.0], # This becomes optional
}
)

Compatibility Notes​

version compatibility

Version Requirements:

  • Array and Object types: Require MLflow β‰₯ 2.10.0
  • Spark ML vectors: Require MLflow β‰₯ 2.15.0
  • AnyType: Requires MLflow β‰₯ 2.19.0

Signature Enforcement and Validation​

Signature enforcement process

MLflow automatically validates inputs against your model signature when:

  • Loading models as PyFunc (mlflow.pyfunc.load_model)
  • Using MLflow deployment tools
  • Serving models via MLflow's REST API

Validation Rules​

Input Validation:

  • Required fields: Must be present or validation fails
  • Optional fields: Can be missing without errors
  • Extra fields: Ignored (not passed to model)
  • Type conversion: Safe conversions applied when possible

Parameter Validation:

  • Type checking: Parameters must match specified types
  • Shape validation: List parameters validated for correct shape
  • Default values: Applied when parameters not provided
  • Unknown parameters: Generate warnings but don't fail

Handling Common Issues​

Integer Columns with Missing Values:

# ❌ Problem: Integer column with NaN becomes float, causing type mismatch
df = pd.DataFrame({"int_col": [1, 2, None]}) # Becomes float64

# βœ… Solution: Define as double from the start
df = pd.DataFrame({"int_col": [1.0, 2.0, None]}) # Stays float64

Type Conversion Examples:

# βœ… Safe conversions (allowed)
int β†’ long # 32-bit to 64-bit integer
int β†’ double # Integer to float
float β†’ double # 32-bit to 64-bit float

# ❌ Unsafe conversions (rejected)
long β†’ double # Potential precision loss
string β†’ int # No automatic parsing

Working with Signatures​

Automatic Signature Inference​

The easiest approach - provide an input example:

import mlflow
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier().fit(X_train, y_train)

with mlflow.start_run():
mlflow.sklearn.log_model(
model,
name="my_model",
input_example=X_train.iloc[[0]], # Signature inferred automatically
)

Manual Signature Creation​

For more control, create signatures explicitly:

from mlflow.models import ModelSignature
from mlflow.types.schema import Schema, ColSpec

# Define input schema
input_schema = Schema(
[
ColSpec("double", "feature_1"),
ColSpec("string", "feature_2"),
ColSpec("long", "feature_3", required=False), # Optional
]
)

# Define output schema
output_schema = Schema([ColSpec("double", "prediction")])

# Create signature
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Log with explicit signature
with mlflow.start_run():
mlflow.sklearn.log_model(model, name="my_model", signature=signature)

Signature Inference Helper​

Use infer_signature for custom workflows:

from mlflow.models import infer_signature

# Generate predictions for signature inference
predictions = model.predict(X_test)

# Infer signature from data
signature = infer_signature(X_test, predictions)

# Log with inferred signature
with mlflow.start_run():
mlflow.sklearn.log_model(model, name="my_model", signature=signature)

Input Examples in Detail​

Input examples serve multiple important purposes beyond signature inference:

Benefits of Input Examples​

  • Signature Inference: Automatically generate model signatures
  • Model Validation: Verify model works during logging
  • Dependency Detection: Help identify required packages
  • Documentation: Show developers proper input format
  • Deployment Testing: Validate REST endpoint payload format

Input Example Formats​

import pandas as pd

# Single record example
single_record = pd.DataFrame(
[{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}]
)

# Multiple records example
batch_example = pd.DataFrame(
[
{"feature_1": 1.0, "feature_2": "A"},
{"feature_1": 2.0, "feature_2": "B"},
{"feature_1": 3.0, "feature_2": "C"},
]
)

# Log model with DataFrame example
mlflow.sklearn.log_model(model, name="model", input_example=single_record)

Model Serving and Deployment​

Serving Input Examples​

MLflow automatically generates serving-compatible examples:

# When you log a model with input_example
input_example = {"question": "What is MLflow?"}

with mlflow.start_run():
model_info = mlflow.pyfunc.log_model(
python_model=MyModel(), name="model", input_example=input_example
)

# MLflow creates two files:
# 1. input_example.json - Original format
# 2. serving_input_example.json - REST API format

Generated Files:

FileContentPurpose
input_example.json{"question": "What is MLflow?"}Original input format
serving_input_example.json{"inputs": {"question": "What is MLflow?"}}REST endpoint format

Validating Serving Examples​

Test your model before deployment:

from mlflow.models.utils import load_serving_example
from mlflow.models import validate_serving_input

# Load serving example
serving_example = load_serving_example(model_info.model_uri)

# Validate it works
result = validate_serving_input(model_info.model_uri, serving_example)
print(f"Validation result: {result}")

# Test with local server
# mlflow models serve --model-uri <model_uri>
# curl -X POST -H "Content-Type: application/json" \
# -d '<serving_example>' http://localhost:5000/invocations

Signature Playground and Examples​

Explore signature behavior with our interactive examples:

Download Signature Examples Notebook

Or view examples directly: Signature Examples Notebook

Quick Reference Examples​

from mlflow.models import infer_signature

# Simple dictionary
simple_dict = {"name": "Alice", "age": 30, "active": True}
print(infer_signature(simple_dict))
# β†’ Schema: [name: string, age: long, active: boolean]

# With optional fields
optional_fields = [
{"name": "Alice", "email": "alice@example.com"},
{"name": "Bob", "email": None}, # email becomes optional
]
print(infer_signature(optional_fields))
# β†’ Schema: [name: string, email: string (optional)]

# Arrays and nested objects
complex_data = {
"user": {"id": 123, "tags": ["premium", "beta"]},
"scores": [0.8, 0.9, 0.7],
}
print(infer_signature(complex_data))
# β†’ Nested schema with arrays and objects

Best Practices and Tips​

Development Workflow​

Always Include Input Examples

# βœ… Good: Always provide examples
mlflow.sklearn.log_model(model, name="model", input_example=X_sample)

# ❌ Avoid: Logging without examples
mlflow.sklearn.log_model(model, name="model") # No signature or validation

Test Your Signatures

# Validate signature works as expected
signature = infer_signature(X_test, y_pred)
loaded_model = mlflow.pyfunc.load_model(model_uri)

# Test with your signature
try:
result = loaded_model.predict(X_test)
print("βœ… Signature validation passed")
except Exception as e:
print(f"❌ Signature issue: {e}")

Performance Considerations​

For Large DataFrames:

# Use a representative sample for input_example
large_df = pd.DataFrame(...) # 1M+ rows
sample_df = large_df.sample(n=100, random_state=42) # Representative sample

mlflow.sklearn.log_model(model, name="model", input_example=sample_df)

For Complex Objects:

# Provide minimal but representative examples
minimal_example = {
"required_field": "example_value",
"optional_field": None, # Shows field is optional
"array_field": ["sample"], # Shows it's an array
}

Common Pitfalls​

Integer Handling:

# ❌ Problem: Integers with NaN become floats
df = pd.DataFrame({"int_col": [1, 2, None]}) # Type becomes float64

# βœ… Solution: Use consistent types
df = pd.DataFrame({"int_col": [1.0, 2.0, None]}) # Explicit float64

Nested Structure Consistency:

# ❌ Problem: Inconsistent nesting
inconsistent = [
{"level1": {"level2": "value"}},
{"level1": "direct_value"}, # Different structure
]

# βœ… Solution: Consistent structure
consistent = [
{"level1": {"level2": "value1"}},
{"level1": {"level2": "value2"}}, # Same structure
]

Type Hints for PythonModel (MLflow 2.20.0+):

from typing import Dict, List


class TypedModel(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input: List[Dict[str, str]]) -> List[str]:
# Signature automatically inferred from type hints!
return [item["text"].upper() for item in model_input]

Troubleshooting​

Common Error Messages​

"Required input field missing":

This error occurs when your model expects a required field that's not present in the input data.

# Example: Model expects field "age" but input only has "name"
input_data = {"name": "Alice"} # Missing required "age" field

Solution: Include all required fields in your input data, or mark the field as optional in your signature by including None values in your input example.

"Cannot convert type X to type Y":

This happens when you try to pass data of one type where the signature expects another type.

# Example: Trying to pass string where integer expected
input_data = {"score": "85"} # String value
# But signature expects: {"score": 85} # Integer value

Solution: Fix your input data types to match the signature, or update the signature if the type change is intentional.

"Tensor shape mismatch":

This error occurs when tensor inputs don't match the expected shape defined in the signature.

# Example: Model expects shape (None, 784) but got (None, 28, 28)
input_tensor = np.random.random((10, 28, 28)) # Wrong shape
# But signature expects: (10, 784) # Flattened shape

Solution: Reshape your input data to match the expected dimensions, or update the signature if the shape requirements have changed.

Debugging Signatures​

Use these techniques to diagnose signature-related issues:

# Inspect existing model signature
from mlflow.models.model import get_model_info

model_info = get_model_info(model_uri)
print("Current signature:")
print(model_info.signature)

# Compare with inferred signature
inferred = infer_signature(your_input_data)
print("Inferred signature:")
print(inferred)

# Check compatibility
if model_info.signature != inferred:
print("⚠️ Signatures don't match - consider updating")

Additional Resources​