Skip to main content

spaCy within MLflow

spaCy is the leading industrial-strength natural language processing library, designed from the ground up for production use. Created by Explosion AI, spaCy combines cutting-edge research with practical engineering to deliver fast, accurate, and scalable NLP solutions that power everything from chatbots and content analysis to document processing and knowledge extraction systems.

spaCy's production-first philosophy sets it apart from academic NLP libraries. With its streamlined API, extensive pre-trained models, and robust pipeline architecture, spaCy enables developers to build sophisticated NLP applications without sacrificing speed or maintainability.

Logging spaCy Models to MLflow​

Basic Model Logging​

MLflow provides native support for spaCy models through the mlflow.spacy.log_model() function:

import mlflow
import spacy

# Load or train your spaCy model
nlp = spacy.load("en_core_web_sm")

# Log the model to MLflow
with mlflow.start_run():
mlflow.spacy.log_model(nlp, name="spacy_model")
What Gets Automatically Captured

Model Components & Architecture​

  • 🧠 Pipeline Components: All pipeline components (tokenizer, tagger, parser, NER, text categorizer)
  • πŸ“ Model Configuration: Architecture details, hyperparameters, and component settings
  • 🎯 Component Metadata: Individual component configurations and performance metrics
  • πŸ”§ Custom Components: User-defined pipeline components and extensions

Dependencies & Environment​

  • πŸ“¦ spaCy Version: Exact spaCy version for reproducibility
  • 🐍 Python Environment: Complete environment specification with all dependencies
  • πŸ“‹ Requirements: Automatic generation of pip requirements and conda environment
  • πŸ”— Model Dependencies: Language models and custom extensions

Deployment Artifacts​

  • πŸ€– Complete Model: Full model serialization with vocabularies and weights
  • πŸ“Š Model Metadata: Model size, components, and performance characteristics
  • 🏷️ Model Signatures: Input/output schemas for validation (when applicable)

Automatic PyFunc Flavor for Text Classification​

When your spaCy model includes a TextCategorizer component, MLflow automatically adds the PyFunc flavor for easy deployment:

import mlflow
import spacy
from spacy import Language
import pandas as pd


# Create a text classification pipeline
@Language.component("custom_textcat")
def create_textcat(nlp, name="textcat"):
return nlp.add_pipe("textcat", name=name)


nlp = spacy.blank("en")
nlp.add_pipe("textcat")

# Add labels to the text categorizer
nlp.get_pipe("textcat").add_label("POSITIVE")
nlp.get_pipe("textcat").add_label("NEGATIVE")

# Train your model (training code omitted for brevity)

with mlflow.start_run():
# Log model - PyFunc flavor added automatically
model_info = mlflow.spacy.log_model(nlp, name="text_classifier")

# Load and use for inference
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

# Prepare input data as DataFrame
test_data = pd.DataFrame({"text": ["This is great!", "This is terrible!"]})
predictions = loaded_model.predict(test_data)
print(predictions)
Text Classification Integration Details

Automatic PyFunc Generation​

  • 🎯 Smart Detection: MLflow automatically detects TextCategorizer components
  • πŸ“Š DataFrame Input: PyFunc wrapper accepts pandas DataFrame with text column
  • πŸ”„ Batch Processing: Efficient inference on multiple texts simultaneously
  • πŸ“ˆ Probability Scores: Returns prediction probabilities for all categories

Input/Output Format​

  • Input: pandas DataFrame with exactly one column containing text data
  • Output: pandas DataFrame with "predictions" column containing category probabilities
  • Format: Each prediction is a dictionary with category names as keys and probabilities as values

Deployment Benefits​

  • πŸš€ Universal Interface: Use standard MLflow serving infrastructure
  • πŸ“¦ Easy Integration: Compatible with MLflow's deployment tools and APIs
  • πŸ” Model Validation: Automatic input validation and error handling
  • πŸ“Š Monitoring: Integration with MLflow's model monitoring capabilities

Advanced spaCy Training with MLflow Integration​

Custom Training Logger​

spaCy's training system can be integrated with MLflow through custom loggers registered in spaCy's component registry:

import sys
import spacy
from spacy import Language
from typing import IO, Callable, Tuple, Dict, Any, Optional
import mlflow


@spacy.registry.loggers("mlflow_logger.v1")
def mlflow_logger():
"""Custom MLflow logger for spaCy training integration."""

def setup_logger(
nlp: Language,
stdout: IO = sys.stdout,
stderr: IO = sys.stderr,
) -> Tuple[Callable, Callable]:
def log_step(info: Optional[Dict[str, Any]]):
"""Called by spaCy for every evaluation step."""
if info:
step = info["step"]
score = info["score"]
metrics = {}

# Log component-specific losses and scores
for pipe_name in nlp.pipe_names:
if pipe_name in info["losses"]:
loss = info["losses"][pipe_name]
metrics[f"{pipe_name}_loss"] = loss
metrics[f"{pipe_name}_score"] = score

# Log overall metrics
metrics["overall_score"] = score
mlflow.log_metrics(metrics, step=step)

def finalize():
"""Called by spaCy after training completion."""
# Log the final trained model
mlflow.spacy.log_model(nlp, name="trained_model")
mlflow.end_run()

return log_step, finalize

return setup_logger
Training Configuration Setup

Configuration File Integration​

  1. Generate Base Configuration:
python -m spacy init config --pipeline textcat --lang en config.cfg
  1. Update Logger Configuration:
[training.logger]
@loggers = "mlflow_logger.v1"

[training]
max_steps = 1000
eval_frequency = 100
  1. Configure Data Paths:
[paths]
train = "./train.spacy"
dev = "./dev.spacy"

Advanced Logger Features​

  • πŸ“Š Component-Level Tracking: Monitor individual pipeline component performance
  • 🎯 Custom Metrics: Log domain-specific evaluation metrics
  • πŸ“ˆ Training Dynamics: Track learning curves and convergence patterns
  • πŸ”„ Automatic Model Saving: Save best models based on validation performance
  • πŸ“ Experiment Metadata: Log training configuration and hyperparameters

Complete Training Integration Example​

Here's a comprehensive example showing spaCy training with MLflow integration:

import mlflow
import spacy
import pandas as pd
from spacy.tokens import DocBin
from spacy.cli.train import train as spacy_train
import tempfile
import os


def prepare_training_data():
"""Prepare sample training data for text classification."""
# Sample data preparation
train_data = [
("This movie is excellent!", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
("Terrible film, waste of time", {"cats": {"POSITIVE": 0.0, "NEGATIVE": 1.0}}),
("Amazing storyline and acting", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
("Boring and predictable", {"cats": {"POSITIVE": 0.0, "NEGATIVE": 1.0}}),
]

# Convert to spaCy format
nlp = spacy.blank("en")
doc_bin = DocBin()

for text, annotations in train_data:
doc = nlp.make_doc(text)
doc.cats = annotations["cats"]
doc_bin.add(doc)

return doc_bin


# Prepare training data
train_docs = prepare_training_data()
dev_docs = prepare_training_data() # Use same data for simplicity

# Save training data
train_docs.to_disk("./train.spacy")
dev_docs.to_disk("./dev.spacy")

# Configuration content
config_content = """
[nlp]
lang = "en"
pipeline = ["textcat"]

[components]

[components.textcat]
factory = "textcat"

[training]
max_steps = 100
eval_frequency = 20

[training.logger]
@loggers = "mlflow_logger.v1"

[paths]
train = "./train.spacy"
dev = "./dev.spacy"
"""

# Write configuration file
with open("config.cfg", "w") as f:
f.write(config_content)

# Start MLflow experiment
with mlflow.start_run(run_name="spacy_text_classification"):
# Log training configuration
mlflow.log_params(
{
"model_type": "text_classification",
"pipeline": "textcat",
"language": "en",
"max_steps": 100,
"eval_frequency": 20,
}
)

# Train the model (this will use our custom logger)
spacy_train("config.cfg")

print("Training completed and logged to MLflow!")

Saving and Loading spaCy Models​

Basic Model Operations​

MLflow provides multiple ways to save and load spaCy models:

import mlflow
import spacy

# Load a pre-trained model
nlp = spacy.load("en_core_web_sm")

# Save with MLflow
model_info = mlflow.spacy.log_model(nlp, name="spacy_model")

# Load back in native spaCy format
loaded_nlp = mlflow.spacy.load_model(model_info.model_uri)

# Use the loaded model
doc = loaded_nlp("This is a test sentence.")
for token in doc:
print(f"{token.text}: {token.pos_}, {token.dep_}")
Loading Options and Use Cases

Native spaCy Loading​

# Full spaCy functionality - all pipeline components
nlp = mlflow.spacy.load_model(model_info.model_uri)

# Access all spaCy features
doc = nlp("Analyze this text completely.")
entities = [(ent.text, ent.label_) for ent in doc.ents]
dependencies = [(token.text, token.dep_, token.head.text) for token in doc]

PyFunc Loading (Text Classification Only)​

# Simplified interface for text classification
classifier = mlflow.pyfunc.load_model(model_info.model_uri)

# DataFrame input required
import pandas as pd

test_data = pd.DataFrame({"text": ["Sample text to classify"]})
predictions = classifier.predict(test_data)

When to Use Each Approach​

  • 🧠 Native spaCy: Full NLP pipeline access, custom components, advanced features
  • πŸ“Š PyFunc: Text classification deployment, simple inference, production serving
  • πŸ”„ Mixed Approach: Development with native, deployment with PyFunc

Model Signatures for spaCy Models​

Adding signatures to spaCy models improves documentation and enables validation:

import mlflow
from mlflow.models import infer_signature
import pandas as pd
import spacy

# Load and prepare model
nlp = spacy.load("en_core_web_sm")

# For text classification models, create sample data
sample_input = pd.DataFrame({"text": ["This is a sample sentence for classification."]})

# If model has TextCategorizer, get predictions for signature
if nlp.has_pipe("textcat"):
# Create wrapper for prediction
class SpacyWrapper:
def __init__(self, nlp):
self.nlp = nlp

def predict(self, df):
results = []
for text in df.iloc[:, 0]:
doc = self.nlp(text)
results.append({"predictions": doc.cats})
return pd.DataFrame(results)

wrapper = SpacyWrapper(nlp)
sample_output = wrapper.predict(sample_input)
signature = infer_signature(sample_input, sample_output)
else:
signature = None

# Log model with signature
mlflow.spacy.log_model(
nlp, name="spacy_model", signature=signature, input_example=sample_input
)
Manual Signature Definition

For complete control over your model signature:

import mlflow
from mlflow.types import Schema, ColSpec
from mlflow.models import ModelSignature

# Define input schema for text classification
input_schema = Schema([ColSpec("string", "text")])

# Define output schema
output_schema = Schema(
[ColSpec("object", "predictions")] # Dictionary with category probabilities
)

# Create signature
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Log model with manual signature
mlflow.spacy.log_model(nlp, name="model", signature=signature)

Manual signatures are useful when:

  • You need precise control over input/output specifications
  • Working with custom output formats
  • The automatic inference doesn't capture your intended schema
  • You want to document expected data types explicitly

Advanced spaCy Tracking Patterns​

Custom Component Tracking​

Track custom spaCy components and their performance:

import mlflow
import spacy
from spacy import Language
from spacy.tokens import Doc, Span


@Language.component("sentiment_analyzer")
def sentiment_analyzer(doc):
"""Custom component for sentiment analysis."""
# Simple rule-based sentiment (replace with actual ML model)
positive_words = {"good", "great", "excellent", "amazing", "wonderful"}
negative_words = {"bad", "terrible", "awful", "horrible", "worst"}

pos_count = sum(1 for token in doc if token.lower_ in positive_words)
neg_count = sum(1 for token in doc if token.lower_ in negative_words)

if pos_count > neg_count:
sentiment = "positive"
score = 0.8
elif neg_count > pos_count:
sentiment = "negative"
score = 0.8
else:
sentiment = "neutral"
score = 0.5

# Add sentiment as custom attribute
doc._.sentiment = sentiment
doc._.sentiment_score = score
return doc


# Register custom extensions
Doc.set_extension("sentiment", default=None)
Doc.set_extension("sentiment_score", default=0.0)

# Create pipeline with custom component
nlp = spacy.blank("en")
nlp.add_pipe("sentiment_analyzer")

# Test and evaluate custom component
test_texts = [
"This is a great product!",
"Terrible service, very bad.",
"It's okay, nothing special.",
]

with mlflow.start_run():
# Log component information
mlflow.log_params(
{
"custom_components": ["sentiment_analyzer"],
"pipeline": nlp.pipe_names,
"model_version": "1.0",
}
)

# Evaluate custom component
correct_predictions = 0
total_predictions = len(test_texts)

results = []
for text in test_texts:
doc = nlp(text)
results.append(
{"text": text, "sentiment": doc._.sentiment, "score": doc._.sentiment_score}
)

# Log evaluation metrics
mlflow.log_metric("component_accuracy", correct_predictions / total_predictions)

# Log model with custom component
mlflow.spacy.log_model(nlp, name="custom_sentiment_model")

# Log evaluation results as artifact
import json

with open("evaluation_results.json", "w") as f:
json.dump(results, f, indent=2)
mlflow.log_artifact("evaluation_results.json")

Multi-Language Model Tracking​

Track experiments across different languages and models:

Multilingual Experiment Tracking
import mlflow
import spacy
from collections import defaultdict


def evaluate_multilingual_models():
"""Evaluate performance across multiple language models."""

# Define language models to test
models = {
"en": "en_core_web_sm",
"de": "de_core_news_sm",
"fr": "fr_core_news_sm",
"es": "es_core_news_sm",
}

# Sample texts for each language
test_texts = {
"en": "Apple Inc. is a technology company based in California.",
"de": "Apple Inc. ist ein Technologieunternehmen in Kalifornien.",
"fr": "Apple Inc. est une entreprise technologique basΓ©e en Californie.",
"es": "Apple Inc. es una empresa de tecnologΓ­a con sede en California.",
}

with mlflow.start_run(run_name="multilingual_comparison"):
results = {}

for lang, model_name in models.items():
try:
with mlflow.start_run(run_name=f"{lang}_model", nested=True):
# Load language-specific model
nlp = spacy.load(model_name)

# Log model information
mlflow.log_params(
{
"language": lang,
"model_name": model_name,
"pipeline_components": nlp.pipe_names,
"model_size": len(nlp.vocab),
}
)

# Process text and extract entities
doc = nlp(test_texts[lang])
entities = [(ent.text, ent.label_) for ent in doc.ents]

# Log results
mlflow.log_metrics(
{
"num_entities": len(entities),
"num_tokens": len(doc),
"processing_time": 0.1, # Placeholder
}
)

# Log the model
mlflow.spacy.log_model(nlp, name=f"{lang}_model")

results[lang] = {"entities": entities, "tokens": len(doc)}

except OSError:
print(f"Model {model_name} not available, skipping {lang}")

# Log summary results
mlflow.log_param("total_languages", len(results))
mlflow.log_metric(
"avg_entities_per_lang",
sum(r["entities"].__len__() for r in results.values()) / len(results),
)

return results


# Run multilingual evaluation
results = evaluate_multilingual_models()

Benefits of Multilingual Tracking​

  • 🌐 Cross-Language Comparison: Compare model performance across languages
  • πŸ“Š Unified Metrics: Track consistent metrics across different language models
  • πŸ”„ Model Selection: Choose best models for multilingual applications
  • πŸ“ˆ Performance Analysis: Identify language-specific strengths and weaknesses

Pipeline Optimization Tracking​

Track different pipeline configurations and optimizations:

import mlflow
import spacy
import time
from itertools import combinations, product


def optimize_pipeline_configuration():
"""Test different pipeline configurations for optimal performance."""

# Define pipeline variations to test
base_components = ["tok2vec", "tagger", "parser", "ner"]
optional_components = ["lemmatizer", "textcat"]

# Test different combinations
configurations = []
for r in range(len(optional_components) + 1):
for combo in combinations(optional_components, r):
config = base_components + list(combo)
configurations.append(config)

with mlflow.start_run(run_name="pipeline_optimization"):
best_config = None
best_score = 0

for i, components in enumerate(configurations):
with mlflow.start_run(run_name=f"config_{i}", nested=True):
# Create model with specific components
nlp = spacy.blank("en")

# Add components (simplified for example)
available_components = {
"tok2vec": "tok2vec",
"tagger": "tagger",
"parser": "parser",
"ner": "ner",
"lemmatizer": "lemmatizer",
}

pipeline_components = []
for comp in components:
if comp in available_components:
try:
nlp.add_pipe(comp)
pipeline_components.append(comp)
except:
continue

# Log configuration
mlflow.log_params(
{
"components": pipeline_components,
"num_components": len(pipeline_components),
"config_id": i,
}
)

# Simulate performance testing
test_text = "This is a test sentence for pipeline evaluation."

start_time = time.time()
doc = nlp(test_text)
processing_time = time.time() - start_time

# Calculate synthetic performance score
performance_score = (
len(pipeline_components) * 10 - processing_time * 100
)

# Log metrics
mlflow.log_metrics(
{
"processing_time": processing_time,
"performance_score": performance_score,
"memory_usage": len(nlp.vocab), # Simplified metric
}
)

# Log model
mlflow.spacy.log_model(nlp, name="pipeline_model")

# Track best configuration
if performance_score > best_score:
best_score = performance_score
best_config = pipeline_components

# Log best configuration summary
mlflow.log_params(
{
"best_config": best_config,
"best_score": best_score,
"total_configs_tested": len(configurations),
}
)

return best_config, best_score


# Run pipeline optimization
best_config, score = optimize_pipeline_configuration()
print(f"Best configuration: {best_config} with score: {score}")

Production Deployment​

Local Model Serving​

Deploy your spaCy models locally using MLflow's serving infrastructure:

# First, log your model with proper configuration
import mlflow
import spacy
import pandas as pd

nlp = spacy.load("en_core_web_sm")

with mlflow.start_run() as run:
# Create example input for signature
sample_input = pd.DataFrame({"text": ["Sample text for classification"]})

# Log model with dependencies
model_info = mlflow.spacy.log_model(
nlp,
name="spacy_model",
input_example=sample_input,
pip_requirements=["spacy>=3.0.0"],
)

model_uri = (
model_info.model_uri
) # The format of this attribute is 'models:/<model_id>'

Then deploy the model using the MLflow CLI:

# Serve the model locally (for text classification models with PyFunc flavor)
mlflow models serve -m models:/<model_id> -p 5000

# Test the deployment
curl http://localhost:5000/invocations \
-H "Content-Type: application/json" \
-d '{"inputs": [{"text": "This is a great product!"}]}'
Advanced Deployment Options

The mlflow models serve command supports several options for spaCy models:

# Specify environment manager
mlflow models serve -m models:/<model_id> -p 5000 --env-manager conda

# Enable MLServer for enhanced performance
mlflow models serve -m models:/<model_id> -p 5000 --enable-mlserver

# Set custom host for network access
mlflow models serve -m models:/<model_id> -p 5000 --host 0.0.0.0

For production deployments, consider:

  • Using MLServer (--enable-mlserver) for better performance and scalability
  • Building Docker images with mlflow models build-docker
  • Deploying to cloud platforms like Azure ML or Amazon SageMaker
  • Setting up proper environment management and dependency isolation
  • Implementing model monitoring and health checks

Real-World Applications​

The MLflow-spaCy integration excels across diverse NLP domains:

  • πŸ“° Content Analysis: Track sentiment analysis, topic modeling, and content classification systems for media and publishing
  • πŸ₯ Healthcare NLP: Monitor clinical text processing, medical entity extraction, and diagnostic support systems
  • πŸ’Ό Enterprise Search: Log document processing, information extraction, and knowledge management pipelines
  • πŸ›’ E-commerce Intelligence: Track product categorization, review analysis, and customer intent recognition
  • πŸ“§ Communications Processing: Monitor email classification, chatbot training, and customer service automation
  • πŸ›οΈ Legal Tech: Log contract analysis, document review, and legal entity recognition systems
  • 🌐 Multilingual Applications: Track translation quality, cross-lingual transfer, and international content processing
  • πŸ“Š Business Intelligence: Monitor text analytics, report generation, and automated insights extraction

Conclusion​

The MLflow-spaCy integration provides a comprehensive solution for tracking, managing, and deploying production-grade NLP systems. By combining spaCy's industrial-strength capabilities with MLflow's experiment tracking, you create a workflow that is:

  • πŸ” Transparent: Every aspect of NLP model development is documented and trackable
  • πŸ”„ Reproducible: Experiments can be recreated exactly with proper environment management
  • πŸ“Š Comparable: Different approaches can be evaluated side-by-side with consistent metrics
  • πŸ“ˆ Scalable: From simple prototypes to enterprise-scale NLP systems
  • πŸ‘₯ Collaborative: Team members can share and build upon each other's NLP research and development

Whether you're building intelligent chatbots, analyzing customer feedback, or extracting insights from unstructured text, the MLflow-spaCy integration provides the foundation for organized, reproducible, and scalable NLP development that grows with your ambitions from prototype to production-scale deployment.