spaCy within MLflow

spaCy is the leading industrial-strength natural language processing library, designed from the ground up for production use. Created by Explosion AI, spaCy combines cutting-edge research with practical engineering to deliver fast, accurate, and scalable NLP solutions that power everything from chatbots and content analysis to document processing and knowledge extraction systems.

spaCy's production-first philosophy sets it apart from academic NLP libraries. With its streamlined API, extensive pre-trained models, and robust pipeline architecture, spaCy enables developers to build sophisticated NLP applications without sacrificing speed or maintainability.

Logging spaCy Models to MLflow

Basic Model Logging

MLflow provides native support for spaCy models through the mlflow.spacy.log_model() function:

import mlflow
import spacy

# Load or train your spaCy model
nlp = spacy.load("en_core_web_sm")

# Log the model to MLflow
with mlflow.start_run():
    mlflow.spacy.log_model(nlp, name="spacy_model")

What Gets Automatically Captured

Model Components & Architecture

🧠 Pipeline Components: All pipeline components (tokenizer, tagger, parser, NER, text categorizer)
📐 Model Configuration: Architecture details, hyperparameters, and component settings
🎯 Component Metadata: Individual component configurations and performance metrics
🔧 Custom Components: User-defined pipeline components and extensions

Dependencies & Environment

📦 spaCy Version: Exact spaCy version for reproducibility
🐍 Python Environment: Complete environment specification with all dependencies
📋 Requirements: Automatic generation of pip requirements and conda environment
🔗 Model Dependencies: Language models and custom extensions

Deployment Artifacts

🤖 Complete Model: Full model serialization with vocabularies and weights
📊 Model Metadata: Model size, components, and performance characteristics
🏷️ Model Signatures: Input/output schemas for validation (when applicable)

Automatic PyFunc Flavor for Text Classification

When your spaCy model includes a TextCategorizer component, MLflow automatically adds the PyFunc flavor for easy deployment:

import mlflow
import spacy
from spacy import Language
import pandas as pd


# Create a text classification pipeline
@Language.component("custom_textcat")
def create_textcat(nlp, name="textcat"):
    return nlp.add_pipe("textcat", name=name)


nlp = spacy.blank("en")
nlp.add_pipe("textcat")

# Add labels to the text categorizer
nlp.get_pipe("textcat").add_label("POSITIVE")
nlp.get_pipe("textcat").add_label("NEGATIVE")

# Train your model (training code omitted for brevity)

with mlflow.start_run():
    # Log model - PyFunc flavor added automatically
    model_info = mlflow.spacy.log_model(nlp, name="text_classifier")

# Load and use for inference
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

# Prepare input data as DataFrame
test_data = pd.DataFrame({"text": ["This is great!", "This is terrible!"]})
predictions = loaded_model.predict(test_data)
print(predictions)

Text Classification Integration Details

Automatic PyFunc Generation

🎯 Smart Detection: MLflow automatically detects TextCategorizer components
📊 DataFrame Input: PyFunc wrapper accepts pandas DataFrame with text column
🔄 Batch Processing: Efficient inference on multiple texts simultaneously
📈 Probability Scores: Returns prediction probabilities for all categories

Input/Output Format

Input: pandas DataFrame with exactly one column containing text data
Output: pandas DataFrame with "predictions" column containing category probabilities
Format: Each prediction is a dictionary with category names as keys and probabilities as values

Deployment Benefits

🚀 Universal Interface: Use standard MLflow serving infrastructure
📦 Easy Integration: Compatible with MLflow's deployment tools and APIs
🔍 Model Validation: Automatic input validation and error handling
📊 Monitoring: Integration with MLflow's model monitoring capabilities

Advanced spaCy Training with MLflow Integration

Custom Training Logger

spaCy's training system can be integrated with MLflow through custom loggers registered in spaCy's component registry:

import sys
import spacy
from spacy import Language
from typing import IO, Callable, Tuple, Dict, Any, Optional
import mlflow


@spacy.registry.loggers("mlflow_logger.v1")
def mlflow_logger():
    """Custom MLflow logger for spaCy training integration."""

    def setup_logger(
        nlp: Language,
        stdout: IO = sys.stdout,
        stderr: IO = sys.stderr,
    ) -> Tuple[Callable, Callable]:
        def log_step(info: Optional[Dict[str, Any]]):
            """Called by spaCy for every evaluation step."""
            if info:
                step = info["step"]
                score = info["score"]
                metrics = {}

                # Log component-specific losses and scores
                for pipe_name in nlp.pipe_names:
                    if pipe_name in info["losses"]:
                        loss = info["losses"][pipe_name]
                        metrics[f"{pipe_name}_loss"] = loss
                        metrics[f"{pipe_name}_score"] = score

                # Log overall metrics
                metrics["overall_score"] = score
                mlflow.log_metrics(metrics, step=step)

        def finalize():
            """Called by spaCy after training completion."""
            # Log the final trained model
            mlflow.spacy.log_model(nlp, name="trained_model")
            mlflow.end_run()

        return log_step, finalize

    return setup_logger

Training Configuration Setup

Configuration File Integration

Generate Base Configuration:

python -m spacy init config --pipeline textcat --lang en config.cfg

Update Logger Configuration:

[training.logger]
@loggers = "mlflow_logger.v1"

[training]
max_steps = 1000
eval_frequency = 100

Configure Data Paths:

[paths]
train = "./train.spacy"
dev = "./dev.spacy"

Advanced Logger Features

📊 Component-Level Tracking: Monitor individual pipeline component performance
🎯 Custom Metrics: Log domain-specific evaluation metrics
📈 Training Dynamics: Track learning curves and convergence patterns
🔄 Automatic Model Saving: Save best models based on validation performance
📝 Experiment Metadata: Log training configuration and hyperparameters

Complete Training Integration Example

Here's a comprehensive example showing spaCy training with MLflow integration:

import mlflow
import spacy
import pandas as pd
from spacy.tokens import DocBin
from spacy.cli.train import train as spacy_train
import tempfile
import os


def prepare_training_data():
    """Prepare sample training data for text classification."""
    # Sample data preparation
    train_data = [
        ("This movie is excellent!", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
        ("Terrible film, waste of time", {"cats": {"POSITIVE": 0.0, "NEGATIVE": 1.0}}),
        ("Amazing storyline and acting", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
        ("Boring and predictable", {"cats": {"POSITIVE": 0.0, "NEGATIVE": 1.0}}),
    ]

    # Convert to spaCy format
    nlp = spacy.blank("en")
    doc_bin = DocBin()

    for text, annotations in train_data:
        doc = nlp.make_doc(text)
        doc.cats = annotations["cats"]
        doc_bin.add(doc)

    return doc_bin


# Prepare training data
train_docs = prepare_training_data()
dev_docs = prepare_training_data()  # Use same data for simplicity

# Save training data
train_docs.to_disk("./train.spacy")
dev_docs.to_disk("./dev.spacy")

# Configuration content
config_content = """
[nlp]
lang = "en"
pipeline = ["textcat"]

[components]

[components.textcat]
factory = "textcat"

[training]
max_steps = 100
eval_frequency = 20

[training.logger]
@loggers = "mlflow_logger.v1"

[paths]
train = "./train.spacy"
dev = "./dev.spacy"
"""

# Write configuration file
with open("config.cfg", "w") as f:
    f.write(config_content)

# Start MLflow experiment
with mlflow.start_run(run_name="spacy_text_classification"):
    # Log training configuration
    mlflow.log_params(
        {
            "model_type": "text_classification",
            "pipeline": "textcat",
            "language": "en",
            "max_steps": 100,
            "eval_frequency": 20,
        }
    )

    # Train the model (this will use our custom logger)
    spacy_train("config.cfg")

print("Training completed and logged to MLflow!")

Saving and Loading spaCy Models

Basic Model Operations

MLflow provides multiple ways to save and load spaCy models:

import mlflow
import spacy

# Load a pre-trained model
nlp = spacy.load("en_core_web_sm")

# Save with MLflow
model_info = mlflow.spacy.log_model(nlp, name="spacy_model")

# Load back in native spaCy format
loaded_nlp = mlflow.spacy.load_model(model_info.model_uri)

# Use the loaded model
doc = loaded_nlp("This is a test sentence.")
for token in doc:
    print(f"{token.text}: {token.pos_}, {token.dep_}")

Loading Options and Use Cases

Native spaCy Loading

# Full spaCy functionality - all pipeline components
nlp = mlflow.spacy.load_model(model_info.model_uri)

# Access all spaCy features
doc = nlp("Analyze this text completely.")
entities = [(ent.text, ent.label_) for ent in doc.ents]
dependencies = [(token.text, token.dep_, token.head.text) for token in doc]

PyFunc Loading (Text Classification Only)

# Simplified interface for text classification
classifier = mlflow.pyfunc.load_model(model_info.model_uri)

# DataFrame input required
import pandas as pd

test_data = pd.DataFrame({"text": ["Sample text to classify"]})
predictions = classifier.predict(test_data)

When to Use Each Approach

🧠 Native spaCy: Full NLP pipeline access, custom components, advanced features
📊 PyFunc: Text classification deployment, simple inference, production serving
🔄 Mixed Approach: Development with native, deployment with PyFunc

Model Signatures for spaCy Models

Adding signatures to spaCy models improves documentation and enables validation:

import mlflow
from mlflow.models import infer_signature
import pandas as pd
import spacy

# Load and prepare model
nlp = spacy.load("en_core_web_sm")

# For text classification models, create sample data
sample_input = pd.DataFrame({"text": ["This is a sample sentence for classification."]})

# If model has TextCategorizer, get predictions for signature
if nlp.has_pipe("textcat"):
    # Create wrapper for prediction
    class SpacyWrapper:
        def __init__(self, nlp):
            self.nlp = nlp

        def predict(self, df):
            results = []
            for text in df.iloc[:, 0]:
                doc = self.nlp(text)
                results.append({"predictions": doc.cats})
            return pd.DataFrame(results)

    wrapper = SpacyWrapper(nlp)
    sample_output = wrapper.predict(sample_input)
    signature = infer_signature(sample_input, sample_output)
else:
    signature = None

# Log model with signature
mlflow.spacy.log_model(
    nlp, name="spacy_model", signature=signature, input_example=sample_input
)

Manual Signature Definition

For complete control over your model signature:

import mlflow
from mlflow.types import Schema, ColSpec
from mlflow.models import ModelSignature

# Define input schema for text classification
input_schema = Schema([ColSpec("string", "text")])

# Define output schema
output_schema = Schema(
    [ColSpec("object", "predictions")]  # Dictionary with category probabilities
)

# Create signature
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Log model with manual signature
mlflow.spacy.log_model(nlp, name="model", signature=signature)

Manual signatures are useful when:

You need precise control over input/output specifications
Working with custom output formats
The automatic inference doesn't capture your intended schema
You want to document expected data types explicitly

Advanced spaCy Tracking Patterns

Custom Component Tracking

Track custom spaCy components and their performance:

import mlflow
import spacy
from spacy import Language
from spacy.tokens import Doc, Span


@Language.component("sentiment_analyzer")
def sentiment_analyzer(doc):
    """Custom component for sentiment analysis."""
    # Simple rule-based sentiment (replace with actual ML model)
    positive_words = {"good", "great", "excellent", "amazing", "wonderful"}
    negative_words = {"bad", "terrible", "awful", "horrible", "worst"}

    pos_count = sum(1 for token in doc if token.lower_ in positive_words)
    neg_count = sum(1 for token in doc if token.lower_ in negative_words)

    if pos_count > neg_count:
        sentiment = "positive"
        score = 0.8
    elif neg_count > pos_count:
        sentiment = "negative"
        score = 0.8
    else:
        sentiment = "neutral"
        score = 0.5

    # Add sentiment as custom attribute
    doc._.sentiment = sentiment
    doc._.sentiment_score = score
    return doc


# Register custom extensions
Doc.set_extension("sentiment", default=None)
Doc.set_extension("sentiment_score", default=0.0)

# Create pipeline with custom component
nlp = spacy.blank("en")
nlp.add_pipe("sentiment_analyzer")

# Test and evaluate custom component
test_texts = [
    "This is a great product!",
    "Terrible service, very bad.",
    "It's okay, nothing special.",
]

with mlflow.start_run():
    # Log component information
    mlflow.log_params(
        {
            "custom_components": ["sentiment_analyzer"],
            "pipeline": nlp.pipe_names,
            "model_version": "1.0",
        }
    )

    # Evaluate custom component
    correct_predictions = 0
    total_predictions = len(test_texts)

    results = []
    for text in test_texts:
        doc = nlp(text)
        results.append(
            {"text": text, "sentiment": doc._.sentiment, "score": doc._.sentiment_score}
        )

    # Log evaluation metrics
    mlflow.log_metric("component_accuracy", correct_predictions / total_predictions)

    # Log model with custom component
    mlflow.spacy.log_model(nlp, name="custom_sentiment_model")

    # Log evaluation results as artifact
    import json

    with open("evaluation_results.json", "w") as f:
        json.dump(results, f, indent=2)
    mlflow.log_artifact("evaluation_results.json")

Multi-Language Model Tracking

Track experiments across different languages and models:

Multilingual Experiment Tracking

import mlflow
import spacy
from collections import defaultdict


def evaluate_multilingual_models():
    """Evaluate performance across multiple language models."""

    # Define language models to test
    models = {
        "en": "en_core_web_sm",
        "de": "de_core_news_sm",
        "fr": "fr_core_news_sm",
        "es": "es_core_news_sm",
    }

    # Sample texts for each language
    test_texts = {
        "en": "Apple Inc. is a technology company based in California.",
        "de": "Apple Inc. ist ein Technologieunternehmen in Kalifornien.",
        "fr": "Apple Inc. est une entreprise technologique basée en Californie.",
        "es": "Apple Inc. es una empresa de tecnología con sede en California.",
    }

    with mlflow.start_run(run_name="multilingual_comparison"):
        results = {}

        for lang, model_name in models.items():
            try:
                with mlflow.start_run(run_name=f"{lang}_model", nested=True):
                    # Load language-specific model
                    nlp = spacy.load(model_name)

                    # Log model information
                    mlflow.log_params(
                        {
                            "language": lang,
                            "model_name": model_name,
                            "pipeline_components": nlp.pipe_names,
                            "model_size": len(nlp.vocab),
                        }
                    )

                    # Process text and extract entities
                    doc = nlp(test_texts[lang])
                    entities = [(ent.text, ent.label_) for ent in doc.ents]

                    # Log results
                    mlflow.log_metrics(
                        {
                            "num_entities": len(entities),
                            "num_tokens": len(doc),
                            "processing_time": 0.1,  # Placeholder
                        }
                    )

                    # Log the model
                    mlflow.spacy.log_model(nlp, name=f"{lang}_model")

                    results[lang] = {"entities": entities, "tokens": len(doc)}

            except OSError:
                print(f"Model {model_name} not available, skipping {lang}")

        # Log summary results
        mlflow.log_param("total_languages", len(results))
        mlflow.log_metric(
            "avg_entities_per_lang",
            sum(r["entities"].__len__() for r in results.values()) / len(results),
        )

        return results


# Run multilingual evaluation
results = evaluate_multilingual_models()

Benefits of Multilingual Tracking

🌐 Cross-Language Comparison: Compare model performance across languages
📊 Unified Metrics: Track consistent metrics across different language models
🔄 Model Selection: Choose best models for multilingual applications
📈 Performance Analysis: Identify language-specific strengths and weaknesses

Pipeline Optimization Tracking

Track different pipeline configurations and optimizations:

import mlflow
import spacy
import time
from itertools import combinations, product


def optimize_pipeline_configuration():
    """Test different pipeline configurations for optimal performance."""

    # Define pipeline variations to test
    base_components = ["tok2vec", "tagger", "parser", "ner"]
    optional_components = ["lemmatizer", "textcat"]

    # Test different combinations
    configurations = []
    for r in range(len(optional_components) + 1):
        for combo in combinations(optional_components, r):
            config = base_components + list(combo)
            configurations.append(config)

    with mlflow.start_run(run_name="pipeline_optimization"):
        best_config = None
        best_score = 0

        for i, components in enumerate(configurations):
            with mlflow.start_run(run_name=f"config_{i}", nested=True):
                # Create model with specific components
                nlp = spacy.blank("en")

                # Add components (simplified for example)
                available_components = {
                    "tok2vec": "tok2vec",
                    "tagger": "tagger",
                    "parser": "parser",
                    "ner": "ner",
                    "lemmatizer": "lemmatizer",
                }

                pipeline_components = []
                for comp in components:
                    if comp in available_components:
                        try:
                            nlp.add_pipe(comp)
                            pipeline_components.append(comp)
                        except:
                            continue

                # Log configuration
                mlflow.log_params(
                    {
                        "components": pipeline_components,
                        "num_components": len(pipeline_components),
                        "config_id": i,
                    }
                )

                # Simulate performance testing
                test_text = "This is a test sentence for pipeline evaluation."

                start_time = time.time()
                doc = nlp(test_text)
                processing_time = time.time() - start_time

                # Calculate synthetic performance score
                performance_score = (
                    len(pipeline_components) * 10 - processing_time * 100
                )

                # Log metrics
                mlflow.log_metrics(
                    {
                        "processing_time": processing_time,
                        "performance_score": performance_score,
                        "memory_usage": len(nlp.vocab),  # Simplified metric
                    }
                )

                # Log model
                mlflow.spacy.log_model(nlp, name="pipeline_model")

                # Track best configuration
                if performance_score > best_score:
                    best_score = performance_score
                    best_config = pipeline_components

        # Log best configuration summary
        mlflow.log_params(
            {
                "best_config": best_config,
                "best_score": best_score,
                "total_configs_tested": len(configurations),
            }
        )

        return best_config, best_score


# Run pipeline optimization
best_config, score = optimize_pipeline_configuration()
print(f"Best configuration: {best_config} with score: {score}")

Production Deployment

Local Model Serving

Deploy your spaCy models locally using MLflow's serving infrastructure:

# First, log your model with proper configuration
import mlflow
import spacy
import pandas as pd

nlp = spacy.load("en_core_web_sm")

with mlflow.start_run() as run:
    # Create example input for signature
    sample_input = pd.DataFrame({"text": ["Sample text for classification"]})

    # Log model with dependencies
    model_info = mlflow.spacy.log_model(
        nlp,
        name="spacy_model",
        input_example=sample_input,
        pip_requirements=["spacy>=3.0.0"],
    )

    model_uri = (
        model_info.model_uri
    )  # The format of this attribute is 'models:/<model_id>'

Then deploy the model using the MLflow CLI:

# Serve the model locally (for text classification models with PyFunc flavor)
mlflow models serve -m models:/<model_id> -p 5000

# Test the deployment
curl http://localhost:5000/invocations \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"text": "This is a great product!"}]}'

Advanced Deployment Options

The mlflow models serve command supports several options for spaCy models:

# Specify environment manager
mlflow models serve -m models:/<model_id> -p 5000 --env-manager conda

# Enable MLServer for enhanced performance
mlflow models serve -m models:/<model_id> -p 5000 --enable-mlserver

# Set custom host for network access
mlflow models serve -m models:/<model_id> -p 5000 --host 0.0.0.0

For production deployments, consider:

Using MLServer (--enable-mlserver) for better performance and scalability
Building Docker images with mlflow models build-docker
Deploying to cloud platforms like Azure ML or Amazon SageMaker
Setting up proper environment management and dependency isolation
Implementing model monitoring and health checks

Real-World Applications

The MLflow-spaCy integration excels across diverse NLP domains:

📰 Content Analysis: Track sentiment analysis, topic modeling, and content classification systems for media and publishing
🏥 Healthcare NLP: Monitor clinical text processing, medical entity extraction, and diagnostic support systems
💼 Enterprise Search: Log document processing, information extraction, and knowledge management pipelines
🛒 E-commerce Intelligence: Track product categorization, review analysis, and customer intent recognition
📧 Communications Processing: Monitor email classification, chatbot training, and customer service automation
🏛️ Legal Tech: Log contract analysis, document review, and legal entity recognition systems
🌐 Multilingual Applications: Track translation quality, cross-lingual transfer, and international content processing
📊 Business Intelligence: Monitor text analytics, report generation, and automated insights extraction

Conclusion

The MLflow-spaCy integration provides a comprehensive solution for tracking, managing, and deploying production-grade NLP systems. By combining spaCy's industrial-strength capabilities with MLflow's experiment tracking, you create a workflow that is:

🔍 Transparent: Every aspect of NLP model development is documented and trackable
🔄 Reproducible: Experiments can be recreated exactly with proper environment management
📊 Comparable: Different approaches can be evaluated side-by-side with consistent metrics
📈 Scalable: From simple prototypes to enterprise-scale NLP systems
👥 Collaborative: Team members can share and build upon each other's NLP research and development

Whether you're building intelligent chatbots, analyzing customer feedback, or extracting insights from unstructured text, the MLflow-spaCy integration provides the foundation for organized, reproducible, and scalable NLP development that grows with your ambitions from prototype to production-scale deployment.

Logging spaCy Models to MLflow​

Basic Model Logging​

Model Components & Architecture​

Dependencies & Environment​

Deployment Artifacts​

Automatic PyFunc Flavor for Text Classification​

Automatic PyFunc Generation​

Input/Output Format​

Deployment Benefits​

Advanced spaCy Training with MLflow Integration​

Custom Training Logger​

Configuration File Integration​

Advanced Logger Features​

Complete Training Integration Example​

Saving and Loading spaCy Models​

Basic Model Operations​

Native spaCy Loading​

PyFunc Loading (Text Classification Only)​

When to Use Each Approach​

Model Signatures for spaCy Models​

Advanced spaCy Tracking Patterns​

Custom Component Tracking​

Multi-Language Model Tracking​

Benefits of Multilingual Tracking​

Pipeline Optimization Tracking​

Production Deployment​

Local Model Serving​

Real-World Applications​

Conclusion​