Plugin Evaluators

MLflow's evaluation framework is designed for extensibility, allowing specialized evaluation plugins to seamlessly integrate with the core evaluation workflow. These plugins extend MLflow's capabilities with domain-specific validation, advanced vulnerability scanning, and specialized testing frameworks developed by the broader ML community.

Available Plugins

MLflow currently supports two powerful evaluation plugins that bring specialized validation capabilities to your model evaluation workflows:

Giskard Plugin - Advanced Vulnerability Scanning

The Giskard plugin extends MLflow's validation capabilities to help anticipate issues before they reach production. This comprehensive scanning tool detects hidden vulnerabilities that traditional metrics might miss.

Key Capabilities

Vulnerability Detection: Giskard scans models to identify critical issues including:

Performance bias - Unequal performance across different groups
Unrobustness - Sensitivity to small input changes
Overconfidence - Excessive confidence in predictions
Underconfidence - Insufficient confidence in accurate predictions
Ethical bias - Discriminatory behavior patterns
Data leakage - Information bleeding from target to features
Stochasticity - Unpredictable model behavior
Spurious correlation - False causal relationships

Analysis Features:

🔍 Sample Exploration: Examine specific data samples that highlight discovered vulnerabilities
📊 Quantified Metrics: Log vulnerabilities as well-defined, measurable metrics within MLflow
🔄 Model Comparison: Compare vulnerability metrics across different model versions and architectures

Getting Started with Giskard

Explore these example implementations to see Giskard in action:

Tabular ML Models - Traditional supervised learning vulnerability assessment
Text ML Models (LLMs) - Language model specific vulnerability scanning

For comprehensive documentation and setup instructions, visit the Giskard-MLflow integration docs.

Trubrics Plugin - Flexible Validation Framework

The Trubrics plugin provides a flexible validation framework that extends MLflow's evaluation capabilities with custom validation logic and comprehensive result reporting.

Key Capabilities

Validation Features:

📋 Out-of-the-box Validations: Large library of pre-built validation checks for common ML scenarios
🔧 Custom Python Functions: Validate runs using any custom Python function or business logic
📊 Comprehensive Reporting: View all validation results in structured JSON format for easy diagnosis

Workflow Integration:

⚡ Flexible Validation Logic: Define validation criteria that match your specific use case requirements
🔍 Detailed Diagnostics: Understand exactly why an MLflow run might have failed validation
📈 Result Tracking: Maintain complete validation history alongside your model experiments

Getting Started with Trubrics

See the plugin in action with the official example notebook, which demonstrates common validation patterns and integration workflows.

For complete documentation and setup instructions, visit the Trubrics-MLflow integration docs.

Integration Benefits

Plugin evaluators seamlessly integrate with MLflow's existing evaluation framework, providing:

🔄 Unified Workflow: Use plugins alongside standard MLflow evaluators in the same evaluation run
📊 Consistent Reporting: Plugin results appear in MLflow's tracking interface with other evaluation metrics
🏗️ Extensible Architecture: Easy integration path for custom evaluation tools and frameworks
📈 Scalable Validation: Run plugin evaluations as part of automated model validation pipelines

Next Steps

Ready to enhance your model evaluation with specialized plugins?

Choose Your Plugin: Select Giskard for vulnerability scanning or Trubrics for flexible validation
Review Examples: Explore the provided example notebooks to understand integration patterns
Install and Configure: Follow the plugin-specific documentation for setup instructions
Integrate with MLflow: Add plugin evaluators to your existing mlflow.evaluate() workflows

These powerful plugins demonstrate the extensibility of MLflow's evaluation framework and provide immediate access to specialized validation capabilities developed by domain experts in the ML community.

Available Plugins​

Giskard Plugin - Advanced Vulnerability Scanning​

Key Capabilities​

Getting Started with Giskard​

Trubrics Plugin - Flexible Validation Framework​

Key Capabilities​

Getting Started with Trubrics​

Integration Benefits​

Next Steps​

Available Plugins

Giskard Plugin - Advanced Vulnerability Scanning

Key Capabilities

Getting Started with Giskard

Trubrics Plugin - Flexible Validation Framework

Key Capabilities

Getting Started with Trubrics

Integration Benefits

Next Steps