Plugin Evaluators
MLflow's evaluation framework is designed for extensibility, allowing specialized evaluation plugins to seamlessly integrate with the core evaluation workflow. These plugins extend MLflow's capabilities with domain-specific validation, advanced vulnerability scanning, and specialized testing frameworks developed by the broader ML community.
Available Pluginsβ
MLflow currently supports two powerful evaluation plugins that bring specialized validation capabilities to your model evaluation workflows:
Giskard Plugin - Advanced Vulnerability Scanningβ
The Giskard plugin extends MLflow's validation capabilities to help anticipate issues before they reach production. This comprehensive scanning tool detects hidden vulnerabilities that traditional metrics might miss.
Key Capabilitiesβ
Vulnerability Detection: Giskard scans models to identify critical issues including:
- Performance bias - Unequal performance across different groups
- Unrobustness - Sensitivity to small input changes
- Overconfidence - Excessive confidence in predictions
- Underconfidence - Insufficient confidence in accurate predictions
- Ethical bias - Discriminatory behavior patterns
- Data leakage - Information bleeding from target to features
- Stochasticity - Unpredictable model behavior
- Spurious correlation - False causal relationships
Analysis Features:
- π Sample Exploration: Examine specific data samples that highlight discovered vulnerabilities
- π Quantified Metrics: Log vulnerabilities as well-defined, measurable metrics within MLflow
- π Model Comparison: Compare vulnerability metrics across different model versions and architectures
Getting Started with Giskardβ
Explore these example implementations to see Giskard in action:
- Tabular ML Models - Traditional supervised learning vulnerability assessment
- Text ML Models (LLMs) - Language model specific vulnerability scanning
For comprehensive documentation and setup instructions, visit the Giskard-MLflow integration docs.
Trubrics Plugin - Flexible Validation Frameworkβ
The Trubrics plugin provides a flexible validation framework that extends MLflow's evaluation capabilities with custom validation logic and comprehensive result reporting.
Key Capabilitiesβ
Validation Features:
- π Out-of-the-box Validations: Large library of pre-built validation checks for common ML scenarios
- π§ Custom Python Functions: Validate runs using any custom Python function or business logic
- π Comprehensive Reporting: View all validation results in structured JSON format for easy diagnosis
Workflow Integration:
- β‘ Flexible Validation Logic: Define validation criteria that match your specific use case requirements
- π Detailed Diagnostics: Understand exactly why an MLflow run might have failed validation
- π Result Tracking: Maintain complete validation history alongside your model experiments
Getting Started with Trubricsβ
See the plugin in action with the official example notebook, which demonstrates common validation patterns and integration workflows.
For complete documentation and setup instructions, visit the Trubrics-MLflow integration docs.
Integration Benefitsβ
Plugin evaluators seamlessly integrate with MLflow's existing evaluation framework, providing:
- π Unified Workflow: Use plugins alongside standard MLflow evaluators in the same evaluation run
- π Consistent Reporting: Plugin results appear in MLflow's tracking interface with other evaluation metrics
- ποΈ Extensible Architecture: Easy integration path for custom evaluation tools and frameworks
- π Scalable Validation: Run plugin evaluations as part of automated model validation pipelines
Next Stepsβ
Ready to enhance your model evaluation with specialized plugins?
- Choose Your Plugin: Select Giskard for vulnerability scanning or Trubrics for flexible validation
- Review Examples: Explore the provided example notebooks to understand integration patterns
- Install and Configure: Follow the plugin-specific documentation for setup instructions
- Integrate with MLflow: Add plugin evaluators to your existing
mlflow.evaluate()
workflows
These powerful plugins demonstrate the extensibility of MLflow's evaluation framework and provide immediate access to specialized validation capabilities developed by domain experts in the ML community.