Evaluate & Monitor

MLflow's evaluation and monitoring capabilities help you systematically measure, improve, and maintain the quality of your GenAI applications throughout their lifecycle from development through production.

LLM Evaluation (Legacy)

MLflow provides LLM evaluation capabilities built on top of the classic mlflow.evaluate API. If you are using self-hosted or local MLflow, click this card to learn more about available evaluation options.

New Evaluation Suite for LLMs/GenAI (Managed-Only)

MLflow 3 introduces a new evaluation suite for LLMs/GenAI. This new suite is only available in Managed MLflow on Databricks, but are coming soon to OSS MLflow. If you are interested in trying it out with free Databricks trial, click this card to learn more.