MLflow Langchain Autologging

MLflow LangChain flavor supports autologging, a powerful feature that allows you to log crucial details about the LangChain model and execution without the need for explicit logging statements. MLflow LangChain autologging covers various aspects of the model, including traces, models, signatures and more.

Attention

MLflow’s LangChain Autologging feature has been overhauled in the MLflow 2.14.0 release. If you are using the earlier version of MLflow, please refer to the legacy documentation here for applicable autologging documentation.

Note

MLflow LangChain Autologging is verified to be compatible with LangChain versions between 0.1.0 and 0.2.3. Outside of this range, the feature may not work as expected. To install the compatible version of LangChain, please run the following command:

pip install mlflow[langchain] --upgrade

Quickstart

To enable autologging for LangChain models, call mlflow.langchain.autolog() at the beginning of your script or notebook. This will automatically log the traces by default as well as other artifacts such as models, input examples, and model signatures if you explicitly enable them. For more information about the configuration, please refer to the Configure Autologging section.

import mlflow

mlflow.langchain.autolog()

# Enable other optional logging
# mlflow.langchain.autolog(log_models=True, log_input_examples=True)

# Your LangChain model code here
...

Once you have invoked the chain, you can view the logged traces and artifacts in the MLflow UI.

LangChain Tracing via autolog

Configure Autologging

MLflow LangChain autologging can log various information about the model and its inference. By default, only trace logging is enabled, but you can enable autologging of other information by setting the corresponding parameters when calling mlflow.langchain.autolog(). For other configurations, please refer to the API documentation.

Target

Default

Parameter

Description

Traces

true

log_traces

Whether to generate and log traces for the model. See MLflow Tracing for more details about tracing feature.

Model Artifacts

false

log_models

If set to True, the LangChain model will be logged when it is invoked. Supported models are Chain, AgentExecutor, BaseRetriever, SimpleChatModel, ChatPromptTemplate, and subset of Runnable types. Please refer to the MLflow repository for the full list of supported models.

Model Signatures

false

log_model_signatures

If set to True, ModelSignatures describing model inputs and outputs are collected and logged along with Langchain model artifacts during inference. This option is only available when log_models is enabled.

Input Example

false

log_input_examples

If set to True, input examples from inference data are collected and logged along with LangChain model artifacts during inference. This option is only available when log_models is enabled.

Inputs and Outputs (Deprecated)

false

log_inputs_outputs

If set to True, the inputs and outputs will be logged when the model is invoked. This feature is deprecated and will be removed in the future. Please use log_traces instead.

For example, to disable logging of traces, and instead enable model logging, run the following code:

import mlflow

mlflow.langchain.autolog(
    log_traces=False,
    log_models=True,
)

Note

MLflow does not support automatic model logging for chains that contain retrievers. Saving retrievers requires additional loader_fn and persist_dir information for loading the model. If you want to log the model with retrievers, please log the model manually as shown in the retriever_chain example.

Example Code of LangChain Autologging

import os
from operator import itemgetter

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda

import mlflow

# Uncomment the following to use the full abilities of langchain autologgin
# %pip install `langchain_community>=0.0.16`
# These two libraries enable autologging to log text analysis related artifacts
# %pip install textstat spacy

assert "OPENAI_API_KEY" in os.environ, "Please set the OPENAI_API_KEY environment variable."

# Enable mlflow langchain autologging
# Note: We only support auto-logging models that do not contain retrievers
mlflow.langchain.autolog(
    log_input_examples=True,
    log_model_signatures=True,
    log_models=True,
    log_inputs_outputs=True,
    registered_model_name="lc_model",
)

prompt_with_history_str = """
Here is a history between you and a human: {chat_history}

Now, please answer this question: {question}
"""
prompt_with_history = PromptTemplate(
    input_variables=["chat_history", "question"], template=prompt_with_history_str
)


def extract_question(input):
    return input[-1]["content"]


def extract_history(input):
    return input[:-1]


llm = OpenAI(temperature=0.9)

# Build a chain with LCEL
chain_with_history = (
    {
        "question": itemgetter("messages") | RunnableLambda(extract_question),
        "chat_history": itemgetter("messages") | RunnableLambda(extract_history),
    }
    | prompt_with_history
    | llm
    | StrOutputParser()
)

inputs = {"messages": [{"role": "user", "content": "Who owns MLflow?"}]}

print(chain_with_history.invoke(inputs))
# sample output:
# "1. Databricks\n2. Microsoft\n3. Google\n4. Amazon\n\nEnter your answer: 1\n\n
# Correct! MLflow is an open source project developed by Databricks. ...

# We automatically log the model and trace related artifacts
# A model with name `lc_model` is registered, we can load it back as a PyFunc model
model_name = "lc_model"
model_version = 1
loaded_model = mlflow.pyfunc.load_model(f"models:/{model_name}/{model_version}")
print(loaded_model.predict(inputs))

How It Works

MLflow LangChain Autologging uses two ways to log traces and other artifacts. Tracing is made possible via the Callbacks framework of LangChain. Other artifacts are recorded by patching the invocation functions of the supported models. In typical scenarios, you don’t need to care about the internal implementation details, but this section provides a brief overview of how it works under the hood.

MLflow Tracing Callbacks

MlflowLangchainTracer is a callback handler that is injected into the langchain model inference process to log traces automatically. It starts a new span upon a set of actions of the chain such as on_chain_start, on_llm_start, and concludes it when the action is finished. Various metadata such as span type, action name, input, output, latency, are automatically recorded to the span.

Customize Callback

Sometimes you may want to customize what information is logged in the traces. You can achieve this by creating a custom callback handler that inherits from MlflowLangchainTracer. The following example demonstrates how to record an additional attribute to the span when a chat model starts running.

from mlflow.langchain.langchain_tracer import MlflowLangchainTracer

class CustomLangchainTracer(MlflowLangchainTracer):

    # Override the handler functions to customize the behavior. The method signature is defined by LangChain Callbacks.
    def on_chat_model_start(
        self,
        serialized: Dict[str, Any],
        messages: List[List[BaseMessage]],
        *,
        run_id: UUID,
        tags: Optional[List[str]] = None,
        parent_run_id: Optional[UUID] = None,
        metadata: Optional[Dict[str, Any]] = None,
        name: Optional[str] = None,
        **kwargs: Any,
    ):
        """Run when a chat model starts running."""
        attributes = {
            **kwargs,
            **metadata,
            # Add additional attribute to the span
            "version": "1.0.0",
        }

        # Call the _start_span method at the end of the handler function to start a new span.
        self._start_span(
            span_name=name or self._assign_span_name(serialized, "chat model"),
            parent_run_id=parent_run_id,
            span_type=SpanType.CHAT_MODEL,
            run_id=run_id,
            inputs=messages,
            attributes=kwargs,
        )

Patch Functions for Logging Artifacts

Other artifacts such as models are logged by patching the invocation functions of the supported models to insert the logging call. MLflow patches the following functions:

  • invoke

  • batch

  • stream

  • get_relevant_documents (for retrievers)

  • __call__ (for Chains and AgentExecutors)

  • ainvoke

  • abatch

  • astream

Warning

MLflow supports autologging for async functions (e.g., ainvoke, abatch, astream), however, the logging operation is not asynchronous and may block the main thread. The invocation function itself is still not blocking and returns a coroutine object, but the logging overhead may slow down the model inference process. Please be aware of this side effect when using async functions with autologging.

Troubleshooting

If you encounter any issues with MLflow LangChain flavor, please also refer to FAQ <../index.html#faq>. If you still have questions, please feel free to open an issue in MLflow Github repo.

How to suppress the warning messages during autologging?

MLflow Langchain Autologging calls various logging functions and LangChain utilities under the hood. Some of them may generate warning messages that are not critical to the autologging process. If you want to suppress these warning messages, pass silent=True to the mlflow.langchain.autolog() function.

import mlflow

mlflow.langchain.autolog(silent=True)

# No warning messages will be emitted from autologging

I can’t load the model logged by mlflow langchain autologging

There are a few type of models that MLflow LangChain autologging does not support native saving or loading.

  • Model contains langchain retrievers

    LangChain retrievers are not supported by MLflow autologging. If your model contains a retriever, you will need to manually log the model using the mlflow.langchain.log_model API. As loading those models requires specifying loader_fn and persist_dir parameters, please check examples in retriever_chain

  • Can’t pickle certain objects

    For certain models that LangChain does not support native saving or loading, we will pickle the object when saving it. Due to this functionality, your cloudpickle version must be consistent between the saving and loading environments to ensure that object references resolve properly. For further guarantees of correct object representation, you should ensure that your environment has pydantic installed with at least version 2.

Documentation for Old Versions

MLflow LangChain Autologging feature is largely renewed in MLflow 2.14.0. If you are using the earlier version of MLflow, please refer to following documentation.

Note

To use MLflow LangChain autologging, please upgrade langchain to version 0.1.0 or higher. Depending on your existing environment, you may need to manually install langchain_community>=0.0.16 in order to enable the automatic logging of artifacts and metrics. (this behavior will be modified in the future to be an optional import) If autologging doesn’t log artifacts as expected, please check the warning messages in stdout logs. For langchain_community==0.0.16, you will need to install the textstat and spacy libraries manually, as well as restarting any active interactive environment (i.e., a notebook environment). On Databricks, you can achieve this via executing dbutils.library.restartPython() to force the Python REPL to restart, allowing the newly installed libraries to be available.

MLflow langchain autologging injects MlflowCallbackHandler into the langchain model inference process to log metrics and artifacts automatically. We will only log the model if both log_models is set to True when calling mlflow.langchain.autolog() and the objects being invoked are within the supported model types: Chain, AgentExecutor, BaseRetriever, RunnableSequence, RunnableParallel, RunnableBranch, SimpleChatModel, ChatPromptTemplate, RunnableLambda, RunnablePassthrough. Additional model types will be supported in the future.

Note

We patch the invoke function for all supported langchain models, the __call__ function for Chains and AgentExecutors models, and get_relevant_documents function for BaseRetrievers so that only when those functions are called will MLflow automatically log metrics and artifacts. If the model contains retrievers, we don’t support autologging the model because it requires saving loader_fn and persist_dir in order to load the model. Please log the model manually if you want to log the model with retrievers.

The following metrics and artifacts are logged by default (depending on the models involved):

Artifacts:

Artifact name

Explanation

table_action_records.html

Each action’s details, including chains, tools, llms, agents, retrievers.

table_session_analysis.html

Details about prompt and output for each prompt step; token usages; text analysis metrics

chat_html.html

LLM input and output details

llm_start_x_prompt_y.json

Includes prompt and kwargs passed during llm generate call

llm_end_x_generation_y.json

Includes llm_output of the LLM result

ent-<hash string of generation.text>.html

Visualization of the generation text using spacy “en_core_web_sm” model with style ent (if spacy is installed and the model is downloaded)

dep-<hash string of generation.text>.html

Visualization of the generation text using spacy “en_core_web_sm” model with style dep (if spacy is installed and the model is downloaded)

llm_new_tokens_x.json

Records new tokens added to the LLM during inference

chain_start_x.json

Records the inputs and chain related information during inference

chain_end_x.json

Records the chain outputs

tool_start_x.json

Records the tool’s name, descriptions information during inference

tool_end_x.json

Records observation of the tool

retriever_start_x.json

Records the retriever’s information during inference

retriever_end_x.json

Records the retriever’s result documents

agent_finish_x.json

Records final return value of the ActionAgent, including output and log

agent_action_x.json

Records the ActionAgent’s action details

on_text_x.json

Records the text during inference

inference_inputs_outputs.json

Input and output details for each inference call (logged by default, can be turned off by setting log_inputs_outputs=False when turn on autolog)

Metrics:

Metric types

Details

Basic Metrics

step, starts, ends, errors, text_ctr, chain_starts, chain_ends, llm_starts llm_ends, llm_streams, tool_starts, tool_ends, agent_ends, retriever_ends retriever_starts (they’re the count number of each component invocation)

Text Analysis Metrics

flesch_reading_ease, flesch_kincaid_grade, smog_index, coleman_liau_index automated_readability_index, dale_chall_readability_score, difficult_words, linsear_write_formula, gunning_fog, fernandez_huerta, szigriszt_pazos, gutierrez_polini, crawford, gulpease_index, osman (they’re the text analysis metrics of the generation text if textstat library is installed)

Note

Each inference call logs those artifacts into a separate directory named artifacts-<session_id>-<idx>, where session_id is randomly generated uuid, and idx is the index of the inference call. session_id is also preserved in the inference_inputs_outputs.json file, so you can easily find the corresponding artifacts for each inference call.