MLflow AI Gateway (Experimental)
Warning
MLflow AI gateway is deprecated and has been replaced by the deployments API <deployments> for generative AI. See MLflow AI Gateway Migration Guide for migration.
The MLflow AI Gateway service is a powerful tool designed to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. It offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM related requests.
A major advantage of using the MLflow AI Gateway service is its centralized management of API keys. By storing these keys in one secure location, organizations can significantly enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end-users to manage keys safely.
The gateway is designed to be flexible and adaptable, capable of easily defining and managing routes by updating the configuration file. This enables the easy incorporation of new LLM providers or provider LLM types into the system without necessitating changes to applications that interface with the gateway. This level of adaptability makes the MLflow AI Gateway Service an invaluable tool in environments that require agility and quick response to changes.
This simplification and centralization of language model interactions, coupled with the added layer of security for API key management, make the MLflow AI Gateway service an ideal choice for organizations that use LLMs on a regular basis.
Tutorials and Guides
If you’re interested in diving right in to a step by step guide that will get you up and running with the MLflow AI Gateway as fast as possible, the guides below will be your best first stop.
View the AI Gateway Getting Started GuideQuickstart
The following guide will assist you in getting up and running, using a 3-route configuration to OpenAI services for chat, completions, and embeddings.
Step 1: Install the MLflow AI Gateway service
First, you need to install the MLflow AI Gateway service on your machine. You can do this using pip from PyPI or from the MLflow repository.
Step 2: Set the OpenAI API Key(s) for each provider
The Gateway service needs to communicate with the OpenAI API. To do this, it requires an API key. You can create an API key from the OpenAI dashboard.
For this example, we’re only connecting with OpenAI. If there are additional providers within the configuration, these keys will need to be set as well.
Once you have the key, you can set it as an environment variable in your terminal:
export OPENAI_API_KEY=your_api_key_here
This sets a temporary session-based environment variable. For production use cases, it is advisable
to store this key in the .bashrc
or .zshrc
files so that the key doesn’t have to be re-entered upon
system restart.
Step 3: Create a Gateway Configuration File
Next, you need to create a Gateway configuration file. This is a YAML file where you specify the routes that the Gateway service should expose. Let’s create a file with three routes using OpenAI as a provider: completions, chat, and embeddings.
For details about the configuration file’s parameters (including parameters for other providers besides OpenAI), see the AI Gateway Configuration Details section below.
routes:
- name: completions
route_type: llm/v1/completions
model:
provider: openai
name: gpt-4o-mini
config:
openai_api_key: $OPENAI_API_KEY
- name: chat
route_type: llm/v1/chat
model:
provider: openai
name: gpt-4o-mini
config:
openai_api_key: $OPENAI_API_KEY
- name: embeddings
route_type: llm/v1/embeddings
model:
provider: openai
name: text-embedding-ada-002
config:
openai_api_key: $OPENAI_API_KEY
Save this file to a location on the system that is going to be running the MLflow AI Gateway server.
Step 4: Start the Gateway Service
You’re now ready to start the Gateway service!
Use the MLflow AI Gateway start
command and specify the path to your configuration file:
mlflow gateway start --config-path config.yaml --port {port} --host {host} --workers {worker count}
The configuration file can also be set using the MLFLOW_GATEWAY_CONFIG_PATH
environment variable:
export MLFLOW_GATEWAY_CONFIG_PATH=/path/to/config.yaml
If you do not specify the host, a localhost address will be used.
If you do not specify the port, port 5000 will be used.
The worker count for gunicorn defaults to 2 workers.
Step 5: Access the Interactive API Documentation
The MLflow AI Gateway service provides an interactive API documentation endpoint that you can use to explore
and test the exposed routes. Navigate to http://{host}:{port}/
(or http://{host}:{port}/docs
) in your browser to access it.
The docs endpoint allow for direct interaction with the routes and permits submitting actual requests to the provider services by click on the “try it now” option within the endpoint definition entry.
Step 6: Send Requests Using the Fluent API
For information on formatting requirements and how to pass parameters, see Querying the AI Gateway.
Here’s an example of how to send a chat request using the Fluent API :
from mlflow.gateway import query, set_gateway_uri
set_gateway_uri(gateway_uri="http://localhost:5000")
response = query(
"chat",
{"messages": [{"role": "user", "content": "What is the best day of the week?"}]},
)
print(response)
Note: Remember to change the uri definition to the actual uri of your Gateway server.
The returned response will be in this data structure (the actual content and token values will likely be different):
{
"candidates": [
{
"message": {
"role": "assistant",
"content": "\n\nIt's hard to say what the best day of the week is.",
},
"metadata": {"finish_reason": "stop"},
}
],
"metadata": {
"input_tokens": 13,
"output_tokens": 15,
"total_tokens": 28,
"model": "gpt-3.5-turbo-0301",
"route_type": "llm/v1/chat",
},
}
Step 7: Send Requests Using the Client API
See the Client API section for further information.
Step 8: Send Requests to Routes via REST API
You can now send requests to the exposed routes. See the REST examples for guidance on request formatting.
Step 9: Compare Provider Models
Here’s an example of adding a new model from a provider to determine which model instance is better for a given use case.
Firstly, update the MLflow AI Gateway config YAML file with the additional route definition to test:
routes:
- name: completions
route_type: llm/v1/completions
model:
provider: openai
name: gpt-4o-mini
config:
openai_api_key: $OPENAI_API_KEY
- name: completions-gpt4
route_type: llm/v1/completions
model:
provider: openai
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY
This updated configuration adds a new completions route completions-gpt4
while still preserving the original completions
route that was configured with the gpt-4o-mini
model.
Once the configuration file is updated, simply save your changes. The Gateway will automatically create the new route with zero downtime.
At this point, you may use the Fluent API to query both routes with similar prompts to decide which model performs best for your use case.
If you no longer need a route, you can delete it from the configuration YAML and save your changes. The AI Gateway will automatically remove the route.
Step 10: Use AI Gateway routes for model development
Now that you have created several AI Gateway routes, you can create MLflow Models that query these routes to build application-specific logic using techniques like prompt engineering. For more information, see AI Gateway and MLflow Models.
Concepts
There are several concepts that are referred to within the MLflow AI Gateway APIs, the configuration definitions, examples, and documentation. Becoming familiar with these terms will help in configuring new endpoints (routes) and ease the use of the interface APIs for the AI Gateway.
Providers
The MLflow AI Gateway is designed to support a variety of model providers. A provider represents the source of the machine learning models, such as OpenAI, Anthropic, and so on. Each provider has its specific characteristics and configurations that are encapsulated within the model part of a route in the MLflow AI Gateway.
Supported Provider Models
The table below presents a non-exhaustive list of models and a corresponding route type within the MLflow AI Gateway.
With the rapid development of LLMs, there is no guarantee that this list will be up to date at all times. However, the associations listed
below can be used as a helpful guide when configuring a given route for any newly released model types as they become available with a given provider.
N/A
means that the provider or the AI Gateway implementation currently doesn’t support the route type.
Provider |
Routes |
||
---|---|---|---|
llm/v1/completions |
llm/v1/chat |
llm/v1/embeddings |
|
OpenAI |
|
|
|
MosaicML |
|
|
|
Anthropic |
|
N/A |
N/A |
Cohere |
|
N/A |
|
Azure OpenAI |
|
|
|
PaLM |
|
|
|
MLflow |
|
|
|
HuggingFace TGI |
N/A |
|
N/A |
AI21 Labs |
|
N/A |
N/A |
Amazon Bedrock |
|
N/A |
N/A |
Mistral |
|
N/A |
|
TogetherAI |
|
|
|
† Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
Within each model block in the configuration file, the provider field is used to specify the name of the provider for that model. This is a string value that needs to correspond to a provider the MLflow AI Gateway supports.
Note
* MLflow Model Serving will only work for chat or completions if the output return is in a route-compatible format. The
response must conform to either an output of {"predictions": str}
or {"predictions": {"candidates": str}}
. Any complex return type from a model that
does not conform to these structures will raise an exception at query time.
** Embeddings support is only available for models whose response signatures conform to the structured format of {"predictions": List[float]}
or {"predictions": List[List[float]]}
. Any other return type will raise an exception at query time. FeatureExtractionPipeline
in transformers
and
models using the sentence_transformers
flavor will return the correct data structures for the embeddings route.
Here’s an example of a provider configuration within a route:
routes:
- name: chat
route_type: llm/v1/chat
model:
provider: openai
name: gpt-4
config:
openai_api_key: $OPENAI_API_KEY
In the above configuration, openai
is the provider for the model.
As of now, the MLflow AI Gateway supports the following providers:
mosaicml: This is used for models offered by MosaicML.
openai: This is used for models offered by OpenAI and the Azure integrations for Azure OpenAI and Azure OpenAI with AAD.
anthropic: This is used for models offered by Anthropic.
cohere: This is used for models offered by Cohere.
palm: This is used for models offered by PaLM.
huggingface text generation inference: This is used for models deployed using Huggingface Text Generation Inference.
ai21labs: This is used for models offered by AI21 Labs.
bedrock: This is used for models offered by Amazon Bedrock.
mistral: This is used for models offered by Mistral.
togetherai: This is used for models offered by TogetherAI.
More providers are being added continually. Check the latest version of the MLflow AI Gateway Docs for the most up-to-date list of supported providers.
Remember, the provider you specify must be one that the MLflow AI Gateway supports. If the provider is not supported, the Gateway will return an error when trying to route requests to that provider.
Routes
Routes are central to how the MLflow AI Gateway functions. Each route acts as a proxy endpoint for the user, forwarding requests to the underlying Models and Providers specified in the configuration file.
A route in the MLflow AI Gateway consists of the following fields:
name: This is the unique identifier for the route. This will be part of the URL when making API calls via the MLflow AI Gateway.
type: The type of the route corresponds to the type of language model interaction you desire. For instance,
llm/v1/completions
for text completion operations,llm/v1/embeddings
for text embeddings, andllm/v1/chat
for chat operations.model: Defines the model to which this route will forward requests. The model contains the following details:
provider: Specifies the name of the provider for this model. For example,
openai
for OpenAI’sGPT-4o
models.name: The name of the model to use. For example,
gpt-4o-mini
for OpenAI’sGPT-4o-Mini
model.config: Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key.
Here’s an example of a route configuration:
routes:
- name: completions
type: chat/completions
model:
provider: openai
name: gpt-4o-mini
config:
openai_api_key: $OPENAI_API_KEY
In the example above, a request sent to the completions route would be forwarded to the
gpt-4o-mini
model provided by openai
.
The routes in the configuration file can be updated at any time, and the MLflow AI Gateway will automatically update its available routes without requiring a restart. This feature provides you with the flexibility to add, remove, or modify routes as your needs change. It enables ‘hot-swapping’ of routes, providing a seamless experience for any applications or services that interact with the MLflow AI Gateway.
When defining routes in the configuration file, ensure that each name is unique to prevent conflicts.
Duplicate route names will raise an MlflowException
.
Models
The model
section within a route
specifies which model to use for generating responses.
This configuration block needs to contain a name
field which is used to specify the exact model instance to be used.
Additionally, a provider needs to be specified, one that you have an authenticated access api key for.
Different endpoint types are often associated with specific models.
For instance, the llm/v1/chat
and llm/v1/completions
endpoints are generally associated with
conversational models, while llm/v1/embeddings
endpoints would typically be associated with
embedding or transformer models. The model you choose should be appropriate for the type of endpoint specified.
Here’s an example of a model name configuration within a route:
routes:
- name: embeddings
route_type: llm/v1/embeddings
model:
provider: openai
name: text-embedding-ada-002
config:
openai_api_key: $OPENAI_API_KEY
In the above configuration, text-embedding-ada-002
is the model used for the embeddings endpoint.
When specifying a model, it is critical that the provider supports the model you are requesting.
For instance, openai
as a provider supports models like text-embedding-ada-002
, but other providers
may not. If the model is not supported by the provider, the MLflow AI Gateway will return an HTTP 4xx error
when trying to route requests to that model.
Important
Always check the latest documentation of the specified provider to ensure that the model you want to use is supported for the type of endpoint you’re configuring.
Remember, the model you choose directly affects the results of the responses you’ll get from the API calls. Therefore, choose a model that fits your use-case requirements. For instance, for generating conversational responses, you would typically choose a chat model. Conversely, for generating embeddings of text, you would choose an embedding model.
Configuring the AI Gateway
The MLflow AI Gateway service relies on a user-provided configuration file, written in YAML, that defines the routes and providers available to the service. The configuration file dictates how the gateway interacts with various language model providers and determines the end-points that users can access.
AI Gateway Configuration
The configuration file includes a series of sections, each representing a unique route. Each route section has a name, a type, and a model specification, which includes the model provider, name, and configuration details. The configuration section typically contains the base URL for the API and an environment variable for the API key.
Here is an example of a single-route configuration:
routes:
- name: chat
route_type: llm/v1/chat
model:
provider: openai
name: gpt-4o-mini
config:
openai_api_key: $OPENAI_API_KEY
In this example, we define a route named chat
that corresponds to the llm/v1/chat
type, which
will use the gpt-4o-mini
model from OpenAI to return query responses from the OpenAI service.
The Gateway configuration is very easy to update. Simply edit the configuration file and save your changes, and the MLflow AI Gateway service will automatically update the routes with zero disruption or down time. This allows you to try out new providers or model types while keeping your applications steady and reliable.
In order to define an API key for a given provider, there are three primary options:
Directly include it in the YAML configuration file.
Use an environment variable to store the API key and reference it in the YAML configuration file.
Define your API key in a file and reference the location of that key-bearing file within the YAML configuration file.
If you choose to include the API key directly, replace $OPENAI_API_KEY
in the YAML file with your
actual API key.
Warning
The MLflow AI Gateway service provides direct access to billed external LLM services. It is strongly recommended to restrict access to this server. See the section on security for guidance.
If you prefer to use an environment variable (recommended), you can define it in your shell environment. For example:
export OPENAI_API_KEY="your_openai_api_key"
Note: Replace “your_openai_api_key” with your actual OpenAI API key.
AI Gateway Configuration Details
The MLflow AI Gateway service relies on a user-provided configuration file. It defines how the gateway interacts with various language model providers and dictates the routes that users can access.
The configuration file is written in YAML and includes a series of sections, each representing a unique route. Each route section has a name, a type, and a model specification, which includes the provider, model name, and provider-specific configuration details.
Here are the details of each configuration parameter:
General Configuration Parameters
routes: This is a list of route configurations. Each route represents a unique endpoint that maps to a particular language model service.
Each route has the following configuration parameters:
name: This is the name of the route. It needs to be a unique name without spaces or any non-alphanumeric characters other than hyphen and underscore.
route_type: This specifies the type of service offered by this route. This determines the interface for inputs to a route and the returned outputs. Current supported route types are:
“llm/v1/completions”
“llm/v1/chat”
“llm/v1/embeddings”
model: This defines the provider-specific details of the language model. It contains the following fields:
provider: This indicates the provider of the AI model. It accepts the following values:
“openai”
“mosaicml”
“anthropic”
“cohere”
“palm”
“azure” / “azuread”
“mlflow-model-serving”
“huggingface-text-generation-inference”
“ai21labs”
“bedrock”
“mistral”
“togetherai”
name: This is an optional field to specify the name of the model.
config: This contains provider-specific configuration details.
Provider-Specific Configuration Parameters
OpenAI
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
openai_api_key |
Yes |
This is the API key for the OpenAI service. |
|
openai_api_type |
No |
This is an optional field to specify the type of OpenAI API to use. |
|
openai_api_base |
No |
https://api.openai.com/v1 |
This is the base URL for the OpenAI API. |
openai_api_version |
No |
This is an optional field to specify the OpenAI API version. |
|
openai_organization |
No |
This is an optional field to specify the organization in OpenAI. |
MosaicML
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
mosaicml_api_key |
Yes |
N/A |
This is the API key for the MosaicML service. |
Cohere
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
cohere_api_key |
Yes |
N/A |
This is the API key for the Cohere service. |
HuggingFace Text Generation Inference
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
hf_server_url |
Yes |
N/A |
This is the url of the Huggingface TGI Server. |
PaLM
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
palm_api_key |
Yes |
N/A |
This is the API key for the PaLM service. |
AI21 Labs
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
ai21labs_api_key |
Yes |
N/A |
This is the API key for the AI21 Labs service. |
Anthropic
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
anthropic_api_key |
Yes |
N/A |
This is the API key for the Anthropic service. |
Amazon Bedrock
Top-level model configuration for Amazon Bedrock routes must be one of the following two supported authentication modes: key-based or role-based.
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
aws_config |
No |
An object with either the key-based or role-based schema below. |
Mistral
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
mistral_api_key | Yes | N/A | This is the API key for the Mistral service. |
TogetherAI
Configuration Parameter | Required | Default | Description |
|||
togetherai_api_key |
Yes |
N/A |
This is the API key for the TogetherAI service. |
To use key-based authentication, define an Amazon Bedrock route with the required fields below. .. note:
If using a configured route purely for development or testing, utilizing an IAM User role or a temporary short-lived standard IAM role are recommended; while for production deployments, a standard long-expiry IAM role is recommended to ensure that the route is capable of handling authentication for a long period. If the authentication expires and a new set of keys need to be supplied, the route must be recreated in order to persist the new keys.
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
aws_region |
No |
AWS_REGION/AWS_DEFAULT_REGION |
The AWS Region to use for bedrock access. |
aws_secret_access_key |
Yes |
AWS secret access key for the IAM user/role authorized to use bedrock |
|
aws_access_key_id |
Yes |
AWS access key ID for the IAM user/role authorized to use Bedrock |
|
aws_session_token |
No |
None |
Optional session token, if required |
Alternatively, for role-based authentication, an Amazon Bedrock route can be defined and initialized with an a IAM Role ARN that is authorized to access Bedrock. The MLflow AI Gateway will attempt to assume this role with using the standard credential provider chain and will renew the role credentials if they have expired.
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
aws_region |
No |
AWS_REGION/AWS_DEFAULT_REGION |
The AWS Region to use for bedrock access. |
aws_role_arn |
Yes |
An AWS role authorized to use Bedrock. The standard credential provider chain must be able to find credentials authorized to assume this role. |
|
session_length_seconds |
No |
900 |
The length of session to request. |
MLflow Model Serving
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
model_server_url |
Yes |
N/A |
This is the url of the MLflow Model Server. |
Note that with MLflow model serving, the name
parameter for the model
definition is not used for validation and is only present for reference purposes. This alias can be
useful for understanding a particular version or route definition that was used that can be referenced back to a deployed model. You may choose any name that you wish, provided that
it is JSON serializable.
Azure OpenAI
Azure provides two different mechanisms for integrating with OpenAI, each corresponding to a different type of security validation. One relies on an access token for validation, referred to as azure
, while the other uses Azure Active Directory (Azure AD) integration for authentication, termed as azuread
.
To match your user’s interaction and security access requirements, adjust the openai_api_type
parameter to represent the preferred security validation model. This will ensure seamless interaction and reliable security for your Azure-OpenAI integration.
Configuration Parameter |
Required |
Default |
Description |
---|---|---|---|
openai_api_key |
Yes |
This is the API key for the Azure OpenAI service. |
|
openai_api_type |
Yes |
This field must be either |
|
openai_api_base |
Yes |
This is the base URL for the Azure OpenAI API service provided by Azure. |
|
openai_api_version |
Yes |
The version of the Azure OpenAI service to utilize, specified by a date. |
|
openai_deployment_name |
Yes |
This is the name of the deployment resource for the Azure OpenAI service. |
|
openai_organization |
No |
This is an optional field to specify the organization in OpenAI. |
An example configuration for Azure OpenAI is:
routes:
- name: completions
route_type: llm/v1/completions
model:
provider: openai
name: gpt-35-turbo
config:
openai_api_type: "azuread"
openai_api_key: $AZURE_AAD_TOKEN
openai_deployment_name: "{your_deployment_name}"
openai_api_base: "https://{your_resource_name}-azureopenai.openai.azure.com/"
openai_api_version: "2023-05-15"
Note
Azure OpenAI has distinct features as compared with the direct OpenAI service. For an overview, please see the comparison documentation.
For specifying an API key, there are three options:
(Preferred) Use an environment variable to store the API key and reference it in the YAML configuration file. This is denoted by a
$
symbol before the name of the environment variable.(Preferred) Define the API key in a file and reference the location of that key-bearing file within the YAML configuration file.
Directly include it in the YAML configuration file.
Important
The use of environment variables or file-based keys is recommended for better security practices. If the API key is directly included in the configuration file, it should be ensured that the file is securely stored and appropriately access controlled. Please ensure that the configuration file is stored in a secure location as it contains sensitive API keys.
Querying the AI Gateway
Once the MLflow AI Gateway server has been configured and started, it is ready to receive traffic from users.
Standard Query Parameters
The MLflow AI Gateway defines standard parameters for chat, completions, and embeddings that can be used when querying any route regardless of its provider. Each parameter has a standard range and default value. When querying a route with a particular provider, the MLflow AI Gateway automatically scales parameter values according to the provider’s value ranges for that parameter.
Completions
The standard parameters for completions routes with type llm/v1/completions
are:
Query Parameter |
Type |
Required |
Default |
Description |
---|---|---|---|---|
prompt |
string |
Yes |
N/A |
The prompt for which to generate completions. |
n |
integer |
No |
1 |
The number of completions to generate for the specified prompt, between 1 and 5. |
temperature |
float |
No |
0.0 |
The sampling temperature to use, between 0 and 1. Higher values will make the output more random, and lower values will make the output more deterministic. |
max_tokens |
integer |
No |
None |
The maximum completion length, between 1 and infinity (unlimited). |
stop |
array[string] |
No |
None |
Sequences where the model should stop generating tokens and return the completion. |
Chat
The standard parameters for chat routes with type llm/v1/chat
are:
Query Parameter |
Type |
Required |
Default |
Description |
---|---|---|---|---|
messages |
array[message] |
Yes |
N/A |
A list of messages in a conversation from which to a new message (chat completion). For information about the message structure, see Messages. |
n |
integer |
No |
1 |
The number of chat completions to generate for the specified prompt, between 1 and 5. |
temperature |
float |
No |
0.0 |
The sampling temperature to use, between 0 and 1. Higher values will make the output more random, and lower values will make the output more deterministic. |
max_tokens |
integer |
No |
None |
The maximum completion length, between 1 and infinity (unlimited). |
stop |
array[string] |
No |
None |
Sequences where the model should stop generating tokens and return the chat completion. |
Additional Query Parameters
In addition to the Standard Query Parameters, you can pass any additional parameters supported by the route’s provider as part of your query. For example:
logit_bias
(supported by OpenAI, Cohere)top_k
(supported by MosaicML, Anthropic, PaLM, Cohere)frequency_penalty
(supported by OpenAI, Cohere, AI21 Labs)presence_penalty
(supported by OpenAI, Cohere, AI21 Labs)
The following parameters are not allowed:
stream
is not supported. Setting this parameter on any provider will not work currently.
Below is an example of submitting a query request to an MLflow AI Gateway route using additional parameters:
data = {
"prompt": (
"What would happen if an asteroid the size of "
"a basketball encountered the Earth traveling at 0.5c? "
"Please provide your answer in .rst format for the purposes of documentation."
),
"temperature": 0.5,
"max_tokens": 1000,
"n": 1,
"frequency_penalty": 0.2,
"presence_penalty": 0.2,
}
query(route="completions-gpt4", data=data)
The results of the query are:
{
"id": "chatcmpl-8Pr33fsCAtD2L4oZHlyfOkiYHLapc",
"object": "text_completion",
"created": 1701172809,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"text": "If an asteroid the size of a basketball ...",
}
],
"usage": {
"prompt_tokens": 43,
"completion_tokens": 592,
"total_tokens": 635,
},
}
FastAPI Documentation (“/docs”)
FastAPI, the framework used for building the MLflow AI Gateway, provides an automatic interactive API
documentation interface, which is accessible at the “/docs” endpoint (e.g., http://my.gateway:9000/docs
).
This interactive interface is very handy for exploring and testing the available API endpoints.
As a convenience, accessing the root URL (e.g., http://my.gateway:9000
) redirects to this “/docs” endpoint.
MLflow Python Client APIs
MlflowGatewayClient
is the user-facing client API that is used to interact with the MLflow AI Gateway.
It abstracts the HTTP requests to the Gateway via a simple, easy-to-use Python API.
The fluent API is a higher-level interface that supports setting the Gateway URI once and using simple functions to interact with the AI Gateway Server.
Fluent API
For the fluent
API, here are some examples:
Set the Gateway URI:
Before using the Fluent API, the gateway URI must be set via
set_gateway_uri()
.Alternatively to directly calling the
set_gateway_uri
function, the environment variableMLFLOW_GATEWAY_URI
can be set directly, achieving the same session-level persistence for allfluent
API usages.from mlflow.gateway import set_gateway_uri set_gateway_uri(gateway_uri="http://my.gateway:7000")
Query a route:
The
query()
function queries the specified route and returns the response from the provider in a standardized format. The data structure you send in the query depends on the route.from mlflow.gateway import query response = query( "embeddings", {"input": ["It was the best of times", "It was the worst of times"]} ) print(response)
Client API
To use the MlflowGatewayClient
API, see the below examples for the available API methods:
Create an
MlflowGatewayClient
from mlflow.gateway import MlflowGatewayClient gateway_client = MlflowGatewayClient("http://my.gateway:8888")
List all routes:
The
search_routes()
method returns a list of all routes.routes = gateway_client.search_routes() for route in routes: print(route)
Query a route:
The
query()
method submits a query to a configured provider route. The data structure you send in the query depends on the route.response = gateway_client.query( "chat", {"messages": [{"role": "user", "content": "Tell me a joke about rabbits"}]} ) print(response)
LangChain Integration
LangChain supports an integration for MLflow AI Gateway. This integration enable users to use prompt engineering, retrieval augmented generation, and other techniques with LLMs in the gateway.
import mlflow
from langchain import LLMChain, PromptTemplate
from langchain.llms import MlflowAIGateway
gateway = MlflowAIGateway(
gateway_uri="http://127.0.0.1:5000",
route="completions",
params={
"temperature": 0.0,
"top_p": 0.1,
},
)
llm_chain = LLMChain(
llm=gateway,
prompt=PromptTemplate(
input_variables=["adjective"],
template="Tell me a {adjective} joke",
),
)
result = llm_chain.run(adjective="funny")
print(result)
with mlflow.start_run():
model_info = mlflow.langchain.log_model(chain, "model")
model = mlflow.pyfunc.load_model(model_info.model_uri)
print(model.predict([{"adjective": "funny"}]))
MLflow Models
Interfacing with MLflow Models can be done in two ways. With the use of a custom PyFunc Model, a query can be issued directly to an AI Gateway endpoint and used in a broader context within a model. Data may be augmented, manipulated, or used in a mixture of experts paradigm. The other means of utilizing the AI Gateway along with MLflow Models is to define a served MLflow model directly as a route within the AI Gateway.
Using the AI Gateway to Query a served MLflow Model
For a full walkthrough and example of using the MLflow serving integration to query a model directly through the MLflow AI Gateway, please see the full example. Within the guide, you will see the entire end-to-end process of serving multiple models from different servers and configuring an MLflow AI Gateway server instance to provide a single unified point to handle queries from.
Using an MLflow Model to Query the AI Gateway
You can also build and deploy MLflow Models that call the MLflow AI Gateway.
The example below demonstrates how to use an AI Gateway server from within a custom pyfunc
model.
Note
The custom Model
shown in the example below is utilizing environment variables for the AI Gateway server’s uri. These values can also be set manually within the
definition or can be applied via mlflow.gateway.get_gateway_uri()
after the uri has been set. For the example below, the value for MLFLOW_GATEWAY_URI
is
http://127.0.0.1:5000/
. For an actual deployment use case, this value would be set to the configured and production deployment server.
import os
import pandas as pd
import mlflow
def predict(data):
from mlflow.gateway import MlflowGatewayClient
client = MlflowGatewayClient(os.environ["MLFLOW_GATEWAY_URI"])
payload = data.to_dict(orient="records")
return [
client.query(route="completions-claude", data=query)["choices"][0]["text"]
for query in payload
]
input_example = pd.DataFrame.from_dict(
{"prompt": ["Where is the moon?", "What is a comet made of?"]}
)
signature = mlflow.models.infer_signature(
input_example, ["Above our heads.", "It's mostly ice and rocks."]
)
with mlflow.start_run():
model_info = mlflow.pyfunc.log_model(
python_model=predict,
registered_model_name="anthropic_completions",
artifact_path="anthropic_completions",
input_example=input_example,
signature=signature,
)
df = pd.DataFrame.from_dict(
{
"prompt": ["Tell me about Jupiter", "Tell me about Saturn"],
"temperature": 0.6,
"max_records": 500,
}
)
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)
print(loaded_model.predict(df))
This custom MLflow model can be used in the same way as any other MLflow model. It can be used within a spark_udf
, used with mlflow.evaluate()
, or deploy like any other model.
REST API
The REST API allows you to send HTTP requests directly to the MLflow AI Gateway server. This is useful if you’re not using Python or if you prefer to interact with the Gateway using HTTP directly.
Here are some examples for how you might use curl to interact with the Gateway:
Get information about a particular route:
GET /api/2.0/gateway/routes/{name}
This endpoint returns a serialized representation of the Route data structure. This provides information about the name and type, as well as the model details for the requested route endpoint.curl -X GET http://my.gateway:8888/api/2.0/gateway/routes/embeddings
List all routes:
GET /api/2.0/gateway/routes/
This endpoint returns a list of all routes.
curl -X GET http://my.gateway:8888/api/2.0/gateway/routes/
Query a route:
POST /gateway/{route}/invocations
This endpoint allows you to submit a query to a configured provider route. The data structure you send in the query depends on the route. Here are examples for the “completions”, “chat”, and “embeddings” routes:
Completions
curl -X POST http://my.gateway:8888/gateway/completions/invocations \ -H "Content-Type: application/json" \ -d '{"prompt": "Describe the probability distribution of the decay chain of U-235"}'
Chat
curl -X POST http://my.gateway:8888/gateway/chat/invocations \ -H "Content-Type: application/json" \ -d '{"messages": [{"role": "user", "content": "Can you write a limerick about orange flavored popsicles?"}]}'
Embeddings
curl -X POST http://my.gateway:8888/gateway/embeddings/invocations \ -H "Content-Type: application/json" \ -d '{"input": ["I would like to return my shipment of beanie babies, please", "Can I please speak to a human now?"]}'
Note: Remember to replace http://my.gateway:8888
with the URL of your actual MLflow AI Gateway Server.
AI Gateway Security Considerations
Remember to ensure secure access to the system that the MLflow AI Gateway service is running in to protect access to these keys.
An effective way to secure your MLflow AI Gateway service is by placing it behind a reverse proxy. This will allow the reverse proxy to handle incoming requests and forward them to the MLflow AI Gateway. The reverse proxy effectively shields your application from direct exposure to Internet traffic.
A popular choice for a reverse proxy is Nginx. In addition to handling the traffic to your application, Nginx can also serve static files and load balance the traffic if you have multiple instances of your application running.
Furthermore, to ensure the integrity and confidentiality of data between the client and the server, it’s highly recommended to enable HTTPS on your reverse proxy.
In addition to the reverse proxy, it’s also recommended to add an authentication layer before the requests reach the MLflow AI Gateway. This could be HTTP Basic Authentication, OAuth, or any other method that suits your needs.
For example, here’s a simple configuration for Nginx with Basic Authentication:
http {
server {
listen 80;
location / {
auth_basic "Restricted Content";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:5000; # Replace with the MLflow AI Gateway service port
}
}
}
In this example, /etc/nginx/.htpasswd is a file that contains the username and password for authentication.
These measures, together with a proper network setup, can significantly improve the security of your system and ensure that only authorized users have access to submit requests to your LLM services.
LangChain Integration
LangChain supports an integration for MLflow AI Gateway. See https://python.langchain.com/docs/ecosystem/integrations/mlflow_ai_gateway for more information.