mlflow.types

The mlflow.types module defines data types and utilities to be used by other mlflow components to describe interface independent of other frameworks or languages.

class mlflow.types.ColSpec(type: Union[mlflow.types.schema.DataType, mlflow.types.schema.Array, mlflow.types.schema.Object, str], name: Optional[str] = None, optional: Optional[bool] = None, required: Optional[bool] = None)[source]

Bases: object

Specification of name and type of a single column in a dataset.

classmethod from_json_dict(**kwargs)[source]

Deserialize from a json loaded dictionary. The dictionary is expected to contain type and optional name and required keys.

property name

The column name or None if the columns is unnamed.

property optional

Whether this column is optional.

Warning

Deprecated. optional is deprecated in favor of required.

Note

Experimental: This property may change or be removed in a future release without warning.

property required

Whether this column is required.

Note

Experimental: This property may change or be removed in a future release without warning.

property type

The column data type.

class mlflow.types.DataType(value)[source]

Bases: enum.Enum

MLflow data types.

binary = 7

Sequence of raw bytes.

boolean = 1

Logical data (True, False) .

datetime = 8

64b datetime data.

double = 5

64b floating point numbers.

float = 4

32b floating point numbers.

integer = 2

32b signed integer numbers.

long = 3

64b signed integer numbers.

string = 6

Text data.

to_numpy()numpy.dtype[source]

Get equivalent numpy data type.

to_pandas()numpy.dtype[source]

Get equivalent pandas data type.

to_python()[source]

Get equivalent python data type.

class mlflow.types.ParamSchema(params: List[mlflow.types.schema.ParamSpec])[source]

Bases: object

Note

Experimental: This class may change or be removed in a future release without warning.

Specification of parameters applicable to the model. ParamSchema is represented as a list of ParamSpec.

classmethod from_json(json_str: str)[source]

Deserialize from a json string.

property params

Representation of ParamSchema as a list of ParamSpec.

to_dict()List[Dict[str, Any]][source]

Serialize into a jsonable dictionary.

to_json()str[source]

Serialize into json string.

class mlflow.types.ParamSpec(name: str, dtype: Union[mlflow.types.schema.DataType, str], default: Optional[Union[mlflow.types.schema.DataType, List[mlflow.types.schema.DataType]]], shape: Optional[Tuple[int, ]] = None)[source]

Bases: object

Note

Experimental: This class may change or be removed in a future release without warning.

Specification used to represent parameters for the model.

class ParamSpecTypedDict(*args, **kwargs)[source]

Bases: dict

property default

Default value of the parameter.

property dtype

The parameter data type.

classmethod enforce_param_datatype(name, value, dtype: mlflow.types.schema.DataType)[source]

Enforce the value matches the data type.

The following type conversions are allowed:

  1. int -> long, float, double

  2. long -> float, double

  3. float -> double

  4. any -> datetime (try conversion)

Any other type mismatch will raise error.

Parameters
  • name – parameter name

  • value – parameter value

  • t – expected data type

classmethod from_json_dict(**kwargs)[source]

Deserialize from a json loaded dictionary. The dictionary is expected to contain name, type and default keys.

property name

The name of the parameter.

property shape

The parameter shape. If shape is None, the parameter is a scalar.

classmethod validate_type_and_shape(spec: str, value: Optional[Union[mlflow.types.schema.DataType, List[mlflow.types.schema.DataType]]], value_type: mlflow.types.schema.DataType, shape: Optional[Tuple[int, ]])[source]

Validate that the value has the expected type and shape.

class mlflow.types.Schema(inputs: List[Union[mlflow.types.schema.ColSpec, mlflow.types.schema.TensorSpec]])[source]

Bases: object

Specification of a dataset.

Schema is represented as a list of ColSpec or TensorSpec. A combination of ColSpec and TensorSpec is not allowed.

The dataset represented by a schema can be named, with unique non empty names for every input. In the case of ColSpec, the dataset columns can be unnamed with implicit integer index defined by their list indices. Combination of named and unnamed data inputs are not allowed.

as_spark_schema()[source]

Convert to Spark schema. If this schema is a single unnamed column, it is converted directly the corresponding spark data type, otherwise it’s returned as a struct (missing column names are filled with an integer sequence). Unsupported by TensorSpec.

classmethod from_json(json_str: str)[source]

Deserialize from a json string.

has_input_names()bool[source]

Return true iff this schema declares names, false otherwise.

input_dict()Dict[str, Union[mlflow.types.schema.ColSpec, mlflow.types.schema.TensorSpec]][source]

Maps column names to inputs, iff this schema declares names.

input_names()List[Union[str, int]][source]

Get list of data names or range of indices if the schema has no names.

input_types()List[Union[mlflow.types.schema.DataType, numpy.dtype, mlflow.types.schema.Array, mlflow.types.schema.Object]][source]

Get types for each column in the schema.

input_types_dict()Dict[str, Union[mlflow.types.schema.DataType, numpy.dtype, mlflow.types.schema.Array, mlflow.types.schema.Object]][source]

Maps column names to types, iff this schema declares names.

property inputs

Representation of a dataset that defines this schema.

is_tensor_spec()bool[source]

Return true iff this schema is specified using TensorSpec

numpy_types()List[numpy.dtype][source]

Convenience shortcut to get the datatypes as numpy types.

optional_input_names()List[Union[str, int]][source]

Note

Experimental: This function may change or be removed in a future release without warning.

Get list of optional data names or range of indices if schema has no names.

pandas_types()List[numpy.dtype][source]

Convenience shortcut to get the datatypes as pandas types. Unsupported by TensorSpec.

required_input_names()List[Union[str, int]][source]

Get list of required data names or range of indices if schema has no names.

to_dict()List[Dict[str, Any]][source]

Serialize into a jsonable dictionary.

to_json()str[source]

Serialize into json string.

class mlflow.types.TensorSpec(type: numpy.dtype, shape: Union[tuple, list], name: Optional[str] = None)[source]

Bases: object

Specification used to represent a dataset stored as a Tensor.

classmethod from_json_dict(**kwargs)[source]

Deserialize from a json loaded dictionary. The dictionary is expected to contain type and tensor-spec keys.

property name

The tensor name or None if the tensor is unnamed.

property required

Whether this tensor is required.

Note

Experimental: This property may change or be removed in a future release without warning.

property shape

The tensor shape

property type

A unique character code for each of the 21 different numpy built-in types. See https://numpy.org/devdocs/reference/generated/numpy.dtype.html#numpy.dtype for details.

class mlflow.types.llm.ChatChoice(index: int, message: mlflow.types.llm.ChatMessage, finish_reason: str = 'stop', logprobs: Optional[mlflow.types.llm.ChatChoiceLogProbs] = None)[source]

A single chat response generated by the model. ref: https://platform.openai.com/docs/api-reference/chat/object

Parameters
  • index (int) – The index of the response in the list of responses.

  • message (ChatMessage) – The message that was generated.

  • finish_reason (str) – The reason why generation stopped. Optional, defaults to "stop"

  • logprobs (ChatChoiceLogProbs) – Log probability information for the choice.

class mlflow.types.llm.ChatChoiceLogProbs(content: Optional[List[mlflow.types.llm.TokenLogProb]] = None)[source]

Log probability information for the choice.

Parameters

content – A list of message content tokens with log probability information.

class mlflow.types.llm.ChatMessage(role: str, content: str, name: Optional[str] = None)[source]

A message in a chat request or response.

Parameters
  • role (str) – The role of the entity that sent the message (e.g. "user", "system").

  • content (str) – The content of the message.

  • name (str) – The name of the entity that sent the message. Optional.

class mlflow.types.llm.ChatParams(temperature: float = 1.0, max_tokens: Optional[int] = None, stop: Optional[List[str]] = None, n: int = 1, stream: bool = False, top_p: Optional[float] = None, top_k: Optional[int] = None, frequency_penalty: Optional[float] = None, presence_penalty: Optional[float] = None)[source]

Common parameters used for chat inference

Parameters
  • temperature (float) – A param used to control randomness and creativity during inference. Optional, defaults to 1.0

  • max_tokens (int) – The maximum number of new tokens to generate. Optional, defaults to None (unlimited)

  • stop (List[str]) – A list of tokens at which to stop generation. Optional, defaults to None

  • n (int) – The number of responses to generate. Optional, defaults to 1

  • stream (bool) – Whether to stream back responses as they are generated. Optional, defaults to False

  • top_p (float) – An optional param to control sampling with temperature, the model considers the results of the tokens with top_p probability mass. E.g., 0.1 means only the tokens comprising the top 10% probability mass are considered.

  • top_k (int) – An optional param for reducing the vocabulary size to top k tokens (sorted in descending order by their probabilites).

  • frequency_penalty – (float): An optional param of positive or negative value, positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

  • presence_penalty – (float): An optional param of positive or negative value, positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

class mlflow.types.llm.ChatRequest(temperature: float = 1.0, max_tokens: Optional[int] = None, stop: Optional[List[str]] = None, n: int = 1, stream: bool = False, top_p: Optional[float] = None, top_k: Optional[int] = None, frequency_penalty: Optional[float] = None, presence_penalty: Optional[float] = None, messages: List[mlflow.types.llm.ChatMessage] = <factory>)[source]

Format of the request object expected by the chat endpoint.

Parameters
  • messages (List[ChatMessage]) – A list of ChatMessage that will be passed to the model. Optional, defaults to empty list ([])

  • temperature (float) – A param used to control randomness and creativity during inference. Optional, defaults to 1.0

  • max_tokens (int) – The maximum number of new tokens to generate. Optional, defaults to None (unlimited)

  • stop (List[str]) – A list of tokens at which to stop generation. Optional, defaults to None

  • n (int) – The number of responses to generate. Optional, defaults to 1

  • stream (bool) – Whether to stream back responses as they are generated. Optional, defaults to False

class mlflow.types.llm.ChatResponse(choices: List[mlflow.types.llm.ChatChoice], usage: mlflow.types.llm.TokenUsageStats, id: Optional[str] = None, model: Optional[str] = None, object: Literal[chat.completion] = 'chat.completion', created: int = <factory>)[source]

The full response object returned by the chat endpoint.

Parameters
  • choices (List[ChatChoice]) – A list of ChatChoice objects containing the generated responses

  • usage (TokenUsageStats) – An object describing the tokens used by the request.

  • id (str) – The ID of the response. Optional, defaults to None

  • model (str) – The name of the model used. Optional, defaults to None

  • object (str) – The object type. The value should always be ‘chat.completion’

  • created (int) – The time the response was created. Optional, defaults to the current time.

class mlflow.types.llm.TokenLogProb(token: str, logprob: float, top_logprobs: List[mlflow.types.llm.TopTokenLogProb], bytes: Optional[List[int]] = None)[source]

Message content token with log probability information.

Parameters
  • token – The token.

  • logprob – The log probability of this token, if it is within the top 20 most likely tokens. Otherwise, the value -9999.0 is used to signify that the token is very unlikely.

  • bytes – A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be null if there is no bytes representation for the token.

  • top_logprobs – List of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested top_logprobs returned.

class mlflow.types.llm.TokenUsageStats(prompt_tokens: Optional[int] = None, completion_tokens: Optional[int] = None, total_tokens: Optional[int] = None)[source]

Stats about the number of tokens used during inference.

Parameters
  • prompt_tokens (int) – The number of tokens in the prompt. Optional, defaults to None

  • completion_tokens (int) – The number of tokens in the generated completion. Optional, defaults to None

  • total_tokens (int) – The total number of tokens used. Optional, defaults to None

class mlflow.types.llm.TopTokenLogProb(token: str, logprob: float, bytes: Optional[List[int]] = None)[source]

Token and its log probability.

Parameters
  • token – The token.

  • logprob – The log probability of this token, if it is within the top 20 most likely tokens. Otherwise, the value -9999.0 is used to signify that the token is very unlikely.

  • bytes – A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be null if there is no bytes representation for the token.