Introduction to Sentence Transformers and MLflow
Welcome to our tutorial on leveraging Sentence Transformers with MLflow for advanced natural language processing and model management.
Learning Objectives​
- Set up a pipeline for sentence embeddings with
sentence-transformers
. - Log models and configurations using MLflow.
- Understand and apply model signatures in MLflow to
sentence-transformers
. - Deploy and use models for inference with MLflow's features.
What are Sentence Transformers?​
Sentence Transformers, an extension of the Hugging Face Transformers library, are designed for generating semantically rich sentence embeddings. They utilize models like BERT and RoBERTa, fine-tuned for tasks such as semantic search and text clustering, producing high-quality sentence-level embeddings.
Benefits of Integrating MLflow with Sentence Transformers​
Combining MLflow with Sentence Transformers enhances NLP projects by:
- Streamlining experiment management and logging.
- Offering better control over model versions and configurations.
- Ensuring reproducibility of results and model predictions.
- Simplifying the deployment process in production environments.
This integration empowers efficient tracking, management, and deployment of NLP applications.
# Disable tokenizers warnings when constructing pipelines
%env TOKENIZERS_PARALLELISM=false
import warnings
# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category=UserWarning)
env: TOKENIZERS_PARALLELISM=false
Setting Up the Environment for Sentence Embedding​
Begin your journey with Sentence Transformers and MLflow by establishing the core working environment.
Key Steps for Initialization​
- Import necessary libraries:
SentenceTransformer
andmlflow
. - Initialize the
"all-MiniLM-L6-v2"
Sentence Transformer model.
Model Initialization​
The compact and efficient "all-MiniLM-L6-v2"
model is chosen for its effectiveness in generating meaningful sentence embeddings. Explore more models at the Hugging Face Hub.
Purpose of the Model​
This model excels in transforming sentences into semantically rich embeddings, applicable in various NLP tasks like semantic search and clustering.
from sentence_transformers import SentenceTransformer
import mlflow
model = SentenceTransformer("all-MiniLM-L6-v2")
Defining the Model Signature with MLflow​
Defining the model signature is a crucial step in setting up our Sentence Transformer model for consistent and expected behavior during inference.
Steps for Signature Definition​
- Prepare Example Sentences: Define example sentences to demonstrate the model's input and output formats.
- Generate Model Signature: Use the
mlflow.models.infer_signature
function with the model's input and output to automatically define the signature.
Importance of the Model Signature​
- Clarity in Data Formats: Ensures clear documentation of the data types and structures the model expects and produces.
- Model Deployment and Usage: Crucial for deploying models to production, ensuring the model receives inputs in the correct format and produces expected outputs.
- Error Prevention: Helps in preventing errors during model inference by enforcing consistent data formats.
NOTE: The List[str]
input type is equivalent at inference time to str
. The MLflow flavor uses a ColSpec[str]
definition for the input type.
example_sentences = ["A sentence to encode.", "Another sentence to encode."]
# Infer the signature of the custom model by providing an input example and the resultant prediction output.
# We're not including any custom inference parameters in this example, but you can include them as a third argument
# to infer_signature(), as you will see in the advanced tutorials for Sentence Transformers.
signature = mlflow.models.infer_signature(
model_input=example_sentences,
model_output=model.encode(example_sentences),
)
# Visualize the signature
signature
inputs: [string] outputs: [Tensor('float32', (-1, 384))] params: None
Creating an experiment​
We create a new MLflow Experiment so that the run we're going to log our model to does not log to the default experiment and instead has its own contextually relevant entry.
# If you are running this tutorial in local mode, leave the next line commented out.
# Otherwise, uncomment the following line and set your tracking uri to your local or remote tracking server.
# mlflow.set_tracking_uri("http://127.0.0.1:8080")
mlflow.set_experiment("Introduction to Sentence Transformers")
<Experiment: artifact_location='file:///Users/benjamin.wilson/repos/mlflow-fork/mlflow/docs/source/llms/sentence-transformers/tutorials/quickstart/mlruns/469990615226680434', creation_time=1701280211449, experiment_id='469990615226680434', last_update_time=1701280211449, lifecycle_stage='active', name='Introduction to Sentence Transformers', tags={}>
Logging the Sentence Transformer Model with MLflow​
Logging the model in MLflow is essential for tracking, version control, and deployment, following the initialization and signature definition of our Sentence Transformer model.
Steps for Logging the Model​
- Start an MLflow Run: Initiate a new run with
mlflow.start_run()
, grouping all logging operations. - Log the Model: Use
mlflow.sentence_transformers.log_model
to log the model, providing the model object, artifact path, signature, and an input example.
Importance of Model Logging​
- Model Management: Facilitates the model's lifecycle management from training to deployment.
- Reproducibility and Tracking: Enables tracking of model versions and ensures reproducibility.
- Ease of Deployment: Simplifies deployment by allowing models to be easily deployed for inference.
with mlflow.start_run():
logged_model = mlflow.sentence_transformers.log_model(
model=model,
name="sbert_model",
signature=signature,
input_example=example_sentences,
)
Loading the Model and Testing Inference​
After logging the Sentence Transformer model in MLflow, we demonstrate how to load and test it for real-time inference.
Loading the Model as a PyFunc​
- Why PyFunc: Load the logged model using
mlflow.pyfunc.load_model
for seamless integration into Python-based services or applications. - Model URI: Use the
logged_model.model_uri
to accurately locate and load the model from MLflow.
Conducting Inference Tests​
- Test Sentences: Define sentences to test the model's embedding generation capabilities.
- Performing Predictions: Use the model's
predict
method with test sentences to obtain embeddings. - Printing Embedding Lengths: Verify embedding generation by checking the length of embedding arrays, corresponding to the dimensionality of each sentence representation.
Importance of Inference Testing​
- Model Validation: Confirm the model's expected behavior and data processing capability upon loading.
- Deployment Readiness: Validate the model's readiness for real-time integration into application services.
inference_test = ["I enjoy pies of both apple and cherry.", "I prefer cookies."]
# Load our custom model by providing the uri for where the model was logged.
loaded_model_pyfunc = mlflow.pyfunc.load_model(logged_model.model_uri)
# Perform a quick test to ensure that our loaded model generates the correct output
embeddings_test = loaded_model_pyfunc.predict(inference_test)
# Verify that the output is a list of lists of floats (our expected output format)
print(f"The return structure length is: {len(embeddings_test)}")
for i, embedding in enumerate(embeddings_test):
print(f"The size of embedding {i + 1} is: {len(embeddings_test[i])}")
The return structure length is: 2 The size of embedding 1 is: 384 The size of embedding 2 is: 384
Displaying Samples of Generated Embeddings​
Examine the content of embeddings to verify their quality and understand the model's output.
Inspecting the Embedding Samples​
- Purpose of Sampling: Inspect a sample of the entries in each embedding to understand the vector representations generated by the model.
- Printing Embedding Samples: Print the first 10 entries of each embedding vector using
embedding[:10]
to get a glimpse into the model's output.
Why Sampling is Important​
- Quality Check: Sampling provides a quick way to verify the embeddings' quality and ensures they are meaningful and non-degenerate.
- Understanding Model Output: Seeing parts of the embedding vectors offers an intuitive understanding of the model's output, beneficial for debugging and development.
for i, embedding in enumerate(embeddings_test):
print(f"The sample of the first 10 entries in embedding {i + 1} is: {embedding[:10]}")
The sample of the first 10 entries in embedding 1 is: [ 0.04866192 -0.03687946 0.02408808 0.03534171 -0.12739632 0.00999414 0.07135344 -0.01433522 0.04296691 -0.00654414] The sample of the first 10 entries in embedding 2 is: [-0.03879027 -0.02373698 0.01314073 0.03589077 -0.01641303 -0.0857707 0.08282158 -0.03173266 0.04507608 0.02777079]
Native Model Loading in MLflow for Extended Functionality​
Explore the full range of Sentence Transformer functionalities with MLflow's support for native model loading.
Why Support Native Loading?​
- Access to Native Functionalities: Native loading unlocks all the features of the Sentence Transformer model, essential for advanced NLP tasks.
- Loading the Model Natively: Use
mlflow.sentence_transformers.load_model
to load the model with its full capabilities, enhancing flexibility and efficiency.
Generating Embeddings Using Native Model​
- Model Encoding: Employ the model's native
encode
method to generate embeddings, taking advantage of optimized functionality. - Importance of Native Encoding: Native encoding ensures the utilization of the model's full embedding generation capabilities, suitable for large-scale or complex NLP applications.
# Load the saved model as a native Sentence Transformers model (unlike above, where we loaded as a generic python function)
loaded_model_native = mlflow.sentence_transformers.load_model(logged_model.model_uri)
# Use the native model to generate embeddings by calling encode() (unlike for the generic python function which uses the single entrypoint of `predict`)
native_embeddings = loaded_model_native.encode(inference_test)
for i, embedding in enumerate(native_embeddings):
print(
f"The sample of the native library encoding call for embedding {i + 1} is: {embedding[:10]}"
)
2023/11/30 15:50:24 INFO mlflow.sentence_transformers: 'runs:/eeab3c1b13594fdea13e07585b1c0596/sbert_model' resolved as 'file:///Users/benjamin.wilson/repos/mlflow-fork/mlflow/docs/source/llms/sentence-transformers/tutorials/quickstart/mlruns/469990615226680434/eeab3c1b13594fdea13e07585b1c0596/artifacts/sbert_model'
The sample of the native library encoding call for embedding 1 is: [ 0.04866192 -0.03687946 0.02408808 0.03534171 -0.12739632 0.00999414 0.07135344 -0.01433522 0.04296691 -0.00654414] The sample of the native library encoding call for embedding 2 is: [-0.03879027 -0.02373698 0.01314073 0.03589077 -0.01641303 -0.0857707 0.08282158 -0.03173266 0.04507608 0.02777079]