BentoML: A Comprehensive Guide to Deploying Machine Learning Models

This guide explores BentoML, its benefits, and how it compares to other options. It’s our second deep dive into BentoML because deployment remains a major challenge for most data science teams.

Deploying machine learning models into production is a critical step in transforming prototypes into scalable applications. Several frameworks facilitate this process, each offering unique features tailored to different deployment needs. This article provides an in-depth guide to using BentoML for model deployment, highlighting its benefits and comparing it to other frameworks.

Overview of Model Deployment Frameworks

Model deployment frameworks streamline the integration of machine learning models into production systems, offering functionalities such as model serving, scaling, and monitoring to ensure efficient and reliable operation in real-world applications.

Common Frameworks:
  • FastAPI: A modern, high-performance web framework for building APIs with Python 3.6+ based on standard Python type hints.
  • MLflow: An open-source platform designed to manage the ML lifecycle, including experimentation, reproducibility, and deployment.
  • Kubeflow: A platform for deploying, orchestrating, and running scalable and portable ML workloads on Kubernetes.

Introduction to BentoML

BentoML is an open-source platform that focuses on simplifying the deployment of machine learning models. It provides a unified interface to package, serve, and deploy models across various environments, including cloud services and on-premises infrastructures.

Key Benefits:
  • Ease of Use: Offers a simple API for packaging and deploying models.
  • Flexibility: Supports multiple ML frameworks and deployment platforms.
  • Scalability: Facilitates scaling of model serving with minimal configuration.
  • Performance Optimization: Includes features like adaptive batching and asynchronous request handling to enhance performance.

Getting Started with BentoML

1. Installation and Setup:
  • Prerequisites:
    • Python 3.8 or higher.
    • pip package manager.
  • Installation:
pip install bentoml

After installation, verify by running:

bentoml --help
  • This command should display the help information for BentoML, confirming that the installation was successful.
2. First Steps:
  • Initialize a BentoML Service:Create a Python script named service.py:
import bentoml
from bentoml.io import JSON

# Define a BentoML service
svc = bentoml.Service("iris_classifier")

# Load your trained model
model_ref = bentoml.sklearn.get("iris_model:latest")
model = model_ref.to_runner()

# Add the model runner to the service
svc.add_runner(model)

# Define an API endpoint
@svc.api(input=JSON(), output=JSON())
async def predict(input_data):
    prediction = await model.predict.async_run(input_data)
    return prediction
  • This script sets up a BentoML service named iris_classifier, loads a pre-trained scikit-learn model, and defines an asynchronous API endpoint for making predictions.
3. First Run:
  • Start the Service:Launch the BentoML service locally to test its functionality.
bentoml serve service.py:svc --reload

This command starts the service with hot-reloading enabled, allowing for real-time updates during development.

  • Test the API Endpoint:Use curl or a similar tool to send a test request to the running service.
curl -X POST -H "Content-Type: application/json" \
-d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}' \
http://127.0.0.1:3000/predict

Ensure that the service returns the expected prediction response.

Step-by-Step Example: Deploying a Simple Iris Classifier

1. Train and Save the Model:
  • Train the Model:Use scikit-learn to train a simple classifier on the Iris dataset.
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import bentoml

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train model
clf = RandomForestClassifier()
clf.fit(X, y)

# Save model with BentoML
bentoml.sklearn.save_model("iris_model", clf)

This script trains a RandomForestClassifier and saves it using BentoML's model management system.

2. Define the BentoML Service:
  • Create service.py:Define the service that will handle prediction requests.
import bentoml
from bentoml.io import JSON

# Load the saved model
model_ref = bentoml.sklearn.get("iris_model:latest")
model_runner = model_ref.to_runner()

# Define a BentoML service
svc = bentoml.Service("iris_classifier", runners=[model_runner])

# Define input and output schemas
input_spec = JSON()
output_spec = JSON()

@svc.api(input=input_spec, output=output_spec)
async def predict(input_data):
    # Extract features from input data
    features = [
        input_data["sepal_length"],
        input_data["sepal_width"],
        input_data["petal_length"],
        input_data["petal_width"],
    ]
    # Run prediction
    prediction = await model_runner.predict.async_run([features])
    # Return the predicted class
    return {"prediction": int(prediction[0])}

This service loads the previously saved model, sets up an API endpoint, and defines how to process incoming requests and return predictions.

3. Build and Containerize the Bento:
  • Build the Bento:Create a bentofile.yaml configuration file to define your Bento's build options:
service: "service:svc"
include:
  - "*.py"
python:
  packages:
    - scikit-learn
    - pandas

Then, build the Bento:

bentoml build

This command packages your service and its dependencies into a Bento bundle.

  • Containerize the Bento:To containerize the Bento with Docker:
bentoml containerize iris_classifier:latest

This command builds a Docker image for your Bento, enabling deployment in any Docker-compatible environment.

4. Deploy the Service:
  • Local Deployment:Run the Docker container locally:
docker run -it --rm -p 3000:3000 iris_classifier:latest serve

This command starts the service, making it accessible at http://localhost:3000.

  • Cloud Deployment:For cloud deployment, you can push the Docker image to a container registry and deploy it using your preferred cloud platform's orchestration services, such as Kubernetes or AWS ECS.

Final Thoughts

BentoML offers a streamlined approach to deploying machine learning models, providing flexibility and scalability across various environments. Its integration capabilities with multiple ML frameworks and support for containerization make it a compelling choice for production deployments. This guide shows you how you can efficiently package, serve, and deploy your models, ensuring they are ready for real-world applications.

For more detailed information and advanced configurations, refer to the official BentoML documentation.

Cohorte Team,

February 27, 2025