Engineering4 min read

BentoML: A Comprehensive Guide to Deploying Machine Learning Models

This guide explores BentoML, its benefits, and how it compares to other options. It’s our second deep dive into BentoML because deployment remains a major challenge for most data science teams.

Tega Adeyemi
Tega Adeyemi
BentoML: A Comprehensive Guide to Deploying Machine Learning Models

Deploying machine learning models into production is a critical step in transforming prototypes into scalable applications. Several frameworks facilitate this process, each offering unique features tailored to different deployment needs. This article provides an in-depth guide to using BentoML for model deployment, highlighting its benefits and comparing it to other frameworks.

Overview of Model Deployment Frameworks

Model deployment frameworks streamline the integration of machine learning models into production systems, offering functionalities such as model serving, scaling, and monitoring to ensure efficient and reliable operation in real-world applications.

Common Frameworks:

Introduction to BentoML

BentoML is an open-source platform that focuses on simplifying the deployment of machine learning models. It provides a unified interface to package, serve, and deploy models across various environments, including cloud services and on-premises infrastructures.

Key Benefits:

Getting Started with BentoML

1. Installation and Setup:
pip install bentoml

After installation, verify by running:

bentoml --help
2. First Steps:
import bentoml
from bentoml.io import JSON

# Define a BentoML service
svc = bentoml.Service("iris_classifier")

# Load your trained model
model_ref = bentoml.sklearn.get("iris_model:latest")
model = model_ref.to_runner()

# Add the model runner to the service
svc.add_runner(model)

# Define an API endpoint
@svc.api(input=JSON(), output=JSON())
async def predict(input_data):
    prediction = await model.predict.async_run(input_data)
    return prediction
3. First Run:
bentoml serve service.py:svc --reload

This command starts the service with hot-reloading enabled, allowing for real-time updates during development.

curl -X POST -H "Content-Type: application/json" \
-d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}' \
http://127.0.0.1:3000/predict

Ensure that the service returns the expected prediction response.

Step-by-Step Example: Deploying a Simple Iris Classifier

1. Train and Save the Model:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import bentoml

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train model
clf = RandomForestClassifier()
clf.fit(X, y)

# Save model with BentoML
bentoml.sklearn.save_model("iris_model", clf)

This script trains a RandomForestClassifier and saves it using BentoML's model management system.

2. Define the BentoML Service:
import bentoml
from bentoml.io import JSON

# Load the saved model
model_ref = bentoml.sklearn.get("iris_model:latest")
model_runner = model_ref.to_runner()

# Define a BentoML service
svc = bentoml.Service("iris_classifier", runners=[model_runner])

# Define input and output schemas
input_spec = JSON()
output_spec = JSON()

@svc.api(input=input_spec, output=output_spec)
async def predict(input_data):
    # Extract features from input data
    features = [
        input_data["sepal_length"],
        input_data["sepal_width"],
        input_data["petal_length"],
        input_data["petal_width"],
    ]
    # Run prediction
    prediction = await model_runner.predict.async_run([features])
    # Return the predicted class
    return {"prediction": int(prediction[0])}

This service loads the previously saved model, sets up an API endpoint, and defines how to process incoming requests and return predictions.

3. Build and Containerize the Bento:
service: "service:svc"
include:
  - "*.py"
python:
  packages:
    - scikit-learn
    - pandas

Then, build the Bento:

bentoml build

This command packages your service and its dependencies into a Bento bundle.

bentoml containerize iris_classifier:latest

This command builds a Docker image for your Bento, enabling deployment in any Docker-compatible environment.

4. Deploy the Service:
docker run -it --rm -p 3000:3000 iris_classifier:latest serve

This command starts the service, making it accessible at http://localhost:3000.

Final Thoughts

BentoML offers a streamlined approach to deploying machine learning models, providing flexibility and scalability across various environments. Its integration capabilities with multiple ML frameworks and support for containerization make it a compelling choice for production deployments. This guide shows you how you can efficiently package, serve, and deploy your models, ensuring they are ready for real-world applications.

For more detailed information and advanced configurations, refer to the official BentoML documentation.

Tega AdeyemiFebruary 27, 2025