Use These Two Approaches To Deploy ML Models on AWS Lambda

Step-by-step guide to leveraging AWS Lambda layers for machine learning applications.

Dec 18th, 2024 11:04am by Gaurav Mittal

Featued image for: Use These Two Approaches To Deploy ML Models on AWS Lambda

Organizations continuously seek cost-effective solutions to eliminate expensive third-party tools in the rapidly evolving landscape of machine learning deployment. Recently, I developed an ML model to eliminate the need for licensed AI platform tools to create ML models that my organization was using to generate predictions for new inputs. The primary goal was to bring the ML model in-house to reduce operational costs, but the deployment process presented significant challenges due to expensive infrastructure requirements. Serverless computing, with platforms like AWS Lambda, now offers a compelling solution for lightweight and on-demand ML inference. This approach is particularly influential in the rise of edge computing and the democratization of machine learning (ML) to reduce the excessive costs traditionally associated with ML deployment. In this article, I’ll explain two ways to deploy an ML model on AWS Lambda. AWS Lambda is preferred because it is inexpensive, automatically scalable, and only charges for individual requests.

Why AWS Lambda for ML Deployment?

AWS Lambda provides a compelling solution that offers an accurate pay-as-you-go service model. Key advantages include:

Cost-efficient: For organizations processing between 1,000 and 10,000 predictions daily, this approach can reduce infrastructure costs by up to 60% compared to maintaining dedicated prediction servers. Organizations can optimize resource utilization and significantly reduce infrastructure overhead by eliminating the need for pre-provisioned server capacity.
Scalability: automatically scaling computational resources based on incoming prediction requests without requiring manual intervention.

While AWS Lambda excels in many scenarios, evaluating its limitations, including cold starts and resource constraints, it is crucial to determine whether it meets your specific ML deployment needs.

Approach#1 Deploying Models Stored on S3

For this approach, deploying the ML model as a pickle file in an S3 bucket and using it through a Lambda API makes the process simple, scalable, and cost-effective. First, the model is trained and saved as a pickle file uploaded to an S3 bucket for secure and centralized storage. AWS Lambda is set up to load this model from S3 when needed, enabling quick predictions without requiring a dedicated server. When someone calls the API connected to the Lambda function, the model is fetched, runs, and predictions are returned based on the input data. This serverless setup ensures high availability, scales automatically, and saves costs since you only pay when the API is used. To make the ML model work, you can create a Lambda layer with required libraries like sci-kit-learn and pandas, making it easy to load and use the model.

Technical Implementation

Step 1 – Create Zipped Layer

A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies. I demonstrate this approach using sklearn and Pandas Library, generally used in ML models. You will start by creating a Lambda layer (of sklearn and pandas library) using Docker. Create a file, name it “createLayer.sh”, and copy the code below. Code To Create a Zipped Layer.

if [ "$1" != "" ] || [$# -gt 1]; then
echo "Creating layer compatible with python version $1"
docker run -v "$PWD":/var/task "lambci/lambda:build-python$1" /bin/sh -c "pip install -r requirements.txt -t python/lib/python$1/site-packages/; exit"
zip -r sklearn_pandas_layer.zip python > /dev/null
rm -r python
echo "Done creating layer!"
ls -lah sklearn_pandas_layer.zip
else
echo "Enter python version as argument - ./createlayer.sh 3.6"
fi

Now, in the same directory, create another file, “requirements.txt”, to store the name and version of libraries for which you want to create layer. For this case, you will make a layer for pandas and sklearn libraries with their below mentioned versions. Code

pandas==0.23.4
scikit-learn==0.20.3

Next, go to the terminal where you placed the above two files and run the command below to generate a zipped folder for the lambda layer. Command To Run:

./createlayer.sh 3.6

The generated layer is in zip format and can be uploaded to S3. As shown in the snapshot below, this zip file contains the folders for the respective Python libraries.

Step 2 Place ML Model and Lambda Layer in S3

Copy the pkl file for your ML model and the generated layer in step 1 to a new function in the s3 bucket. After you have copied the files, the s3 bucket in AWS should show the function and its contents, as presented in the below image.

Step 3 Configure Lambda Function and Lambda Layer

Here, we are ready with our model and lambda layer. Let us start configuring Lambda by creating a new Lambda function. Then, we will add the Lambda layer from step#one to the function from the S3 bucket. To add a layer to the Lambda function, we first create a layer by clicking on Layers → Create Layer. See the image below for the button’s placement on the AWS screen.

Define the name, description, s3 URL, and other new layer properties as shown below and click Save.

Once the new lambda layer is created, you should receive the below success message on the screen stating the name of the layer you created.

Some key points about the lambda layer:

Lambda layers need to be zipped files.
You can have five lambda layers for a given lambda function.
The lambda layers cannot be bigger than 250MB (unzipped).

To add this layer to your lambda function, go to the Lambda function you created to deploy your ML model and click on “Layers” to add a layer. See the below image for reference.

Choose the “custom layers” option. In the custom layer’s dropdown, select the name of the newly created lambda layer in the previous step and its specific version in the next dropdown. Click on “Add” to add it to the Lambda function.

Also, write the below code in the lambda handler function:

import json
import pickle
import sklearn
import boto3
import pathlib
import jsons3 = boto3.resource('s3')
filename = 'ml_model.pkl'
file = pathlib.Path('/tmp/'+filename)
if file.exists ():
    print ("File exist")
else :
    s3.Bucket('deployingmlmodel').download_file(filename, '/tmp/ml_model.pkl')
def lambda_handler(event, context):    model = pickle.load(open('/tmp/'+filename, 'rb'))
y    print("provide input here")
    #pred = model.predict(""provide input here"")

Your lambda function should now show one layer and the code mentioned above. Here is a snapshot for reference.

Hurray!!! You have successfully added the required dependencies (sklearn and pandas) and deployed your ML model on AWS Lambda. It is now ready to be tested and to view the ML model predictions.

Approach#2 Packaging Models as Part of the Lambda Deployment

This approach includes zipping the ML model with the Lambda function to upload the model directly to AWS Lambda. Below is a diagram of the architecture for this approach.

Start by adding lambda handler code in Predict.py and then zipping it with the pkl file to create a zipped file for your model. (See the image below for the contents of the zipped file Archived.zip.)

Now, upload this zipped file to Lambda’s upload a .zip file option highlighted in the below snapshot.

If your zip file is less than 10MB, you can directly upload it here. Otherwise, keep it in S3 and refer to the zip file from there. This information is displayed in small fonts in the picture below.

Click Save. You can view your Lambda function once your file has been uploaded successfully. The screenshot below shows the pkl and py files in the Archived folder.

Woo Hoo!!!! You have successfully deployed the ML model in zip format along with the LambdaHandler code.

Real-World Applications

Serverless ML deployment is particularly suited for the following:

Low-Volume, On-Demand Use Cases: For example, customer support chatbots or image recognition APIs.
Edge Computing: Lightweight inference at the edge, reducing reliance on central data centers.

A business organization that leveraged AWS Lambda to run lightweight ML models for real-time customer data analysis and used serverless deployment reduced infrastructure costs by 25-30%, enabling it to reinvest those savings in expanding its service offerings.

Limitations

AWS Lambda’s primary deployment limitation is the 250MB package size restriction, which can challenge complex ML model deployments.

To address this constraint, developers can employ strategic mitigation techniques such as model compression, selective feature engineering, and leveraging Lambda layers for efficient dependency management. Modularizing model components and implementing hybrid architectures that blend serverless and traditional infrastructure can help overcome size limitations without sacrificing model performance. Another challenge with Lambda is the cold start scenario, where initial function invocations can experience latency spikes of 10-12 seconds. This contrasts sharply with the near-instantaneous responses of dedicated server environments. Lambda must download the container image into its runtime environment on the first invocation, leading to additional response time. This latency is particularly noticeable in scenarios requiring low-latency responses. To mitigate this, a CloudWatch-triggered Lambda event can be configured to invoke the function, keeping it “warm periodically.” This ensures the Lambda runtime environment is pre-warmed and ready for execution, reducing delays. Additionally, this configuration can be optimized to run only during specific time windows, such as business hours, to balance performance and cost. This approach ensures the ML model remains available without introducing unnecessary runtime costs.

Conclusion: Cost-Efficiency Meets Scalability

In conclusion, deploying an ML model using AWS Lambda provides a scalable, cost-effective solution that eliminates the need for expensive licensing and deployment tools. The two approaches discussed — reading the ML model from an S3 bucket and zipping the model with Lambda code — demonstrate flexibility in addressing different deployment scenarios. While the Lambda architecture is efficient, addressing the cold start latency with techniques like warming up the Lambda function ensures consistent performance, even for the first API call. By combining cost efficiency and performance optimization, this deployment method is a practical choice for organizations aiming to maximize value and reduce expenses.

Gaurav Mittal is an accomplished author and international speaker, recognized for his published articles, including Implementing Email Attachment Security in Informs ORMS and Time-Cost Effective ML Model Deployment Using AWS Lambda in Informs Analytics magazine. He has spoken at global...