Use These Two Approaches To Deploy ML Models on AWS Lambda
Step-by-step guide to leveraging AWS Lambda layers for machine learning applications.
Dec 18th, 2024 11:04am by
Photo by Jason Goodman on Unsplash
Why AWS Lambda for ML Deployment?
AWS Lambda provides a compelling solution that offers an accurate pay-as-you-go service model. Key advantages include:- Cost-efficient: For organizations processing between 1,000 and 10,000 predictions daily, this approach can reduce infrastructure costs by up to 60% compared to maintaining dedicated prediction servers. Organizations can optimize resource utilization and significantly reduce infrastructure overhead by eliminating the need for pre-provisioned server capacity.
- Scalability: automatically scaling computational resources based on incoming prediction requests without requiring manual intervention.
Approach#1 Deploying Models Stored on S3
For this approach, deploying the ML model as a pickle file in an S3 bucket and using it through a Lambda API makes the process simple, scalable, and cost-effective. First, the model is trained and saved as a pickle file uploaded to an S3 bucket for secure and centralized storage. AWS Lambda is set up to load this model from S3 when needed, enabling quick predictions without requiring a dedicated server. When someone calls the API connected to the Lambda function, the model is fetched, runs, and predictions are returned based on the input data. This serverless setup ensures high availability, scales automatically, and saves costs since you only pay when the API is used. To make the ML model work, you can create a Lambda layer with required libraries like sci-kit-learn and pandas, making it easy to load and use the model.
Technical Implementation
Step 1 – Create Zipped Layer
A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies. I demonstrate this approach using sklearn and Pandas Library, generally used in ML models. You will start by creating a Lambda layer (of sklearn and pandas library) using Docker. Create a file, name it “createLayer.sh”, and copy the code below. Code To Create a Zipped Layer.
if [ "$1" != "" ] || [$# -gt 1]; then
echo "Creating layer compatible with python version $1"
docker run -v "$PWD":/var/task "lambci/lambda:build-python$1" /bin/sh -c "pip install -r requirements.txt -t python/lib/python$1/site-packages/; exit"
zip -r sklearn_pandas_layer.zip python > /dev/null
rm -r python
echo "Done creating layer!"
ls -lah sklearn_pandas_layer.zip
else
echo "Enter python version as argument - ./createlayer.sh 3.6"
fi
pandas==0.23.4
scikit-learn==0.20.3
./createlayer.sh 3.6
Step 2 Place ML Model and Lambda Layer in S3
Copy the pkl file for your ML model and the generated layer in step 1 to a new function in the s3 bucket. After you have copied the files, the s3 bucket in AWS should show the function and its contents, as presented in the below image.
Step 3 Configure Lambda Function and Lambda Layer
Here, we are ready with our model and lambda layer. Let us start configuring Lambda by creating a new Lambda function. Then, we will add the Lambda layer from step#one to the function from the S3 bucket. To add a layer to the Lambda function, we first create a layer by clicking on Layers → Create Layer. See the image below for the button’s placement on the AWS screen.
Define the name, description, s3 URL, and other new layer properties as shown below and click Save.
Once the new lambda layer is created, you should receive the below success message on the screen stating the name of the layer you created.
Some key points about the lambda layer:
- Lambda layers need to be zipped files.
- You can have five lambda layers for a given lambda function.
- The lambda layers cannot be bigger than 250MB (unzipped).
Choose the “custom layers” option. In the custom layer’s dropdown, select the name of the newly created lambda layer in the previous step and its specific version in the next dropdown. Click on “Add” to add it to the Lambda function.
Also, write the below code in the lambda handler function:
import json
import pickle
import sklearn
import boto3
import pathlib
import jsons3 = boto3.resource('s3')
filename = 'ml_model.pkl'
file = pathlib.Path('/tmp/'+filename)
if file.exists ():
print ("File exist")
else :
s3.Bucket('deployingmlmodel').download_file(filename, '/tmp/ml_model.pkl')
def lambda_handler(event, context): model = pickle.load(open('/tmp/'+filename, 'rb'))
y print("provide input here")
#pred = model.predict(""provide input here"")
Hurray!!!
You have successfully added the required dependencies (sklearn and pandas) and deployed your ML model on AWS Lambda. It is now ready to be tested and to view the ML model predictions.
Approach#2 Packaging Models as Part of the Lambda Deployment
This approach includes zipping the ML model with the Lambda function to upload the model directly to AWS Lambda. Below is a diagram of the architecture for this approach.
Start by adding lambda handler code in Predict.py and then zipping it with the pkl file to create a zipped file for your model. (See the image below for the contents of the zipped file Archived.zip.)
Now, upload this zipped file to Lambda’s upload a .zip file option highlighted in the below snapshot.
If your zip file is less than 10MB, you can directly upload it here. Otherwise, keep it in S3 and refer to the zip file from there. This information is displayed in small fonts in the picture below.
Click Save. You can view your Lambda function once your file has been uploaded successfully. The screenshot below shows the pkl and py files in the Archived folder.
Woo Hoo!!!!
You have successfully deployed the ML model in zip format along with the LambdaHandler code.
Real-World Applications
Serverless ML deployment is particularly suited for the following:- Low-Volume, On-Demand Use Cases: For example, customer support chatbots or image recognition APIs.
- Edge Computing: Lightweight inference at the edge, reducing reliance on central data centers.
Limitations
AWS Lambda’s primary deployment limitation is the 250MB package size restriction, which can challenge complex ML model deployments.
To address this constraint, developers can employ strategic mitigation techniques such as model compression, selective feature engineering, and leveraging Lambda layers for efficient dependency management. Modularizing model components and implementing hybrid architectures that blend serverless and traditional infrastructure can help overcome size limitations without sacrificing model performance.
Another challenge with Lambda is the cold start scenario, where initial function invocations can experience latency spikes of 10-12 seconds. This contrasts sharply with the near-instantaneous responses of dedicated server environments. Lambda must download the container image into its runtime environment on the first invocation, leading to additional response time. This latency is particularly noticeable in scenarios requiring low-latency responses.
To mitigate this, a CloudWatch-triggered Lambda event can be configured to invoke the function, keeping it “warm periodically.” This ensures the Lambda runtime environment is pre-warmed and ready for execution, reducing delays. Additionally, this configuration can be optimized to run only during specific time windows, such as business hours, to balance performance and cost. This approach ensures the ML model remains available without introducing unnecessary runtime costs.
Conclusion: Cost-Efficiency Meets Scalability
In conclusion, deploying an ML model using AWS Lambda provides a scalable, cost-effective solution that eliminates the need for expensive licensing and deployment tools. The two approaches discussed — reading the ML model from an S3 bucket and zipping the model with Lambda code — demonstrate flexibility in addressing different deployment scenarios. While the Lambda architecture is efficient, addressing the cold start latency with techniques like warming up the Lambda function ensures consistent performance, even for the first API call. By combining cost efficiency and performance optimization, this deployment method is a practical choice for organizations aiming to maximize value and reduce expenses.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.