January 3, 2025

AWS Lambda SnapStart for Python

This post will cover what is AWS Lambda, how it works, and how cold starts can impact performance. It then covers Lambda Snapstart, how to enable, and how to measure its impact on cold starts using different AWS services.

AWS Lambda, now just over 10 years old, is serverless functions that run your code. You can use Lambda to run code without provisioning or managing servers. When using Lambda, you are responsible only for your code. Lambda manages the compute fleet that offers a balance of memory, CPU, network, and other resources to run your code. A Lambda function is a piece of code that runs in response to events, such as a user clicking a button on a website or a file being uploaded to an Amazon S3 bucket. When a function runs in response to an event, Lambda runs the function’s handler function. Lambda invokes your function in an execution environment, which provides a secure and isolated runtime environment. When a function is first invoked, Lambda creates a new execution environment for the function to run in. This is called a “cold start”. After the function has finished running, Lambda doesn't stop the execution environment right away; if the function is invoked again, Lambda can re-use the existing execution environment. This is called a “warm start”.

To summarise then: a cold start refers to the initial execution of a function when it needs to fully initialize its runtime environment, leading to a noticeable delay, while a warm start occurs when a function is invoked again while its execution environment is already active, resulting in a much faster response time because it doesn't need to go through the initialization process again; essentially, a cold start is like starting a car from completely off, while a warm start is like restarting a car that's already running.

To see this practically, lets start with my Telegram Bot, which is deployed using AWS SAM as IaC, to deploy this serverless application. Prior to enabling SnapStart, this is how a cold start looked like CloudWatch Logs:

Using this Cloudwatch Log Insights snippet, we confirm that the cold start took just over 2 seconds:

SnapStart

There are various ways to reduce or minimise the impact of cold starts. One of those is Lambda SnapStart, which initializes your function when you publish a function version. Lambda takes a Firecracker microVM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and intelligently caches it to optimize retrieval latency.
When you invoke the function version for the first time, and as the invocations scale up, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup latency. Lambda SnapStart is designed to address the latency variability introduced by one-time initialization code, such as loading module dependencies or frameworks. These operations can sometimes take several seconds to complete during the initial invocation. Use SnapStart to reduce this latency from several seconds to as low as sub-second, in optimal scenarios.

Lambda execution environment lifecycle

SnapStart was introduced in Nov 2022 for Java only. Two years later, in Nov 2024, SnapStart for Python (and .NET) was released.

To enable SnapStart, I include this in the SAM template:

AutoPublishAlias: SnapStart
SnapStart:    
    ApplyOn: PublishedVersions

To enable us to measure and monitor the impact, I also enable CloudWatch Lambda Insights, Application Signals and X-Ray Tracing:

Layers:      
    - !Sub "arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension-Arm64:20" #Lambda Insights Layer       
    - !Sub "arn:aws:lambda:${AWS::Region}:615299751070:layer:AWSOpenTelemetryDistroPython:5" #Application Signals Layer 
Tracing: Active
Policies:       
    - CloudWatchLambdaInsightsExecutionRolePolicy

Once deployed, I check if SnapSnart is enabled using this AWS CLI command:
aws lambda get-function-configuration --function-name telegramtasweerbot
and this is the response I am looking for:

"State": "Active",    
    },
    "SnapStart": {
        "ApplyOn": "PublishedVersions",
        "OptimizationStatus": "On"    
   },

When the function is invoked, I now see the SnapStart restore in CloudWatch Logs, which takes 276ms, vs the 2 seconds prior with the cold start. A massive improvement:

CloudWatch Logs
2025-01-02T12:31:44.431Z
{    
"time": "2025-01-02T12:31:44.431Z",    
"type": "platform.restoreStart",    
"record": {        
    "runtimeVersion": "python:3.13.v13",        
    "runtimeVersionArn": "arn:aws:lambda:eu-west-1::runtime:.....    
    }
   }
{
"time":"2025-01-02T12:31:44.431Z",
"type":"platform.restoreStart",
   "record":{ "runtimeVersion":"python:3.13.v13",
   "runtimeVersionArn":"arn:aws:lambda:eu-west-1::runtime:......"}
   }
2025-01-02T12:31:44.708Z
{    
"time": "2025-01-02T12:31:44.708Z",    
"type": "platform.restoreReport",    
"record": 
    {       
    "status": "success",        
    "metrics": 
        {            
        "durationMs": 276.512        
        }    
    }
 }

In X-Ray, we can see the traces and segments like this:

AWS X-Ray


To monitor SnapStart restore metrics in CloudWatch Logs Insights, we use this query over a whole month:

CloudWatch Logs Insights

CloudWatch Lambda Insights provides us with a summary per function:

CloudWatch Lambda Insights