Mike Danilov, a senior principal engineer at AWS, delivered an insightful presentation on the inner workings of AWS Lambda. This blog post will summarize the key points and strategies discussed in his talk, including key infrastructure and architecture diagrams.

Table of Contents

  1. Introduction
  2. AWS Lambda Overview
  3. Invoke Request Routing
  4. Compute Fabric
  5. Snapshot Distribution
  6. Cold Start Optimization
  7. Conclusion

Introduction

Mike Danilov is a Senior Principal Engineer with AWS Lambda.

AWS Lambda is a serverless compute service that allows users to execute code in response to events without provisioning or managing servers. It supports multiple programming languages and scales automatically with incoming request volume. Lambda is designed with key principles of availability, efficiency, scalability, security, and performance.

AWS Lambda Overview

Lambda enables users to execute code on demand with minimal overhead. It supports synchronous and asynchronous invocation models. The system’s design ensures high availability, rapid scaling, and efficient resource usage.

AWS Lambda Overview

Invoke Request Routing

Invoke Request Routing is a crucial part of Lambda’s architecture. It connects various microservices, ensuring availability and scalability. The process involves:

  • Feature Retrieval: Fetching user features from a key-value store.
  • Candidate Retrieval: Selecting potential ads from billions of content items.
  • Ranking Service: Using heavy-weight models to rank ads based on multiple objectives like clicks, saves, and reposts.

Invoke Request Routing

To minimize overhead and enhance customer experience, a new system, the worker manager, is introduced, to eliminate cold starts. Operating in two modes:

  1. provides a pre-existing sandbox upon frontend request, leading to smooth “warm invokes,”
  2. in the absence of a sandbox, initiates a slower path involving placement to create a new one

But, the worker manager, responsible for tracking sandboxes, posed challenges due to its reliance on in-memory storage, leading to potential data loss in case of host failures. A replacement named the assignment service was introduced a year ago to address this.

The assignment service utilizes partitions, each with a leader and two followers, leveraging a leader-follower architecture to facilitate failovers. Ensure fault-tolerant, coupled with implementing a leader-follower model, improved efficiency and reduced latency..

sync invoke

Compute Fabric

The compute fabric is responsible for executing code in Lambda. It consists of a worker fleet composed of EC2 instances. The capacity manager ensures optimal fleet size adjustments based on demand.

AWS Lambda uses Firecracker, a virtualization technology, to provide robust (including data) isolation and efficient resource utilization, for the challenge of running multiple users’ code on the same worker.

Compute Fabric

Snapshot Distribution

Snapshot distribution is a critical aspect of AWS Lambda, considering the significant size of snapshots at 30GB+, ensuring fast VM resumption and efficient resource utilization. Snapshots are split into smaller chunks (typically 512 kilobytes), allowing for progressive loading and on-demand chunk retrieval.

Cold Start Optimization

cold start stats

To address cold start issues, AWS Lambda employs VM snapshots and on-demand chunk loading. These techniques reduce latency by ensuring minimal overhead during the initialization process. Firecracker’s use further enhances efficiency and security.

  • optimization: read-ahead, multiple pages are read when a single page is accessed
  • but, this method proves inefficient for mapped memory (instead of a regular file)
    • To address this, an analysis of memory access among 100 VMs reveals a consistent access pattern (what pattern? not included, maybe AWS secret and super confidential :P )
    • then apply this found pattern to every snapshot, such that: during snapshot resumptionthe system possesses prior knowledge of the required pages and their sequence

Cold Start Optimization

Conclusion

By leveraging advanced strategies and technologies like Firecracker, AWS Lambda ensures robust performance, security, and scalability. The architecture optimizes resource utilization and minimizes latency, providing an efficient serverless computing environment.

For more detailed insights, you can read the full article on InfoQ: AWS Lambda Under the Hood.