Adaptable Scaling using Api Gateway Direct integration (Part1)

The Hyper-scaling is for a while discussing in Serverless community and some elegant workarounds are presented by the community members and AWS staff .

To discover more about the optimizations have a look at :

Theses are all the optimization that you can bring to your design and implementation to improve and go toward hyperscale

This is part 1 of a series of 5 articles to explore , design and improve toward hyperscale.

  • Part 1 : Explore Lambda / SQS / SNS / DynamoDb

  • Part 2 : Explore Step Functions / Event Bridge

  • Part 3 : Design Patterns

  • Part 4 : Choosing Write Pattern Per Use-Case

  • Part 5: Achieve HyperScale

In this article we will discover more about how reliable are every type of design when using Api Gateway.


Scale in casual terms is the possibility to handle a high level of demands in a duration of time.

Throughput: the number of requests in a duration of time ( basically per Second — RPS ) , this varies based on services that are interacting in your design at different Architectural Layers.

Latency: The amount of time that a single demand takes to be treated. this is a concern in synchronous service integration when that caller waits for the response. this is mostly in Milliseconds.

Concurrency: The number of demands that can be treated simultaneously ( parallel treatment at a given time). this is a metric with a Count aggregation unit that relies on throughput and latency, the degree of concurrency can be calculated nearly by

Concurrency Degree = Throughput / Latency (Seconds)

For a service with a throughput of 25 and a latency of 1s the Degree of concurrent demands will be

C = 25 / 1 = 25

For the same service with same throughput of 25 and latency of 200ms that will be

C = 25 / (1/5) = 100

Initially, the concurrency has no time-specific measurement, but here to show better the relation between 3 metrics I tried to present below

At a given time = 1 second

Scaling is an SLO that relies on different SLI ( indicators — metrics ) and is a measure of those metrics' combination

Api Gateway:

Api Gateway is a fully managed service by AWS that is mostly used as a Front-Door pattern. The AGW let us have the Usage plans and define our usage plans based on software needs like Throttling. but by default, there is a 10k RPS limit for an api gateway which can be configured per your needs.

Api Gateway has a graceful integration with lambda and recently added direct service integration which let send request Directly to underlying services without any line of custom code, this helps remove maintenance and regression costs in long term.

As well that will remove the burden of latency when we come to Custom code and runtime specifications.

In this part, we are going to explore more about some services and how they behave behind AGW.

Let’s Integrate:

Api Gateway: Here we use Rest Api and we use as well AWS CDK v2 to construct our infrastructure.

constructor(scope: Construct, id: string, props: ApiStackProps) {
        super(scope, id, props);

        const integrationRole = new Role(this, 'integration-role', { assumedBy: new ServicePrincipal('') });

        this.Api = new RestApi(
                restApiName: `${props.stackContext}-rest-api`,
                endpointTypes: [EndpointType.REGIONAL],
                cloudWatchRole: true,
                deployOptions: {
                    stageName: 'live'

        const lambdaProxyIntegration = new LambdaIntegration(props.validateRequet);
        const lambdaApiResource = this.Api.root.addResource('lambda');


Nothing fancy, just a log group, role and API and an example of lambda proxy integration. ( this is simplified just for readability — for the whole code refer to github link below )

Integrations: We cover the following cases to compare any single solution and its reliability at hyper-scale

  • Lambda Functions

  • SQS

  • SNS

  • DynamoDb

Just for lambda function we write a sample handler as bellow
import { APIGatewayProxyEvent } from "aws-lambda";
export const handler = async (event: APIGatewayProxyEvent) => {
    const { letter } = JSON.parse(event.body);

    return {
        "statusCode": 200,
        "headers": {
            "Content-Type": "application/json"
        "body": JSON.stringify({ done: true })

The source code here

Load Testing:

To test the different types of integration we need to load the solution and observe the behavior.

we use the Artillery for load testing and CloudWatch for observability.

To configure Artillery we use yaml config syntax as below (To find more about load phases: here )

  target: "{{$processEnvironment.API_ENDPOINT}}"
    - duration: 30
      arrivalRate: 5
      name: Warm up
    - duration: 60
      arrivalRate: 10
      rampTo: 500
      name: Ramp up load
    - duration: 120
      arrivalRate: 5000
      name: Sustained load
  - flow:
      - post:
          url: "/sqs"
            letter: "A"
      - post:
          url: "/lambda"
            letter: "A"
      - post:
          url: "/sns"
            letter: "A"
      - post:
          url: "/sfn"
            letter: "A"
      - post:
          url: "/eb"
            letter: "A"
      - post:
          url: "/ddb"
            letter: "A"

For simplicity, we push just a Letter A as the payload


We use GitHub actions to deploy the solution, The load test will run at the end of the CD pipeline

    - name: Load Test
      run: |
        cd tests/load && npm i \
        && npm run perf

To deploy locally simply run the following command

cd aws/cdk/app && npm i && npm run deply


Lambda Proxy Integration: The lambda behavior was auto-managed but with a bit of inconvenient behavior in detail

the concurrent executions raised to near 1000 and in the second load test near 850.

The particular inconveniency was the throttling

Running 10 times the Load Test got to a bill as bellow

SQS Direct Integration: Based on aws documentation for SQS Messages Quota, the number of messages which can send into SQS is nearly unlimited. so we don’t face quota problems like throttling, etc… and based on load tests that seem exact for around 600K messages in 10 minutes.

The Messages sent to SQS were always with success

Resulting to have 800K as the total of visible messages

The billing was also interesting with SQS for the same rate of tests as lambda proxy integration we achieved less than 50 cents

SNS Direct Integration: the SNS as well retained the load and all messages was published into the topic

Pricing is kept in the free tier but based on Pricing documentation we will be around SQS as well.

  • First 1 million Amazon SNS requests per month are free, $0.50 per 1 million requests thereafter

Dynamodb Direct integration: For DynamoDb load test on the table with a new Partition per request we raised the WCU of around 40K and the allowed

The Test for Single Table WCU was as bellow

System Errors was 0 and no error was faced

Api Gateway Observations:
These details were not enough and looking at Api gateway integration latency and latency to prove these metrics we find the latency varies based on service.

Dynamodb had the best integration latency because of its NS / MS nature

SQS had the next optimized latency of less than 100 ms

Lambda with a small gap was higher than SQS ( knowing that we have implemented no business rules or interaction with dependencies like SQS/DDB/ SNS and etc….

SNS surprisingly had the highest latency between services.

The SNS as well got randomly the 5XX errors based on Api Gateway metrics ( we did not discover more about details as the number of errors was a weak number )


Improving the level of scale in a serverless design is not the hardest part of the game but it must be considered a crucial part of design and design decisions.

When designing for scale the base elements i think about are

  • Releasing resources asap

  • Keep latency at a minimum level

  • Treating Heavy processes asynchronously

  • Taking care of throttling

Dynamod: bAs we explored the Dynamodb has the best integration latency which means, the resources are released as fast as possible but per account, there is a limit of WCU/RCU of 40K per table.

Discover more about Dynamodb Limits in Alex DeBrie post.

SQS: The second choice was SQS where the integration latency, unlimited quota as well billing makes SQS a good choice to consider.

SNS: The SNS was near to SQS in terms of Billing but in terms of integration latency was not so interesting compared to other services also the Limitation of 9K ( Ireland ) per second to publish messages makes things harder ( this is a soft limit ).

Lambda: The lambda scaled well, but had an elevated bill comparing other services, as well as the Max 1000 concurrent execution limit ( 3000 burst in Ireland — 500 per min ) — this limit is adjustable.

Discover more on Aws blogs.

Now for an asynchronous design, the first service i choose is SQS , it seems the most trustable service, but this is not all we need, we will discover more in Part 3 ( Design patterns ) and Part 4 ( choosing the rigth pattern ).