Real-Time Serverless Applications on AWS

WebSocket technology has been around for nearly a decade and has become the standard for providing real-time web interactions without a need for special client software (beyond a browser) or polling operations. We can deliver amazing end-user experiences using this technology, but we also incur additional complexity and most likely cost. The operational overhead of needing to maintain persistent client connections can be considerable and becomes even more complex if our infrastructure needs to scale to handle shifts in user behavior.

AWS API Gateway WebSocket API is a fully managed service that can solve some of these problems for us. In this article, we'll compare WebSocket API to other solutions as we build out a real-time game and explore all the features of WebSocket API.

Real-Time Serverless Applications on AWS
1. Practical Use Cases for WebSockets
  1. When Should You Use Web Sockets?
Project Summary
WebSocket API
Building the Real-Time App
1. Architecture
2. Database Design
WebSocket API Resources
Stages
Routing
Sending Messages
Notifying Clients
1. Scaling Up
Request Models
Authorization
Cost
1. Estimate
2. Throttling
Logging
Other Integrations
Conclusion

Practical Use Cases for Web Sockets

Here are some ideas on how you can apply WebSockets to your applications:

Build a real-time chat messaging system
Build a multiplayer player video game over TCP in the web-browser
Build a real-time interface for a conversational AI chatbot
Push real-time notification to a notification bell
Create a shared interactive virtual environment
Create a collaborative whiteboard

When Should You Use Web Sockets?

When you want real-time interactions within your website or web-application
When you are building a game that does not depend on require twitch-like or frame-per-frame gameplay
When you are restricted to the limitation of the web browser for real-time interactions
When you want to implement a pub/sub integration with 3rd parties

Project Summary

To give an overview of the capabilities of WebSocket API, we are going to build a real-time game using:

ApiGateway V2 WebSockets
- Custom Routing
- Authorizer
- Request Models
- Logging
React
- Static site in S3
- Exposed to the Internet via Cloudfront
Cloud Development Kit (CDK)
- Fullstack TypeScript
- L2 constructs when possible
- Expanded functionality via L1 constructs and Aspects
DynamoDB
- Single Table Design
- K-Sortable Unique IDentifier (KSUID)

WebSocket API

In 2018 AWS released WebSocket API as a new offering for API Gateway. WebSocket API is a 100% serverless solution to managing client connections and communicating over WebSockets. We don't have to worry about provisioning servers to manage client connections using WebSocket API, nor will we incur expenses for unused, overprovisioned servers. This is a great tool to have, but there are still some limitations and implications to discuss.

API Gateway V2

It's worth mentioning that WebSocket API is considered to be part of the API Gateway V2 specification. WebSocket API was joined by HTTP API in 2019 as the more widely-known V2 specification. The original (V1) API Gateway is also called REST API. This is relevant when it comes to CloudFormation templates ("AWS::ApiGatewayV2::Api") and feature completeness. Although it uses a very different protocol from HTTP API, WebSocket API has all the same model, authorization, logging, and traceability constructs as HTTP API. These are a bit different from the V1 spec used in REST API.

Feature Completeness

We want to know whether we're in the V1 or V2 spec because V2 is very much a work in progress and isn't as feature complete as V1. This is most relevant when attempting to choose between REST API or HTTP API, but we should be aware of the limitations for WebSocket API. For example, at the time of this writing, WebSocket API doesn't support integration with AWS X-Ray, it doesn't support resource policies and API Gateway caching is not available. Unfortunately, there isn't a great resource for understanding the differences between the specifications, but the HTTP vs REST API comparison is a good starting place.

WebSocket API vs AppSync Subscriptions

AppSync, AWS's managed GraphQL service, announced support for GraphQL subscriptions in 2019. GraphQL subscriptions use WebSockets to manage persistent connections with clients. This means you've got a choice of which serverless implementation you want to use for managing connections. How to choose?

This article will not do a deep dive into AppSync, but just to scratch the surface, the simplest decision point is whether or not we want to use GraphQL. If we want to use it, then you use AppSync. If not, then we use WebSocket API. That's pretty easy!

But maybe we care more about cost than specific technology choices. In that case, things get a lot more complicated, we need to understand our application and we need to do some math. WebSocket API charges $0.25 per million connection minutes in us-east-1 while AppSync charges just 8 cents for the same. However, WebSocket API charges $1 per million messages, and AppSync charges twice that. Do we want to optimize for clients sitting idle or optimize for chattiness? It depends on the app.

It's also worth mentioning that AppSync doesn't require us to store connection information in a database and may be more suitable for operating at a very high scale. We'll discuss WebSocket API connection management a bit later.

WebSocket API vs Third-Party WebSocket Services

There are several third-party libraries and services that we can use to build real-time applications. How do they compare to WebSocket API?

Pusher

Pusher is a SaaS product that provides real-time functionality in the form of "channels" or "beams", basically chat rooms or broadcast messages. Pusher exposes an API and client libraries, so it's not a drop-in replacement for other WebSocket technologies. Pusher has a generous free tier allowing 100 concurrent connections and 200k messages per day. If you go beyond that, the pricing model is fixed, unlike the pay-per-use model of API Gateway.

Socket.IO

This library is quite literally the reason I got into Node.js. It is a fantastic library, but it is a library. You will still need to deploy and manage your infrastructure. If that isn't a deal-breaker, then Socket.IO is open-source software at its best.

Ably

Similar to Pusher, Ably provides real-time communication as a service. They have an API and SDK, so again you are writing against the provider. Pricing looks almost identical to Pusher.

ActionCable

Another library, ActionCable, is built into the Ruby on Rails framework. If you already have a Rails app, this might be a no-brainer, but it's probably not a compelling enough reason to create one on its own if you don't. You'll still need to manage your infrastructure.

Firebase

Firebase is a serverless backend from Google. Among many other features, it includes real-time updates. If Firebase is a good overall solution for your application, this would be worth checking out. If you have no other reason to be using Firebase, this probably isn't enough to make it interesting.

AWS API Gateway WebSocket API

WebSocket API can be managed with your favorite Infrastructure-as-Code such as AWS CDK, CloudFormation, Terraform, Pulumi, Architect, Serverless, etc. You can integrate with other AWS services like Cognito, Lambda, S3, or DynamoDB. WebSocket API has a low price point for getting started and the cost scales with use. You're able to use compliance and auditing tools like AWS Config and Cloudtrail to manage, remediate and monitor access. WebSocket API can send access and execution logs to CloudWatch and stream to other logging sources.

In short, WebSocket API is part of the AWS ecosystem and comes with all the benefits a managed AWS service has, including support contracts and a vibrant community.

WebSocket API vs "serverful"

In a traditional server model for WebSockets, the server will hold the connection open, requiring a small amount of memory. The Node.js single-threaded event loop architecture is ideal for this, but even so, we will likely wish to have pooled servers for redundancy and eventual scale. Doing this means that for users connected to server A to receive messages sent by users on server B, we need to maintain our list of connected clients in some external database or cache. Redis is often used for this. We also face the problem that a scaling event or removal of an operational node can require end-users to reconnect and miss messages. There are plenty of ways to mitigate this risk, but it requires additional coding or other layers of redundancy which can increase cost or complexity.

As we'll see, working with WebSocket API doesn't get us out of having to track connected clients, but it does mean we don't need to be concerned with scaling events or reliability. When it comes to spiky or unpredictable traffic patterns, the serverless approach can lead to significant savings in both cost and operational overhead.

Another advantage of the serverless approach we'll see is that we don't need to implement any kind of server-side WebSocket protocol. This is handled for us by WebSocket API. If we wish to implement a WebSocket implementation in Node.js to run on a server, we may use a library like Socket.IO, which is a great library, but we have to accept the ongoing maintenance of managing the dependency, considering updates, configuring it, etc. When we use WebSocket API, that is all done for us.

Your Favorite Infrastructure-as-Code Solution

I've written my app using AWS CDK, which is my favorite way to manage infrastructure on AWS. I'll include some CloudFormation examples below as well as some console screenshots. It should be easy enough to bootstrap these examples into another framework like SAM, Serverless, or Architect.

As always, the AWS Console can be useful for learning more about AWS services or exploring the options, but you really should use some kind of Infrastructure-as-Code to manage your application.

The working application is available on GitHub and ready to be deployed into your AWS account. The app is written in TypeScript.

Building the Real-Time App

The standard WebSocket example is a chat application. AWS has a pretty good one that I'm happy to recommend if that's what you're looking for. I wanted to do something a bit more elaborate. I thought it might be fun to build a little drag-and-drop SciFi base building game.

To gain inspiration, I purchased Planet Surface Backgrounds and Space Modular Buildings from itch.io. With those assets, I wrote a game client in React. I used React because it's the modern framework I have the most experience with and I find Hooks to be good for state management in a small app. I wanted to use a simple WebSocket client and I wanted drag-and-drop functionality. I went with React useWebSocket v2 (based on Hooks) and React-Draggable.

I decided I wanted my application to be able to change the background to one of the Planet Surface Backgrounds, to be able to add new building assets (from Space Modular Buildings), to be able to drag them around, and to be able to delete assets. I thought about adding the ability to rotate, resize or layer assets, but given I'm already updating position, this functionality would just involve adding new attributes to the building assets and so all the complexity would be in the CSS and I'm not trying to write an article on CSS. I'm poorly qualified to do so in fact!

Architecture

The fundamental paradigm here is that communication is one-way. A client sends a message such as "set the background to image #4" or "move asset #34 to x745y316". The server processes that and then it updates all connected clients. When my client makes a change, the server will process that and then my game state is updated along with all the other connected clients.

Now that we have an idea of what we want the app to do and what the architecture looks like, we can start thinking about an API. We know we'll want the following capabilities:

Change Background
Add Icon
Move Icon
Delete Icon We'll also need a way to pull the initial game state when a new client connects.
Get Game State And finally, we'll need a way to connect and disconnect clients.
Connect
Disconnect

This diagram demonstrates the AWS resources (aside from IAM) needed to build this application.

Our web assets are stored in S3 and served via CloudFront. Clients will establish WebSocket connections via WebSocket API and invoke Lambda functions to connect, disconnect and access the game functions.

A great advantage of this kind of architecture is that we can have single-purpose functions. This is great for keeping our codebase simple and clean. The functions I've written range from 10-30 lines of code with a couple of shared resources, the biggest of which weighs in at a hefty 44 lines of code (omitting comments). Being able to create sophisticated interactive web applications with so few lines is why we build on the cloud and is a superpower of serverless architectures.

Database Design

We'll look at each of our seven functions, but first, let's consider the database. Those who know me won't be surprised to learn I've chosen DynamoDB. DynamoDB is a serverless NoSQL database. The provisioning and management of this database make it ideal for a sample project like this one. At the same time, a well-designed table can achieve a massive scale.

For this app, we have to track the game state and connected clients. These entities, though unrelated, can be stored in the same DynamoDB table, a pattern widely known as single-table design. The User entity I created is extremely minimal. Since the app doesn't track anything about the user, we only need to store the connection id that will be assigned by WebSocket API. We'll also set an expiration attribute to ensure old items are cleaned up.

For the game state, I thought about storing each icon in a separate table item but instead opted for a simpler design where the entire game state is stored as a single item. This item has an attribute that corresponds to the background we've chosen for the game and a map type that contains all the icons and their coordinates. Icons are identified by a KSUID - not that we necessarily need the IDs to be sortable, but it's something I usually have at hand when working with DynamoDB. Now the game state payload looks something like this:

{
  "bg": 4,
  "icons": {
    "icon1tl9wKMnwJIkc7wSTqPVFCy4zQe": {
      "img": 0,
      "x": 474,
      "y": 450
    },
    "icon1tl9xC1KWfkfXnuANJLFK1MYBoM": {
      "img": 1,
      "x": 277,
      "y": 448
    }
  }
}

To save space, I'm only storing the index of the images chosen. We could save more space by using shorter unique identifiers or by using a list type and identifying icons only by position, but that could cause race conditions with multiple clients sending updates. This design is good enough. Never let perfect be the enemy of the good.

WebSocket API Resources

Creating a WebSocket API is a lot like creating HTTP or REST API resources. We provide some basic configuration options, can add routing, models, and authorization. We can create a WebSocket API using AWS CDK using the WebSocketApi construct.

const webSocketApi = new WebSocketApi(scope, 'WebSocketApi', {});

When we synth this to CloudFormation and clean up the names a little, we can see some of the defaults that were set.

  WebSocketApi:
    Type: AWS::ApiGatewayV2::Api
    Properties:
      Name: WebSocketApi
      ProtocolType: WEBSOCKET
      RouteSelectionExpression: $request.body.action

Stages

The intent of API Gateway stages appears to be that we should map /dev, /test, /prod, and so on all to the same API Gateway resource. This has never really made sense to me since even though we can select Lambda versions using stage variables, it would still mean that my test stage has an IAM role that lets it execute production compute. Instead deploy your dev, test, and prod into separate accounts. I don't know of anybody who uses multiple stages per API Gateway.

Nevertheless, we need at least one stage. It can be defined very easily in CDK.

  const stage = new WebSocketStage(scope, 'DevStage', {
    autoDeploy: true,
    stageName: 'dev',
    webSocketApi: webSocketApi,
  });

Or CloudFormation.

  DevStage:
    Type: AWS::ApiGatewayV2::Stage
    Properties:
      ApiId:
        Ref: WebSocketApi
      StageName: dev
      AutoDeploy: true

AutoDeploy is another odd one here. Presumably, if we set that to false (which is the default in AWS CDK), then we'll need to trigger a deployment manually from the AWS console instead of expecting it to happen as part of our stack deployment. I'm not sure why we'd want that as a default. To make it even stranger, CloudFormation defaults this value to true.

Routing

If we're using AWS CDK, our RouteSelectionExpression will default to $request.body.action. This will perform routing based on the action attribute of the JSON payload for each message we send to WebSocket API. Other IaC solutions may require this attribute to be set explicitly to perform custom routing.

We can also configure our connect and disconnect routes as well as a default route when creating the WebSocket API. In this application, I chose not to use a default route and instead focus on single-purpose functions. If I'd wanted to, I could've created a "lambdalith" that handles routing internally with all messages coming via the default route. Note that the choice to use custom routing means that we must use JSON as the payload for our WebSocket messages. If we wanted to send plain text or something else, we could still do that via WebSocket API, but we'd be limited to the default route for routing.

  const webSocketApi = new WebSocketApi(scope, 'WebSocketApi', {
    connectRouteOptions: { integration: new LambdaWebSocketIntegration({ handler: fns.onConnect }) },
    disconnectRouteOptions: { integration: new LambdaWebSocketIntegration({ handler: fns.onDisconnect }) },
  });

This CDK code binds our onConnect and onDisconnect functions to the $connect and $disconnect routes of WebSocket API. A WebSocket client will use these routes automatically.

Here's the generated CloudFormation (with simplified names) for the onConnect route.

  ConnectRouteIntegration:
    Type: AWS::ApiGatewayV2::Integration
    Properties:
      ApiId:
        Ref: WebSocketApi
      IntegrationType: AWS_PROXY
      IntegrationUri:
        Fn::Join:
          - ""
          - - "arn:"
            - Ref: AWS::Partition
            - :apigateway:us-east-1:lambda:path/2015-03-31/functions/
            - Fn::GetAtt:
                - onConnectFunction
                - Arn
            - /invocations
  ConnectRoute:
    Type: AWS::ApiGatewayV2::Route
    Properties:
      ApiId:
        Ref: WebSocketApi
      RouteKey: $connect
      Target:
        Fn::Join:
          - ""
          - - integrations/
            - Ref: ConnectRouteIntegration

The YAML starts to get a bit verbose, but those experienced with other API Gateway implementations will recognize how functions are bound to routes. It works just the same in WebSocket API.

We can use AWS CDK to add custom routes.

  webSocketApi.addRoute('addIcon', {
    integration: new LambdaWebSocketIntegration({ handler: fns.addIcon }),
  });

This produces similar CloudFormation (again simplified names).

  AddIconRouteIntegration:
    Type: AWS::ApiGatewayV2::Integration
    Properties:
      ApiId:
        Ref: WebSocketApi
      IntegrationType: AWS_PROXY
      IntegrationUri:
        Fn::Join:
          - ""
          - - "arn:"
            - Ref: AWS::Partition
            - :apigateway:us-east-1:lambda:path/2015-03-31/functions/
            - Fn::GetAtt:
                - AddIconFunction
                - Arn
            - /invocations
  AddIconRoute:
    Type: AWS::ApiGatewayV2::Route
    Properties:
      ApiId:
        Ref: WebSocketApi
      RouteKey: addIcon
      Target:
        Fn::Join:
          - ""
          - - integrations/
            - Ref: AddIconIntegration

When we send a message, WebSocket API will examine the "action" attribute of the payload. If that matches "addIcon" (the RouteKey), then this route will be selected and AddIconFunction will be invoked with the payload.

Sending Messages

WebSocket messages can be sent as text or binary frames. To use JSON, we'll need the built-in JavaScript JSON object to serialize our objects. WebSocket API will be able to deserialize the payload and check the "action" property, but we'll also need to deserialize again in our Lambda handler using JSON.parse().

The WebSocket client we've chosen gives us an easy way to serialize JSON by exposing sendJsonMessage, which wraps its own sendMessage function with a JSON.stringify call. We can simply prepare a payload and call the function.

sendJsonMessage({ action: 'addIcon', icon: { img: 3, x: 400, y: 400 } });

This instructs the backend to add icon #3 to the x400 y400 coordinates of the screen. Because communication is one-way, we don't need to await a response or resolve a promise. If WebSocket API responds with an error, the error should be handled by the onError handler of our client library.

I like type safety, so I added some types and a simple wrapper with a TypeScript generic and here is the resulting invocation.

sendMessage<AddIconModel>({ action: MessageAction.ADD_ICON, icon: { img, x: 400, y: 400 } });

Now that we've notified the backend that a change has occurred, we need to notify connected clients and send out the new game state.

Notifying Clients

Sending a JSON payload to a backend is pretty standard behavior for a web app. If that's all we were trying to do here, HTTP API would do the job just fine. To unlock the real benefits of WebSockets, we need our backend to update all connected clients. To do that we'll need to use ApiGatewayManagementApi. You've got to love a service that both starts and ends with "Api".

Unfortunately, ApiGatewayManagementApi (let's call it AGMA) doesn't provide any mechanism for broadcasting to connected clients. We can use it to fetch a connection by client id, delete a connection, or post a message to a connection. AGMA has a few other utility methods, but nothing that will help us. This means that we need to manage a list of connected clients and further means that the logic for "notify all clients" is that we fetch a list of the clients and then loop over the list, notifying them one-by-one.

This has real implications of scale for WebSocket API. If you're thinking about building an application that manages millions of connected clients, WebSocket API probably isn't suitable for the task at this time. Hopefully, it is on the roadmap to add broadcast and channel capabilities to WebSocket API.

I wrote some helper functions for working with AGMA. Whenever we update the game state, we can call notifyClients to broadcast the new game state. This function will pull all the client connections from the database, iterate over the connectionIds and post the message to each of them. If it turns out that one of the clients has disconnected, AGMA will respond with an HTTP 410 (Gone) code and we can clean the stale connection from our database.

Scaling Up

So what's the realistic scale for something like this? I tried some light benchmarking using a tool called thor (an older tool that needs a workaround to install - would be glad for suggestions on something better). Managing a thousand connected clients pushed my Lambda invocation times out to 40-50 seconds. I'm sure that could be optimized, but we start to hit practical limits for the one-at-a-time approach WebSocket API limits us to. With this in mind, I think WebSocket API will still work great for real-time interactions shared by tens of users. Of course, we can introduce our own channel abstractions to limit the number of connected clients we have to manage at one time, but this will require more custom code.

Request Models

The ability to validate payloads against a JSON schema is one of the best and most overlooked features of API Gateway. All versions of API Gateway support this, including WebSocket API. Payload validation can be messy to implement with imperative logic in application code. For this reason, it can be the source of bugs or outright omitted, leading to unexpected application behavior or invalid data. Implementing request validation in API Gateway gives us a good separation of concerns, protects our application code from invalid data, and best of all bears no additional expense. API Gateway is billed per request. The additional compute required to perform validation is free if we let API Gateway handle it, and if the validation fails, our application code is never invoked, thus never billed.

Models are expressed in JSON Schema v4. Unfortunately, AWS CDK does not provide a higher level construct to add models to API Gateway V2 as it does for V1. This means we will need to use L1 Constructs to define models in CDK. This can be done using the CfnModel construct.

  new CfnModel(scope, `AddModel`, {
    apiId: webSocketApi.apiId,
    contentType: 'application/json',
    name: `AddModel`,
    schema: {
      $schema: 'http://json-schema.org/draft-04/schema#',
      properties: {
        action: { enum: [MessageAction.ADD_ICON] },
        icon: {
          properties: {
            id: { type: JsonSchemaType.STRING },
            img: { type: JsonSchemaType.NUMBER },
            x: { type: JsonSchemaType.NUMBER },
            y: { type: JsonSchemaType.NUMBER },
          },
          required: ['img', 'x', 'y'],
          type: JsonSchemaType.OBJECT,
        },
      },
      required: ['action', 'icon'],
      title: `AddSchema`,
      type: 'object',
    },
  });

This synths to CloudFormation.

  AddModel:
    Type: AWS::ApiGatewayV2::Model
    Properties:
      ApiId:
        Ref: WebSocketApi34BCF99B
      Name: AddModel
      Schema:
        $schema: http://json-schema.org/draft-04/schema#
        properties:
          action:
            enum:
              - addIcon
          icon:
            properties:
              id:
                type: string
              img:
                type: number
              x:
                type: number
              "y":
                type: number
            required:
              - img
              - x
              - "y"
            type: object
        required:
          - action
          - icon
        title: AddSchema
        type: object
      ContentType: application/json

In order to attach a model to a route, we need to cast WebSocketRoute to CfnRoute, as models are missing from the L2 WebSocketApi construct.

  const route = webSocketApi.addRoute(MessageAction.ADD_ICON, {
    integration: new LambdaWebSocketIntegration({ handler: fns.addIcon }),
  });
  const rt = route.node.defaultChild as CfnRoute;

  rt.modelSelectionExpression = '$request.body.action';
  rt.requestModels = { [action]: model.name };

  rt.addDependsOn(model);

The same in CloudFormation.

  WebSocketApiaddIconRoute:
    Type: AWS::ApiGatewayV2::Route
    Properties:
      ApiId:
        Ref: WebSocketApi
      RouteKey: addIcon
      ModelSelectionExpression: $request.body.action
      RequestModels:
        addIcon: AddModel
      Target:
        Fn::Join:
          - ""
          - - integrations/
            - Ref: WebSocketApiaddIconRouteWebSocketIntegration
    DependsOn:
      - AddModel

To avoid repetitive imperative code, I wrote a few helper functions to make it easier to generate routes and attach models.

const getModel = (
  scope: Stack,
  modelName: string,
  properties: {
    [name: string]: JsonSchema;
  },
  required: string[],
  webSocketApi: WebSocketApi,
) => {
  return new CfnModel(scope, `${modelName}Model`, {
    apiId: webSocketApi.apiId,
    contentType: 'application/json',
    name: `${modelName}Model`,
    schema: {
      $schema: 'http://json-schema.org/draft-04/schema#',
      properties,
      required,
      title: `${modelName}Schema`,
      type: 'object',
    },
  });
};

This will produce a model ready to use in the given webSocketApi.

const addRoute = (handler: LambdaFunction, action: MessageAction, model: CfnModel, webSocketApi: WebSocketApi) => {
  const route = webSocketApi.addRoute(action, {
    integration: new LambdaWebSocketIntegration({ handler }),
  });
  const rt = route.node.defaultChild as CfnRoute;

  rt.modelSelectionExpression = '$request.body.action';
  rt.requestModels = { [action]: model.name };

  rt.addDependsOn(model);
};

And here we can pass the model and Lambda integration to extend addRoute capabilities to include the model and selection expression. Including addDependsOn is necessary to create stack dependencies for updates and deletions. This is something an L2 construct will do implicitly but since we are using L1, we need to manage these dependencies or CloudFormation may return the dreaded "UPDATE_ROLLBACK_FAILED" state.

An issue I noticed when working with models is that I wanted to set additionalProperties: false on my model to fail validation if there should be properties in the payload that are not described by the model. When I tried this, my deployment failed with "Invalid model schema specified". I can't find any official documentation on this property but I suspect this is a bug in API Gateway V2 as I'm able to use this property in V1 models and it's valid in the JSON Schema. The only clue I've found is an old stack overflow response that suggests at one time, additionalProperties wasn't supported by API Gateway at all.

A final limitation I've found is that a good design pattern using JsonSchema is to have one model reference another using $ref. This is outlined in API Gateway docs, but under the REST API heading. V2 docs simply point to the V1 docs. I have been able to get $ref to work with REST API but wasn't able to get it to work with WebSocket API. This might be a limitation in the V2 API.

It's not clear why these limitations appear in the V2 service. What's interesting is that when working in the AWS console, I see requests to both the apigateway path as well as apigatewayv2. It seems some features are reused from V1, so maybe they aren't interacting perfectly with the newer V2 APIs.

Authorization

All versions of API Gateway support custom Lambda authorizers, but as with models, WebSocket API is lacking an L2 construct for CDK. Unlike models, WebSocket API is alone in its lack of an L2 Lambda authorizer construct. HTTP API recently got one, so we can hold out hope that WebSocket API will have one.

The official docs do a decent job of going over the anatomy of an authorizer function, but unlike the RESTful API implementations, WebSocket API has no mention of a Cognito integration, but there is a pretty good example app using Cognito on GitHub.

So can we add an authorizer to our application? Yes! But it will be for demonstration purposes only, logging out the request and then passing the user through. The most common way to authorize users is to access a JSON Web Token in the user request and validate it. To do that in our game, we'd need to add a login page and some kind of token issuer (possibly Cognito). This is beyond the scope of this article.

Since there isn't an L2 Authorizer construct that works with WebSocket API, we'll need to use L1 constructs if we want to define our application in CDK.

  const authorizer = new CfnAuthorizer(scope, 'Authorizer', {
    apiId: webSocketApi.apiId,
    authorizerType: 'REQUEST',
    authorizerUri: Stack.of(scope).formatArn({
      account: 'lambda',
      resource: 'path',
      resourceName: `2015-03-31/functions/${authFn.functionArn}/invocations`,
      service: 'apigateway',
    }),
    name: 'WebSocketApiAuthorizer',
  });

  authFn.addPermission('AuthZPermission', {
    principal: new ServicePrincipal('apigateway.amazonaws.com'),
    sourceArn: Stack.of(webSocketApi).formatArn({
      service: 'execute-api',
      resource: (webSocketApi.node.defaultChild as CfnApi).ref,
      resourceName: `authorizers/${authorizer.ref}`,
    }),
  });

One cool thing about using Authorizers with WebSocket API is that, unlike most RESTful APIs, we only have to authorize on the $connect route. This means our Authorizer function will only be invoked once per user session and we only have to attach it to the $connect route. We could attempt to use a cast to find the route, but for this use case, since we don't actually instantiate a WebSocketRoute, it's easier to do this using an Aspect. Aspects are special CDK constructs that can inspect and modify constructs. They can be used for compliance, but in this case, we'll use one to set properties that aren't available on the L2 construct.

class AuthZAspect implements IAspect {
  constructor(private authorizer: CfnAuthorizer) {}
  public visit(node: IConstruct) {
    if (node instanceof CfnRoute && node.routeKey === '$connect') {
      node.authorizerId = this.authorizer.ref;
      node.authorizationType = 'CUSTOM';
    }
  }
}

Aspects.of(webSocketApi).add(new AuthZAspect(authorizer));

Of course this all synths to CloudFormation, or it could just be written that way.

  Authorizer:
    Type: AWS::ApiGatewayV2::Authorizer
    Properties:
      ApiId:
        Ref: WebSocketApi
      AuthorizerType: REQUEST
      Name: WebSocketApiAuthorizer
      AuthorizerUri:
        Fn::Join:
          - ""
          - - "arn:"
            - Ref: AWS::Partition
            - :apigateway:us-east-1:lambda:path/2015-03-31/functions/
            - Fn::GetAtt:
                - authorizerFunction
                - Arn
            - /invocations

  WebSocketApiconnectRoute:
    Type: AWS::ApiGatewayV2::Route
    Properties:
      ApiId:
        Ref: WebSocketApi
      RouteKey: $connect
      AuthorizationType: CUSTOM
      AuthorizerId:
        Ref: Authorizer
      Target:
        Fn::Join:
          - ""
          - - integrations/
            - Ref: WebSocketApiconnectRouteWebSocketIntegration

Cost

To get any kind of accurate cost estimate, we'll need deep knowledge of our application and how it will be used. The AWS free tier is pretty good on serverless projects, so I don't expect to have a bill for the game. For production-grade applications, we need to think about how many clients will connect, how long they'll stay connected, will they mostly listen for messages or will they be chatty, wanting to broadcast messages to other connected clients? The answers to these questions will have a monumental impact on the cost of running your application.

Estimate

When it comes to estimating costs, the two things to think about are connection minutes and messages sent. If we had a thousand clients who stayed connected for eight hours per day, that would be 480 (minutes) 30 (days) 1000 (clients) = 14.4 million minutes per month billed at $0.25 per day (in us-east-1) for a cost of $3.60 per month.

We also need to look at the total number of messages sent. If we sent a message every minute to every one of those clients that would give us 14.4 million messages for $14.40 (in us-east-1). However, if the app were extremely chatting sending a message every second, the cost would shoot up to $864 per month, not including the cost for compute, storage, data transfer, and so on.

Throttling

One way to control costs or at least help prevent cost overruns is to implement throttling in an API. This can be done in any version of API Gateway and is a best practice. API Gateway, like most AWS services, has quotas and rate limits out of the box. We can further protect our wallets by implementing throttling at the account or route level.

If we think our API won't get a lot of traffic, it is a very good idea to implement throttling as it will prevent a runaway client (perhaps a result of a bug causing an endless loop) from generating a large bill. The throttling limit is per second. If we set our limit to 10, then the maximum bill we could generate for requests in a month is $26.94, and that's just if our API was hit nonstop and served up all 26,280,000 requests!

To manage throttling with higher expected throughput, combine it with a billing alarm. Billing alarms do not trigger immediately. Imagine we've deployed an API Gateway and some errant process started running up the bill. If it ran for 24 hours at the default account-level throttling rate of 10,000 requests per second, we'd have a bill of $864, having served 864 million requests in that period! If we neglected to set up a billing alarm, we'd be looking at a serious expense for this API after 30 days. It's a best practice to enable throttling unless we think we're going to need to serve 10,000 requests per second.

Throttling is configured at the API Gateway stage. Unfortunately, it is not implemented in the CDK WebSocketStage construct, so to implement throttling, we need to cast the construct to CfnStage. This will allow us to set defaultRouteSettings.

  const stage = new WebSocketStage(scope, 'DevStage', {
    autoDeploy: true,
    stageName,
    webSocketApi,
  });

  const cfnStage = stage.node.defaultChild as CfnStage;
  cfnStage.defaultRouteSettings = {
    throttlingBurstLimit: 500,
    throttlingRateLimit: 1000,
  };

The CDK code starts to resemble CloudFormation when we use L1 constructs.

  DevStage:
    Type: AWS::ApiGatewayV2::Stage
    Properties:
      ApiId:
        Ref: WebSocketApi
      StageName: dev
      AutoDeploy: true
      DefaultRouteSettings:
        ThrottlingBurstLimit: 500
        ThrottlingRateLimit: 1000

Logging

Lambda functions automatically create CloudWatch logs when they are executed and so if we are working with Lambda integrations, we can get some level of logging with no extra effort. However, this is only going to be as good as our integration. If we have misconfigured something in the WebSocket API and the function never gets called, then we're going to want the WebSocket API to be logging as well to avoid the frustrating situation of our app being broken and having no idea why.

CloudWatch Role

Getting logging right for WebSocket API is a bit of a bear and unfortunately, the official docs aren't much help. The first and most important thing we're going to have to do is create a role to allow API Gateway to send to CloudWatch. If you're an AWS CDK user with some REST API experience, like I am, you may be used to the more mature RestApi construct managing all this for you automatically. The apigatewayv2 constructs don't seem to have the same feature and so we must create the role manually. This could be done in the console, but I wouldn't recommend it. We'll just have to write a little more code.

I haven't found a lot of information about managing this role with CloudFormation, but from what I'm able to tell, we need to create the role, then specify the role ARN to the AWS::ApiGateway::Account resource. Oddly there's no AWS::ApiGatewayV2::Account resource so we need to use AWS::ApiGateway::Account. This means in CDK we'll have to use the CfnAccount L1 construct from @aws-cdk/aws-apigateway. This feels like a big hack and I'd be delighted to know of another way to manage this (outside of the console - a bigger hack).

  const cwRole = new Role(scope, 'CWRole', {
    assumedBy: new ServicePrincipal('apigateway.amazonaws.com'),
    managedPolicies: [ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonAPIGatewayPushToCloudWatchLogs')],
  });

  new CfnAccount(scope, 'Account', {
    cloudWatchRoleArn: cwRole.roleArn,
  });

Access Logs

Now that we have a role, we can enable logging. API Gateway can provide access logs and execution logs. We can format the access log and choose from a list of parameters. The access log will produce a single line in the log for each request made to our WebSocket API. This kind of logging is similar to Apache or Nginx access logging.

To enable access logging in CDK, we again will need to cast WebSocketStage to CfnStage. We'll also need to create a log group to target.

  const stage = new WebSocketStage(scope, 'DevStage', {
    autoDeploy: true,
    stageName,
    webSocketApi,
  });
  // Manage the log group
  new LogGroup(scope, 'ExecutionLogs', {
    logGroupName: `/aws/apigateway/${webSocketApi.apiId}/${stageName}`,
    removalPolicy: RemovalPolicy.DESTROY,
    retention: RetentionDays.ONE_WEEK,
  });
  const log = new LogGroup(scope, 'AccessLogs', {
    removalPolicy: RemovalPolicy.DESTROY,
    retention: RetentionDays.ONE_WEEK,
  });
  const cfnStage = stage.node.defaultChild as CfnStage;
  cfnStage.accessLogSettings = {
    destinationArn: log.logGroupArn,
    format: `$context.identity.sourceIp - - [$context.requestTime] "$context.httpMethod $context.routeKey $context.protocol" $context.status $context.responseLength $context.requestId`,
  };
  cfnStage.defaultRouteSettings = {
    throttlingBurstLimit: 500,
    throttlingRateLimit: 1000,
  };

Execution Logs

API Gateway execution logs are far more verbose than access logs and contain detailed information on the internals of API Gateway. If we encounter a stubborn validation bug, we'll want to enable execution logs to learn why API Gateway is rejecting our request. However, these logs are so verbose that we could run up a high CloudWatch bill if we leave them enabled all the time. The best practice is usually to disable execution logs and only re-enable them when we think we need them.

Execution logs are configured in the same place as throttling, the defaultRouteSettings. We can assign values to dataTraceEnabled, detailedMetricsEnabled, and loggingLevel. The last is the most important. The values allowed are 'ERROR', 'INFO', and 'OFF'. The other two properties are a little harder to understand. When I set dataTraceEnabled to true, I see "Log full message data" checked in the AWS console.

This isn't what I'd expect a property called "dataTraceEnabled" to do. I would expect that to add trace IDs to our requests or something similar. If I set the property to false, I see this.

And now the logs are considerably less verbose. The docs aren't helping me much here, but apparently "dataTraceEnabled" is what we want if we want to see full request bodies.

The other property, "detailedMetricsEnabled", suggests we might see additional CloudWatch metrics if we enable it, however, I don't see anything in the console that corresponds to this property, and setting it didn't have any apparent effect, nor is there any information in the documentation on this property.

After combining my logging properties with the throttling properties, I now have my defaultRouteSettings ready to go.

  const stage = new WebSocketStage(scope, 'DevStage', {
    autoDeploy: true,
    stageName,
    webSocketApi,
  });
  // Manage the log group
  new LogGroup(scope, 'ExecutionLogs', {
    logGroupName: `/aws/apigateway/${webSocketApi.apiId}/${stageName}`,
    removalPolicy: RemovalPolicy.DESTROY,
    retention: RetentionDays.ONE_WEEK,
  });
  const log = new LogGroup(scope, 'AccessLogs', {
    removalPolicy: RemovalPolicy.DESTROY,
    retention: RetentionDays.ONE_WEEK,
  });
  const cfnStage = stage.node.defaultChild as CfnStage;
  cfnStage.accessLogSettings = {
    destinationArn: log.logGroupArn,
    format: `$context.identity.sourceIp - - [$context.requestTime] "$context.httpMethod $context.routeKey $context.protocol" $context.status $context.responseLength $context.requestId`,
  };
  cfnStage.defaultRouteSettings = {
    dataTraceEnabled: true,
    loggingLevel: 'INFO',
    throttlingBurstLimit: 500,
    throttlingRateLimit: 1000,
  };

CloudWatch Metrics

API Gateway will automatically send some metrics to CloudWatch. We can use them to create CloudWatch dashboards or feed them into any of the popular 3rd party monitoring tools. CDK users might benefit from an open-source tool like CDK Watchful.

Other Integrations

Our base-building game uses Lambda integrations, but it bears mentioning that like REST and HTTP API, WebSocket API has several different integration patterns. This is a great feature of API Gateway that lets us integrate directly with AWS services like DynamoDB or S3, make REST calls to arbitrary endpoints, connect to VPC Links, or event return mock responses without having to supply our own compute layer.

These kinds of integrations usually involve writing mapping templates which can be challenging at first but can be well worth it as you still get all the other great benefits of API Gateway, like throttling, authorization, request models, and so on.

Conclusion

I spent a lot of time in this article on some of the less well-defined features of WebSocket API, but that shouldn't distract from the core which is a fully managed service that is highly performant, highly available, capable of serving 10,000 requests per second without increasing latency, and pay-for-use. WebSocket API is probably not suitable for broadcasting messages to thousands or millions of consumers - AppSync might fit that use case better - but there are many use cases for smaller real-time applications.

As a CDK user, I've called out the places where CDK support isn't complete, requiring us to drop down to L1 constructs to make full use of API Gateway features. Although I was able to make this work, developers interested in using CDK to develop WebSocket APIs should make sure the feature requests are there. L1 constructs usually take more time to work with as we don't get the great default settings and typing support out of CDK and so this is a drawback, even if we have workarounds.

WebSocket API is great fun to build on. As someone who has built other WebSocket applications, WebSocket API is a great development accelerator and abstracts away a lot of the hard parts about working with WebSockets at scale. However, the need to manage user connections is a drawback and can be a dealbreaker depending on the number of users we need to send messages to.

First on my wishlist for WebSocket API is that it will gain support for channels and broadcasts, similar to Socket.IO's broadcast and room features. I would also like to see better CDK support and L2 constructs to make working with WebSocket API easier. As for the latter, there are some GitHub issues already addressing CDK support.

COVER: Sköll and Hati chase Sól and Mani. PUBLIC DOMAIN

Building Real-Time Serverless Web Applications with AWS API Gateway WebSocket APIs