April 28, 2023

Use Events internally and APIs externally

How to integrate your AWS event-based system with external/legacy systems using RESTful APIs

Overview

I recently helped an AWS customer design a new system that had very specific non-functional requirements regarding how it integrates with external or 3rd party systems. The internal part of the system was developed as an event-drivensystem, but for these ‘legacy’ 3rd party systems they needed a more simpler, traditional method of integration. This is story of how we used RESTful APIs to inject events to and from the event-driven system by using the Serverless Event Gateway Pattern on AWS, with some deployable code in github.

Requirements

The customer required a system that receives requests, and then sends those requests to 3rd parties for fulfilment. The 3rd parties process those requests, and then respond back with their fulfilment status.

Now this is not unique - we've all come across these types of systems before, e.g. an e-commerce system, when it receives an order from a user/customer, might send out fulfilment requests to many 3rd parties, who will each need to respond timeously with stock details, etc. Status updates will then need to be sent to the users.

And if we think about the amount of requests and responses flowing through the system: the customer expects there to be thousands of users who will send requests, and there will be hundreds of 3rd parties, so we estimate that the system will process many hundreds of thousands of requests and responses on a regular (perhaps daily) basis. So naturally, we would want to design the integration and communication between the system and the 3rd parties to be highly available and performant.

Now with that background of the system in mind, we can talk in more detail about the strict requirements that my customer had:

We need a standard way to integrate with all 3rd parties

This is quite obvious, and a pretty standard requirement. We want a uniform, standard way to integrate with all 3rd parties, that does not require custom development for different 3rd parties, even though 3rd parties could be using different tech stacks (architectures, languages, versions, etc). The integration should be based on a well-known and potentially open source standard/protocol/system like a RESTful API defined with OAS/Swagger, or an Event Broker/Router like MQ, Kafka or JMS.

It should take no development effort and very little or no operational effort to onboard new 3rd parties.

This makes sense, considering that we will have many hundreds of 3rd parties, that will each need to be registered, integrated with, and then will need some ongoing operational management. Also consider that each 3rd party will have multiple environments (prod, dev, test) each with its own parameters (IP/DNS, credentials, SSL certs, webhooks/callback URLs). We therefore don’t want to have to hire a full time team just to onboard and manage 3rd parties

Retries and failures must be automatically managed

3rd parties are highly distributed all over the internet in different locations (on-prem, hosting, cloud), and due to the nature of a highly distributed system, each 3rd party will have changes in the status and availability over time, i.e. not all 3rd parties will be up and available when we want to send them a request, and may be down or unresponsive when we send them a message. The system needs to ensure that every request reaches the intended 3rd party, but we don’t want to have to build custom retry logic. And we needed to consider some still un-defined variables: how long should the system promise to keep un-delivered requests for, and how long should it retry for? So we need to find a way to manage this without custom code.

Only the intended 3rd parties should receive requests

Not all types of requests should be sent to all 3rd parties. So when we onboard a 3rd party, we define the type of requests that that 3rd party can fulfil, and the system should have a way to filter requests from users, so that only specific 3rd parties receive those requests. These filters should be easy to maintain, and should not require custom code or even a code change to update.

Solution design

These four requirements above were key to shaping the design of the solution. Internally, the system would be composed of many different modules, each with their own functionality, so we decided to have each developed and deployed as individual microservices This meant that not only could we scale them individually based on capacity requirements, but it would increase our development velocity as we could introduce new features and have them deployed independently.

I’ve written before about how Architecture predicts success. This also led us to think about the impact it will have on the team structure (which I’ve written about before as well) - we know from Team Topologies that:

One key approach to achieving the software architecture (and associated benefits like speed of delivery or time to recover from failure) is to apply the reverse Conway maneuver: designing teams to match the desired architecture......In short, by considering the impact of Conway’s law when designing software architectures and/or reorganizing team structures, you will be able to take advantage of the isomorphic force at play, which converges the software architecture and the team design.

We wanted to be able to handle requests asynchronously, in order to increase performance and responsiveness, and to better tolerate failure and slowness from internal modules and 3rd parties. All this lead us to using an event-driven architecture: where the internal microservices are decoupled from each other and from the 3rd parties, using an event broker/router. An additional benefit was that he system had a previously developed monolithic component, that we wanted to re-use without much development - so using the event broker would allow us to integrate the legacy component with the newer microservices.

We then looked to design the communication and interaction with the 3rd parties. Since we were using an event-driven architecture internally, we looked to also extend this to the 3rd parties. Initially, we thought to leverage off a well-known Event/Message Broker (see my rant on microservices and ESB are few years ago) like MQ, Kafka or JMS, with the thought that because they are standard and well-known, it will make it easy for 3rd parties and their developers to be able to integrate with. AWS has a few fully managed services that we were looking for: Amazon MQ that supports Apache ActiveMQ and RabbitMQ, Amazon MSK for Kafka, and Amazon SQS supports JMS, where AWS will manage all the infrastructure and operations, while freeing us to work on the functionality we need.

We took some time to consider this would have, specifically the event/message broker (Kafka/JMS/MQ), on the 3rd parties and their developers, and how easy would they be able to integrate with it. We considered the physical location/region that the customer operates in, and the experience that typical developers in that location would have with Kafka/JMS/MQ, and eventually we came to the conclusion that it would pose a significant learning curve that would be too steep, which would quickly lead to longer integration and development lead times, and even bigger issues with debugging. We decided that imposing an architecture like this would go against some of the key requirements the customer had - it would simply be too complex. In addition, even though this system will be running on AWS, most of the 3rd parties probably wouldn’t be, which means we couldn’t assume that the 3rd parties will have AWS credentials to be able to access SQS or other native AWS services that require AWS credentials.

But we knew that most corporate developers were familiar with HTTP APIs, and would have enough experience and tooling to be able to quickly integrate with a RESTful API. So the challenge was how can we maintain the internal event-driven architecture, and yet expose easy to understand RESTful APIs to 3rd parties. And that lead us to Amazon EventBridge - a serverless event-bus that would allow the internal microservices to send, filter and route event with rules, is fully managed and serverless, and yet supported API destinations - which could be any external 3rd party API. With EventBridge, we could use the Serverless Event Gateway Pattern:

There are many times when developing our serverless solutions when we need to interact with existing legacy systems or 3rd party services within our wider domain; and they are unable to publish and consume events from your central Amazon EventBridge bus directly due to technology or framework limitations...using the ‘Event Gateway Pattern’ alongside Amazon EventBridge and Amazon EventBridge API Destinations to allow this flow between systems both ways i.e. consuming and publishing based on domain events.

Its common, when talking about microservices and events to think about the Bounded Context: the boundary within a domain where a particular domain model applies. So with-in our bounded context of the internal system architecture we using private or internal events for communication, and between the bounded contexts we using REStful APIs as public or external events. We also think about how microservices will interact with each other, using Choreography and Orchestration, and the resulting rule-of-thumb:

use orchestration within the bounded context of a microservice, but use choreography between bounded-contexts.

So in our case, we are suggesting that when you are faced with an event-driven system that needs to interact with external/legacy systems, and for considerations of trust, security, latency and/or experience/expertise, the rule-of-thumb is:

se events internally, and APIs externally
(me, right now)

(you might recognise that last part from Linus Torvalds in 1992 during the Tanenbaum-Torvalds debate)

This would allow us - in theory - to meet all of our key requirements without any custom code on our side: using RESTful APIs allows 3rd parties to easily integrate with the system with not much effort, EventBridge will handle retries natively, and EventBridge filters define which requests goes to which 3rd parties.

Now to put the theory to the test.

PoC

We decided to run a quick Proof of Concept (PoC) on AWS to validate the architecture. We used AWS Application Composer to design, build and deploy the architecture using a drag-and-drop visual interface, and in a very short time we built a working environment.

The architecture consists of the following key components and AWS Services:

An EventBridge custom event bus called marketplace
EventBridge rules that route events to 3rd parties, or from 3rd parties to other microservices
API Gateway to host an API for 3rd parties to call to provide fulfilment request status updates
Lambda functions to run microservices that are invoked via API Gateway, with data stored on DynamoDB
SNS to send email/SMS updates to users when a fulfilment request status has changed

We’ve defined a basic event, which could represent a new request from a specific user, that is being sent to 3rdparty1 for fulfilment:

{
      "Source": "com.marketplace.market",
      "EventBusName": "marketplace",
      "Detail": "{ \"user_id\": \"123456789\", \"request_status\": \"new\", \"3rdparty\": \"3rdparty1\", \"requests1\": \"books\", \"requests2\": \"pens\"}",
      "DetailType": "marketplace requests"
    }

Together with Application Composer to design this, we used AWS SAM to deploy it. SAM provides shorthand syntax to express functions, APIs, databases, and event source mappings. With just a few lines per resource, you can define the application you want and model it using YAML. During deployment, SAM transforms and expands the SAM syntax into AWS CloudFormation syntax, enabling you to build serverless applications faster. Check out this Github repo for the full working code you can deploy using SAM.

Let’s simulate a user creating a new request, by injecting that event, using the command aws events put-events --entries file://event.json (in reality, the User Portal will inject this event when a user creates a new request)
EventBridge routes that event to the ThirdPartyAPI Lambda function microservice, which saves that data to DynamoDB. We can use https://webhook.site/ as a way to simulate a 3rd party API that is hosted somewhere on the internet. We modify one of the rules to route to https://webhook.site/ using API Destinations. We can then use curl to send GET and POST requests to the Marketplace API, which simulates how a 3rd party will inject events back into the system.

Supported Workflows

Now that we had proved that we can get events into our internal architecture using APIs, we documented out the complete workflows that the solution should support.

This architecture will support the following three user flows: How a User registers on marketplace and places new fulfilment requests, how 3rd parties receive those requests and post their status back to the marketplace and users, and how 3rd parties can manage their details in a self-service manner.

Flow A: User Request to 3rd Parties

Flow A: User Request Event to 3rdParties

Users utilise the User Portal to register. During the registration process, documents like ID and banking proof could be uploaded to Amazon S3, which stores the documents redundantly across 3 Availability Zones (AZs) and removes the burden of managing a shared files system to store the documents. S3 can also serve as a way to analyse user documents using a serverless workflow and extract data and verify it against the student profile. E.g. automatically verify that the user ID number that the user inputted during the registration matches with the ID document. Ideally, the user portal is developed using AWS Amplify, which hosts the portal in S3 and removes the burden of managing servers, and provides out of the box CI/CD pipelines.
When the user creates a request, the portal emits an event to Amazon EventBridge. EventBridge is a serverless event bus, that is highly available and scalable. EventBridge will serve as a central point to receive all events: new request, request status changes, new 3rd party. Each Institution will have a rule configured on EventBridge, with their corresponding API details and authentication EventBridge rules with match the application event against the 3rd party.
EventBridge will send an HTTP POST with the request event to the corresponding 3rd party API, and will manage retries until the event is delivered. Each 3rd party will be required to host a standardized RESTful API. For those 3rd parties that don’t/cant host the standardized API, they can poll the Marketplace API, hosted on Amazon API Gateway, which is a serverless, managed, and scalable way to host APIs on AWS. A Lambda function will query the Marketplace DB and respond back with users requests. Each event can have multiple destinations, so in addition to sending to the 3rd party, the event can also be delivered to an S3 bucket that will serve as a data lake to store and analyse marketplace data.
The event from EventBridge can contain a link to the S3 bucket that hosts the users documents, so that 3rd parties can safely retrieve users documents, if required.

Flow B: 3rd Party Request Status shared back to the marketplace

Flow B: 3rd Party Request Status shared back to marketplace

When 3rd parties receive user fulfilment requests, they process them. Once the status of the request changes (in-progress, accept, decline), the status needs to be shared back to marketplace and users.
3rd parties will use a standardised RESTful API on marketplace to POST updates on applications. Amazon API Gateway will be used to host the API.
API Gateway will emit the status as an event to EventBridge. EventBridge rules will match against the event.
One rule will send the event to the User Portal.
Another rule will send the event to SNS, to be sent as an email/SMS to users.

Summary

Your functional and non-functional requirements will determine which communication pattern or API style you choose to use in your architecture. And as I have shown, you can use an event-driven architecture internally between your microservices, and APIs externally with external systems.

Static Serverless Ghost

Flask on AWS Serverless: A learning journey - Part 2

Flask on AWS Serverless: A learning journey - Part 1