December 21, 2020

Microservices: Distributed Data Patterns

tldr It shouldn’t be synchronous

I’ve written about microservices before, based on the work I did with the MAD-API program. One of the particular limitations and lessons learnt was that all the communication between our services was synchronous. Sync comms leads to runtime coupling (with loose coupling being one of the main reasons to adopt microservices in the first place). Back then, I didn’t understand why and how to move to async comms. Also, MAD-API only relied on API Composition, but back then I did’nt know the different ways to make it efficient (which this course has taught me), and the CQRS alternative.

I’ve been the a fan of Chris Richardsons writing on https://microservices.io/ and https://chrisrichardson.net/ for a while, but recently these two posts really got me excited: Microservices - an architecture that enables DevOps and The microservice architecture: enabling rapid, reliable, frequent and sustainable development.
And when I read this:

You will learn how a successful microservice architecture consist of loosely coupled services with stable APIs that communicate asynchronous. I will cover strategies for effectively testing microservices.

I immediately signed up for Chris’ Virtual bootcamp: Distributed data patterns in a Microservice architecture. Its an on-demand 12 hour course with videos and labs. It heavily references Chris’ Microservices Patterns book - I used the live version.

Breakdown and summary of the course

Labs and code setup

Microservice Essentials

based on Chris’ presentation at Jfokus 2020, which starts off brilliant discussion on the downsides on monoliths, and how the magic triangle of:

Process: DevOps, Lean
Org Structure: small teams
Architecture: microservices

can lead to better outcomes. But microservices comes with additional complexities, causing you to learn new things, and new ways to do old things. Avoid design time coupling (sharing a DB, or large complex APIs). Avoid runtime coupling: sync RESTfull API comms between services, which leads to a reduced availability and a distributed monolith. MSA enables DevOps, and DevOps required automated testing. Doing MSA without automated is self-defeating

Transactions and queries in a microservice architecture

for commands (create new customer), to avoid run-time coupling (sync API between services), using sagas.
For queries (get orders), to avoid design-time coupling (shared DB), use API composition or CQRS

Transactions: Saga:

Check out this 4 part series and associated talk, and https://microservices.io/patterns/data/saga.html
Need to roll back - but its complex, need to write a compensating transaction for each transaction. Sage transactions are are 3 phases: compensate-able, pivot or retry-able.
Sagas are ACD (Atomic, C, Durable, not I (Isolation)
Countermeasures make Sagas’s ACID-like
Sagas can use distributed (choreography) or centralised (orchestration) decision making

Saga communication mechanisms

using sync RESTfull API to comm between services will result in runtime coupling
Rather use async broker-based messaging that guarantees atleast-once-delivery to ensure the sage completes
So each saga step is each services updates its own DB, and send a message to the broker. But to make this atomic (what is the message fails, after updating DB, other likewise). Two options:
Event sourcing: uses event source for persistence. Event store is hybrid of a DB and message broker. Service persists data in event store
Transaction outbox - 2 updates to a DB, wrapped in a DB transaction
Sagas complicate API design? When should the original (create customer/order) API response be sent?
Sync - wait for saga to complete - bad...run time coupling..all services need to be alive
Async - client needs to poll to be notified - preferred approach, improves availability

Choreography-based sagas

event-driven sagas. Decentralised decision making
Benefits: loose runtime coupling
Drawbacks: saga logic is split amongst all participants...difficult to understand
risk of design time coupling

Orchestration-based sagas

centralised orchestrator invokes participants using async request/response
benefits: reduced coupling, customer service does not need to about customer. centralised logic easy to find and understand
Sage state stored in DB, so is queryable
Drawbacks: requires a framework. Risk of smart sagas directing dumb services

Countermeasures

Sagas are ACD.
countermeasures make Sagas ACID-like, because Sages don't have Isolation (multiple independent transactions are the same as sequential transactions)
Semantic Lock: flag on domain object: PENDING
Commutative: 2 commands that can be done in any order: debit and credit
Pessimistic view: change order
Re-read/offline optimistic
By value: choose countermeasure by value of the transaction

Challenges of queries

cant do join()s across multiple services. Two options
API Composition: simple, but not efficient: multiple queries to each service, join()s in memory
CQRS: complex. Use events to update a replica, and query the replica

API Composition

works well, but is too inefficiency with high response times
requires too many roundtrips and in memory joins
runtime coupling, lower availability, high response times
must use resiliency techniques: timeouts, retries, circuit breakers
use a fallback mechanism for each provider: empty, default or cached
call multiple services each
can be done by API GW, or by a client itself
if done serially: total response time = SUM(each service response time)
if done in parallel : total response time = MAX (each service response time)
but cant always be done in parallel, as one response is a query to another, so there is a dependancy graph
use reactive programming: map() and flatmap(), Java CompleteableFutures, Spring Mono and Flux

CQRS

complex, but more powerfull that API Composition
Defines View database, that is a replica, kept up to date by events
Benefits: supports multiple denormalised views, improves scalability and performance
Drawback: complex, cost of replication, eventually consistent
Event Handlers must be idempotent, and cant assume customer must exist when updating credit limit (credit limit event may have arrived before create customer)
Can have multiple View DB: document, SQL, ElasticSearch, etc
eventual consistent: lag between updating/command, and when view is updated

Chris also gave very detailed talks at Saturn 2018 and GOT 2019.

Distributed Data Patterns, microservices and events on AWS

In the course, Chris uses his Eventuate platform to solve the distributed data issue (contrast this to my post on microservices and ESBs). Eventuate Tram used the CDC service to poll/tail the DB (the outbox pattern) and insert those into the message broker. I wanted to see what native options AWS has to solve this in the cloud:

Implementing Microservices on AWS whitepaper talks about using AWS Step Functions to achieve Sagas:

In a distributed system, business transactions can span multiple microservices. Because they cannot leverage a single ACID transaction, you can end up with partial executions. In this case, we would need some control logic to redo the already processed transactions. For this purpose, the distributed Saga patternis commonly used. In the case of a failed business transaction, Saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions. AWS Step Functionsmake it easy to implement a Saga execution coordinator as shown in the next figure.

re:Invent 2019 talk Implement microservice architectures with Amazon DynamoDB and AWS Lambda discusses:

orchestration-based sagas using Step Functions
and choreography-based sagas using DynamoDB Streams and Lambda. On AWS, DynamoDB Streams together with Lambda satisfies the requirements to atomically update the DB and publish the event, with events delivered at least once.

Modern Application Development on AWS whitepaper:

Because the event sourcing pattern involves storing and later replaying event messages, it requires some mechanism for storing and retrieving messages. If you plan to use this pattern in the AWS Cloud, depending on your use case, you can use Amazon Kinesis, Amazon Simple Queue Service (SQS), Amazon MQ, or Amazon MSK (Amazon MSK). In the event sourcing pattern, each event that changes the system is stored first to a message queue, and then updates to the application state are made based on that event. For example, an event can be written as a record in an Amazon Kinesis stream, and then a service built on AWS Lambda can retrieve the record and perform updates in its own data store).

Check out Event driven architectures with Amazon EventBridge and other re:Invent 2019 talks like Building microservices with AWS Lambda (SVS343-R1)

AWS Lambda SnapStart for Python

AWS is the Linux of Cloud

Static Serverless Ghost