tldr It shouldn’t be synchronous
I’ve written about microservices before, based on the work I did with the MAD-API program. One of the particular limitations and lessons learnt was that all the communication between our services was synchronous. Sync comms leads to runtime coupling (with loose coupling being one of the main reasons to adopt microservices in the first place). Back then, I didn’t understand why and how to move to async comms. Also, MAD-API only relied on API Composition, but back then I did’nt know the different ways to make it efficient (which this course has taught me), and the CQRS alternative.
I’ve been the a fan of Chris Richardsons writing on https://microservices.io/ and https://chrisrichardson.net/ for a while, but recently these two posts really got me excited: Microservices - an architecture that enables DevOps and The microservice architecture: enabling rapid, reliable, frequent and sustainable development.
And when I read this:
You will learn how a successful microservice architecture consist of loosely coupled services with stable APIs that communicate asynchronous. I will cover strategies for effectively testing microservices.
I immediately signed up for Chris’ Virtual bootcamp: Distributed data patterns in a Microservice architecture. Its an on-demand 12 hour course with videos and labs. It heavily references Chris’ Microservices Patterns book - I used the live version.
Breakdown and summary of the course
Labs and code setup
Microservice Essentials
- based on Chris’ presentation at Jfokus 2020, which starts off brilliant discussion on the downsides on monoliths, and how the magic triangle of:
- Process: DevOps, Lean
- Org Structure: small teams
- Architecture: microservices
can lead to better outcomes. But microservices comes with additional complexities, causing you to learn new things, and new ways to do old things. Avoid design time coupling (sharing a DB, or large complex APIs). Avoid runtime coupling: sync RESTfull API comms between services, which leads to a reduced availability and a distributed monolith. MSA enables DevOps, and DevOps required automated testing. Doing MSA without automated is self-defeating
Transactions and queries in a microservice architecture
- for commands (create new customer), to avoid run-time coupling (sync API between services), using sagas.
- For queries (get orders), to avoid design-time coupling (shared DB), use API composition or CQRS
Transactions: Saga:
- Check out this 4 part series and associated talk, and https://microservices.io/patterns/data/saga.html
- Need to roll back - but its complex, need to write a compensating transaction for each transaction. Sage transactions are are 3 phases: compensate-able, pivot or retry-able.
- Sagas are ACD (Atomic, C, Durable, not I (Isolation)
- Countermeasures make Sagas’s ACID-like
- Sagas can use distributed (choreography) or centralised (orchestration) decision making
Saga communication mechanisms
- using sync RESTfull API to comm between services will result in runtime coupling
- Rather use async broker-based messaging that guarantees atleast-once-delivery to ensure the sage completes
- So each saga step is each services updates its own DB, and send a message to the broker. But to make this atomic (what is the message fails, after updating DB, other likewise). Two options:
- Event sourcing: uses event source for persistence. Event store is hybrid of a DB and message broker. Service persists data in event store
- Transaction outbox - 2 updates to a DB, wrapped in a DB transaction
- Sagas complicate API design? When should the original (create customer/order) API response be sent?
- Sync - wait for saga to complete - bad...run time coupling..all services need to be alive
- Async - client needs to poll to be notified - preferred approach, improves availability
Choreography-based sagas
- event-driven sagas. Decentralised decision making
- Benefits: loose runtime coupling
- Drawbacks: saga logic is split amongst all participants...difficult to understand
- risk of design time coupling
Orchestration-based sagas
- centralised orchestrator invokes participants using async request/response
- benefits: reduced coupling, customer service does not need to about customer. centralised logic easy to find and understand
- Sage state stored in DB, so is queryable
- Drawbacks: requires a framework. Risk of smart sagas directing dumb services
Countermeasures
- Sagas are ACD.
- countermeasures make Sagas ACID-like, because Sages don't have Isolation (multiple independent transactions are the same as sequential transactions)
- Semantic Lock: flag on domain object: PENDING
- Commutative: 2 commands that can be done in any order: debit and credit
- Pessimistic view: change order
- Re-read/offline optimistic
- By value: choose countermeasure by value of the transaction
Challenges of queries
- cant do join()s across multiple services. Two options
- API Composition: simple, but not efficient: multiple queries to each service, join()s in memory
- CQRS: complex. Use events to update a replica, and query the replica
API Composition
- works well, but is too inefficiency with high response times
- requires too many roundtrips and in memory joins
- runtime coupling, lower availability, high response times
- must use resiliency techniques: timeouts, retries, circuit breakers
- use a fallback mechanism for each provider: empty, default or cached
- call multiple services each
- can be done by API GW, or by a client itself
- if done serially: total response time = SUM(each service response time)
- if done in parallel : total response time = MAX (each service response time)
- but cant always be done in parallel, as one response is a query to another, so there is a dependancy graph
- use reactive programming: map() and flatmap(), Java CompleteableFutures, Spring Mono and Flux
CQRS
- complex, but more powerfull that API Composition
- Defines View database, that is a replica, kept up to date by events
- Benefits: supports multiple denormalised views, improves scalability and performance
- Drawback: complex, cost of replication, eventually consistent
- Event Handlers must be idempotent, and cant assume customer must exist when updating credit limit (credit limit event may have arrived before create customer)
- Can have multiple View DB: document, SQL, ElasticSearch, etc
- eventual consistent: lag between updating/command, and when view is updated
Chris also gave very detailed talks at Saturn 2018 and GOT 2019.
Distributed Data Patterns, microservices and events on AWS
In the course, Chris uses his Eventuate platform to solve the distributed data issue (contrast this to my post on microservices and ESBs). Eventuate Tram used the CDC service to poll/tail the DB (the outbox pattern) and insert those into the message broker. I wanted to see what native options AWS has to solve this in the cloud:
Implementing Microservices on AWS whitepaper talks about using AWS Step Functions to achieve Sagas:
In a distributed system, business transactions can span multiple microservices. Because they cannot leverage a single ACID transaction, you can end up with partial executions. In this case, we would need some control logic to redo the already processed transactions. For this purpose, the distributed Saga patternis commonly used. In the case of a failed business transaction, Saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions. AWS Step Functionsmake it easy to implement a Saga execution coordinator as shown in the next figure.
re:Invent 2019 talk Implement microservice architectures with Amazon DynamoDB and AWS Lambda discusses:
- orchestration-based sagas using Step Functions
- and choreography-based sagas using DynamoDB Streams and Lambda. On AWS, DynamoDB Streams together with Lambda satisfies the requirements to atomically update the DB and publish the event, with events delivered at least once.
Modern Application Development on AWS whitepaper:
- Because the event sourcing pattern involves storing and later replaying event messages, it requires some mechanism for storing and retrieving messages. If you plan to use this pattern in the AWS Cloud, depending on your use case, you can use Amazon Kinesis, Amazon Simple Queue Service (SQS), Amazon MQ, or Amazon MSK (Amazon MSK). In the event sourcing pattern, each event that changes the system is stored first to a message queue, and then updates to the application state are made based on that event. For example, an event can be written as a record in an Amazon Kinesis stream, and then a service built on AWS Lambda can retrieve the record and perform updates in its own data store).
Check out Event driven architectures with Amazon EventBridge and other re:Invent 2019 talks like Building microservices with AWS Lambda (SVS343-R1)