Hacksaw - The Blog

Flask on AWS Serverless: A learning journey - Part 2

Yusuf Mayet — Thu, 21 Dec 2023 16:51:22 GMT

About 3 years ago I learnt some basic Python, which I've used almost exclusively to build back-end APIs on AWS Serverless, mostly Lambda. This includes a monitoring solution on AWS, an event-driven API integration solution, a load shedding telegram bot, a Slack bot that posts AWS News, a CDK project, and other telegram bots. None of them are front-end web apps, and thats something that has always been a gap for me. Some years back I did some Ruby on Rails, but did'nt build anything meaningful, and I've since forgotten most of it. So I've decided to learn Flask as a tool to build some web apps: primarily because its still python, and a micro-framework with a minimal learning curve. And I wanted to re-use what I've learned building python back-end apps and APIs on AWS Serverless, and see how I can build front-end apps and APIs that run on AWS Serverless. Admittedly, Flask is still server-side, which means I'm still avoiding client-side web apps (React, etc), but baby steps for now.

I'm specifically focussing on Flask web apps, that return HTML, CSS and JS. For Flask APIs on AWS Serverless, there are already awesome packages like Chalice, Zappa and Powertools for AWS Lambda (Python).

There are other AWS service that can be used to run Flask on AWS, like using EC2, Elastic Beanstalk, ECS or Lightsail. But I am specifically looking to use serverless because I don't want to manage servers or containers, I only want to to pay for what I actually use without having resource on all the time (and with the generous free tier for serverless on AWS you wont pay anything to run this tutorial), I want to fully automate the deployment process, and if I eventually have to scale, I don't want to have to re-architect anything. Serverless has a much better Developer Experience, and allows you to quickly build things with less hassle.

So in this series of posts, we will learn to build some Flask apps on AWS, and figure things out along the way. I'll probably get some stuff wrong, so Errors and omissions excepted. Onwards!

Previously in this series

In part 1, we took the app from the How to Make a Web Application Using Flask in Python 3 tutorial, and got it running on AWS Serverless: API GW, Lambda and DynamoDB. In part 2, we going to do almost the same thing, except instead of DynamoDB as the database, we going to use Amazon Aurora Serverless for MySQL. This part 2 will assume you didn't go through part 1, so you can start here if you want.

What is Aurora Serverless?

Aurora Serverless is an on-demand autoscaling DB cluster that scales compute capacity up and down based on your application's needs. It uses familiar SQL, so if the NoSQL DynamoDB was not your thing, then Aurora MySQL will be much closer to the tutorial.

Aurora Serverless (v1) for MySQL was announced in Preview in 2017, and went GA in 2018. It scales to zero (pausing), which is really awesome. You connect to it using standard SQL. It lives in a VPC, which means connecting to it from Lambda is going to be a challenge. However, the Data API, announced in 2019, changed that. Now you can connect to it from a Lambda function that does not need to be associated with your VPC, and you don't need to worry about setting-up and tearing-down connections. Which is really awesome, but there is a few issues, the key one for me being that Aurora Severless v1 was (is still) not available in many regions. But overall, its really good.

Aurora Serverless v2 was announced in preview in 2020, and went GA in 2022. Scaling improved dramatically, and its in all regions. But theres two major issues: it does not scale to zero, and it doesn't support the Data API, which means the Lambda function needs to be associated with your VPC. However, as I write this on 21 Dec 2023, AWS just announced the Data API for Aurora Serverless v2 PostgreSQL (not MySQL).

So based on these limitations, specifically that we don't have the Data API available for Aurora Serverless v2 for MySQL, I think that for a new serverless app that we are building and deploying via AWS SAM, its better to to use Aurora Serverless v1 for MySQL with the Data API, even though its limited to specific regions.

But Lambda can only be used for APIs...I hear you say!

But AWS Lambda is for APIs, as it returns JSON. And for RESTfull APIs you usually serve Lambda functions behind Amazon API Gateway or a Lambda Function URL, or behind Appsync for GraphQL APIs. Yes, you can have Lambda functions returning HTML with some customisation, but how would we run Flask on Lambda without changing anything in Flask? The answer: by using the Lambda Web Adapter, which serves as a universal adapter for Lambda Runtime API and HTTP API. It allows developers to package familiar HTTP 1.1/1.0 web applications, such as Express.js, Next.js, Flask, SpringBoot, or Laravel, and deploy them on AWS Lambda. This replaces the need to modify the web application to accommodate Lambda’s input and output formats, reducing the complexity of adapting code to meet Lambda’s requirements.

I should also call out the really good awsgi package, (and tutorial) which can also be used to run Flask on AWS serverless, with just a small handler in the flask app.

In order to demonstrate how to run Flask on AWS Serverless using the Lambda Web Adapter, I'm going to take an existing Flask app, and show you how to run it on AWS. For this, we will start using a very-well written tutorial on Digital Ocean: How to Make a Web Application Using Flask in Python 3. Using this tutorial as a vehicle, I will show you how to get this Flask app running on AWS, using AWS Lambda, Amazon API Gateway and Aurora Serverless for MySQL, all deployed using AWS SAM. So to follow along, you may want to keep that tutorial open, as well as this blog post. I refer to the instructions in the tutorial, and advise what needs to be changed. In addition, or alternatively to following along, you can use the resources on this projects github part-2:

starter: use this simply as a base to start with, as you follow along and make the required changes
completed: use this as a complete working project, that you can simply deploy without any further changes

Prerequisites

Besides a working Python 3 environment (in my case python 3.12), you will also need:

An active AWS account
AWS Command Line Interface (AWS CLI) installed and configured
AWS Serverless Application Model Command Line Interface (AWS SAM CLI) installed
Optionally, you should be using an IDE like AWS Cloud9 or VS Code, with the AWS Toolkit installed

Step 1 — Installing Flask

Follow the tutorial and install Flask. In my case, the version of Flask I have locally installed is:

3.0.0

Step 2 — Creating a Base Application

Follow the tutorial, and get the Hello World Flask app running locally. You can set the variables as the tutorial does, or alternatively specify it in the flask run command:

flask --app app run --debug               
 * Serving Flask app 'hello'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 139-148-368

(I realise the tutorial is using hello.py for this initial step, but to make it simpler for later on, I've started naming the file app.py from now.)

Now lets see how we can get this Hello World Flask app running on AWS. We need to create a SAM app, then build and deploy it to AWS.

We first initialise a new SAM app, using the sam-cli, based of this projects part -2 repo on github:

sam init --name flask-aws-serverless --location https://github.com/jojo786/flask-aws-serverless

then change to the part-2 folder, and specifically the starter sub-folder:

cd flask-aws-serverless/flask-aws-serverless-part-2/flask-aws-serverless-part-2-starter/

which contains these files and folders:

.
├── __init__.py
├── flask
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   └── run.sh
└── template.yaml

The flask folder contains the python code that will run as Lambda functions - the app.py file contains the same base application from the tutorial. The template.yaml file describes the serverless application resources and properties for AWS SAM deployments.

We can now build the SAM app using sam build:

sam build

Starting Build use cache                                                                                 
Manifest is not changed for (HelloWorldFunction), running incremental build                      
Building codeuri:                                                                                
.../flask-aws-serverless-part-1/flask runtime:    
python3.12 metadata: {} architecture: arm64 functions: HelloWorldFunction                        
 Running PythonPipBuilder:CopySource                                                             
 Running PythonPipBuilder:CopySource                                                             

Build Succeeded

and deploy it to AWS using sam deploy. The first time we run it, we use the interactive guided workflow to setup the various parameters: sam deploy --guided

sam deploy --guided

Configuring SAM deploy
======================

        Looking for config file [samconfig.toml] :  Not found

        Setting default arguments for 'sam deploy'
        =========================================
        Stack Name [sam-app]: flask-aws-serverless-part-2-starter
        AWS Region [af-south-1]: eu-west-1
        Parameter DBClusterName [aurora-flask-cluster]: aurora-flask-cluster
        Parameter DatabaseName [aurora_flask_db]: aurora_flask_db
        Parameter DBAdminUserName [admin_user]: 
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]: N
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: 
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [y/N]:                                
        HelloWorldFunction has no authentication. Is this okay? [y/N]: y                               
        Save arguments to configuration file [Y/n]: 
        SAM configuration file [samconfig.toml]: 
        SAM configuration environment [default]: 

        Looking for resources needed for deployment:

You can choose what to use for each argument. Please note, we haven't configured any authentication on Amazon API Gateway, so you will need to reply with y in order for the deployment to proceed.

In my case, I chose to deploy this to eu-west-1, which is the Europe (Ireland) Region, which has the Aurora Serverless v1 service. You may choose any other region, based on availability.

Once the deployment has been successful, you will find the output will list the URL of the Hello World Lambda function:

CloudFormation outputs from deployed stack
------------------------------------------------------------------------------------------------------------------
Outputs                                                                                                          
------------------------------------------------------------------------------------------------------------------
Key                 HelloWorldApi                                                                                
Description         API Gateway endpoint URL for the Hello World function                             
Value               https://helloabc123.execute-api.eu-west-1.amazonaws.com/                                     
------------------------------------------------------------------------------------------------------------------


Successfully created/updated stack - flask-aws-serverless-part-2-starter in eu-west-1

Using your API Gateway URL, you can paste that into a browser, or call it from the command line using curl, and verify that the Flask app is working on AWS:

curl https://helloabc123.execute-api.eu-west-1.amazonaws.com/
Hello, World!%

You can view the logs from Amazon CloudWatch, using sam logs:

sam logs --stack-name flask-aws-serverless-part-2-starter --region eu-west-1Access logging is disabled for HTTP API ID (gqi5xjq39i)2023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.156000 {"time": "2023-12-22T09:16:30.156Z","type": "platform.initStart","record": {"initializationType": "on-demand","phase": "init","runtimeVersion": "python:3.12.v16","runtimeVersionArn": "arn:aws:lambda:eu-west-1::runtime:5eaca0ecada617668d4d59f66bf32f963e95d17ca326aad52b85465d04c429f5","functionName": "part-2-starter-temp-HelloWorldFunction-OlVXkpFFUM5D","functionVersion": "$LATEST"}}2023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.466000 [2023-12-22 09:16:30 +0000] [12] [INFO] Starting gunicorn 21.2.02023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.466000 [2023-12-22 09:16:30 +0000] [12] [INFO] Listening at: http://0.0.0.0:8000 (12)2023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.466000 [2023-12-22 09:16:30 +0000] [12] [INFO] Using worker: sync2023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.471000 [2023-12-22 09:16:30 +0000] [13] [INFO] Booting worker with pid: 132023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.990000 {"time": "2023-12-22T09:16:30.990Z","type": "platform.extension","record": {"name": "lambda-adapter","state": "Ready","events": []}}2023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:30.992000 {"time": "2023-12-22T09:16:30.992Z","type": "platform.start","record": {"requestId": "604b817a-284e-4d6a-8508-4640e6a2a209","version": "$LATEST"}}2023/12/22/[$LATEST]b1522e565fea4016ae7f687b7ece5947 2023-12-22T09:16:31.085000 {"time": "2023-12-22T09:16:31.085Z","type": "platform.report","record": {"requestId": "604b817a-284e-4d6a-8508-4640e6a2a209","metrics": {"durationMs": 92.846,"billedDurationMs": 93,"memorySizeMB": 128,"maxMemoryUsedMB": 76,"initDurationMs": 834.174},"status": "success"}}

Your Flask app is now live on AWS! Lets review what we have accomplished thus far. We first initialised an AWS SAM app, built it, then deployed it to AWS. What SAM actually did for us in the background was to provision the following resources on AWS:

An AWS Lambda function to run the Flask base Hello World app. This includes a Layer for the Lambda Web Adapter
An Amazon API Gateway HTTP API in front of the Lambda function to receive requests, which will invoke the Lambda function
An Amazon CloudWatch group to store logs from the Lambda function
And a few other things like IAM Roles and policies, and API Gateway stages

Step 3 — Using HTML templates

Everything in this step will be exactly the same as it is in the tutorial. After you've created all the templates in the flask folder, the file structure will now look like:

.
├── README.md
├── __init__.py
├── flask
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   ├── run.sh
│   ├── static
│   │   └── css
│   │       └── style.css
│   └── templates
│       ├── base.html
│       └── index.html
├── samconfig.toml
├── template.yaml

to test it locally, change to the flask directory, and use flask run:

cd flask/
flask --app app run --debug

And to deploy these changes to AWS, simply run:

sam build && sam deploy

And once the deploy is done, you can test using the same API Gateway URL on AWS as before in your browser.

Step 4 — Setting up the Database

AWS Lambda functions and its storage are ephemeral, meaning their execution environments only exist for a short time when the function is invoked. This means that we will eventually lose data if we setup an SQLite database as part of the Lambda function, because the contents are deleted when the Lambda service eventually terminates the execution environment. There are multiple options for managed serverless databases on AWS, including Amazon Aurora Serverless, that also supports MySQL, just like SQLite as used in the tutorial.

We will need to make a few changes to the tutorial to use Aurora, instead of SQLite is. We will use SAM to deploy an Aurora Serverless v1 for MySQL DB (based off this serverlessland pattern). Add (or uncomment) the following config in template.yaml:

	  AWS_REGION: !Ref AWS::Region
          DBClusterArn: !Sub 'arn:aws:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DBClusterName}'
          DBName: !Ref DatabaseName
          SecretArn: !Ref DBSecret
      Policies: # Creates an IAM Role that defines the services the function can access and which actions the function can perform
        - AWSSecretsManagerGetSecretValuePolicy:
            SecretArn: !Ref DBSecret
        - Statement:
          - Effect: Allow
            Action: 'rds-data:ExecuteStatement'
            Resource: !Sub 'arn:aws:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DBClusterName}'

  
  DBSecret: # Secrets Manager secret
    Type: 'AWS::SecretsManager::Secret'
    Properties:
      Name: !Sub '${DBClusterName}-AuroraUserSecret'
      Description: RDS database auto-generated user password
      GenerateSecretString:
        SecretStringTemplate: !Sub '{"username": "${DBAdminUserName}"}'
        GenerateStringKey: password
        PasswordLength: 30
        ExcludeCharacters: '"@/\'

  
  AuroraCluster: # Aurora Serverless DB Cluster with Data API
    Type: 'AWS::RDS::DBCluster'
    Properties:
      DBClusterIdentifier: !Ref DBClusterName
      MasterUsername: !Sub '{{resolve:secretsmanager:${DBSecret}:SecretString:username}}'
      MasterUserPassword: !Sub '{{resolve:secretsmanager:${DBSecret}:SecretString:password}}'
      DatabaseName: !Ref DatabaseName
      Engine: aurora-mysql
      EngineMode: serverless
      EnableHttpEndpoint: true # Enable the Data API for Aurora Serverles
      ScalingConfiguration:
        AutoPause: true
        MinCapacity: 1
        MaxCapacity: 2
        SecondsUntilAutoPause: 3600
        
Outputs:
  DBClusterArn:
    Description: Aurora DB Cluster Resource ARN
    Value: !Sub 'arn:aws:rds:${AWS::Region}:${AWS::AccountId}:cluster:${DBClusterName}'
  DBName:
    Description: Aurora Database Name
    Value: !Ref DatabaseName
  SecretArn:
    Description: Secrets Manager Secret ARN
    Value: !Ref DBSecret

That contains a number of resources:

some variables to export to the Lambda function, so it knows how to connect to Aurora
Secrets Manager to store the DB credentials
Aurora Serverless v1 MySQL cluster, that is set to pause after 1 hour of in-activity

And to deploy these changes to AWS, simply run:

sam build && sam deploy

The Output section of the sam deploy will contain the details of Aurora that we will need:

CloudFormation outputs from deployed stack
-----------------------------------------------------------------------------------------------------------
Outputs                                                                                                   
-----------------------------------------------------------------------------------------------------------
Key                 SecretArn                                                                             
Description         Secrets Manager Secret ARN                                                            
Value               arn:aws:secretsmanager:eu-west-1:1111111111:secret:cluster-temp-AuroraUserSecret-   
1111111111                                                                                                    

Key                 DBClusterArn                                                                          
Description         Aurora DB Cluster Resource ARN                                                        
Value               arn:aws:rds:eu-west-1:1111111111:cluster:aurora-flask-cluster                               

Key                 DBName                                                                                
Description         Aurora Database Name                                                                  
Value               flask_db                                                                               

Key                 HelloWorldApi                                                                         
Description         API Gateway endpoint URL for Hello World function                                     
Value               https://1111111111.execute-api.eu-west-1.amazonaws.com/                               
-----------------------------------------------------------------------------------------------------------


Successfully created/updated stack - flask-aws-serverless-part-2-starter in eu-west-1

These need to be set/exported as variables. On macOS, like this:

export DBClusterArn=arn:aws:rds:eu-west-1:1111111:cluster:aurora-flask-cluster
export SecretArn=arn:aws:secretsmanager:eu-west-1:11111:secret:aurora-flask-cluster-AuroraUserSecret-1111
export DBName=aurora_flask_db

Now to setup the schema of Aurora, we will use this schema.sql. The only difference is that MySQL uses AUTO_INCREMENT

CREATE TABLE posts (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    title TEXT NOT NULL,
    content TEXT NOT NULL
);

Our init_db.py script will be as follows:

import os
import boto3
from botocore.config import Config

DBClusterArn = os.environ['DBClusterArn']
DBName = os.environ['DBName']
SecretArn = os.environ['SecretArn']
my_config = Config(
        region_name = os.environ['AWS_REGION'])
client = boto3.client('rds-data', config=my_config)


with open('schema.sql') as file:
    schema = file.read()
    response = client.execute_statement(
                resourceArn=DBClusterArn,
                secretArn=SecretArn,
                database=DBName,
                sql=schema
            ) 

response = client.execute_statement(
                resourceArn=DBClusterArn,
                secretArn=SecretArn,
                database=DBName,
                sql="""
                INSERT INTO posts (title, content) 
                VALUES (:title, :content)
                """,
                parameters=[
                        {
                        'name':'title', 
                        'value':{'stringValue':"First Post"}
                        },
                        {
                        'name':'content', 
                        'value':{'stringValue':"Content for the first post"}
                        }
                    ] 
            ) 

response = client.execute_statement(
                resourceArn=DBClusterArn,
                secretArn=SecretArn,
                database=DBName,
                sql="""
                INSERT INTO posts (title, content) 
                VALUES (:title, :content)
                """,
                parameters=[
                        {
                        'name':'title', 
                        'value':{'stringValue':"Second Post"}
                        },
                        {
                        'name':'content', 
                        'value':{'stringValue':"Content for the second post"}
                        }
                    ] 
            )

Both files should be in the same directory, e.g. in the flask directory. You can now execute it with:

python3 init_db.py

Instead of using raw boto3, you can look at using libraries that make it easier in python and Flask to work with the Aurora Data API, like aurora-data-api or sqlalchemy-aurora-data-api. Alternatively, instead of using the schema and init_db scripts to create the table and tests posts, you can use built-in visual RDS Query Editor in the AWS Console, or the AWS CLI:

aws rds-data execute-statement --region eu-west-1  --resource-arn arn:aws:rds:eu-west-1:1111111:cluster:aurora-flask-cluster  --secret-arn arn:aws:secretsmanager:eu-west-1:111111:secret:aurora-flask-cluster-AuroraUserSecret-11111 --database aurora_flask_db --sql "CREATE TABLE posts (id INTEGER PRIMARY KEY AUTO_INCREMENT, created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, title TEXT NOT NULL, content TEXT NOT NULL );"

aws rds-data execute-statement --region eu-west-1  --resource-arn arn:aws:rds:eu-west-1:1111111:cluster:aurora-flask-cluster  --secret-arn arn:aws:secretsmanager:eu-west-1:111111:secret:aurora-flask-cluster-AuroraUserSecret-11111 --database aurora_flask_db --sql "INSERT INTO posts (title, content) VALUES ('First Post', 'Content for first post');"

Step 5 — Displaying All Posts

Here we will make some changes to the Flask app, to read data from Aurora. We will import the boto3 package - a Python SDK for AWS. We will lookup the name of the Aurora DB and secret that was created by SAM. The boto3 execute_statement command, using the secrets and database ARNs, will safely retrieve the database password before executing the SQL query.

Our app.py will now look as follows:

from flask import Flask, render_template, request, url_for, flash, redirect
import os
from boto3.dynamodb.conditions import Key
from boto3 import resource
from werkzeug.exceptions import abort
import boto3
from botocore.config import Config

DBClusterArn = os.environ['DBClusterArn']
DBName = os.environ['DBName']
SecretArn = os.environ['SecretArn']
my_config = Config(
        region_name = os.environ['AWS_REGION'])
client = boto3.client('rds-data', config=my_config)

app = Flask(__name__)

@app.route('/')
def index():
    posts = []

    response = client.execute_statement(
        resourceArn=DBClusterArn,
        secretArn=SecretArn,
        database=DBName,
        sql="""SELECT * FROM posts"""
    )

    print(response)
    for record in response['records']:
        posts.append({
            'id': record[0]['longValue'],
            'created': record[1]['stringValue'],
            'title': record[2]['stringValue'],
            'content': record[3]['stringValue']
        })

    return render_template('index.html', posts=posts)

Step 6 — Displaying a Single Post

The only change required here is to the get_post method, which will retrieve a particular item from Aurora:

def get_post(post_id):
    post = {}
    
    response = client.execute_statement(
        resourceArn=DBClusterArn,
        secretArn=SecretArn,
        database=DBName,
        sql="""SELECT * FROM posts WHERE id = :id""",
        parameters=[
                {
                'name':'id', 
                'value':{'longValue':post_id}
                }
            ] 
    )
    
    for record in response['records']:
        post['id'] = record[0]['longValue']
        post['created'] = record[1]['stringValue']
        post['title'] = record[2]['stringValue']
        post['content'] = record[3]['stringValue']
    
    if len(post) == 0:
        abort(404)
    
    return post

As usual, run sam build && sam deploy to run it on AWS, and/or flask run to test locally.

Step 7 — Modifying Posts

Creating a New Post

Our create function will create a new post in Aurora:

@app.route('/create', methods=('GET', 'POST'))
def create():
    if request.method == 'POST':

        title = request.form['title']
        content = request.form['content']
        
        if not title:
            flash('Title is required!')
        else:
            response = client.execute_statement(
                resourceArn=DBClusterArn,
                secretArn=SecretArn,
                database=DBName,
                sql="""
                INSERT INTO posts (title, content) 
                VALUES (:title, :content)
                """,
                parameters=[
                        {
                        'name':'title', 
                        'value':{'stringValue':title}
                        },
                        {
                        'name':'content', 
                        'value':{'stringValue':content}
                        }
                    ] 
            ) 
                  
            return redirect(url_for('index'))
    return render_template('create.html')

Editing a Post

Our edit function will work very similar, where we lookup a particular post id, and then update that item:

@app.route('//edit', methods=('GET', 'POST'))
def edit(id):
    post = get_post(id)

    if request.method == 'POST':
        title = request.form['title']
        content = request.form['content']

        if not title:
            flash('Title is required!')
        else:
            response = client.execute_statement(
                resourceArn=DBClusterArn,
                secretArn=SecretArn,
                database=DBName,
                sql="""
                UPDATE posts SET title = :title, content = :content
                WHERE id = :id 
                """,
                parameters=[
                    {
                        'name':'title', 
                        'value':{'stringValue':title}
                        },
                        {
                        'name':'content', 
                        'value':{'stringValue':content}
                        },
                        {
                        'name':'id', 
                        'value':{'longValue':id}
                        }
                    ] 
            ) 

            return redirect(url_for('index'))

    return render_template('edit.html', post=post)

Deleting a Post

The delete function is quite similiar again, where we lookup a particular post id, then delete it:

@app.route('//delete', methods=('POST',))
def delete(id):
    post = get_post(id)

    response = client.execute_statement(
        resourceArn=DBClusterArn,
        secretArn=SecretArn,
        database=DBName,
        sql="""DELETE FROM posts WHERE id = :id""",
        parameters=[
                {
                'name':'id', 
                'value':{'longValue':id}
                }
            ] 
    ) 
        
    return redirect(url_for('index'))

You can get all the final code from the completed folder in github.

As usual, you simply run sam build && sam deploy to deploy to AWS.

Conclusion

We've taken the excellent How To Make a Web Application Using Flask in Python 3 tutorial and using AWS SAM, demonstrated how you can run a Flask app on AWS Serverless. With serverless, we dont need to think of or manage servers, or worry about other mundane tasks like installing or patching the OS, database or any software packages. The beauty of SAM is that it deploys directly to AWS for us, with very little effort. We chose to use Aurora Serverless as the serverless database, due to it supporting MySQL.

Flask on AWS Serverless: A learning journey - Part 1

Yusuf Mayet — Mon, 18 Dec 2023 13:02:36 GMT

So in this series of posts, we will learn to build some Flask apps on AWS, and figure things out along the way. I'll probably get some stuff wrong, so Errors and omissions excepted. Onwards!

But Lambda can only be used for APIs...I hear you say!

I should also call out the really good awsgi package, (and tutorial) which can also be used to run Flask on AWS serverless, with just a small handler in the flask app.

In order to demonstrate how to run Flask on AWS Serverless using the Lambda Web Adapter, I'm going to take an existing Flask app, and show you how to run it on AWS. For this, we will start using a very-well written tutorial on Digital Ocean: How to Make a Web Application Using Flask in Python 3. Using this tutorial as a vehicle, I will show you how to get this Flask app running on AWS, using AWS Lambda, Amazon API Gateway and Amazon DynamoDB, all deployed using AWS SAM. So to follow along, you may want to keep that tutorial open, as well as this blog post. I refer to the instructions in the tutorial, and advise what needs to be changed. In addition, or alternatively to following along, you can use the resources on this projects github:

starter: use this simply as a base to start with, as you follow along and make the required changes
completed: use this as a complete working project, that you can simply deploy without any further changes

Prerequisites

Besides a working Python 3 environment (in my case python3.12), you will also need:

An active AWS account
AWS Command Line Interface (AWS CLI) installed and configured
AWS Serverless Application Model Command Line Interface (AWS SAM CLI) installed
Optionally, you should be using an IDE like AWS Cloud9 or VS Code, with the AWS Toolkit installed

Step 1 — Installing Flask

Follow the tutorial and install Flask. In my case, the version of Flask I have locally installed is:

3.0.0

Step 2 — Creating a Base Application

Follow the tutorial, and get the Hello World Flask app running locally. You can set the variables as the tutorial does, or alternatively specify it in the flask run command:

flask --app app run --debug               
 * Serving Flask app 'hello'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 139-148-368

(I realise the tutorial is using hello.py for this initial step, but to make it simpler for later on, I've started naming the file app.py from now.)

Now lets see how we can get this Hello World Flask app running on AWS. We need to create a SAM app, then build and deploy it to AWS.

We first initialise a new SAM app, using the sam-cli, based of this projects repo on github:

sam init --name flask-aws-serverless --location https://github.com/jojo786/flask-aws-serverless

then change to the part-1 folder, and specifically the starter sub-folder:

cd flask-aws-serverless/flask-aws-serverless-part-1/flask-aws-serverless-part-1-starter/

which contains these files and folders:

.
├── README.md
├── __init__.py
├── flask
│   ├── __init__.py
│   ├── app.py
│   └── requirements.txt
├── template.yaml

We can now build the SAM app using sam build:

sam build

Starting Build use cache                                                                                 
Manifest is not changed for (HelloWorldFunction), running incremental build                      
Building codeuri:                                                                                
.../flask-aws-serverless-part-1/flask runtime:    
python3.12 metadata: {} architecture: arm64 functions: HelloWorldFunction                        
 Running PythonPipBuilder:CopySource                                                             
 Running PythonPipBuilder:CopySource                                                             

Build Succeeded

and deploy it to AWS using sam deploy. The first time we run it, we use the interactive guided workflow to setup the various parameters: sam deploy --guided

sam deploy --guided

Configuring SAM deploy
======================


        Setting default arguments for 'sam deploy'
        =========================================
        Stack Name [flask-aws-serverless-part-1]: 
        AWS Region [af-south-1]: 
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]: N
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: 
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [y/N]:                               
        HelloWorldFunction has no authentication. Is this okay? [y/N]: y
                             
        Save arguments to configuration file [Y/n]: 
        SAM configuration file [samconfig.toml]: 
        SAM configuration environment [default]: 

        Looking for resources needed for deployment:

You can choose what to use for each argument. Please note, we haven't configured any authentication on Amazon API Gateway, so you will need to reply with y in order for the deployment to proceed.

Once the deployment has been successful, you will find the output will list the URL of the Hello World Lambda function:

CloudFormation outputs from deployed stack
------------------------------------------------------------------------------------------------------------------
Outputs                                                                                                          
------------------------------------------------------------------------------------------------------------------
Key                 HelloWorldApi                                                                                
Description         API Gateway endpoint URL for the Hello World function                             
Value               https://helloabc123.execute-api.af-south-1.amazonaws.com/                                     
------------------------------------------------------------------------------------------------------------------


Successfully created/updated stack - flask-aws-serverless-part-1 in af-south-1

Using your API Gateway URL, you can paste that into a browser, or call it from the command line using curl, and verify that the Flask app is working on AWS:

curl https://helloabc123.execute-api.af-south-1.amazonaws.com/
Hello, World!%

You can view the logs from Amazon CloudWatch, using sam logs:

sam logs --stack-name flask-aws-serverless-part-1       
                                                             
2023/12/18/[$LATEST]faca0569b6084bbdac895af5611c311f 2023-12-18T12:59:47.543000 {
  "time": "2023-12-18T12:59:47.543Z",
  "type": "platform.initStart",
  "record": {
    "initializationType": "on-demand",
    "phase": "init",
    "runtimeVersion": "python:3.12.v16",
    "runtimeVersionArn": "arn:aws:lambda:af-south-1::runtime:5eaca0ecada617668d4d59f66bf32f963e95d17ca326aad52b85465d04c429f5",
    "functionName": "flask-aws-serverless-part-1-HelloWorldFunction-ovMO2mWwZDtR",
    "functionVersion": "$LATEST"
  }
}
2023/12/18/[$LATEST]faca0569b6084bbdac895af5611c311f 2023-12-18T12:59:47.819000 [2023-12-18 12:59:47 +0000] [12] [INFO] Starting gunicorn 21.2.0
2023/12/18/[$LATEST]faca0569b6084bbdac895af5611c311f 2023-12-18T12:59:47.820000 [2023-12-18 12:59:47 +0000] [12] [INFO] Listening at: http://0.0.0.0:8000 (12)
2023/12/18/[$LATEST]faca0569b6084bbdac895af5611c311f 2023-12-18T12:59:47.820000 [2023-12-18 12:59:47 +0000] [12] [INFO] Using worker: sync
2023/12/18/[$LATEST]faca0569b6084bbdac895af5611c311f 2023-12-18T12:59:47.822000 [2023-12-18 12:59:47 +0000] [13] [INFO] Booting worker with pid: 13

2023/12/18/[$LATEST]faca0569b6084bbdac895af5611c311f 2023-12-18T12:59:48.278000 {
  "time": "2023-12-18T12:59:48.278Z",
  "type": "platform.report",
  "record": {
    "requestId": "4921460f-40dc-4452-81bf-75f608101f12",
    "metrics": {
      "durationMs": 19.307,
      "billedDurationMs": 20,
      "memorySizeMB": 128,
      "maxMemoryUsedMB": 76,
      "initDurationMs": 713.405
    },
    "status": "success"
  }
}

An AWS Lambda function to run the Flask base Hello World app. This includes a Layer for the Lambda Web Adapter
An Amazon API Gateway HTTP API in front of the Lambda function to receive requests, which will invoke the Lambda function
An Amazon CloudWatch group to store logs from the Lambda function
And a few other things like IAM Roles and policies, and API Gateway stages

Step 3 — Using HTML templates

Everything in this step will be exactly the same as it is in the tutorial. After you've created all the templates in the flask folder, the file structure will now look like:

.
├── README.md
├── __init__.py
├── flask
│   ├── __init__.py
│   ├── app.py
│   ├── requirements.txt
│   ├── run.sh
│   ├── static
│   │   └── css
│   │       └── style.css
│   └── templates
│       ├── base.html
│       └── index.html
├── samconfig.toml
├── template.yaml

to test it locally, change to the flask directory, and use flask run:

cd flask/
flask --app app run --debug

And to deploy these changes to AWS, simple run:

sam build && sam deploy

And once the deploy is done, you can use the same API Gateway URL on AWS as before (e.g. https://helloabc123.execute-api.af-south-1.amazonaws.com/) in your browser.

Step 4 — Setting up the Database

So we will need to make a few changes to the tutorial to use DynamoDB, instead of SQLite is. We wont need the schema.sql or init_db.py files. Rather, we will use SAM to deploy a DynamoDB table that we referencing as PostsTable (the actual name of the table will only be known after it has been created, in the Output of sam deploy).

Add (or uncomment) the following config in template.yaml:

  # PostsTable:
  #   Type: AWS::DynamoDB::Table
  #   Properties:
  #     AttributeDefinitions:
  #       - AttributeName: id
  #         AttributeType: N
  #     KeySchema:
  #       - AttributeName: id
  #         KeyType: HASH
  #     BillingMode: PAY_PER_REQUEST
  #     Tags:
  #      - Value: "flask-aws-serverless"
  #        Key: "project"
  
  Outputs:
  # DynamoDBTable:
  #   Description: DynamoDB Table
  #   Value: !Ref PostsTable

DynamoDB does not have a traditional row and column data model like a relational database. Instead, DynamoDB uses a key-value format to store data.

In DynamoDB, a table contains items rather than rows. Each item is a collection of attribute-value pairs. You can think of the attributes as being similar to columns in a relational database, but there is no predefined schema that requires all items to have the same attributes.
DynamoDB items also do not have a fixed structure like rows in a relational database. Items can vary in content and size up to 400KB. This flexible data model allows DynamoDB tables to accommodate a wide variety of data types and workloads.
Primary keys in DynamoDB serve a similar purpose to primary keys in a relational database by uniquely identifying each item. But DynamoDB tables have just one primary key rather than having a separate primary key for each table like in a relational DB.

So in summary, while DynamoDB shares some similarities to relational databases, it does not have the traditional row-column structure. Its flexible, non-schema model is better suited for many NoSQL and distributed applications.

Lets talk a little about the columns that are used in the tutorial, and compare it to what we using with DynamoDB:

id INTEGER PRIMARY KEY AUTOINCREMENT,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
title TEXT NOT NULL,
content TEXT NOT NULL

But with DynamoDB, we are specifying:

the primary key is id, which is of Type Number. DynamoDB does not natively have an auto-incrementing column type. Instead of an auto-incrementing number, we will simply use generate a timestamp as the id value.
DynamoDB doesn’t have a dedicated datetime data type. In the flask app, we will use the python datetime class to create a timestamp.
We dont need to specify the other columns (title, content) yet.

And to allow the Lambda function to acquire the required IAM permissions to read and write to that table, we will add a connector. Add this config to the Lambda resource in template.yaml:

PostsTable: !Ref PostsTable
    Connectors:
      PostsTableConnector:
        Properties:
          Destination: 
            Id: PostsTable
          Permissions: 
            - Read
            - Write

The complete template.yaml is now as follows, as per github:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: flask-aws-serverless-part-1

Globals:
  Function:
    Tags:
      project: "flask-aws-serverless"
    Timeout: 3
    MemorySize: 128
    Runtime: python3.12
    Layers:
        - !Sub arn:aws:lambda:${AWS::Region}:753240598075:layer:LambdaAdapterLayerArm64:17
    LoggingConfig:
      LogFormat: JSON
      #LogGroup: !Sub /aws/lambda/${AWS::StackName}
    Architectures:
      - arm64 #Graviton: cheaper and faster
    

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: flask/
      Handler: run.sh #required for the Lambda Web Adapter
      Events:
        HelloWorld:
          Type: HttpApi
      Environment:
        Variables:
          AWS_LAMBDA_EXEC_WRAPPER: /opt/bootstrap
          PORT: 8000
          PostsTable: !Ref PostsTable
    Connectors:
      PostsTableConnector:
        Properties:
          Destination: 
            Id: PostsTable
          Permissions: 
            - Read
            - Write 

  PostsTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: id
          AttributeType: N
      KeySchema:
        - AttributeName: id
          KeyType: HASH
      BillingMode: PAY_PER_REQUEST

Outputs:
  HelloWorldApi:
    Description: API Gateway endpoint URL for Hello World function
    Value: !Sub "https://${ServerlessHttpApi}.execute-api.${AWS::Region}.${AWS::URLSuffix}/"
  DynamoDBTable:
    Description: DynamoDB Table
    Value: !Ref PostsTable

As usual, to deploy to AWS, run sam build && sam deploy. The Output will include the name of the DynamoDB table that SAM created.

To insert some test posts into DynamoDB, you can log into the DynamoDB console in your AWS account, or with the aws cli, using the value you got above:

aws dynamodb put-item \                                                 
  --table-name flask-aws-serverless-part-1-PostsTable-abc123 \
  --item '{"id": {"N": "1"}, "title": {"S": "My first post"}, "content": {"S": "Hello World"}, "created": {"S": "2023-12-18 18:05:00"}}'

Step 5 — Displaying All Posts

Here we will make some changes to the Flask app, to read data from DynamoDB. We will import the boto3 package - a Python SDK for AWS. We will lookup the name of the DynamoDB table that was created by SAM, then use a scan operation to return all items.

Our app.py will now look as follows:

from flask import Flask, render_template
import os
from boto3.dynamodb.conditions import Key
from boto3 import resource

dynamodb = resource('dynamodb')
posts_table = dynamodb.Table(os.environ["PostsTable"])
app = Flask(__name__)

@app.route('/')
def index():
    posts = ''
    
    try: 
        response = posts_table.scan()
        posts = response['Items']
    except Exception as error:
        print("dynamo scan failed:", error, flush=True) 
              
    return render_template('index.html', posts=posts)

You can now see the posts on the flask app. You can use the flask run command (remember to change to flask directory) to run the app locally, However, you will need to provide it with the name of the DynamoDB table. On MacOS, you can export the variable, based on output from the SAM deploy, before running flask run:

export PostsTable=flask-aws-serverless-part-1-PostsTable-abc123

Instead of using the raw boto3 to interface with DynamoDB, you can look at persistence libraries that make it easier to work with DynamoDB in python and Flask.

Step 6 — Displaying a Single Post

The only change required here is to the get_post method, which will get a particular item from DynamoDB:

def get_post(post_id):
    try:
        response = posts_table.get_item(Key={'id': post_id})
        post = response['Item']
    except Exception as error:
        print("dynamo get post failed:", error, flush=True) 
        abort(404)

    return post

As usual, run sam build && sam deploy to run it on AWS, and/or flask run to test locally.

Step 7 — Modifying Posts

Creating a New Post

Our create function will work as follows. Using the DynamoDB update_item operation:

DynamoDB does not natively have an auto-incrementing column type. Instead of an auto-incrementing, number, we will simply use a timestamp as the id value, which we cast to an int
DynamoDB doesn’t have a dedicated datetime data type, so we will use the python datetime class to create a timestamp, which we cast to a string

@app.route('/create', methods=('GET', 'POST'))
def create():
    if request.method == 'POST':

        title = request.form['title']
        content = request.form['content']
        created = str(datetime.now())
        id = int(datetime.now().timestamp())
        
        if not title:
            flash('Title is required!')
        else:
            try: 
                #insert new post into dynamodb
                posts_table.put_item(
                    Item={
                        'id': id,
                        'title': title,
                        'content': content,
                        'created': created
                        }
                )
            except Exception as error:
                print("dynamo PUT failed:", error, flush=True) 
                  
            return redirect(url_for('index'))
    return render_template('create.html')

Editing a Post

Our edit function will work very similar, where we lookup a particular post id, and then update that item:

@app.route('//edit', methods=('GET', 'POST'))
def edit(id):
    post = get_post(id)

    if request.method == 'POST':
        title = request.form['title']
        content = request.form['content']

        if not title:
            flash('Title is required!')
        else:
            try:
                    posts_table.update_item(
                    Key={
                        'id': id
                        },
                        UpdateExpression="set title = :title, content = :content",
                        ExpressionAttributeValues={
                            ':title': title,
                            ':content': content
                            }
                )
            except Exception as error:
                print("dynamo update failed:", error, flush=True) 
                       
            return redirect(url_for('index'))

    return render_template('edit.html', post=post)

Deleting a Post

The delete function is quite similiar again, where we lookup a particular post id, then delete it:

@app.route('//delete', methods=('POST',))
def delete(id):
    post = get_post(id)

    try:
        posts_table.delete_item(
            Key={
                'id': id
                }
        )
        flash('"{}" was successfully deleted!'.format(post['title']))
    except Exception as error:
        print("dynamo delete failed:", error, flush=True)  
        
    return redirect(url_for('index'))

You can get all the final code from the c ompleted folder in github.

As usual, you simply run sam build && sam deploy to deploy to AWS.

Conclusion

We've taken the excellent How To Make a Web Application Using Flask in Python 3 tutorial and using AWS SAM, demonstrated how you can run a Flask app on AWS Serverless. With serverless, we dont need to think of or manage servers, or worry about other mundane tasks like installing or patching the OS, database or any software packages. The beauty of SAM is that it deploys directly to AWS for us, with very little effort. We chose to use DynamoDB as the serverless database. In part 2 we use Aurora Serverless.

Use Events internally and APIs externally

Yusuf Mayet — Fri, 28 Apr 2023 12:56:31 GMT

Overview

I recently helped an AWS customer design a new system that had very specific non-functional requirements regarding how it integrates with external or 3rd party systems. The internal part of the system was developed as an event-drivensystem, but for these ‘legacy’ 3rd party systems they needed a more simpler, traditional method of integration. This is story of how we used RESTful APIs to inject events to and from the event-driven system by using the Serverless Event Gateway Pattern on AWS, with some deployable code in github.

Requirements

The customer required a system that receives requests, and then sends those requests to 3rd parties for fulfilment. The 3rd parties process those requests, and then respond back with their fulfilment status.

Now this is not unique - we've all come across these types of systems before, e.g. an e-commerce system, when it receives an order from a user/customer, might send out fulfilment requests to many 3rd parties, who will each need to respond timeously with stock details, etc. Status updates will then need to be sent to the users.

And if we think about the amount of requests and responses flowing through the system: the customer expects there to be thousands of users who will send requests, and there will be hundreds of 3rd parties, so we estimate that the system will process many hundreds of thousands of requests and responses on a regular (perhaps daily) basis. So naturally, we would want to design the integration and communication between the system and the 3rd parties to be highly available and performant.

Now with that background of the system in mind, we can talk in more detail about the strict requirements that my customer had:

We need a standard way to integrate with all 3rd parties

This is quite obvious, and a pretty standard requirement. We want a uniform, standard way to integrate with all 3rd parties, that does not require custom development for different 3rd parties, even though 3rd parties could be using different tech stacks (architectures, languages, versions, etc). The integration should be based on a well-known and potentially open source standard/protocol/system like a RESTful API defined with OAS/Swagger, or an Event Broker/Router like MQ, Kafka or JMS.

It should take no development effort and very little or no operational effort to onboard new 3rd parties.

This makes sense, considering that we will have many hundreds of 3rd parties, that will each need to be registered, integrated with, and then will need some ongoing operational management. Also consider that each 3rd party will have multiple environments (prod, dev, test) each with its own parameters (IP/DNS, credentials, SSL certs, webhooks/callback URLs). We therefore don’t want to have to hire a full time team just to onboard and manage 3rd parties

Retries and failures must be automatically managed

3rd parties are highly distributed all over the internet in different locations (on-prem, hosting, cloud), and due to the nature of a highly distributed system, each 3rd party will have changes in the status and availability over time, i.e. not all 3rd parties will be up and available when we want to send them a request, and may be down or unresponsive when we send them a message. The system needs to ensure that every request reaches the intended 3rd party, but we don’t want to have to build custom retry logic. And we needed to consider some still un-defined variables: how long should the system promise to keep un-delivered requests for, and how long should it retry for? So we need to find a way to manage this without custom code.

Only the intended 3rd parties should receive requests

Not all types of requests should be sent to all 3rd parties. So when we onboard a 3rd party, we define the type of requests that that 3rd party can fulfil, and the system should have a way to filter requests from users, so that only specific 3rd parties receive those requests. These filters should be easy to maintain, and should not require custom code or even a code change to update.

Solution design

These four requirements above were key to shaping the design of the solution. Internally, the system would be composed of many different modules, each with their own functionality, so we decided to have each developed and deployed as individual microservices This meant that not only could we scale them individually based on capacity requirements, but it would increase our development velocity as we could introduce new features and have them deployed independently.

I’ve written before about how Architecture predicts success. This also led us to think about the impact it will have on the team structure (which I’ve written about before as well) - we know from Team Topologies that:

One key approach to achieving the software architecture (and associated benefits like speed of delivery or time to recover from failure) is to apply the reverse Conway maneuver: designing teams to match the desired architecture......In short, by considering the impact of Conway’s law when designing software architectures and/or reorganizing team structures, you will be able to take advantage of the isomorphic force at play, which converges the software architecture and the team design.

We wanted to be able to handle requests asynchronously, in order to increase performance and responsiveness, and to better tolerate failure and slowness from internal modules and 3rd parties. All this lead us to using an event-driven architecture: where the internal microservices are decoupled from each other and from the 3rd parties, using an event broker/router. An additional benefit was that he system had a previously developed monolithic component, that we wanted to re-use without much development - so using the event broker would allow us to integrate the legacy component with the newer microservices.

We then looked to design the communication and interaction with the 3rd parties. Since we were using an event-driven architecture internally, we looked to also extend this to the 3rd parties. Initially, we thought to leverage off a well-known Event/Message Broker (see my rant on microservices and ESB are few years ago) like MQ, Kafka or JMS, with the thought that because they are standard and well-known, it will make it easy for 3rd parties and their developers to be able to integrate with. AWS has a few fully managed services that we were looking for: Amazon MQ that supports Apache ActiveMQ and RabbitMQ, Amazon MSK for Kafka, and Amazon SQS supports JMS, where AWS will manage all the infrastructure and operations, while freeing us to work on the functionality we need.

We took some time to consider this would have, specifically the event/message broker (Kafka/JMS/MQ), on the 3rd parties and their developers, and how easy would they be able to integrate with it. We considered the physical location/region that the customer operates in, and the experience that typical developers in that location would have with Kafka/JMS/MQ, and eventually we came to the conclusion that it would pose a significant learning curve that would be too steep, which would quickly lead to longer integration and development lead times, and even bigger issues with debugging. We decided that imposing an architecture like this would go against some of the key requirements the customer had - it would simply be too complex. In addition, even though this system will be running on AWS, most of the 3rd parties probably wouldn’t be, which means we couldn’t assume that the 3rd parties will have AWS credentials to be able to access SQS or other native AWS services that require AWS credentials.

But we knew that most corporate developers were familiar with HTTP APIs, and would have enough experience and tooling to be able to quickly integrate with a RESTful API. So the challenge was how can we maintain the internal event-driven architecture, and yet expose easy to understand RESTful APIs to 3rd parties. And that lead us to Amazon EventBridge - a serverless event-bus that would allow the internal microservices to send, filter and route event with rules, is fully managed and serverless, and yet supported API destinations - which could be any external 3rd party API. With EventBridge, we could use the Serverless Event Gateway Pattern:

There are many times when developing our serverless solutions when we need to interact with existing legacy systems or 3rd party services within our wider domain; and they are unable to publish and consume events from your central Amazon EventBridge bus directly due to technology or framework limitations...using the ‘Event Gateway Pattern’ alongside Amazon EventBridge and Amazon EventBridge API Destinations to allow this flow between systems both ways i.e. consuming and publishing based on domain events.

Its common, when talking about microservices and events to think about the Bounded Context: the boundary within a domain where a particular domain model applies. So with-in our bounded context of the internal system architecture we using private or internal events for communication, and between the bounded contexts we using REStful APIs as public or external events. We also think about how microservices will interact with each other, using Choreography and Orchestration, and the resulting rule-of-thumb:

use orchestration within the bounded context of a microservice, but use choreography between bounded-contexts.

So in our case, we are suggesting that when you are faced with an event-driven system that needs to interact with external/legacy systems, and for considerations of trust, security, latency and/or experience/expertise, the rule-of-thumb is:

se events internally, and APIs externally
(me, right now)

(you might recognise that last part from Linus Torvalds in 1992 during the Tanenbaum-Torvalds debate)

This would allow us - in theory - to meet all of our key requirements without any custom code on our side: using RESTful APIs allows 3rd parties to easily integrate with the system with not much effort, EventBridge will handle retries natively, and EventBridge filters define which requests goes to which 3rd parties.

Now to put the theory to the test.

PoC

We decided to run a quick Proof of Concept (PoC) on AWS to validate the architecture. We used AWS Application Composer to design, build and deploy the architecture using a drag-and-drop visual interface, and in a very short time we built a working environment.

The architecture consists of the following key components and AWS Services:

An EventBridge custom event bus called marketplace
EventBridge rules that route events to 3rd parties, or from 3rd parties to other microservices
API Gateway to host an API for 3rd parties to call to provide fulfilment request status updates
Lambda functions to run microservices that are invoked via API Gateway, with data stored on DynamoDB
SNS to send email/SMS updates to users when a fulfilment request status has changed

We’ve defined a basic event, which could represent a new request from a specific user, that is being sent to 3rdparty1 for fulfilment:

{
      "Source": "com.marketplace.market",
      "EventBusName": "marketplace",
      "Detail": "{ \"user_id\": \"123456789\", \"request_status\": \"new\", \"3rdparty\": \"3rdparty1\", \"requests1\": \"books\", \"requests2\": \"pens\"}",
      "DetailType": "marketplace requests"
    }

Together with Application Composer to design this, we used AWS SAM to deploy it. SAM provides shorthand syntax to express functions, APIs, databases, and event source mappings. With just a few lines per resource, you can define the application you want and model it using YAML. During deployment, SAM transforms and expands the SAM syntax into AWS CloudFormation syntax, enabling you to build serverless applications faster. Check out this Github repo for the full working code you can deploy using SAM.

Let’s simulate a user creating a new request, by injecting that event, using the command aws events put-events --entries file://event.json (in reality, the User Portal will inject this event when a user creates a new request)
EventBridge routes that event to the ThirdPartyAPI Lambda function microservice, which saves that data to DynamoDB. We can use https://webhook.site/ as a way to simulate a 3rd party API that is hosted somewhere on the internet. We modify one of the rules to route to https://webhook.site/ using API Destinations. We can then use curl to send GET and POST requests to the Marketplace API, which simulates how a 3rd party will inject events back into the system.

Supported Workflows

Now that we had proved that we can get events into our internal architecture using APIs, we documented out the complete workflows that the solution should support.

This architecture will support the following three user flows: How a User registers on marketplace and places new fulfilment requests, how 3rd parties receive those requests and post their status back to the marketplace and users, and how 3rd parties can manage their details in a self-service manner.

Flow A: User Request to 3rd Parties

Users utilise the User Portal to register. During the registration process, documents like ID and banking proof could be uploaded to Amazon S3, which stores the documents redundantly across 3 Availability Zones (AZs) and removes the burden of managing a shared files system to store the documents. S3 can also serve as a way to analyse user documents using a serverless workflow and extract data and verify it against the student profile. E.g. automatically verify that the user ID number that the user inputted during the registration matches with the ID document. Ideally, the user portal is developed using AWS Amplify, which hosts the portal in S3 and removes the burden of managing servers, and provides out of the box CI/CD pipelines.
When the user creates a request, the portal emits an event to Amazon EventBridge. EventBridge is a serverless event bus, that is highly available and scalable. EventBridge will serve as a central point to receive all events: new request, request status changes, new 3rd party. Each Institution will have a rule configured on EventBridge, with their corresponding API details and authentication EventBridge rules with match the application event against the 3rd party.
EventBridge will send an HTTP POST with the request event to the corresponding 3rd party API, and will manage retries until the event is delivered. Each 3rd party will be required to host a standardized RESTful API. For those 3rd parties that don’t/cant host the standardized API, they can poll the Marketplace API, hosted on Amazon API Gateway, which is a serverless, managed, and scalable way to host APIs on AWS. A Lambda function will query the Marketplace DB and respond back with users requests. Each event can have multiple destinations, so in addition to sending to the 3rd party, the event can also be delivered to an S3 bucket that will serve as a data lake to store and analyse marketplace data.
The event from EventBridge can contain a link to the S3 bucket that hosts the users documents, so that 3rd parties can safely retrieve users documents, if required.

Flow B: 3rd Party Request Status shared back to the marketplace

When 3rd parties receive user fulfilment requests, they process them. Once the status of the request changes (in-progress, accept, decline), the status needs to be shared back to marketplace and users.
3rd parties will use a standardised RESTful API on marketplace to POST updates on applications. Amazon API Gateway will be used to host the API.
API Gateway will emit the status as an event to EventBridge. EventBridge rules will match against the event.
One rule will send the event to the User Portal.
Another rule will send the event to SNS, to be sent as an email/SMS to users.

Summary

Your functional and non-functional requirements will determine which communication pattern or API style you choose to use in your architecture. And as I have shown, you can use an event-driven architecture internally between your microservices, and APIs externally with external systems.

Choreography and Orchestration using AWS Serverless

Yusuf Mayet — Mon, 17 Apr 2023 12:54:45 GMT

I was recently working on a little serverless app - a telegram bot that checks our electricity supply company for the load shedding status, and sends out notifications. Its actually quite simple:

an AWS Lambda function routinely polls an API for the loadshedding stage
another Lambda function pulls the schedule for that stage
all the data is stored in DynamoDB
another Lambda function then routinely sends out telegram notifications based on the stage and the schedule.
EventBridge rules that schedules the functions to run at specific times

Altogether, it looks like this:

Telegram Bot Architecture

With my other bots (based on this sample), I’ve shown that AWS Serverless is the best place to run a bot, due to low cost and simplicity. So when building this loadshedding bot, I decided to avoid the Lambda monolith - a fat function containing all the above logic in a single function - and chose to split the functionality over 3 separate functions, essentially each its own microservice, albeit with a shared DynamoDB table. This way, I could have specific EventBridge rules to schedule the functions at run at the intervals I needed them to, and logging and debugging was easier with simpler functions. Deployment was really easy to using AWS SAM - all it takes is a sam build && sam deploy each time I need to make a change.

Now to the point I really want to discuss: how would I get these different Lambda functions to co-ordinate and work together? How would the schedule function know when the loadsheding stage had changed (which could change a few times a day), and how would the notification function know when either or both the stage and schedules had changed, in order to send out a new telegram notification? I also didn’t want the different functions to be constantly polling the DynamoDB table, as that would just increase costs for both Lambda and DynamoDB. There were two options available: Choreography and Orchestration. For my initial use-case, I was going after simplicity, so cheoragraphy made more sense. Later on, for a different use-case, I needed all the functions to run in a specific order, so I used Orchestration. Lets see how choreography and orchestration can be achieved on AWS.

Choreography

In choreography, every service works independently. There are no hard dependencies between them, and they are loosely coupled only through shared events. Each service listens for events that it’s interested in and does its own thing. This follows the event-driven paradigm.
And since Lambda itself is inherently event-driven, the choreography approach has become very popular in the serverless community.

When the loadshedding stage changed, the schedule function needed to be aware, and then the notification function needed to be run. I was after simplicity after-all, and since each function was updating DynamoDB, I used DynamoDB Streams to invoke the other Lambda functions,i.e. an update to DynamoDB emits an event to Lambda. With this, the Lambda functions dont need to poll DynamoDB - only when there is a update made, will the function be invoked. So any of the functions can update the loadshedding stage and schedule, and the other functions will then be notified of this change and process it. And with AWS SAM, integrating DynamoDB Streams with Lambda is really easy.

On the DynamoDB resource:

StreamSpecification:
        StreamViewType: NEW_AND_OLD_IMAGES

And on the Lambda function resource:

Events:
        Stream: 
          Type: DynamoDB

I could have used SQS as a queue to capture all of these events, or EventBridge as an event bus, but using DynamoDB streams was just the easiest in this case. Either way, I managed to send events between the different functions, and they acted on it when required. And for the most part, it worked really well. There were a few times that the upstream loadshedding API was down, which would have required me to write some custom retry logic, however I simply relied on the EventBridge schedule to call it again later.

Then in the last few days, I realised I needed a new capability: the electricity supply company was providing loadshedding updates on Twitter due to emergency failures, and I wanted the ability to invoke all the existing functions on an adhoc basis, but in a specific order: get the latest loadshedding stage, then get the schedule for that stage, and then post a notification. If I simply re-used the existing architecture, I would be invoking one lambda function from another, which is generally frowned upon. I would also need to make sure I build that custom retry logic to cater for any failures. This made me realise that for this specific use-case, I needed to orchestrate and coordinate the different functions to run in the order required.

Orchestration

In orchestration, there is a controller (the ‘orchestrator’) that controls the interaction between services. It dictates the control flow of the business logic and is responsible for making sure that everything happens on cue. This follows the request-response paradigm.

From telegram itself, using a telegram command, I wanted users to be able to instruct the bot to pull the last loadshedding info. It would then to call the stage API successfull, retry if required, then call the stage API, retry if required, then send the telegram notification. To orchestrate all of this, I used AWS Step Functions, which allowed me to build a serverless workflow, that makes it easy to take care of retries without custom code. I used the Workflow Studio to visually design this workflow using drag and drop:

Step Functions State Machine

and then exported the JSON definition into the AWS SAM template for deployment to AWS. Now I can make sure each function runs in order, with retries, with no custom code or no changes to the existing Lambda functions.

Hopefully this easy but practical example showed when to use choreography and orchestration as modes of interaction in a microservices architecture.

Deploying Ghost to Amazon ECS using a CI/CD Pipeline

Yusuf Mayet — Sat, 25 Mar 2023 12:39:00 GMT

Introduction

Ghost is one of the most popular CMS and blog platforms, rated 2nd on GitHub, and the biggest open source headless CMS project. Ghost is an open source Node.js JAMstack implementation, and can be used headless, or with the built-in editor. Ghost can be self-hosted in different forms: installed using the ghost CLI, or using a docker container.

This post is about deploying a Highly Available Ghost CMS using DevOps practices on AWS - its both a deep-dive into the different AWS DevOps services, and a tutorial to follow to get a working, highly available Ghost blog running on AWS ECS deployed through a CI/CD pipeline. It will include details on each service, how to integrate the different services togethers, using different methods: the cli, and the CDK. The code can be found on my github repo.

This post does not include any screenshots of the AWS Console, to avoid potentially out-dated content as the console updates over time.

Why use AWS

In my previous role, I wrote about how we built an API ecosystem that included a CI/CD pipeline. That pipeline was made up of various DevOps tools:

Git: Bitbucket
Pipeline: initially Bitbucket Pipelines for the pipeline. which we used for the first 6 months. But because we were paying for it with a credit card, and thats not how Corporates roll, we needed to switch over to a more long term solution. We then wanted to move over fully to Github for git and pipelines (because Github was included on the Microsoft EA), but at that point (early 2019), Github Actions was not yet available. We tried an on-prem Jenkins as well, but we did'nt like it. So we moved the pipelines over to Azure Pipelines, part of the Azure DevOps Services. This resulted in a different experience each time, as each service had support for a different set of features.
Container Image Repo: We started with dockerhub, then hosted in internally in Openshift Online. I recall us moving over to Azure Container Repos later on.
Managed Kubernetes: we started off with Openshift Online, then moved over to Pivotal PKS, hosted in Azure.
Others: Jira, Confluence, Postman and others were all SaaS Cloud based.

This resulted in a mixed bag of different tooling, each with different mechanisms for hosting, payment models, authentication, and logging. It was not very easy to orchestrate into a single cohesive unit. But thats exactly what AWS offers - a single place to consume multiple distinct services, with the ability to combine them together to solve a business problem. So in this tutorial we are going to leverage of only AWS native fully managed services, so you don’t need to provision, manage, and scale your own servers.

Architecture

We are going to build an AWS environment, using various AWS DevOps tools, that will allow developers to push code or new Ghost content to a git repo (CodeCommit), which will kick off a pipeline (CodePipeline) to build (CodeBuild) the code in a container, store the image in a container repo (ECR), and do a blue/green deploy (CodeDeploy) of the new image to a container orchestration system (ECS), fronted behind a load balancer (ALB). All logs, events and metrics for each service and event is stored centrally in a monitoring and logging tool (CloudWatch). A notification system (SNS) will send out email and IM alerts, and a BI service (QuickSight) allows us to build custom dashboards to measure developer productivity.

Using the best practices learnt from the AWS blog post Why Deployment Requirements are Important When Making Architectural Choices | Amazon Web Services we will segregate the architecture into the three lenses:

Build lens: the focus of this part of the architecture is on achieving deployability, with the objective to give the developers an easy-to-use, automated platform that builds, tests, and pushes their code into the different environments, in a repeatable way. Developers can push code changes more reliably and frequently, and the operations team can see greater stability because environments have standard configurations and rollback procedures are automated
Runtime lens: the focus here is on the users of the application and on maximizing their experience by making the application responsive and highly available.
Operate lens: the focus here is on achieving observability for the DevOps teams, allowing them to have complete visibility into each part of the architecture.

Therefore this tutorial has dedicated sections for each of the three lenses of the architecture. Because the DevOps toolset in the Build architecture requires the existence of the ALB and ECS services in the Run architecture, we will start first with the Run architecture, then move onto Build, then the Operate architecture.

This results in the following architecture:

Ghost architecture on Amazon ECS with CI/CD pipelines

The SDLC that the developers would follow would be as follows:

Developers do development locally, and run ghost locally in a docker container. They could even use Cloud9 as a cloud-hosted IDE. When they are ready, they push changes to git CodeCommit
CodeCommit stores the code in a managed private Git repo
CodePipeline detects a code commit (via a CloudWatch rule), and begins to orchestrate the build, test, and deploy stages of the pipeline
When a pipeline starts, SNS sends out notifications (emails, IM) for approval, and notifications
CodeBuild compiles the code, builds the container image and pushes it to ECR, and runs tests
ECR stores the container image
CodeDeploy deploys the image to ECS
Logs, metrics and events for each event in each stage is continuously sent to CloudWatch, where we can then measure performance and do error tracing.

We aim to make the architecture highly available, by including redundancy in the design, with an active-active configuration over multiple Availability Zones (where each AZ is one or more Data Centers) in a single AWS Region (each Region contains multiple AZs). In the architecture digram above I included 2 AZs for simplicity, but we will actually take advantage of 3 AZs because according to best practice ECS stretches sitself across 3 AZs
As far as the rest of the services are concerned, all the Build architecture services (CodePipeline, CodeBuild, CodeDeploy, CodeCommit), as well as Operate services (CloudWatch, Lambda, S3) are at the Regional level, and not specific to an AZ, which makes all of them natively highly available and redundant. So really the only services that are AZ specific, and which we need to design for High Availability across multiple AZs, is the Load Balancer and EFS file system. This design ensures that our application can survive the failure of an entire AZ.

In addition, you can easily include the CloudFront CDN to cache static content in S3, and bring content closer to the users for latency. You can also include AWS WAF to protect any attacks like SQL injections, and Shield Advanced for further DDoS protection (Shield Basic is included for free on all accounts).

Pre-requisites

You need to have an AWS account. You can start on the free tier if you don’t already have one, or just want to create a new separate account just for this tutorial, which makes it easy to simply delete and dispose of after we are complete so that you don’t accrue any surprise charges
Install and configure the AWS CLI. Or you may already have it setup, and perhaps already have an existing account. So if you have multiple profiles configured in the AWS CLI, you can simple use the --profile profilename flag to specify which account you using per CLI command.
Create an IAM user, with Admin rights, as you shouldn't use your root account. This is part of the Well Architected Framework's Security Pillar, of controlling access by giving users/resources just enough permissions as is needed for their role. Identity and permissions are really important to understand with AWS, from two perspectives:
IAM users: your IAM user will need the correct permissions to create resources and services. That’s why we giving your AIM user Admin rights. Not all users will in normal usage will need Admin rights, but just sufficient rights/permissions for their role.
roles: each service will need specific permissions to talk to other AWS services on your behalf. For this you will need to create specific roles for certain services that assigns permissions to that role, for the service to assume, to do certain things (e.g. CodeDeploy will need ALB permissions to update the ALB target groups, and ECS will need permissions to pull the image from ECR)
Once an IAM user is created (or even the root user if you insist), generate/add your SSH public key to the IAM user. I am using git over SSH, but you could use git over HTTPS, in which case you wont need to add your SSH keys, but just generate CodeCommit credentials. Or you could use Cloud9, which includes a local shell to run all your git commands. The important part here is that you should be able to use git from your local machine. Your ssh config should look something like this:

> cat ~/.ssh/config

For sake of simplicity, we will be creating all services in a single, specific region. In reality, this won’t always be the case, as services can be created in different regions. But for the purposes of this tutorial, choose one of the regions, based on the availability of the services we will be using, and stick with it. That means in the console you will make the sure the top right hand corner is always set to the chosen region, and the cli is always using that region (either set as default in the profile, or by setting --regions for each cli command). So for this tutorial, I have chosen Europe (London) eu-west-2.
On your local machine, you will need to install git, in order to clone the github repo, which contains all the files used in this tutorial. Lets do that now, by creating a directory, and cloning the repo:

mkdir ghost-aws
cd ghost-aws
git clone https://github.com/jojo786/ghost-aws-ecs-pipeline.git

And lastly, but not mandatory: on your local machine, you will need to install docker, in order to run Ghost locally to enable you to create/update posts and content, or make other changes to Ghost, and then push to git. The pipeline then picks up these changes, builds a new image, and pushes it to ECS in production. So once docker is running locally, you can run ghost like this:

> docker run --name ghost -p 80:2368 ghost:latest
Unable to find image 'ghost:latest' locally
latest: Pulling from library/ghostbf
5952930446: Already exist
s3afbf19eb36c: Pull complete
96ad7b697be4: Pull complete.....
[2020-08-13 13:12:19] INFO "GET /assets/built/casper.js?v=bc44039cde" 200 3ms
[2020-08-13 13:12:20] INFO "GET /favicon.ico" 200 4ms

Going to http://localhost/ in your browser will access that container, which the last few lines above showing the GETrequests.
You could choose to not do the port mapping from port on your local machine to 2368 in the container, in which case you can access ghost with http://localhost:2368/
The assumption I am making here is that ghost is storing all content (posts and images) locally in the file structure in the container, using SQLite, and not storing it in a DB. In order to preserve any content across docker images, you can mount a local folder to the image with a volume:

docker run -d --name ghost -p 4431:2368 \ --restart always \-v /home/user/gitwork/ghost-data-v1:/var/lib/ghost/content \ghost:latest

Run

In this part of the architecture we will focus on creating the infrastrcuture to run Ghost, in a highly available manner. In this section, we going to create a VPC with subnets, and security groups. We then going to reference the VPC and subnet IDs when we create an ALB Load Balancer, and ECS cluster.
We will also create an EFS file system to store Ghost data

VPC

Let's start off by creating the VPC to house the networking components, e.g. the subnets, which will be used when we deploy ECS and RDS.

Create a VPC with a 10.0.0.0/16 CIDR block.

aws ec2 create-vpc --cidr-block 10.0.0.0/16 --region eu-west-2

In the output that's returned, take note of the VPC ID.

{"Vpc": {"VpcId": "vpc-2f09a348",         ...    }}

Using the VPC ID from the previous step, create two subnets with 10.0.1.0/24 and 10.0.2.0/24 CIDR blocks in each availability zone. We will use these for the public subnets.

aws ec2 create-subnet --vpc-id vpc-2f09a348 \
--cidr-block 10.0.1.0/24 \
--availability-zone eu-west-2a \
--region eu-west-2

aws ec2 create-subnet --vpc-id vpc-2f09a348 --cidr-block 10.0.2.0/24 -availability-zone eu-west-2b

Create two additional subnets, that we will use for the private subnets with a 10.0.3.0/24 and 10.0.4.0/24 CIDR blocks in each availability zone.

aws ec2 create-subnet --vpc-id vpc-2f09a348 --cidr-block 10.0.3.0/24 -availability-zone eu-west-2a
aws ec2 create-subnet --vpc-id vpc-2f09a348 --cidr-block 10.0.4.0/24 -availability-zone eu-west-2b

After you've created the VPC and subnets, you can make the first two of the subnets public by attaching an Internet gateway to your VPC, creating a custom route table, and configuring routing for the subnet to the Internet gateway.

Create an Internet gateway.

aws ec2 create-internet-gateway --region eu-west-2

In the output that's returned, take note of the Internet gateway ID.

{"InternetGateway": {        ...        "InternetGatewayId": "igw-1ff7a07b",         ...    }}

Using the ID from the previous step, attach the Internet gateway to your VPC.

aws ec2 attach-internet-gateway --vpc-id vpc-2f09a348 --internet-gateway-id igw-1ff7a07b

Create a custom route table for your VPC.

aws ec2 create-route-table --vpc-id vpc-2f09a348

In the output that's returned, take note of the route table ID.

{"RouteTable": {        ...         "RouteTableId": "rtb-c1c8faa6",         ...    }}

Create a route in the route table that points all traffic (0.0.0.0/0) to the Internet gateway.

aws ec2 create-route --route-table-id rtb-c1c8faa6 --destination-cidr-block 0.0.0.0/0 --gateway-id igw-1ff7a07b

The route table is currently not associated with any subnet. You need to associate it with a subnet in your VPC so that traffic from that subnet is routed to the Internet gateway. First, use the describe-subnets command to get your subnet IDs. You can use the --filter option to return the subnets for your new VPC only, and the --query option to return only the subnet IDs and their CIDR blocks.

aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-2f09a348" --query 'Subnets[*].{ID:SubnetId,CIDR:CidrBlock}'

Look for the subnet-id associated with 10.0.1.0/24 and 10.0.2.0/24 CIDR block, and associate both with the route-table

aws ec2 associate-route-table  --subnet-id subnet-b46032ec --route-table-id rtb-c1c8faa6
aws ec2 associate-route-table  --subnet-id subnet-a46032fc --route-table-id rtb-c1c8faa6

Here we create and configure a security group (ghost-SG) to allow access to port 80

aws ec2 create-security-group --group-name ghost-SG \
--description "Ghost SG" \
 --vpc-id vpc-2f09a348 --region eu-west-2

Using the group-id of newly-created security group, we allow port 80 traffic, which will be attached to the ALB later on.

aws ec2 authorize-security-group-ingress --group-id sg-04a1a9b583455f819 --protocol tcp --port 80 --cidr 0.0.0.0/0 --region eu-west-2

Using the group-id of newly-created security group, we allow port 2368 traffic, which will be attached to the ALB later on.

aws ec2 authorize-security-group-ingress --group-id sg-04a1a9b583455f819 --protocol tcp --port 80 --cidr 0.0.0.0/0 --region eu-west-2

ELB

Elastic Load Balancing supports 3 types of LBs - we will be using an ALB. The ALB will be created across multiple AZs, using one public subnet from each AZ, creating a HA LB configuration.
We will first create the ALB, then create two target groups, then a listener that will bind the ALB to the target groups.

Lets create an internet-facing ALB, using the 2 "SubnetId"s of the public subnets, with a name of ecs-ghost.
Please note down the Arn of the created load balancer, as we will use it later

aws elbv2 create-load-balancer --name ecs-ghost \
--subnets subnet-65abf80c subnet-72f65b3e \
--security-group sg-04a1a9b583455f819 \
--region eu-west-2

Now we will create two target-groups that the ALB will send traffic to, using protocol HTTP and targets of type IP, referencing the "VpcId" from earlier.

aws elbv2 create-target-group --name ghostecstarget1 \--protocol HTTP --port 2368 \--target-type ip \--vpc-id vpc-9d98d1f5 --region eu-west-2

aws elbv2 create-target-group --name ghostecstarget2 \--protocol HTTP --port 2368 /--target-type ip --vpc-id vpc-9d98d1f5 \--region eu-west-2

It is these target groups that ECS services and tasks will bind to when new containers are launched. Please note down the Arn of the target groups, as we will use it later

And lastly, we will create an HTTP listerner on port 80 that references that ARN of the ALB and the ARN of the target group

aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:eu-west-2:723215012169:loadbalancer/app/ecs-ghost/7d9f0d07eab1bbec \
--protocol HTTP --port 80 --region eu-west-2 \
--default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:eu-west-2:723215012169:targetgroup/ghostecstarget1/4ff4d80c95688591

We now have an ALB created, that will be forwarding traffic to the (not as yet) targets registered in the target groups. These targets will be ECS tasks that we will create next. For now, you can get the DNSName of the ALB, to which you can test out in your browser.

aws elbv2 describe-load-balancers --region eu-west-2

{    "LoadBalancers": [        {            "LoadBalancerArn": "arn:aws:elasticloadbalancing:eu-west-2:132131232312:loadbalancer/app/ecs-ghost/7d9f0d07eab1bbec",            "DNSName": "ecs-ghost-862450218.eu-west-2.elb.amazonaws.com",            "CanonicalHostedZoneId": "ZHURV8PSTC4K8",            "CreatedTime": "2020-07-31T08:44:58.940000+00:00",            "LoadBalancerName": "ecs-ghost",            "Scheme": "internet-facing",            "VpcId": "vpc-9d98d1f5",            "State": {                "Code": "active"            },            "Type": "application",            "AvailabilityZones": [                {                    "ZoneName": "eu-west-2b",                    "SubnetId": "subnet-72f65b3e",                    "LoadBalancerAddresses": []                },                {                    "ZoneName": "eu-west-2a",                    "SubnetId": "subnet-cd50cfb7",                    "LoadBalancerAddresses": []                }            ],            "SecurityGroups": [                "sg-7dda4d19"            ],            "IpAddressType": "ipv4"        }    ]}

Above we can see that our ALB is stretched across 2 AZs, and protected by a security group, which will control traffic into the ALB.

EFS

We will create an EFS file system and made it accessible to the ECS tasks.

aws efs create-file-system \
--throughput-mode bursting \
--performance-mode generalPurpose \
--region eu-west-2

Using the file-system-id of the EFS file system above, we will apply a default policy. This policy contains a single rule, which denies any traffic that isn’t secure. The policy does not explicitly grant anyone the ability to mount the file system:

aws efs put-file-system-policy --file-system-id fs-b6702d47 \--region eu-west-2 --policy '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": {
                "AWS": "*"
            },
            "Action": "*",
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        }
    ]
}'

We now configure a security group (efs-SG) that allows in-bound access on port 2049 (the NFS protocol) from the ECS tasks. This will allow ECS to mount the EFS file system

aws ec2 create-security-group --group-name efs-SG \
--description "EFS SG" \
--vpc-id vpc-2f09a348 --region eu-west-2

Using the group-id of newly-created security group, we allow port 2049 (NFS) traffic from the Ghost SG ECS Ghost security group (created previously) referenced by --source-group

aws ec2 authorize-security-group-ingress \
--group-id sg-5345a435435435 \
--protocol tcp --port 2049 \
--source-group sg-04a1a9b583455f819 \
--region eu-west-2

create two mount targets in the two private subnets. For this you will need to refer to the subnet-ids of the private subnets 10.0.3.0/24 and 10.0.4.0/24 and the seceity group if of efs-SG

aws efs create-mount-target --file-system-id fs-b6702d47 \--subnet-id subnet-03957fb450fd018e6 \--security-groups sg-0cdf4abb0136e9ac4 \--region eu-west-2

aws efs create-access-point --file-system-id fs-b6702d47 \
--posix-user "Uid=1000,Gid=1000" \
--root-directory "Path=/ghost-data,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=755}" \
--region eu-west-2

Create an EFS Access Point that maps to the directory /ghost-data

aws efs create-access-point --file-system-id fs-b6702d47 \--posix-user "Uid=1000,Gid=1000" \--root-directory "Path=/ghost-data,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=755}" \--region eu-west-2

refer to Part 2 where we talk about the advantage of using access points.

ECS - Fargate

For the rest of the sections, when running the cli commands, please make sure you are in the folder ghost-aws-ecs-pipeline that we cloned, as it contains the files that the cli commands will use.

We will run Ghost as a docker container, so we will need a container platform that will take care of starting, running, monitoring, and moving the containers around. It will also need to make sure there is sufficient containers to cater for the usage - so as usage fluctuates, the container platform will adjust the number of running containers accordingly.
We are also want the Fargate deployment model (rather than EC2 deployment model), which means we dont have to manage any EC2 servers. This is essentially serverless containers. This tutorial will also work almost exactly for the EC2 deployment model if you so choose.
And perhaps most importantly, the container platform will make it easy for us to update new content/code to the containers, making it easy to deploy new changes, will ensuring zero downtime. For these reason, I chose Amazon Elastic Container Service (ECS), as it is very well integrated into the rest of the AWS ecosystem:

ECS integrates very well with ELB, and can automatically register new containers to the ELB as targets
ECS integrates very well with CloudWatch for logs and monitoring, and with the new Container Insights in CloudWatch, you get detailed stats across the ECS cluster, tasks, and services.
CodeDeploy integrates very neatly with ECS, and ECS natively understands Blue/Green deployments

I believe, at least for the reasons above, ECS makes a better choice than kubernetes with EKS for our purposes. However, in a follow-up tutorial, I aim to build a CI/CD pipeline for Ghost to EKS.

So at this point, we need to create a service-role, with the correct permissions, that ECS can assume, to talk to other AWS services on our behalf. We will need to create the taskExecRoleforECS role - you can read this and this for additional information.

Use the create-role command to create a role that allows ECS to assume

aws iam create-role --role-name ecsTaskExecutionRole \
--assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}'

Now attach the AmazonECSTaskExecutionRolePolicy permissions policy, that gives the role permission to use ECR

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy \
--role-name ecsTaskExecutionRole

Now that we have our EFS file system properly configured, we need to make our application aware of it. To do so, we are going to create an IAM role (ecsTaskRole) that grants permissions to map the EFS Access Point

aws iam create-role --role-name ecsTaskRole \
--assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}'

Create a policy, in a file called ecsTaskRolePolicy.json - making sure you properly configure your EFS file system ARN and your EFS Access Point ARN. This policy grants access to that specific access point we have created.

aws iam put-role-policy --role-name ecsTaskRole \
 --policy-name efs-ap-rw \
 --policy-document file://ecsTaskRolePolicy.json

Lets now go ahead and create an ECS cluster, named ghost, in our region (no need to specify AZs, as the cluster will stretch itself across 3 AZs)
aws ecs create-cluster --cluster-name ghost --region eu-west-2With this next command, we’re adding the Fargate capacity providers to our ECS Cluster. Let’s break down each parameter:

--capacity-providers: this is where we pass in our capacity providers that we want enabled on the cluster. Since we do not use EC2 backed ECS tasks, we don’t need to create a cluster capacity provider prior to this. With that said, there are only the two options when using Fargate.
--default-capacity-provider-strategy: this is setting a default strategy on the cluster; meaning, if a task or service gets deployed to the cluster without a strategy and launch type set, it will default to this.

You could choose to use Fargate Spot, and save upto 70% on costs.

aws ecs put-cluster-capacity-providers --cluster ghost \
--capacity-providers FARGATE \
--default-capacity-provider-strategy \
capacityProvider=FARGATE,weight=1,base=1

In order to run a container in ECS, we need to have 3 things:

we start with a task definition, which specifies the docker image, CPU, RAM, and a few other parameters, defined in a JSON file.
A task will then be a running instance of a task definition.
A service, which enables you to run and maintain a specified number of instances of a task definition simultaneously. The service is the interesting part here, which links to the ALB we created earlier, by registering itself as a target

This is the task definition we will use for Ghost, saved locally as taskdef.json which specifies:

the ARN of the taskExecRoleforECS that we created earlier
the docker image of ghost,
2368 as the port ghost listens on,
that we using FARGATE launch mode,
and CPU and RAM
role that grants permissions to map the EFS Access Point
directives to connect to the EFS Access Point we created above

"executionRoleArn": "arn:aws:iam::1111111111:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::1111111111:role/ecsTaskRole",
    "containerDefinitions": [
        {
            "name": "ghost",
            "image": "", 
            "essential": true,
            "portMappings": [
                {
                    "hostPort": 2368,
                    "protocol": "tcp",
                    "containerPort": 2368
                }
            ],
            "mountPoints": [
                {"containerPath": "/var/lib/ghost/content",
                 "sourceVolume": "efs-server-AP"
                }
            ]
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "volumes": [
        {"name": "efs-server-AP",
         "efsVolumeConfiguration": 
            {"fileSystemId": "fs-b6702d47",
                "transitEncryption": "ENABLED",
                "authorizationConfig": {
                    "accessPointId": "fsap-0e0254e906640402e",
                    "iam": "ENABLED"
             }
            }
        }
    ],
    "networkMode": "awsvpc",
    "cpu": "256",
    "memory": "512",

Now that we have created the taskdef.json file locally, we need to register is with ECS:

aws ecs register-task-definition \
--cli-input-json file://taskdef.json \
—region eu-west-2

We now create a service, defined in a JSON file as well, called create-service.json, which references:

the ghost-blog:1 task definition created above, with the version
the ghost ECS cluster
the ALB target group ARN (not to be confused with the ALB ARN)
the subnet IDs of the two public subnets
security group ID of the Ghost SG
and that CodeDeploy will be used to deploy it

{
    "taskDefinition": "ghost-blog:1",
    "cluster": "ghost",
    "loadBalancers": [
        {
            "targetGroupArn": "arn:aws:elasticloadbalancing:eu-west-2:1312321321:targetgroup/ghostecstarget1/4ff4d80c95688591",
            "containerName": "ghost",
            "containerPort": 2368
        }
    ],
    "desiredCount": 2,
    "launchType": "FARGATE",
    "schedulingStrategy": "REPLICA",
    "deploymentController": {
        "type": "CODE_DEPLOY"
    },
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [
                "subnet-65abf80c ",
                "subnet-72f65b3e",
                "subnet-cd50cfb7"
            ],
            "securityGroups": [
                "sg-7dda4d19"
            ],
            "assignPublicIp": "ENABLED"
        }
    }
}

And similarly to the task definition, we now need to create the service with ECS, which we will name ghost-blog, using the create-service.json file above. Please make sure you customise the configs in the file with the

aws ecs create-service --service-name ghost-blog \
--cli-input-json file://create-service.json \
--region eu-west-2

We now going to set-up Application Auto Scaling on the ECS service. In the autoScaling.json file, we specify target tracking scaling policy with a customized metric specification to our ghost-blog service in the ghost cluster. The policy keeps the average utilization of the service at 75 percent, with scale-out and scale-in cooldown periods of 60 seconds.

aws application-autoscaling put-scaling-policy --service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/ghost/ghost-blog \
--policy-name cpu75-target-tracking-scaling-policy --policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://autoScaling.json \
--region eu-west-2

As per Tutorial: Create a pipeline with an Amazon ECR source and ECS-to-CodeDeploy deployment, we will later update this line from

image": "ghost",

"to

"image": """,

because the pipeiline will use that placeholder as an output.

We now have linked up most of the different services up thus far:

The VPC, subnets and security groups created first were referenced when creating the ALB
The ALB listener referenced the ALB target groups
The ECS service referenced both the task definition and the ALB, and the vpc, subnets and security groups, and also references CodeDeploy, which we will configure in the next section

And now we create the ECR docker image repo, which will be used to store the images we build and push to ECS.

aws ecr create-repository --repository-name ghost —region eu-west-2

Build

The Build lens focusses on achieving deployability for the development team. In this section, we will discuss and deploy the AWS CI/CD tooling, using these AWS services:

CodeCommit
CodePipeline
CodeBuild
CodeDeploy

There are multiple ways of designing a pipeline. Usually the pipeline will promote code through the different environments: dev → test → QA → prod. There are also different types of deployments: rolling, canary, blue/green. In this post, we are deploying to production only, using a blue/green deployment method. This means that before the pipeline runs, blue is the current production environment. The pipeline will create a new replacement production environment, green, and switch all traffic over. Depending on the timeouts in the pipeline, it will wait a few minutes, and then delete the blue/original environment. So if you picked up a problem early on, during the waiting period, you can simply rollback to the already existing blue environment, which should be very quick. If you picked up a problem later, you do that from scratch with a new push to git.

You will notice that AWS has dedicated and separate services for each function. Other SaaS providers (GitHub, BitBucket, etc) provide a single pipeline service, which is usually just a docker container, in which you run commands for each phase of the SDLC: build, test, deploy, etc. But with AWS building each function/service separately, this allows you flexibility to mix and match services as you require, and not be tied in to a specific service. You could use any git repo, like GitHub and still use CodePipeline, GitHub integrated with CodeBuild, or alternatively use CodeCommit together with other pipeline tools like Jenkins.

This section is based off the official Tutorial: Creating a service using a blue/green deployment. CodeBuild, CodeDeploy and CodePipeline will follow a similiar structure of defining their parameters in a JSON file, then using the AWS CLI to create the config by referring to the file. This is unlike the previous cli commands, like creating the ALB, where all the parameters were specified on the command line as flags.

CodeCommit

Let’s start off by creating a CodeCommit git repo. Using the cli, its done with

aws codecommit create-repository --repository-name ghost-blog —region eu-west-2

where ghost-blog is the name of your git repo
On your local machine, you should be able to clone that newly created repo:

git clone ssh://git-codecommit.eu-west-2.amazonaws.com/v1/repos/ghost-blog

CodeBuild

We start off by creating a role with the required permissions for CodeBuild to assume

aws iam create-role --role-name CodeBuildServiceRole \
--assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codebuild.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}'

We now attach policies that provides CodeBuild with the appropriate permissions that is stored in CodeBuildServiceRolePolicyDoc.json file

aws iam put-role-policy --role-name CodeBuildServiceRole \
--policy-name CodeBuildServiceRolePolicy \
--policy-document file://CodeBuildServiceRolePolicyDoc.json

We then create the build project, defined in a JSON file, called codebuild.json, which references:

the CodeCommit repo
the role create above
the buildspec.yml file will be used to specify the build commands

{
            "name": "ghost",
            "source": {
                "type": "CODECOMMIT",
                "location": "git-codecommit.eu-west-2.amazonaws.com/v1/repos/ghost-blog",
                "gitCloneDepth": 1,
                "gitSubmodulesConfig": {
                    "fetchSubmodules": false
                },
                "insecureSsl": false
            },
            "secondarySources": [],
            "sourceVersion": "refs/heads/master",
            "secondarySourceVersions": [],
            "artifacts": {
                "type": "NO_ARTIFACTS"
            },
            "secondaryArtifacts": [],
            "cache": {
                "type": "NO_CACHE"
            },
            "environment": {
                "type": "LINUX_CONTAINER",
                "image": "aws/codebuild/standard:4.0",
                "computeType": "BUILD_GENERAL1_SMALL",
                "environmentVariables": [],
                "privilegedMode": true,
                "imagePullCredentialsType": "CODEBUILD"
            },
            "serviceRole": "arn:aws:iam::123213213213:role/CodeBuildServiceRole",
            "timeoutInMinutes": 60,
            "queuedTimeoutInMinutes": 480,
            "encryptionKey": "arn:aws:kms:eu-west-2:123213213213:alias/aws/s3",
            "tags": [],
            "badgeEnabled": false,
            "logsConfig": {
                "cloudWatchLogs": {
                    "status": "ENABLED"
                },
                "s3Logs": {
                    "status": "DISABLED",
                    "encryptionDisabled": false
                }
            }
        }
        ```

Using that file, we create a build project with CodeBuild:

aws codebuild create-project \
--cli-input-json file://codebuild.json —region eu-west-2

CodeDeploy

We start off by creating a role with the required permissions for CodeDeploy to assume

aws iam create-role --role-name CodeDeployECSRole \
--assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codedeploy.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}'

Now attach the AWSCodeDeployRoleForECS permissions policy, that provides CodeDeploy service wide access to perform an ECS blue/green deployment on your behalf

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS \
--role-name CodeDeployECSRole

We then need to create a CodeDeploy application, then a deployment group to point to ECS:
The application named, ghost-ecs, will use a JSON file, called codedeploy.json

aws deploy create-application --cli-input-json \
file://codedeploy.json —region eu-west-2

We now define the deployment group in a JSON file, which refers to:

the application ghost-ecs defined above
the service role ARN created above
the ALB target groups, created in the previous section - this will allow CodeDeploy to update the target groups with the tasks it deploys
the ALB listener ARN
the ECS cluster and service name created in the previous section
the appspec.yaml file -which is by CodeDEploy to determine the ECS task definition

{
        "applicationName": "ghost-ecs",
        "deploymentGroupName": "ghost-ecs",
        "deploymentConfigName": "CodeDeployDefault.ECSAllAtOnce",
        "serviceRoleArn": "arn:aws:iam::132112111:role/CodeDeployECSRole",
        "triggerConfigurations": [],
        "alarmConfiguration": {
            "enabled": false,
            "ignorePollAlarmFailure": false,
            "alarms": []
        },
        "deploymentStyle": {
            "deploymentType": "BLUE_GREEN",
            "deploymentOption": "WITH_TRAFFIC_CONTROL"
        },
        "blueGreenDeploymentConfiguration": {
            "terminateBlueInstancesOnDeploymentSuccess": {
                "action": "TERMINATE",
                "terminationWaitTimeInMinutes": 5
            },
            "deploymentReadyOption": {
                "actionOnTimeout": "CONTINUE_DEPLOYMENT",
                "waitTimeInMinutes": 0
            }
        },
        "loadBalancerInfo": {
            "targetGroupPairInfoList": [
                {
                    "targetGroups": [
                        {
                            "name": "ghostecstarget1"
                        },
                        {
                            "name": "ghostecstarget2"
                        }
                    ],
                    "prodTrafficRoute": {
                        "listenerArns": [
                            "arn:aws:elasticloadbalancing:eu-west-2:723215012169:listener/app/ecs-ghost/7d9f0d07eab1bbec/722f9969e2b206fc"
                        ]
                    }
                }
            ]
        },
        "ecsServices": [
            {
                "serviceName": "ghost-blog",
                "clusterName": "ghost"
            }
        ]
    }
    ```

Using this definition in the file, we create the deployment group:

aws deploy create-deployment-group --cli-input-json \
file://codedeploymentgroup.json —region eu-west-2

CodePipeline

There are many ways to design and define how the pipeline works. I have chosen to split it into two separate pipelines:

the first is a 2-stage pipeline that:
Source: gets the code from CodeCommit,
Build: builds the image, and pushes it to ECR
the second pipeline is also a 2-stage pipeline:
Source: gets the image from ECR. Includes another source in CodeCommit for the appspec and buildspec files
Deploy: deploys to ECS

There are many ways to optimise and extend the pipeline:

We could have chosen to have a single 3 stage pipeline consisting of source, build and deploy stages
We can extend it with a Test stage in CodeDeploy
Add a manual approval step
Add image scanning
Add testing, with Test Reports, and Load/Performance Tests
Add static code analysis and profiling

Similiar to what we did with above, we start first with the IAM roles that provide CodePipe with appropriate access to other AWS resources

aws iam create-role --role-name CodeBuildServiceRole \
--assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
            "Service": "codepipline.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}'

We now attach policies that provides CodePipeline with the appropriate permissions that is stored in codePipelineServiceRolePolicyDoc.json file
aws iam put-role-policy --role-name CodePipelineServiceRole \
--policy-name CodeBuildServiceRolePolicy \
--policy-document file://codePipelineServiceRolePolicyDoc.json
We again define the parameters for our pipeline in a JSON file, then use the cli to create it. Lets start with the first pipeline, which is focussed on building the Ghost image and pushing it to ECR. The main variables defined in codepipeline-1-commit-build-ecr.json are

The ARN of the CodePipelineServiceRole
The CodeCommit repo name

So using this command, we can create the first pipeline

aws codepipeline create-pipeline --cli-input-json \
file://codepipeline-1-commit-build-ecr.json —region eu-west-2

And similarly, we will create the second pipeline using the parameters specified in codepipeline-2-deploy-ecs.json

The ARN of the CodePipelineServiceRole
The CodeCommit repo name
The ECR repo name
various CodeDeploy settings

aws codepipeline create-pipeline --cli-input-json \
file://codepipeline-2-deploy-ecs.json —region eu-west-2

Operate

The Operate lens focusses on achieving Observability, which is about receiving feedback on the state of the application, as well as stats on developer productivity.

CloudWatch

CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices. The metrics include utilization for resources such as CPU, memory, disk, and network. The metrics are available in CloudWatch automatic dashboards. To enable Container Insights on our cluster, run this command

aws ecs update-cluster-settings --cluster ghost \
--settings name=containerInsights,value=enabled —region eu-west-2

You can now head over to CloudWatch in the AWS console to view Container Insights, as well as to create dashboards that contain our resources, including ALB and VPC Flow Logs.

QuickSight

You can follow this post on how to use Amazon QuickSight as an analytics tool to measure CI/CD metrics.

Enable Container Insights

Execute the following command to enable Container Insights on the cluster. This command will enable Service and Cluster level insights on your ECS cluster

aws ecs update-cluster-settings --cluster ${clustername}  --settings name=containerInsights,value=enabled --region ${AWS_REGION}

Further Learning

As a base for this post, I used a combination of the following key tutorials in order to build the complete solution of a pipeline for Ghost to ECS:

A very good place to start learning ECS is with Amazon ECS Workshop for AWS Fargate which is a hands-on workshop to start building on ECS. Also check out Containers From The Couch.
Tutorial: Create a pipeline with an Amazon ECR source and ECS-to-CodeDeploy deployment - this includes only the 2nd half of the pipeline that we require, from pushing a new image to ECR manually using docker, which deploys to ECS. We therefore needed to still build the 1st half of the pipeline from CodeCommit to CodeBuild to ECR.
Other usefull posts and tutorials included Using Amazon ECS with Codebuild, Deploying Docker Containers Using an AWS CodePipeline for DevOps, CodeDeploy Tutorial, Tutorial: Creating a service using a blue/green deployment and Build a Continuous Delivery Pipeline for Your Container Images with Amazon ECR as Source
Monitoring and management with Amazon QuickSight and Athena in your CI/CD pipeline
There is a very good AWS Blog post and accompanying GitHub reference architecture that is based off CloudFormation scripts, for building a CI/CD pipeline from GitHub (as opposed to CodeCommit which we are using in this post) to ECS.
You can check out these courses on Cloud Academy and Linux Academy

Whats also worth mentioning, that highlights the rapid pace of innovation at AWS, is the newly launched AWS Copilot, which is a new cli to deploy to ECS, including creation of the required pipelines.

Book Review: multiple - Mark Schwartz

Yusuf Mayet — Wed, 27 Apr 2022 14:11:42 GMT

Mark Schwartz is the author of four books, which are primarily about Digital Transformation. I've listed them in order of publication - most recent at the bottom:

He shares his lessons from transforming a part of the US government. He talks how he has used Agile, DevOps and Cloud to transform. He writes from a practitioners point of view, not a theorist. His style is writing is playful and snarky – you’ll love it!

As an Enterprise Strategist at AWS, he also blogs about helping customers transform in the cloud.

You don’t need to read all of his books, or even in the order he wrote him, but it will help. He has some common threads spread across his books that will help understand the topic under discussion. If you working in a legacy org, struggling to over bureaucracy, these are required reading.

Hiring in Big Tech - and what legacy orgs need to improve

Yusuf Mayet — Wed, 27 Apr 2022 13:36:29 GMT

[This is my personal opinion and does not reflect the views of my employer. All information mentioned here is publicly available]

I've recently (circa early 2022) interviewed at AWS, GCP and Azure. This post serves as a review of sorts, containing the things I enjoyed about the different hiring processes. At times I will compare and contrast some differences of each. Specifically, I will call out how this differs to the hiring process in traditional legacy organisations. My main focus here is to discuss and comment on the Candidate Experience, which is the perception that the job seeker has of the employer, and will highlight just how much effort Big Tech companies have invested in their hiring processes. I think the targeted audience of this post are those, like me, who have spent most of their career with traditional legacy organisations, and will find this Big Tech (FAANG/MANGA/whatever) hiring process to be very different and refreshing. This post may be of interest to those thinking about interviewing for a Solutions Architect role at AWS, GCP and/or Azure. Please note: because these Big Tech companies have (mostly) standardised interview processes across all their roles, I suppose this post could very well apply to any role (Software Development, Product Managers, Program Managers, etc) across Amazon, Google or Microsoft.

Role Names

Ok, let’s get the role names out of the way. In AWS, this role is called Solutions Architect. GCP calls it Customer Engineer, while Azure calls it a Solution Area Specialist. In summary, these are customer facing roles, part of the Cloud Sales function, where the candidate is responsible for the technical engagement with the customer - helping the customer understand how the Cloud can help them transform their business. These roles requires technical ability to understand and propose solutions for customers. Coding is not necessarily required, but it is a hands-on role.

Starting the hiring process

Each of these three interviews started out with a recruiter reaching out to me on LinkedIn. They had looked at my profile, and thought I was a good fit for a particular role. They asked me to read over the role, and to apply if I was interested. So the first call out I can make is that I had to apply myself for the role - I was not auto-applied to the role. So whether you found the role and applied yourself, or got contacted by a recruiter, it probably makes no material difference to the interview process. You wont get any benefit or bonus points, as you will be equally evaluated just like any other candidate.

Recruiter roles

It’s worth calling out the different recruitment people you will interact with during the interview process, besides the actual interviewers themselves. I counted three distinct roles:

Talent Finder (Talent Acquisition / Sourcer)

This persons role is the understand the job and role they’re scouting for, and look for candidates. They reach out to you, sell the role to you and convince you enough so that you apply, and then move on. You won’t hear from them again after your apply. But if you applied yourself, you wont meet this person at all.

Recruiter

Once you've applied, the Recruiter will be your guide through the entire process. They will be the one that progresses you along through the process, gives you regular status updates, provides you with the intermediate and final decisions, and works on the compensation package with you. They share with you details of the interviews: who the interviewers are, what type of interview (technical, sales, etc), and even the principles or competencies that will be tested in each interview. My recruiter setup a call the day before my on-site interview to brief me on all of the above. Their role in the process is to make you successful.

Scheduler

In order to book the main interview loop, the Recruiter will bring in a Scheduler, who’s role is to book all the interviews with you and all the interviewers in their calendars. You won’t hear from them again after the calendars are booked. In Amazon, the recruiter that reached out to me was both my Recruiter and Talent Finder. While with Microsoft and Google, each of the three roles were distinct people. In some cases, the Talent Finders may be from external 3rd party recruitment agencies, while the other two roles are always internal staff.

In my experience, the Talent Finder is an important and key differentiator to improving the Candidate Experience: they realise that this is a competitive market, and they all facing a talent shortage. Amazons vacancies page lists 54 000 jobs and Facebook, Alphabet, Microsoft, Apple and Amazon have put more than 1 million people to work between 2000 and 2018. The market clearly belongs to job seekers, and candidates have plenty of options. And thats why the role of the Talent Finder is key - its to connect with passive, happily-employed candidates, sell them the role and convince them to apply. In my experience, the Talent Finders really understood the roles, and sold them well enough to get me to apply. It really increased my Candidate Experience. Contrast this to a legacy orgs, where their hiring practices belie the fact that they are struggling to find talent. Their job boards and application process is a mess, and HR Partners and recruitment agencies barely understand the role they looking for. Case in point: an agency cold-called me a few years ago, and in trying to understand if I am a fit for this role, rattled off a list of technologies listed on the JD, and one of the questions they posed was "do I have the Oracle?..." In my experience, using a Recruitment agency always results in a lower Candidate Experience, because as a candidate, you don’t have direct contact with the Recruiter. You have to rely on the agency person to pass messages between you and the actual recruiter. This broken telephone results in delays and missed messages, and gives a very impersonal feel to the whole process.

Hiring System

Lets speak about the job boards and systems used throughout the process. In Legacy orgs, they just use the module that comes with their ERP/HR system, which if it Oracle E-Business Suite, is pretty bad. Their job boards are hard to navigate, the JD specs are too long and just copy/pasted without considering formatting. It takes too long to register and apply, as they mostly ask for too much information. It’s hard for candidates to setup alerts/reminders for roles they interested in. In general, it’s poorly designed software that makes it hard to find a job that you looking for. And legacy orgs only use that system for the candidate application process - its hardly updated again during interviews or even after. I recall a few times where I've been successful and already started a job, but the software does not reflect that.
On the other hand, Big Tech has really slick job boards, that make it easy to find a role you looking for. And as you will find throughout the recruitment process, their internal hiring system is used to schedule the interviews and send out calendar invites, assign different roles for each interviewer, track the overall hiring process, and capture interview feedback. Interviewers will tell you they are capturing your answers during an interview, and after each phase and when you get your final feedback, the recruiter will give your feedback based on what was written in the system. By documenting and collecting the interview feedback, it allows them to make objective and data-driven hiring decisions.

Hiring principles - how they hire

Microsoft, Google and Amazon now ask mostly behavioural-based questions based on the concept that past behaviour predicts future performance. Google has noted how they have stopped asking brain teasers, as well as Microsoft.
The entire hiring process is designed to determine if the candidate and company are a good fit for each other. Amazon, Google and Microsoft each have distinct principles that they use to measure this ‘fit’ . Amazon uses their Leadership Principles, Google looks for “Googlyness”, based on their vision statement and philosophy, while Microsoft uses their Core Competencies and Cultural Attributes. These values, principles and competencies are core to the interview - they take them very seriously. And even though there are differences between these companies, and different words and terms are used across the three companies, once you've understood them, you will find that that are quite similar.
So thats the ‘what’, i.e. thats what candidates are measured against. ‘How’ they are measured is done mostly using behavioural questions. E.g. if Amazon wants to ask you a question about a particular Leadership Principle, they may ask “Tell me about a time when you were faced with a problem that had a number of possible solutions.”

These behavioural questions require candidates to prepare beforehand - I’ve included some invaluable links in the Resources section below. Amazon looks for responses using the STAR method, while Microsoft has a small addition with the STARL method. For Googles hypothetical-based questions, Jeff H Sipe suggests using this framework:

Framework for answering hypothetical questions

The Interview Construct

At each of these companies, the interviews are very structured. It has two major parts to it:
1) phone screening calls - shorter in duration, about 30 mins or less
2) multiple on-site interviews - longer in duration, between 45 and 60 mins each

Whats key is that the process is transparent, and is not a surprise. Each of these companies share details about their hiring process. Candidates know exactly what they in for, how long it will take, and therefore will know when it’s completed so they can expect feedback.

Phone Screens

You start off with the phone screens, and if you pass/progress those, you get invited for the on-site interviews.
The phone screens can consist of a few. There is the first one with HR (perhaps with Talent Finder or the Recruiter), then maybe one with the hiring manager, and perhaps a technical screening call.
Amazon usually has a recruiter phone screen, and then a technical phone screen. Google has a recruiter phone screen, and then a call with the hiring manager, which they name Champion Calls. Here the Hiring Manager sells the role to you, tells you about the role and the team, and answers any questions you may have. Microsoft also has a recruiter phone screen and a hiring manager call.

On-site interviews

If you pass those screening calls, you are invited to the actual on-site interviews - which are now remote/online/virtual due to COVID19. These consist of 3 to 5 interviews, between 45 mins and an hour each. The focus is on structure, that allows them to collect sufficient data. This is a huge time investment - in most cases, candidates will take a days leave to attend the on-site interviews. With the online option, it can be scheduled over a few days.

Amazon has 5 on-site interviews, each 1 hour long. They are mostly behavioural based questions. One or two of the interviews will also contain a technical portion. The recruiter would have specified this before-hand.
Google schedules it over 3 interviews, either 45 mins or an hour long. Their interview types are a bit more specific, with dedicated names: GCA (General Cognitive Ability), G/L (Googleyness & Leadership), and RRK (Role-Related Knowledge). These are mostly behavioural-based, technical, as well as hypothetical questions. Google is unique amongst the three of using hypothetical questions. Microsoft has 3 on-site interviews, also mostly behavioural and technical questions.

Each interviewer will be assigned different principles, competencies and/or technical/functional areas. These are assigned prior to the interviews, based on the requirements of the role.

What you will realise, across all roles: Account Managers, Product Managers, Technical Program Managers, Solutions Architects and Software Development, is that the majority of the interviews are assessed using behavioural questions. So even though you will be assessed for your functional / technical skills, the majority of the interviews are trying to assess your culture fit.

Now compare this to more traditional companies, where the interviews are less structured, more arbitrary, and does not have a structured way to collect and use the data. In most legacy orgs that I interviewed at, it was mostly a single interview, or at maximum two interviews that defined the hiring process.
Legacy orgs, even if they have defined interview structure, hardly share this info beforehand. And they almost never give candidates feedback. Which means at some point the recruiter / agency ghosts you, without letting you know if they have decided to not proceed or not.

The hiring decision-making process

Once each interviewer has inputted their feedback into the hiring system, they now have all the data from all the interviews on which to base the hiring decision. Google talks about how this is done:

Hiring committees are built into the Google hiring process. Research shows that teams with divergent opinions can make better, less-biased decisions, something that's key to selecting a great hire. At Google a hiring manager can say "no" to any candidate, but if they find someone they want to hire, they alone cannot give a final “yes” — they must pass the candidate onto a hiring committee for review. The idea is that a single hiring manager isn’t necessarily motivated to wait or search for the very best candidate. Especially as a search drags on, the hiring manager is eager to fill the position. But making a quick hire to satisfy a short-term need is not a long-term solution for an organization. Hiring committees help select candidates who will be good for Google, who will grow with the company, and perhaps take on future roles that don’t exist today.

Amazon talks about the Bar Raisers role:

A Bar Raiser is an interviewer at Amazon who is brought into the hiring process to be an objective third party. By bringing in somebody who’s not associated with the team, the best long-term hiring decisions are made

Basically, it’s not just the hiring managers decision to hire. A group of people get together to look at the data collected, and they make an objective decision to hire or not. Contrast this to legacy orgs, where it’s mostly the hiring manager, with a little feedback from HR, that makes the hiring decision.

Feedback

From a candidates perspective, perhaps one of the most memorable aspects of the candidate experience is if, and how, feedback was delivered - and this is where Big Tech differs from legacy orgs. My Big Tech recruiters kept me in the loop throughout the process, and even when they didn’t have feedback, they would send a mail to just say “we’re still busy, hang in there”. And once a decision was made, recruiters called me immediately to let me know the outcome. And even when the outcome was not positive, they still called, and let me know where I was strong, and where they felt I could improve.

Legacy orgs, on the other hand, almost never give feedback of any sort. Even once another candidate has been selected, they will not even inform you of the outcome. And this really leads to a poor candidate experience.

Compensation - not only cash

Big Tech companies offer equity - stocks, options, and others - as part of compensation. Amazon likes to think “big picture”. Employees are considered part-owners of the company and they want you to think about what your total compensation is projected to be at the end of 4 years. Amazon, in its wisdom, seeks to, among other things, align the interests of the company with its employees. To do this, part of every employee’s compensation is partially in RSUs. This form of compensation incentives employees to think like owners and do what is best for the company in the short and long term.
Compared to legacy orgs, I've found that these stocks/shares really form an important part of your compensation, and as the stock price increase, so does your compensation.

Hiring has evolved over time

I’ve previously written a few posts about hiring - which to be fair, where mostly all rants. From when I started off early in my career, to when I was a hiring manager and shared my learnings from hiring. My conclusion is that Hiring across the board is flawed, but I think Big Tech is perhaps better than legacy orgs. I also think that there are a lot more resources to assist candidates now, than there was a few years ago. My first ever interview, while still in University, was with Microsoft, which was for a graduate-level role. I first interviewed with Google in 2011, when they still asked brain teaser questions. My first interview with Amazon was in 2017. I was unsuccessful at all of those initially, but a few years later, with the help of the different resources available and some experience, I’ve managed to have some success.

Candidate Resources

The biggest learnings I had was from the frequent content that guys like Jeff H Sipe and Dan Croitor produce. I’ve listed below some other very usefull interview resources.

Amazon

Google

Microsoft

Summary

From a nomenclature point of view: Interviews are one part of the Recruitment or hiring process. I have used the words Hiring and Recruiting interchangeably, but it seems there is a difference:

Hiring happens when you need to fill a role. Recruiting is the process of attracting top talent to your organization.

I think Big Tech understand Recruitment, while legacy orgs only do hiring! Legacy orgs don’t invest in attracting talent, especially passive candidates. They don’t have structured interviews, with a focus on collecting data. They don’t make objective, group-based hiring decisions. Their compensation packages, lacking stock options, are not competitive. And they don’t give necessary feedback to candidates. Overall, these legacy orgs hiring practices have a pretty low Candidate Experience.

WFH Setup

Yusuf Mayet — Sat, 27 Nov 2021 13:30:34 GMT

This is an update to my desktop setup, especially now that I am working from home.

WFH Desktop setup

Mac Mini
Dell 4P317Q - 43 inch 4K monitor
Logitech MX Ergo mouse
SAMSON Go Mic Direct - USB Microphone
External speakers
ErgoTherapy chair

The biggest learning for me was to be comfortable, as I've suffered from lower back and shoulder pain for the last few years. But now with WFH, and the low level of mobility caused by not walking to meetings, etc, has made it worse. So I've made a few changes, based on these guidelines. I had an ergonomist come out to assess me, and he made a few suggestions which has really helped.

I work exclusively on the desktop computer (Mac Mini), and not on the laptop at all. The mouse and keyboard are wireless, so I can bring it forward, and let my arms rest on the chair arms, while my wrists are on the desk, close to my stomach. I had something under my feet to raise my knees, which has reduced lower back pain. And most importantly, I stretch and exercise often, to prevent my back from locking up. Push-ups, sit-ups, and planks are my daily routing, as well as 10 minutes on the stationery bike.

Another thing that made it really comfortable to work is not wearing headphones. While they do give the best hearing and speaking experience, they hurt your ears if you wearing it for hours on end. Even though I have a very comfortable Jabra Evolve, with leader padding over the ear covering, it meant you get stuffy, and are stuck/tethered to your desk. With a good external mic, and speakers, I can move around the room, even while on meetings.

Running a Telegram bot on AWS Lambda

Yusuf Mayet — Wed, 03 Nov 2021 13:30:09 GMT

Telegram is a common messaging application, with over 500 million users. I've been using it daily since 2015, and I prefer it over WhatsApp due to many reasons: native PC/Mac applications (and not just a mobile app), cloud-based storage, the ability to run it with the same identity concurrently on multiple devices, an open API, and open-sourced clients. It has supported bots since 2015, which are third-party applications that run inside Telegram. Users can interact with bots by sending them messages, commands and inline requests. You control your bots using HTTPS requests to the Bot API.

A few months ago, I wrote TelegramTasweerBot to control images and videos sent on groups and channels. It can be used to control Personally identifiable information (PII), specifically images of people. The bot deletes any image that contains a face, and now deletes videos and emojis of faces as well.

Instead of using the Telegram Bot API myself, I used python-telegram-bot (PTB) which is a wrapper/framework in python that makes its easier to write Telegram bots in python.

Hosting and running bots

Telegram bots can be run in two ways:

Polling: periodically connects to Telegram's servers to check for new updates
Webhook: Whenever a new update for your bot arrives, Telegram sends that update to your specified URL.

For the first few months, I ran the bot using polling (see the code). It ran as a python app, running on an AWS EC2 instance. The challenge here is make sure the bot is always running (I used monit), making sure the instance is always running, and deploying new changes to the code. You have to think about these things, and solve them. And the fact that the instance is running 24x7 means you paying for that all the time, even if the bot is not used. Thats why I wanted to move it to AWS Lambda - serverless computing, where your code only runs when its needed, and you only pay when it runs. And because you dont have to worry about the instance where the code runs, and the uptime and maintenance of it, means you dont have to think or worry about that Ops stuff. And the most important thing for me: deployment of the code is super easy To make it even easier, I've used AWS SAM to define and deploy the bot - SAM takes care of deploying all the AWS services required: API Gateway, Lambda, DynamoDB, and all logs in CloudWatch.

So since last week, TelegramTasweerBot now can be run on AWS Lambda as well: this is the Lambda handler, and this is the SAM template that defines it all. The architecture looks like this:

TelegramTasweerBot - Architecture

all it takes to build and deploy this in your own AWS account with your own bot token, is to clone/download this locally, run sam build && sam deploy and thats it - check out the README for more details.

I can now get stats on how many times the bot was invoked, how long it ran, errors, etc, all from CloudWatch:

Monitoring

Optimising Cost and Performance

Lambda allows you to specify a specific amount of memory to a Lambda function, which dictates how it performance, and thus the cost as well. Whats interesting is there is a balance between performance and cost: if you allocate less RAM, the price will be cheaper, but it will run slower, and actually costs more, and the flip-side: it you allocate mote memory, it might be cheaper to run because it will run faster, even though the increased memory costs more. So there is actually a sweet spot you can target: the right amount of memory that makes your Lambda function run faster and cheaper. To help figure out what that sweet spot is, I used AWS Lambda Power Tuning to test different configurations, measure the running times, and calculate the cost of each run.

AWS Lambda Power Tuning Results

It takes just a few minutes to setup and run, and the results show that my bot runs fastest and cheapest with 1024MB of memory.

Working with different git branches

Yusuf Mayet — Tue, 02 Nov 2021 15:57:38 GMT

This is more of a note to my future self.

I was working on an wiki, hosted on MkDocs, that you contribute via making changes and pushing to GitLab. The process to submit a change is:

fork the repo on gitlab
clone the repo locally: git clone ...
Create a branch: git checkout -b improv/new-demo
Make your changes
Add and commit local changes
git add docs/
git commit -m "improv: Add new demo"
Push the changes to GitLab: git push -u origin improv/add-demo-X
Create a new merge request from GitLab

Thats fairly straight-forward. However, I was making two separate changes at the same time, each on different branches, and kinda messed up switching between the different branches. I also had to pull a fresh copy from the origin before making the second branch, to ensure I was working on the latest. This is how I set myself straight.

Some concepts first

There is a git server, perhaps using Github, GitLab, AWS CodeCommit, etc, hosting a project want to contribute to. This is the called the remoterepository. If you were the owner, you could simply make the change directly. E.g in GitHub, you could click the 'Edit this file button', but probably you would:

clone the repo locally (on your laptop, or a cloud IDE), using the SSH or HTTPS links: git clone git@github.com:jojo786/TelegramTasweerBot.git. In this case, you cloned the default branch (main or master) branch.
make the change, and commit: git commit -am "My Change". Usually you would make a new branch, which is a pointer to your changes.
push it back: git push. The git push command is used to upload local repository content to a remote repository. Pushing is how you transfer commits from your local repository to a remote repo. You did'nt create another branch first, so you have pushed straight to master. The full command is actually git push origin main, which origin is the name of the remote repo, and main is the branch.

However, if you wanted to make a change to another repo, owned by someone else, you cant just push back to it. Rather, you fork the repo, then go through a process called a Pull Request or Merge Request (usually shortened to PR), in which you submit your changes, and they can merge it in. They are the same thing: a means of pulling changes from another branch or fork into your branch and merging the changes with your existing code. So the process would be:

Fork the repo
clone your fork locally
create a branch: git branch crazy-experiment or git switch -c crazy-experiment
Make changes, then git commit -am "message..."
push to the new branch on the remote: git push -u origin crazy-experiment. This creates crazy-experiment branch on the remote
Then make a Pull Request to ask the owners of the repo to merge your code in.
Once the PR is approved, the crazy-experiment branch is deleted both locally and on the remote, and just the main branch is left.

But there are now three different copies of the data:

the upstream (GitLab, GitHub, etc)
my fork on the remote - which is now my remote origin
and my local clone on my laptop

But at any time, the upstream maybe be including other changes, that my fork and local clone is not aware of.

So back to my situation. I have cloned the remote, made a new branch, pushed my changes, and submitted a PR. I then needed to submit another separate/independent change, so I needed to create a new branch. However, my clone of the repo locally, which is from my fork, might have been stale, as changes could have happened to the upstream since I cloned it. I could have simply deleted my local folder, forked and cloned again, to get the freshest copy. But I wanted to see what it takes to get my local repo, fork and upstream in sync. There are different ways to sync a fork, but I preferred this one:

After you fork and clone, add the original origin as another repo called upstream: git remote add upstream https://github.com/aws-samples/serverless-patterns
Make your changes, push, and create a PR
After your pull request has been accepted into the upstream repo: Switch to your local main branch: git checkout main
Pull changes that occurred in the upstream repo: git fetch upstream
Merge the upstream main branch with your local main branch: git merge upstream/main main
Push changes from you local repo to the remote origin repo: git push origin main

Hope you enjoyed it|!

How South Africa’s need for energy efficiency can lead us to cloud computing

Yusuf Mayet — Wed, 20 Oct 2021 16:01:04 GMT

Cloud computing is not a new concept. Amazon Web Services (AWS) was the first to offer cloud services when it launched its Amazon Simple Storage Service (S3) storage service in 2006. Gartner estimates that the Cloud computing market could be worth over $300 billion globally in 2021. Cloud has spawned a new generation of born-in-the-cloud companies, like Netflix, that have been built primarily in the cloud. Large enterprises like Time is going all in and migrating five of its global datacenters to AWS.. But there are many other companies that are still hesitant about moving to the cloud, because they not sure of the value that they can gain from it.

In this post, I will attempt to use an analogy to highlight the power of AWS, that will hopefully allow customers to see how AWS can be leveraged to transform their business.

The Power of Analogies

Analogies are persuasive and powerful. They use our familiarity with objects and models we know and understand, to force us to make a mental leap towards something new. They are used in many places including physics, and business, where the power of analogy is used when defining strategy. There are existing cloud analogies that perhaps some of us are familiar with, and the most common one being about the power grid:

"Cloud computing is like plugging into a central power grid instead of generating your own power."

That’s easy to understand - as a company, you need electricity to run your business, so instead of generating your own power, you simply get power from the power grid. Generating your own power would entail a host of things, e.g. buying expensive power equipment, hiring and training power experts - all of which are mostly likely not related to your actual business. But getting power from the power grid would allow you to focus on your business, and lets the power utility company worry about those other things. So to close that analogy then - cloud computing allows you to simply focus on your business because you can rely on the IT resources and expertise provided by your cloud provider. These resources include compute, storage, database, AI/ML and many others - that the cloud provider focusses on to build and support, and you just use it as and when required. Sounds simple, but somehow it does not resonate with many businesses. For a few reasons I think. Firstly, many businesses don't recall a time when you had to generate your own power. Roughly from about the 1870s, we have relied on the power grid to supply us with electricity. Secondly, there aren't really any other choices of getting power, except from the power grid. In South Africa, Eskom is the only option, so business haven't had to think about where to get power from, they can only use Eskom. And thirdly, and perhaps almost ironically, Cloud providers are now in fact are generating their own power: Amazon is going to make its own electricity in SA, then run it across Eskom’s grid. So businesses in South Africa don't identify with that analogy. I therefore wanted to use an analogy that we in South Africa will hopefully identify with, and use it as a vehicle to talk about how AWS can transform your business.

Gas Geysers as an analogy for cloud computing consumption

In our domestic lives, most of rely on a some type of an appliance to supply us with hot water. In South Africa, most of us use geysers - which is a tank type of water heater, that is a cylindrical container that keeps water continuously hot and ready to use, which is usually it's powered by electricity. Traditional electric geysers have always been the default choice for most South Africans. Until recently, electricity was cheap, but considering South Africa's energy crisis, I recently embarked on a journey to switch away from electric storage geysers towards something more energy-efficient. In the end, I chose an instant gas water geyser to replace our electric geyser at home, for two main benefits:

This is an instant geyser - which means water is heated only when the tap is opened, using less energy. Unlike storage geysers which use electric elements to constantly heat the water in the storage tank.
It uses gas, specifically LPG gas (which has a calorific value 2.5 times higher than main gas, so more heat is produced), instead of electricity.

And that brings me to my main point of this post: I propose that cloud computing is like using an instant gas geyser, that saves you cost and makes you energy efficient - two things which are important for South Africans. I want to expound on a few specific points to make it more clear.

For the most part, this analogy will also apply to other water-heating appliances like solar geysers and heat pumps.

Right sizing

Companies really only have two choices when sizing and purchasing infrastructure: either under-size (sized for normal usage) or over-size (sized for peak usage). If you care about making sure you service is always available, you will over-size, causing your infrastructure to sit idle for most of the time. With our geyser analogy, this is exactly what we have been doing. Electric storage geysers have been heating water constantly, because we don’t know when we will need the hot water. This means that hot water was constantly been heated, and only used a few times a day, obviously resulting is wasting energy. Additionally, the size of the geyser is an important consideration for much water the electric storage geyser can store. Your decision will be based on the size of the family. Getting in wrong will mean a cold shower on certain days where everyone is showering at the same time.

With the instant gas geyser, you only heat the water when you need it, and save on wasteful and unnecessary heating. In addition, since it heats water on-demand, you won’t ‘run out’ of hot water - the amount of hot water you have access to is not limited to the size of the geyser.

With AWS, you don't ever need to oversize. Firstly, you don't need to purchase upfront, rather you can use a service only when you need it, with no long term commitments. This allows you to right size, and choose just the capacity you need. Secondly, AWS allows you to achieve elasticity and scale, so you only use capacity when you need it.

Save Costs

Electric storage geysers are wasteful, which means your electricity bill is higher than it needs to be, because it is constantly heating the water, even when you don’t need it. On the other hand, instant gas geysers only use gas when hot water is needed, resulting in you only paying for gas based on actual usage.

Similarly, by purchasing servers up front, and keeping them on all the time, to only use it when you service peaks, is wasteful. With AWS, you only pay for what you use, thereby reducing cost. You don't pay to keep servers idle.

Resistant to failure

Power outages, which are frequent in South Africa due to load-shedding, renders your electric geyser unable to heat water. Instant gas geysers just need a small battery to ignite the flame, and are thus resistant to power outages.

Similarly, on-premise infrastructure are susceptible to outages caused by power failures, even when they have batteries and generators. AWS however, can afford to dedicate huge amount of resources and controls to making AWS regions resistant to power failures. At re:Invent 2020, AWS described how they have designed custom power supplies to protect their infrastructure. It's very difficult, for even large corporations, to have the expertise and resources to build such highly resilient infrastructure that is impervious to power failures.

Operational change

Typically, electrical storage geysers are set to heat the water at about 65 degrees Celsius, or even higher. However, people only need hot water at about 45 to 48 degrees - anything hotter will scald. This means that when you having a shower, you have to mix the water, by opening both the hot and cold water taps to get the temperature to somewhere between 45 to 48. So why waste so much energy to heat water to a temperature higher than is actually needed? This is done for a few reasons: as you open the hot tap and hot water flows out of the geyser, cold water starts flowing into the geyser, dropping its actual temperature. This wont have a real effect until you have a lot more cold water in the geyser in short space of time, e.g. a shower longer than about 10 minutes, before the geyser has time to heat it back up to 65. So the higher temperature off-sets the cold water flowing in. Clearly electric storage geysers are very un-efficient, as it needs to over-heat the water, because water can only be heated in its storage compartment.

Instant gas geysers are very different, because they don’t store water, so it can heat water only when its required, in a Just-in-time manner. Therefore, there is no need to over-heat the water above the actual required temperature of 45 to 48. So when you open the hot water tap, the water comes out at the desired temperature, and there is no need to mix in cold water.

So the way you use and operate an electric gas geyser is very different to an instant gas geysers. Users of electric geysers need to know that the water coming out of the hot tap is going to be a scalding 65 degrees, requiring them to mix in cold water to get it to a comfortable 45. On the other hand, users of instant gas geysers just open the hot tap, and don't need to mix in cold water. They also don't have to worry about running out of hot water, whereas users of electric storage geysers will run out of hot water after a long shower, and will need to wait a while for the electric geyser to heat up the water again.

So besides the obvious cost savings, they require different ways of usage. The same operational differences can be called out between on-premise infrastructure and Cloud. With on-premise to need to over-size your purchase (the same as heating up water higher than the actual temperature), while with cloud, it will scale and give you the capacity where and when required. So you don’t need to plan up-front for 3 to 5 years, rather just use the cloud for what you need. So in our analogy, the cloud gives you the water at the perfect temperature you need, while on-premise infrastructure requires you to mix in cold water, and worrying about running out of cold water. These different operational models require a significant change in mindset to take maximum benefit.

Understand your drivers for change

When I migrated from the electric storage geyser to the instant gas geyser, I knew exactly why I was doing it: to save costs, and be more energy efficient. In the same way, when you move to the cloud, you need to be sure what is driving you there. Gas geysers cost more than electric geysers, so I knew up front that even though the initial layout will cost more, I will save over time.
Most customers moving to AWS initially start talking about cost savings, but the number one reason customers choose to move to the cloud is for the agility they gain. So customers moving to AWS can manage budget expectations in discussions between the CIO and CFO.

Summary

Analogies aren't perfect, and can't cover every use-case. Our analogy can't cover the full ambit of cloud offerings from AWS: over 200 services, which include over 15 types of purpose-built databases, and the global footprint of 25 regions, one of which was the Africa (Cape Town) region, launched in April 2020.

However it’s still a usefull analogy, especially for someone trying to understand the benefit cloud computing could bring to their company. And, to complete our analogy, we can say that cloud computing is like switching from an electric storage geyser to an instant gas geyser – it will help be more energy-efficient, save costs, and be resistant to power failures!

IaC or: How I Learned to Stop Worrying and Love AWS (sic)

Yusuf Mayet — Mon, 08 Mar 2021 14:49:43 GMT

Lets say you building something in AWS. A typical architecture - perhaps a 3-tier web-app: Load-balancer, app, and database. Maybe you want to use a container platform to run the app, so you will need ECS or EKS. Besides that, you will need VPCs, and subnets, and CloudWatch...‌‌This is how the build typically goes...lets go!

You used the console to create your container stack

You created a VPC
and an ALB, EFS
Then the pipeline: CodeCommit, Build, Deploy
then a cluster in ECS
then had to figure out IAM policies

...and now you worried. Will you remember what you did, in what order? Will you be able to replicate it? Did you just adopt a new pet?

So you got tired of ClickOps, and tried the CLI

aws ec2 create create-subnet --vpc-id vpc-2f09a348 \
--cidr-block 10.0.1.0/24 \
--availability-zone eu-west-2a \
--region eu-west-2

aws elbv2 create-load-balancer --name ecs-ghost \
--subnets subnet-65abf80c subnet-72f65b3e \
--security-group sg-04a1a9b583455f819 \
--region eu-west-2

aws ecs create-cluster --cluster-name ghost --region eu-west-2

aws codepipeline create-pipeline --cli-input-json \
file://codepipeline-2-deploy-ecs.json —region eu-west-2

...and you still worried. There were many parameters that needed to be customised. Its gonna take time to rebuild this...still looks like a pet? The Ops team love it...but the Devs dont even know where to begin.

So you got smarter and tried the SDK...

import boto3

ec2 = boto3.resource('ec2')
# create VPC
vpc = ec2.create_vpc(CidrBlock='172.16.0.0/16')

import boto3
# Create ECS client
try:
  ecs_client = boto3.client('ecs')
  
  response = ecs_client.create_cluster(
    clusterName='CLUSTER_NAME'
  )
  print(response)

except BaseException as exe:
    print(exe)

So your developers are happy, because it looks like code...but they dont know what this VPC thing is, and how to link it to the cluster....so you still worried

Then you heard about CloudFormation!

After you got over the YAML shock, it looked awesome. You could model all your infrastructure, put them in stacks, and modify the stacks. So you started writing some CFN templates, and have lots of cattle!

To make it easier to write CFN for new resources, you can use ConsoleRecorder
But what about the existing resources you’ve built by hand? Use Former2

But....your devs hated this YAML stuff. They need to know about VPCs, and Internet Gateways....and they wanted was a cluster to run their code.

Then you discovered CDK, and learned to stop worrying

This single CDK construct will build a VPC, ALB, ECS Cluster, IAM roles, pulls the container, and run it:

lbecs = aws_ecs_patterns.ApplicationLoadBalancedFargateService(self, 'ECS_Fargate', 
            memory_limit_mib = 1024,
            cpu = 512,
            desired_count = 1,
            task_image_options = ghost_task_image,
        )

Your devs love it, because it looks like code. They dont need to know about VPCs, and all that stuff. They just get what they want.
How does it do that...this is how?

AWS CloudFormation was used to reliably and consistently provision the resources they needed, but the team discovered an unmet need. Although AWS CloudFormation was the right tool for provisioning resources, the team felt that using YAML/JSON was not the right approach for describing their system. AWS CloudFormation templates are basically a flat list of resources and their configuration. They don’t include tools for expressing abstract ideas such as “the injection pipeline” or the “storage layer” or a “dynamodb scanner.”

Serverless Microservice Patterns for AWS
https://aws.amazon.com/blogs/opensource/the-cdk-patterns-open-source-journey/
Architecture and the AWS CDK
Dont say “CFN is not a real language”
Can be used with SAM
cdktf, which is Cloud Development Kit for Terraform.
cdk8s, which is Cloud Development Kit for Kubernetes.
CDK for Azure prototype
CDK alternatives:
Pulumi for multi-cloud (uses Terraform behind the scenes.)
Troposphere

So why CDK?

Looks like code
Lives with your code, so its treated like code: git, IDE, linting, unit-tests, etc
Higher-level Constructs means it creates full services with sensible defaults, correct IAM privileges, and best practices
CDK == Well Architected (or close enough)
CDK Patterns takes it even further, building full use cases

Alternate Ending: AWS Copilot

https://aws.amazon.com/blogs/containers/introducing-aws-copilot/

The Most Under-Appreciated AWS Service

Yusuf Mayet — Wed, 06 Jan 2021 13:50:37 GMT

I knew this was gonna be a tough ask: to identify the most under-appreciated AWS, and to expound why that is. Tough for a few reasons I think, mostly because we can’t even agree on how many services AWS has. Officially AWS sticks to their usual “over 175 services”, probably because that would mean admitting to the exact number of services with horrible names. Cloud Pegboard has it pegged (see what I did there?) at 283 services, 219 total IAM prefixes, 191 product listing counts, 182 namespaces and 159 URLs. Pick whichever you want, thats a lot. But those are the just services you can touch and feel, and not the underlying stuff. E.g. all those services are the stuff that companies and developers want (read need), like the multiple types of compute or 15 choices of databases services, and even things that developers say they need (but not really) like kubernetes.

But like I said, those all the services that are listed in the catalogue, but not the hidden, below-the-line muck that supports all those hundreds of services. And thats where I think the under-appreciated ones are. But to truly appreciate it, we need to step back and see how large corporates (which I enjoy picking on, like the fat kid in the park) have been building products and the underlying infrastructure until now prior to the cloud (and the head-in-the-sand denialists - I’m looking most you at you telcos). I could easily summarise any project I worked on a until a few years ago, which typically went like this: Business launches a project to build some fancy product, writes a bunch of requirements docs for a few months that nobody reads (or read the wrong version, oops!), which leads to a specification document from IT that asks for budget to purchase the required infrastructure. That Bill of Materials (BOM) usually includes racks, network switches, cabling, servers, load-balancers, storage, cards, software licences and a bunch of consultants that will take months to procure, install and setup. Somewhere near the end (after the project is already a month late because they needed to re-install due to the driver not being compatible with the OS), the consultants declare that the application is installed and working, ready for UAT. When you look at the connection details for the URL, you realise its something like http://10.1.34.58 — WTF! Which usually means the project manager forget to include the request to the network team for a public IP and DNS name, which in-turn needed the security team to first approve the firewall request to open the ports to the Internet. And thats the point of my long-winded rambling: on-prem networking is hard!. It's hard because many different teams besides the developers (and consultants who are quoting for more because the scope has increased), like the networking and security teams, have to be involved to make that application available on the network. And that’s just the tip of the iceberg: I recall multiple times where it took weeks just to request new IP address space, then routing those IPs out on the MPLS network, then trying to convince the security team to open a bunch of ports so the front-end can speak to the back-end. Or where return-path traffic is routed is completely different, causing latency and dropped packets. And thats even before you wanted to install kubernetes on-prem!

Yet in the cloud with AWS, you can spin up instances that come with public IPs and DNS names that are immediately routable on on the Internet. Or making your RDS DB instance open on the internet with a single config (don't do it, even though you can) You don't need to even think about it - it just works. And thats why I think networking is the most under-appreciated AWS service. Sure, networking is a broad term, and AWS does have networking services but they don't typically look like the ones that you would bought on-prem. With AWS, you don't get to choose the top-of-rack and end-of-row switches (ha, I knew that CCNA 10 years ago would come in handy), and decide which SFP you need (and inevitably choose the wrong one). The networking team dont get to log onto the router and break MPLS. And talking of security, where is the all-mighty firewall - the most important device stopping you from getting hacked? Until 2 months ago (November 2020 when they released AWS Network Firewall) AWS did not offer a service with with words “Firewall” in the name. For that, you use security groups as a virtual firewall, and thats why old-school security companies were banging on about the lack of a dedicated firewall service on AWS. So how does AWS do all of this networking magic: it uses custom-made routers with software-defined networking:

We run our own custom-made routers, made to our specifications, and we have our own protocol-development team,” Hamilton said. “It was cost that caused us to head down our own path, and though there’s a big cost (improvement) . . . the biggest gain is in reliability.” This custom-made gear “has one requirement, from us, and we show judgment and keep it simple. As fun as it would be to have a lot of tricky features, we just don’t do it, because we want it to be reliable.

and thats what allows you to create VPCs, subnets, security groups, attach elastic IPs, and still log all traffic - literally with just a few commands or clicks. Theres no way this could be done on-prem with a lot of effort.

And thats the muck that AWS builds, so you don't have to. And until now, you didn't even know it.

Keeping active

Yusuf Mayet — Tue, 05 Jan 2021 16:30:40 GMT

2023

Strava summary for 2023

2021

Total Active Days: 98 (up by 11 days)

Total Time: 30 hours (up 1 hour)

Total distance: 420km (up 120 hours)

2020

These are my Strava stats for 2020:

Summary of 2020:

87 total active days
Most active day: Wednesday
Total time: 29 hours
Distance: 293 km
Longest activty 13km

Lets compare 2020 vs 2019:

With 20 hours in total, and Sunday being my most active day. Of the 144kms, most coming from the bike

So I improved in 2020 by:

9 hours more
149km more

2019

Check out 2019

Total Time: 20 hours (up 1 hour)

Total distance: 144km (up 120 hours)

Reading List - Updated

Yusuf Mayet — Tue, 05 Jan 2021 16:00:25 GMT

I used to be a voracious reader through primary school and high school, but it kinda stopped somewhere during campus days. I have rekindled it recently, especially with the ability to read on-line (my preference is Google Play Books on Android), although I have also still do read soft-cover books, that I have on my desk at work, or in the lounge at home.

It seems that my preference is to read the samples on Google Play, which is about 4 chapters or so of the book, then I purchase it if I like it. I sometimes will read samples of different books, and then purchase the one that really caught my attention the most, and come back to the others later.

This is brilliant article that talks about the way you read books and what is says about your intelligence.

My tastes

This includes tech, motivation (but mostly around tech), fiction, and wildlife. I really enjoy Scott Burken, and I have a few of his books below. I discovered few of the others when I read Burkens Myth of Innovation, and he referred to other books on innovation, so I have a few of those in the list as well.
I enjoyed Gene Kim's Project Phoenix, so I have his other DevOps handbook in the list.

I've got most of IT Revolutions books:

Up-coming

This is the list of books I have in my Google Play bookstore, and have read most of the samples for, that I will choose from next to purchase. My two most favorite authors are Gene Kim and Scott Berkun, so I'm checking out their next books.

The Year Without Pants: WordPress.com and the Future of Work
Confessions of a Public Speaker – Scott Berkun
Lean Enterprise, How High Performance Organizations Innovate at Scale – Jez Humble
The Lean Startup, How Constant Innovation Creates Radically Successful Businesses – Eric Ries
The Startup Way, How Entrepreneurial Management Transforms Culture and Drives Growth – Eric Ries
Extreme Ownership, How U.S. Navy SEALs Lead and Win – Jocko Willink, Leif Babin

I will be looking to add these that Gene Kim recommends, as well as these recommended ones by Chris Richardson, as well the Gregor Hohpe's Architects Path

I'm looking at these DDD-related ones as well:

Completed

This is the list of books I've read: (most recent at the top)

2021

Developer Hegemony - Erik Dietrich. Now available for free on his Hit Subscribe group
A Reader's Guide to A Seat at the Table - Mark Schwartz
A Seat at the Table - Mark Schwartz

2020

2019

ACCELERATE: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations: Coming off the work that Gene Kim did on the State of DevOps Reports each year, this details all the research they did, that shows that DevOps practices will improve your bottom line.
Build APIs You Won't Hate - Phil Sturgeon: Its not a normal book to read, but has code samples to follow along with, and parts that you really need to think about and apply to your API development. I wrote a review based on the main points that really helped me with my API developement journey.
My Family and Other Animals - Gerald Durrel. How a young boy, living on a Greek island, picked up his love for animals, and became a world famous zooligst. I've also read the other books in this series
Zoo Quest: The Adventures of a Young Naturalist - David Attenborough. The Zoo Quest series is about how a young David went on animal collecting expeditions to different countries.
The Trouble with Africa – Vic Guhrs
Cry of the Kalahari - Mark Owens
The Unicorn Project

2018

Huawei: Leadership, Culture, Connectivity. I have worked with Huawei so many times during my career and was always fascinated by their ethos. This provides a glimpse into what makes them tick.
DevOps for Digital Leaders, Reignite Business with a Modern DevOps-Enabled Software Factory

2017

Fumbling the Future: How Xerox Invented, then Ignored the Personal Computer about real Innovation, leadership, structures and politics of companies, and how it influenced them. I wrote a post that can act as a book review
Myths of Innovation - Scott Berkun

2016

Project Phoenix - Gene Kim: This is the IT version of The Goal. It accurately describes what IT Operations people do. This also answers the question "what do you do" when asked by your grandmother
Crucial Conversations

2015

Making Things Happen - Scott Berkun
Critical Chain - Eliyahu M. Goldratt
Its Not Luck - Eliyahu M. Goldratt