18 min readPart 2/4 7 February 2020

Before any trip, better check your equipment

Tools, AWS services and third-party frameworks that speed up different parts of the process.

AWS CodePipeline
AWS Amplify
Serverless Framework
Terraform

Karol Junde CTO & Co-Founder

After reading the first part of this series, you should have a general idea about the business and technical goals that have a direct impact on the choice of event-based architecture. This time, I am going to cover tools, AWS services and third-party frameworks that speed up different parts of the process. I will also show you some values we got and problems we came across.

Equipment

Basically,we’ve collected tools that are used during event-based scenarios (serverless for some people) in one bag called “BaseGear”:

Serverless Framework
Terraform/Terragrunt
CI/CD pipeline for serverless/terragrunt deployments — designed and implemented by our teammate, Magda
Amplify
Vue.js

Serverless Framework

We have already featured the core functionality of the Serverless Framework in one of our previous articles, so we’ll skip the introductions here. However, let me give you some advice that will, hopefully, be valuable to you:

We always build separate folders per microservice, per needed lambda layer, and create a new “serverless project” within that. This particular project had the following structure:

~/common/bango

.├── customer-bango-backend├── customer-bango-algorithm-equipment-layer├── customer-bango-aws-custom-logging-layer├── customer-bango-aws-ssm-cache-layer├── customer-bango-frontend└── customer-bango-image-terragrunt

~/common/bango/customer-bango-backend

.├── README.md├── algorithms-service├── analysis-service├── system-health-service└── terraform

~/common/bango/customer-bango-backend/algorithms-service

.├── files├── functions├── messages├── node_modules├── package-lock.json├── package.json├── requirements.txt├── resources├── serverless.yml└── tests

We try to separate databases, s3 buckets definitions from serverless stack especially on “prod” environment. We try to put such services together with the rest. We call them static resources and define them via Terraform code. Serverless Framework has got CloudFormation behind the scenes, so it manages your resources completely. Keep in mind that one simple mistake can accidentally delete serverless resources, like DynamoDB tables. So, either set the DeletionPolicy to Retain or move the definition outside the stack, like we do. Better safe than sorry…
Use exclude (exclude and include allow you to define globs that will be excluded from the resulting artifact) to keep control over the packaging process and pack only the code you need.

exclude:   - node_modules/**   - .requirements/**   - env/**   - README.md   - package.json   - package-lock.json   - requirements.txt   - ssm_cache/**

DynamoDB Streams definition and its serverless.yml problem

layers:   - arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-custom-logging-${self:custom.stage}:${self:custom.logging-layer-version}events:  - stream: arn:aws:dynamodb:#{AWS::Region}:#{AWS::AccountId}:table/${self:custom.algorithms-tablename}/stream/2019-11-20T16:15:11.647  - stream:    type: dynamodb    arn:      Fn::GetAtt:        - AlgorithmsMetadataTable        - StreamArn    batchWindow: 3    startingPosition: LATEST    enabled: True

Aws documentation:

Lambda polls shards in your DynamoDB Stream for records at a base rate of 4 times per second. When records are available, Lambda invokes your function and waits for the result. If processing succeeds, Lambda resumes polling until it receives more records.

To avoid less efficient synchronous Lambda function invocations set batchSize or batchWindow parameter. The batch size for Lambda configures the limit parameter in the GetRecords API. DynamoDB Streams will return up to that many records if they are available in the buffer, whereas the batchWindow property specifies a maximum amount of time to wait before triggering a Lambda invocation with a batch of records.

The issue with the configuration in serverless.yml is that when you use the schema presented below, it doesn’t respect the Lambda function trigger configuration seen in the AWS Console.

events:  - stream:    type: dynamodb    arn:      Fn::GetAtt: [AlgorithmsMetadataTable, StreamArn]  - stream:    type: dynamodb    arn:      Fn::GetAtt:        - AlgorithmsMetadataTable        - StreamArn    batchWindow: 3    maximumRetryAttempts: 10    startingPosition: LATEST    enabled: True

We suggest moving the configuration of the DynamoDB Stream and the DynamoDB Table to a separate file in another directory. It simply helps keep the structure tidy and organized:

resources: - ${file(resources/new_algorithms_tablename.yml)}

Lambda function trigger configuration in AWS console:

Lambda layers definition - each layer we use for Lambda functions is also defined inside the serverless file regulated with proper version variable:

layers:  - arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-custom-logging-${self:custom.stage}:${self:custom.logging-layer-version}  - arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-ssm-cache-${self:custom.stage}:${self:custom.ssm-cache-version}

Generally, I haven’t come across any problems with layers defined this way. However, I did notice one thing about CloudFormation containing layers’ parameter values. Let’s use real case as an example:

We published changes for selected group of Lambda functions via CloudFormation stack. Below, I pasted only the necessary part of parameters section:

The point is that we’ve automated the whole process. We coded a function for getting the latest, available Lambda version, and then publishing a newer one accordingly. However, there is no information in AWS documentation that you can only collect last 50 versions (we had over 60 that time) via a single API call.

his led to a problem, because we thought we were publishing new function version (via def publish_new_version(self, uploadId) presented below) and re-pointing the prod alias to it. So, as you can see below, we had a situation where AWS Console version greater than 270 was available, but get_latest_published was returning only 50 items with the maximum value of 185, and that was a value the alias has been pointed to. Moreover, it has generated additional problems, because each time we’ve updated the layer version, it wasn’t seen by the Lambda function.

To sum it up, if you’ve exceeded 50 function versions, combined with aliasing, and your Lambda layers/functions updates workflows are being done via CloudFormation/Boto3, use the NextMarker.

versionsPublished = ['$LATEST', '9', '13', '17', '21', '25', '29', '33', '37', '41','45', '49', '53', '57', '61', '65','69', '73', '77', '81', '85','89', '93', '97', '101', '102', '105', '109', '113', '117', '121','125', '129', '130','131', '132', '133', '137', '141', '145', '149','153', '157', '161', '165', '169', '173', '177', '181', '185']

Getting the latest version (with NewMarker key) of Lambda function, will allow you to collect more than 50 versions (especially the most recently published one) via Lambda API:

Publish new version of Lambda function:

def publish_new_version(self, uploadId):    try:        response = self.client.publish_version(            FunctionName=self.functionName,            Description=uploadId        )        if response['ResponseMetadata']['HTTPStatusCode'] == 201:            logging.info("----Successfully published new algorithm version: {0}".format(response['Version']))            return {                "statusCode": 201,                "body": json.dumps(response),                "version": response['Version'],                "published": True            }        else:            logging.critical("----Failed to publish new algorithm version")            return {                "statusCode": response['ResponseMetadata']['HTTPStatusCode'],                "body": json.dumps(response),                "version": None,                "published": False            }    except ClientError as err:        logging.critical("----Client error: {0}".format(err))        logging.critical("----HTTP code: {0}".format(err.response['ResponseMetadata']['HTTPStatusCode']))        return {            "statusCode": 400,            "body": json.dumps(response),            "version": None,            "published": False        }

Definitely use serverless plugins. You can add them by sls plugin install -n serverless-PLUGIN_NAME or for some, like step functions: npm install serverless-step-functions

They pretty much save you development time. Just keep in mind to follow their repositories’ issues. You can see some we’ve been using below:

plugins: - serverless-python-requirements - serverless-plugin-aws-alerts - serverless-pseudo-parameters - serverless-plugin-lambda-dead-letter - serverless-step-functions

serverless-python-requirements — a Serverless v1.x plugin to automatically bundle dependencies from requirements.txt and make them available in your PYTHON PATH.
serverless-plugin-aws-alerts — adds CloudWatch alarms to functions. Below, a simple example, showing how to use a default alarm functionErrors:

objects-processor:  name: ${self:custom.app}-${self:custom.service_acronym}-objects-processor  runtime: python3.6  memorySize: 256  reservedConcurrency: 20  alarms: # Merged with function alarms    - functionErrors

There are more default alarms:

alerts:  alarms:    - functionErrors    - functionThrottles    - functionInvocations    - functionDuration

With following default configurations:

definitions:  functionInvocations:    namespace: 'AWS/Lambda'    metric: Invocations    threshold: 100    statistic: Sum    period: 60    evaluationPeriods: 1    datapointsToAlarm: 1    comparisonOperator: GreaterThanOrEqualToThreshold    treatMissingData: missing  functionErrors:    namespace: 'AWS/Lambda'    metric: Errors    threshold: 1    statistic: Sum    period: 60    evaluationPeriods: 1    datapointsToAlarm: 1    comparisonOperator: GreaterThanOrEqualToThreshold    treatMissingData: missing  functionDuration:    namespace: 'AWS/Lambda'    metric: Duration    threshold: 500    statistic: Average    period: 60    evaluationPeriods: 1    comparisonOperator: GreaterThanOrEqualToThreshold    treatMissingData: missing  functionThrottles:    namespace: 'AWS/Lambda'    metric: Throttles    threshold: 1    statistic: Sum    period: 60    evaluationPeriods: 1    datapointsToAlarm: 1    comparisonOperator: GreaterThanOrEqualToThreshold    treatMissingData: missing

If you want, you can create your own alarm or override default alarm’s parameters. We used that in another microservice. Here’s the example:

  definitions:  # these defaults are merged with your definitions    functionErrors:      period: 300 # override period    customAlarm:      description: 'My custom alarm'      namespace: 'AWS/Lambda'      nameTemplate: $[functionName]-Duration-IMPORTANT-Alarm # Optionally - naming template for the alarms, overwrites globally defined one      metric: duration      threshold: 200      statistic: Average      period: 300      evaluationPeriods: 1      datapointsToAlarm: 1      comparisonOperator: GreaterThanOrEqualToThreshold

serverless-pseudo-parameters — you can use #{AWS::AccountId}, #{AWS::Region} etc., in any of your config strings. This plugin replaces values with the proper pseudo parameter Fn::SubCloudFormation function. Example from our code:

layers:   arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-custom-logging-${self:custom.stage}:${self:custom.logging-layer-version}

serverless-plugin-lambda-dead-letter — can assign a DeadLetterConfig to a Lambda function and optionally create a new SQS queue or SNS Topic with a simple syntax. Keeping in mind the principle that everything fails, we implement “backup forces” for our Lambdas in order to handle unprocessed events. Below, one of our functions with DLQ configured:

objects-processor:   name: ${self:custom.app}-${self:custom.service_acronym}-objects-processor   description: Updates status in Dynamodb for uploaded algorithms packages   handler: functions.new_algorithms_objects_processor.lambda_handler   role: NewAlgorithmsProcessorRole   runtime: python3.6   memorySize: 256   reservedConcurrency: 20   alarms: # merged with function alarms     - functionErrors   deadLetter:     sqs: # New Queue with these properties       queueName: ${self:custom.algorithms-objects-processor-dlq}       delaySeconds: 10       maximumMessageSize: 2048       messageRetentionPeriod: 86400       receiveMessageWaitTimeSeconds: 5       visibilityTimeout: 300

serverless-step-functions — basically, it simplifies the Step Functions state machines definition in a serverless project.

stepFunctions: stateMachines:   PackageDelete:     name: ${self:custom.app}-${self:custom.service_acronym}-package-delete-flow-${self:custom.stage}     alarms:       topics:         alarm: arn:aws:sns:#{AWS::Region}:#{AWS::AccountId}:${self:custom.app}-${self:custom.service_acronym}-package-delete-flow-alarm       metrics:         - executionsTimeOut         - executionsFailed         - executionsAborted         - executionThrottled     events:       - http:         path: ${self:custom.api_ver}/algorithm         method: delete         private: true         cors:           origin: "*"           headers: ${self:custom.allowed-headers}         origins:           - "*"         response:           statusCodes:             400:               pattern: '.*"statusCode":400,.*' # JSON response               template:                 application/json: $input.path("$.errorMessage")             200:               pattern: "" # Default response method               template:                 application/json: |                   {                    "request_id": '"$input.json('$.executionArn').split(':')[7].replace('"', "")"',                    "output": "$input.json('$.output').replace('"', "")",                    "status": "$input.json('$.status').replace('"', "")"                   }         request:           template:             application/json: |               {                #set($algorithmId = $input.params().querystring.get('algorithmId'))                #set($uploadVersion = $input.params().querystring.get('uploadVersion'))                #set($sub = $context.authorizer.claims.sub)                #set($x-correlation-id = $util.escapeJavaScript($input.params().header.get('x-correlation-id')))                #set($x-session-id = $util.escapeJavaScript($input.params().header.get('x-session-id')))                #set($x-user-agent = $util.escapeJavaScript($input.params().header.get('User-Agent')))                #set($x-host = $util.escapeJavaScript($input.params().header.get('Host')))                #set($x-user-country = $util.escapeJavaScript($input.params().header.get('CloudFront-Viewer-Country')))                #set($x-is-desktop = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-Desktop-Viewer')))                #set($x-is-mobile = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-Mobile-Viewer')))                #set($x-is-smart-tv = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-SmartTV-Viewer')))                #set($x-is-tablet = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-Tablet-Viewer')))                "input" : "{ \"id\": \"$algorithmId\", \"uploadVersion\": \"$uploadVersion\", \"contextid\": \"$context.requestId\", \"contextTime\": \"$context.requestTime\", \"sub\": \"$sub\",\"x-correlation-id\": \"$x-correlation-id\",\"x-session-id\": \"$x-session-id\", \"x-user-agent\": \"$x-user-agent\",\"x-host\": \"$x-host\", \"x-user-country\": \"$x-user-country\", \"x-is-desktop\": \"$x-is-desktop\", \"x-is-mobile\": \"$x-is-mobile\", \"x-is-smart-tv\": \"$x-is-smart-tv\", \"x-is-tablet\": \"$x-is-tablet\"}",                "stateMachineArn": "arn:aws:states:#{AWS::Region}:#{AWS::AccountId}:stateMachine:${self:custom.app}-${self:custom.service_acronym}-package-delete-flow-${self:custom.stage}"               }     definition: ${file(resources/new_algorithms_package_delete_stepfunctions.yml)}

definition: part contains YAML file with particular states configurations. I do prefer to configure it this way instead of putting the whole code in the serverless.yml file.

├── files├── functions├── messages├── node_modules├── package-lock.json├── package.json├── requirements.txt├── resources├── new_algorithms_package_delete_stepfunctions.yml├── serverless.yml└── tests

After the deployment,we got the diagram shown below. As you can see, the loop has been used in order to wait for the result returned from the CloudFormation stack. Basically, it allows you to get rid of sync calls and react, depending on the returned status from another service.

Terraform/ Terragrunt

I am pretty sure each of you knows these tools, so I’ll add only a short annotation. Generally, there are some talks about Terragrunt necessity. According to the description from their repo, “Terragrunt is a thin wrapper for Terraform that provides extra tools for working with multiple Terraform modules”. Therefore, do not expect Terragrunt to wrap up all the issues you have with terraform. Personally, I like it for the way it organizes the repo:

├── aws│  └── eu-west-1│     ├── deployments-terragrunt│     │  └── terraform.tfvars│     ├── docker-images-terragrunt│     │  └── terraform.tfvars│     ├── frontend-build│     │  └── terraform.tfvars│     ├── terraform.tfvars│     └── website│        └── terraform.tfvars├── deployment│  └── buildspec.yml└── modules   ├── aws-tf-codebuild   ├── aws-tf-codebuild-multisource   ├── aws-tf-codepipeline-base   ├── aws-tf-codepipeline-github   ├── aws-tf-codepipeline-multisource   ├── aws-tf-ecr   ├── aws-tf-lambda   ├── aws-tf-s3-host-website   ├── aws-tf-tags   ├── image-builder   ├── multisource-pipeline   └── terragrunt-deployments

Then, with a simple terraform.tfvars, you can set variables used in particular terraform modules located in source="../../../modules/", and with a specified module/product you want to use. Our example depicts a CI/CD pipeline for terraform infrastructural changes built on top the CodePipeline + CodeBuild + Lambda Notification (terragrunt-deployments).

terragrunt {  # Include all the settings from the root .tfvars file  include {    path = "${find_in_parent_folders()}"  }  terraform {    source = "../../..//modules/terragrunt-deployments"  }}aws_region = "eu-west-2"ecr_repository_name = "terragrunt-images-prod"ecr_repository_arn = "arn:aws:ecr:eu-west-2:xxxxxx:repository/terragrunt-imgs-prod"owner = "chaosgears"# ---------------------------------------------------------------------------------------------------------------------# CODEPIPELINE module parameters# ---------------------------------------------------------------------------------------------------------------------stage = "prod"app = "bingo"info = "terragrunt-deployments"service = "algorithms"ssm_key = "alias/aws/ssm"stage_1_name = "Source"stage_1_action = "GitHub"stage_2_name = "Terragrunt"stage_2_action = "Terragrunt-Deploy"stage_3_name = "Notify"stage_3_action = "Slack"repository_owner = "chaosgears"repository = "bingo-backend"branch = "prod"artifact = "terragrunt"# ---------------------------------------------------------------------------------------------------------------------# CODEBUILD module parameters# ---------------------------------------------------------------------------------------------------------------------versioning = "true"force_destroy = "true"artifact_type = "CODEPIPELINE"cache_type = "S3"codebuild_description = "Pipeline for Terragrunt deployment"compute_type = "BUILD_GENERAL1_SMALL"codebuild_image = "xxxxx.dkr.ecr.eu-west-2.amazonaws.com/bingo-terra-imgs-prod:xxx"codebuild_type = "LINUX_CONTAINER"source_type = "CODEPIPELINE"buildspec_path = "./terraform/deployment/buildspec.yml"environment_variables = [{  "name"  = "TERRAGRUNT_PIPELINE"  "value" = "frontend-build"}, {  "name"  = "TERRAGRUNT_REGION"  "value" = "eu-west-2"}, {  "name"  = "TERRAGRUNT_ENVIRONMENT"  "value" = "prod"}, {  "name"  = "TERRAGRUNT_COMMAND"  "value" = "terragrunt apply -auto-approve"}]

An attentive reader might have noticed the buildspec_path = "./terraform/deployment/buildspec.yml" statement. In this particular case, we used it as a source buildspec file for CodeBuild that is invoking Terragrunt commands and making changes in the environment.

version: 0.2env: parameter-store:   CODEBUILD_KEY: "/algorithms/prod/deploy-key"phases: install:   commands:     - echo "Establishing SSH connection..."     - mkdir -p ~/.ssh     - echo "$CODEBUILD_KEY" > ~/.ssh/id_rsa     - chmod 600 ~/.ssh/id_rsa     - ssh-keygen -F github.com || ssh-keyscan github.com >>~/.ssh/known_hosts     - git config --global url."git@github.com:".insteadOf "https://github.com/"     - pip3 --version     - pip --version pre_build:   commands:     - echo "Looking for working directory..."     - cd terraform/aws/$TERRAGRUNT_REGION/$TERRAGRUNT_PIPELINE     - rm -rf .terragrunt-cache/ build:   commands:     - echo "Show Terraform plan output"     - terragrunt plan     - echo "Deploying Terragrunt functions..."     - echo "Running $TERRAGRUNT_COMMAND for $TERRAGRUNT_PIPELINE pipeline."     - $TERRAGRUNT_COMMAND post_build:   commands:     - echo "Terragrunt deployment completed on `date`"

Is it packed already? Next stop “serverless architecture”

So far, I’ve covered tools we use to make things easier and those that save us time. Nonetheless, I would deceive you and blur the reality if I was to say that they work out-of-the-box. For me and my team, it’s all about the estimation; how much time we need to start using new tool effectively, and how much time we save by using a particular tool. Business doesn’t care about tools, it cares about time. My advice is, don’t bind yourself to tools but rather to the question: “what/how much will I achieve if I use it”. Roll up your sleeves, more chapters are on the way…