AWSLambdaprojectserverless

“Before starting the trip better check whether you’re properly equipped” (Chapter 2)

By 02/07/2020 No Comments

TL;TR

After reading the first part of this series, you should have a general idea about the business and technical goals that have a direct impact on the choice of event-based architecture. This time, I am going to cover tools, AWS services and third-party frameworks that speed up different parts of the process. I will also show you some values we got and problems we came across.

Equipment

 Basically, we’ve collected tools that are used during event-based scenarios (serverless for some people) in one bag called “BaseGear”:

We’ve already posted articles about Amplify, Vue (https://chaosgears.com/lets-start-developing-using-vue-and-amplify/) and CI/CD (https://chaosgears.com/serverless-pipeline-for-ec2-configuration-management-3-solution-architecture-and-implementation-of-serverless-pipeline/) on our blog, so I’ll skip those topics in this article.

Serverless Framework

This framework has been described in one of the previous articles, so I won’t focus on its features. However, let me give you some advice that, hopefully, will be valuable to your projects:

  • We always build separate folders per microservice, per needed lambda layer, and create a new “serverless project” within that. This particular project had the following structure:

 ~/common/bango/

├── customer-bango-backend

├── customer-bango-algorithm-equipment-layer

├── customer-bango-aws-custom-logging-layer

├── customer-bango-aws-ssm-cache-layer

├── customer-bango-frontend

└── customer-bango-image-terragrunt

~/common/bango/customer-bango-backend/

├── README.md

├── algorithms-service

├── analysis-service

├── system-health-service

└── terraform

 ~/common/bango/customer-bango-backend/algorithms-service

├── files

├── functions

├── messages

├── node_modules

├── package-lock.json

├── package.json

├── requirements.txt

├── resources

├── serverless.yml

└── tests

  • We try to separate databases, s3 buckets definitions from serverless stack especially on “prod” environment. We try to put such services together with the rest. We call them static resources and define them via Terraform code. Serverless Framework has got CloudFormation behind the scenes, so it manages your resources completely. Keep in mind that one simple mistake can accidentally delete serverless resources, like DynamoDB tables. So, either set the DeletionPolicy: Retain or move the definition outside the stack, like we do. Better safe than sorry…
  • Use “exclude” (exclude and include allows you to define globs that will be excluded from the resulting artifact) to keep control over the packaging process and pack only the code you need.
exclude:
  - node_modules/**
  - .requirements/**
  - env/**
  - README.md
  - package.json
  - package-lock.json
  - requirements.txt
  - ssm_cache/**
  • Dynamodb Streams definition and its “serverless.yml” problem
layers:
  - arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-custom-logging-${self:custom.stage}:${self:custom.logging-layer-version}
events:
  - stream: arn:aws:dynamodb:#{AWS::Region}:#{AWS::AccountId}:table/${self:custom.algorithms-tablename}/stream/2019-11-20T16:15:11.647
  - stream:
    type: dynamodb
    arn:
      Fn::GetAtt:
        - AlgorithmsMetadataTable
        - StreamArn
    batchWindow: 3
    startingPosition: LATEST
    enabled: True

NOTE (from AWS documentation): “Lambda polls shards in your DynamoDB stream for records at a base rate of 4 times per second. When records are available, Lambda invokes your function and waits for the result. If processing succeeds, Lambda resumes polling until it receives more records.”

To avoid less efficient synchronous Lambda function invocations set batchSize or batchWindow parameter. The batch size for Lambda configures the limit parameter in the GetRecords API. DynamoDB Streams will return up to that many records if they are available in the buffer, whereas the batchWindow property specifies a maximum amount of time to wait before triggering a Lambda invocation with a batch of records.

The issue with the configuration in “serverless.yml” is that when you use the schema presented below, it doesn’t respect the Lambda function trigger configuration seen in the AWS Console.

 events:
- stream:
    type: dynamodb
    arn:
      Fn::GetAtt: [AlgorithmsMetadataTable, StreamArn]
- stream:
  type: dynamodb
  arn:
    Fn::GetAtt:
      - AlgorithmsMetadataTable
      - StreamArn
  batchWindow: 3
  maximumRetryAttempts: 10
  startingPosition: LATEST
  enabled: True

Note: We suggest moving the configuration of the Dynamodb stream and the Dynamodb table to a separate file in another directory. It simply makes the structure tidier and better organized:

resources:
 - ${file(resources/new_algorithms_tablename.yml)}

Lambda function trigger configuration in AWS console:

  • Lambda layers definition – each layer we use for Lambda functions is also defined inside the serverless file regulated with proper version variable:
	   layers:
	     - arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-custom-logging-${self:custom.stage}:${self:custom.logging-layer-version}
	     - arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-ssm-cache-${self:custom.stage}:${self:custom.ssm-cache-version}

Generally, I haven’t come across any problems with layers defined this way. However, I did notice one thing about CloudFormation containing layers’ parameter values. Let’s use real case as an example:

We published changes for selected group of Lambda functions via CloudFormation stack. Below, I pasted only the necessary part of parameters section:

The point is that we’ve automated the whole process. We coded a function for getting the latest, available Lambda version, and then publishing a newer one accordingly. However, there is no information in AWS documentation (https://boto3.amazonaws.com/v1/documentation/api/1.9.42/reference/services/lambda.html#Lambda.Client.list_versions_by_function) that you can only collect last 50 versions (we had over 60 that time) via single API call. This led to a problem, because we thought we were publishing new function version (via def publish_new_version(self, uploadId) presented below) and re-pointing the “prod” alias to it. So, as you can see below, we had a situation where AWS Console version greater than 270 was available, but get_latest_published was returning only 50 items with the maximum value of 185, and that was a value the alias has been pointed to. Moreover, it has generated additional problems, because each time we’ve updated the layer version, it wasn’t seen by the Lambda function.

To sum it up, if you’ve exceeded 50 function versions, combined with aliasing, and your Lambda layers/functions updates workflows are being done via CloudFormation/Boto3, use the “NextMarker”.

versionsPublished = [‘$LATEST’, ‘9’, ’13’, ’17’, ’21’, ’25’, ’29’, ’33’, ’37’, ’41’, ’45’, ’49’, ’53’, ’57’, ’61’, ’65’, ’69’, ’73’, ’77’, ’81’, ’85’, ’89’, ’93’, ’97’, ‘101’, ‘102’, ‘105’, ‘109’, ‘113’, ‘117’, ‘121’, ‘125’, ‘129’, ‘130’, ‘131’, ‘132’, ‘133’, ‘137’, ‘141’, ‘145’, ‘149’, ‘153’, ‘157’, ‘161’, ‘165’, ‘169’, ‘173’, ‘177’, ‘181’, ‘185’]

Getting the latest version (with “NewMarker” key) of Lambda function, will allow you to collect more than 50 versions (especially the most recently published one) via Lambda API:

def get_latest_published(self):
       versionsPublished = []
       try:
           logging.info("----Searching for algorithm versions")
           versions = self.client.list_versions_by_function(
               FunctionName=self.functionName)
           for version in versions['Versions']:
               versionsPublished.append(version['Version'])
           while 'NextMarker' in versions:
               versions = self.client.list_versions_by_function(FunctionName=self.functionName,Marker=versions['NextMarker'])['Versions']
               for version in versions:
                   versionsPublished.append(version['Version'])
           logging.info("----Found versions {0}: {1}".format(str(len(versionsPublished)), versionsPublished))
           return {"statusCode": 200,
                   "version": versionsPublished[-1]
                   }
       except ClientError as err:
           logging.critical("----Client error: {0}".format(err))
           logging.critical(
               "----HTTP code: {0}".format(err.response['ResponseMetadata']['HTTPStatusCode']))
           return {"statusCode": 400
                   }

Publish new version of Lambda function:

def publish_new_version(self, uploadId):
       try:
           response = self.client.publish_version(
               FunctionName=self.functionName,
               Description=uploadId
           )
           if response['ResponseMetadata']['HTTPStatusCode'] == 201:
               logging.info(
                   "----Successfully published new algorithm version: {0}".format(response['Version']))
               return {"statusCode": 201,
                       "body": json.dumps(response),
                       "version": response['Version'],
                       "published": True
                       }
           else:
               logging.critical("----Failed to publish new algorithm version")
               return {"statusCode": response['ResponseMetadata']['HTTPStatusCode'],
                       "body": json.dumps(response),
                       "version": None,
                       "published": False
                       }
       except ClientError as err:
           logging.critical("----Client error: {0}".format(err))
           logging.critical(
               "----HTTP code: {0}".format(err.response['ResponseMetadata']['HTTPStatusCode']))
           return {"statusCode": 400,
                   "body": json.dumps(response),
                   "version": None,
                   "published": False
                   }
  • Definitely use serverless plugins. You can add them by sls plugin install -n serverless-PLUGIN_NAME or for some like step functions: npm install serverless-step-functions

They pretty much save you development time. Just keep in mind to follow their repositories’ issues. You can see some we’ve been using below:

plugins::
 - serverless-python-requirements
 - serverless-plugin-aws-alerts
 - serverless-pseudo-parameters
 - serverless-plugin-lambda-dead-letter
 - serverless-step-functions

serverless-python-requirements – A Serverless v1.x plugin to automatically bundle dependencies from requirements.txt and make them available in your PYTHONPATH.

serverless-plugin-aws-alerts – adds CloudWatch alarms to functions. Below, a simple example, showing how to use a default alarm “functionErrors”:

objects-processor:
   name: ${self:custom.app}-${self:custom.service_acronym}-objects-processor
   runtime: python3.6
   memorySize: 256
   reservedConcurrency: 20
   alarms: # merged with function alarms
     - functionErrors

There are more default alarms:

alerts:
  alarms:
    - functionErrors
    - functionThrottles
    - functionInvocations
    - functionDuration

With following default configurations:

definitions:
  functionInvocations:
    namespace: 'AWS/Lambda'
    metric: Invocations
    threshold: 100
    statistic: Sum
    period: 60
    evaluationPeriods: 1
    datapointsToAlarm: 1
    comparisonOperator: GreaterThanOrEqualToThreshold
    treatMissingData: missing
  functionErrors:
    namespace: 'AWS/Lambda'
    metric: Errors
    threshold: 1
    statistic: Sum
    period: 60
    evaluationPeriods: 1
    datapointsToAlarm: 1
    comparisonOperator: GreaterThanOrEqualToThreshold
    treatMissingData: missing
  functionDuration:
    namespace: 'AWS/Lambda'
    metric: Duration
    threshold: 500
    statistic: Average
    period: 60
    evaluationPeriods: 1
    comparisonOperator: GreaterThanOrEqualToThreshold
    treatMissingData: missing
  functionThrottles:
    namespace: 'AWS/Lambda'
    metric: Throttles
    threshold: 1
    statistic: Sum
    period: 60
    evaluationPeriods: 1
    datapointsToAlarm: 1
    comparisonOperator: GreaterThanOrEqualToThreshold
    treatMissingData: missing

If you want, you can create your own alarm or override default alarm’s parameters. We used that in another microservice. Here’s the example:

definitions:  # these defaults are merged with your definitions
      functionErrors:
        period: 300 # override period
      customAlarm:
        description: 'My custom alarm'
        namespace: 'AWS/Lambda'
        nameTemplate: $[functionName]-Duration-IMPORTANT-Alarm # Optionally - naming template for the alarms, overwrites globally defined one
        metric: duration
        threshold: 200
        statistic: Average
        period: 300
        evaluationPeriods: 1
        datapointsToAlarm: 1
        comparisonOperator: GreaterThanOrEqualToThreshold

serverless-pseudo-parameters – You can  use #{AWS::AccountId}, #{AWS::Region}, etc., in any of your config strings. This plugin replaces values with the proper pseudo parameter Fn::Sub CloudFormation function. Example from our code:

layers:
-	arn:aws:lambda:${self:provider.region}:#{AWS::AccountId}:layer:bingo-aws-custom-logging-${self:custom.stage}:${self:custom.logging-layer-version}

serverless-plugin-lambda-dead-letter – can assign a DeadLetterConfig to a Lambda function and optionally create a new SQS queue or SNS Topic with a simple syntax. Keeping in mind the principle that everything fails, we implement “backup forces” for our Lambdas in order to handle unprocessed events. Below, one of our functions with DLQ configured:

objects-processor:
   name: ${self:custom.app}-${self:custom.service_acronym}-objects-processor
   description: Updates status in Dynamodb for uploaded algorithms packages
   handler: functions.new_algorithms_objects_processor.lambda_handler
   role: NewAlgorithmsProcessorRole
   runtime: python3.6
   memorySize: 256
   reservedConcurrency: 20
   alarms: # merged with function alarms
     - functionErrors
   deadLetter:
     sqs: # New Queue with these properties
       queueName: ${self:custom.algorithms-objects-processor-dlq}
       delaySeconds: 10
       maximumMessageSize: 2048
       messageRetentionPeriod: 86400
       receiveMessageWaitTimeSeconds: 5
       visibilityTimeout: 300

serverless-step-functions – Basically, it simplifies the Step Functions statemachines definition in a serverless project.

stepFunctions:
 stateMachines:
   PackageDelete:
     name: ${self:custom.app}-${self:custom.service_acronym}-package-delete-flow-${self:custom.stage}
     alarms:
       topics:
         alarm: arn:aws:sns:#{AWS::Region}:#{AWS::AccountId}:${self:custom.app}-${self:custom.service_acronym}-package-delete-flow-alarm
       metrics:
         - executionsTimeOut
         - executionsFailed
         - executionsAborted
         - executionThrottled
     events:
       - http:
           path: ${self:custom.api_ver}/algorithm
           method: delete
           private: true
           cors:
             origin: "*"
             headers: ${self:custom.allowed-headers}
           origins:
             - "*"
           response:
             statusCodes:
               400:
                 pattern: '.*"statusCode":400,.*' # JSON response
                 template:
                   application/json: $input.path("$.errorMessage")
               200:
                 pattern: "" # Default response method
                 template:
                   application/json: |
                     {
                     "request_id": '"$input.json('$.executionArn').split(':')[7].replace('"', "")"',
                     "output": "$input.json('$.output').replace('"', "")",
                     "status": "$input.json('$.status').replace('"', "")"
                     }
           request:
             template:
               application/json: |
                 {
#set($algorithmId = $input.params().querystring.get('algorithmId'))
                 #set($uploadVersion = $input.params().querystring.get('uploadVersion'))
                 #set($sub = $context.authorizer.claims.sub)
                 #set($x-correlation-id = $util.escapeJavaScript($input.params().header.get('x-correlation-id')))
                 #set($x-session-id = $util.escapeJavaScript($input.params().header.get('x-session-id')))
                 #set($x-user-agent = $util.escapeJavaScript($input.params().header.get('User-Agent')))
                 #set($x-host = $util.escapeJavaScript($input.params().header.get('Host')))
                 #set($x-user-country = $util.escapeJavaScript($input.params().header.get('CloudFront-Viewer-Country')))
                 #set($x-is-desktop = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-Desktop-Viewer')))
                 #set($x-is-mobile = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-Mobile-Viewer')))
                 #set($x-is-smart-tv = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-SmartTV-Viewer')))
                 #set($x-is-tablet = $util.escapeJavaScript($input.params().header.get('CloudFront-Is-Tablet-Viewer')))
                 "input" : "{ \"id\": \"$algorithmId\", \"uploadVersion\": \"$uploadVersion\", \"contextid\": \"$context.requestId\", \"contextTime\": \"$context.requestTime\", \"sub\": \"$sub\",\"x-correlation-id\": \"$x-correlation-id\",\"x-session-id\": \"$x-session-id\", \"x-user-agent\": \"$x-user-agent\",\"x-host\": \"$x-host\", \"x-user-country\": \"$x-user-country\", \"x-is-desktop\": \"$x-is-desktop\", \"x-is-mobile\": \"$x-is-mobile\", \"x-is-smart-tv\": \"$x-is-smart-tv\", \"x-is-tablet\": \"$x-is-tablet\"}", 
                 "stateMachineArn": "arn:aws:states:#{AWS::Region}:#{AWS::AccountId}:stateMachine:${self:custom.app}-${self:custom.service_acronym}-package-delete-flow-${self:custom.stage}"
                 }
     definition: ${file(resources/new_algorithms_package_delete_stepfunctions.yml)}

definition: part contains YAML file with particular states configurations. I do prefer to configure it this way instead of putting the whole code in the “serverless.yml” file.

├── files

├── functions

├── messages

├── node_modules

├── package-lock.json

├── package.json

├── requirements.txt

├── resources

├── new_algorithms_package_delete_stepfunctions.yml

├── serverless.yml

└── tests

After the deployment, we got the diagram shown below. As you can see, the loop has been used in order to wait for the result returned from the CloudFormation stack. Basically, it allows you to get rid of sync calls and react, depending on the returned status from another service.

Terraform/Terragrunt

 I am pretty sure each of you knows these tools, so I’ll add only a short annotation. Generally, there are some talks about Terragrunt necessity. According to the description from their repo, “Terragrunt is a thin wrapper for Terraform that provides extra tools for working with multiple Terraform modules”. Therefore, do not expect Terragrunt to wrap up all the issues you have with terraform. Personally, I like it for the way it organizes the repo:

 

├── aws

│   └── eu-west-1

│       ├── deployments-terragrunt

│       │   └── terraform.tfvars

│       ├── docker-images-terragrunt

│       │   └── terraform.tfvars

│       ├── frontend-build

│       │   └── terraform.tfvars

│       ├── terraform.tfvars

│       └── website

│           └── terraform.tfvars

├── deployment

│   └── buildspec.yml

└── modules

├── aws-tf-codebuild

├── aws-tf-codebuild-multisource

├── aws-tf-codepipeline-base

├── aws-tf-codepipeline-github

├── aws-tf-codepipeline-multisource

├── aws-tf-ecr

├── aws-tf-lambda

├── aws-tf-s3-host-website

├── aws-tf-tags

├── image-builder

├── multisource-pipeline

└── terragrunt-deployments

Then, with a simple “terraform.tfvars”, you can set variables used in particular terraform modules located in “source = “../../..//modules/””, and with a specified module/product you want to use. Our example depicts a CI/CD pipeline for terraform infrastructural changes built on top of the CodePipeline + CodeBuild + Lambda Notification (terragrunt-deployments).

terragrunt {
 # Include all the settings from the root .tfvars file
 include {
   path = "${find_in_parent_folders()}"
 }

 terraform {
   source = "../../..//modules/terragrunt-deployments"
 }
}

aws_region = "eu-west-2"

ecr_repository_name = "terragrunt-images-prod"

ecr_repository_arn = "arn:aws:ecr:eu-west-2:xxxxxx:repository/terragrunt-imgs-prod"

owner = "chaosgears"

# ---------------------------------------------------------------------------------------------------------------------
# CODEPIPELINE module parameters
# ---------------------------------------------------------------------------------------------------------------------

stage = "prod"

app = "bingo"

info = "terragrunt-deployments"

service = "algorithms"

ssm_key = "alias/aws/ssm"

stage_1_name = "Source"

stage_1_action = "GitHub"

stage_2_name = "Terragrunt"

stage_2_action = "Terragrunt-Deploy"

stage_3_name = "Notify"

stage_3_action = "Slack"

repository_owner = "chaosgears"

repository = "bingo-backend"

branch = "prod"

artifact = "terragrunt"

# ---------------------------------------------------------------------------------------------------------------------
# CODEBUILD module parameters
# ---------------------------------------------------------------------------------------------------------------------

versioning = "true"

force_destroy = "true"

artifact_type = "CODEPIPELINE"

cache_type = "S3"

codebuild_description = "Pipeline for Terragrunt deployment"

compute_type = "BUILD_GENERAL1_SMALL"

codebuild_image = "xxxxx.dkr.ecr.eu-west-2.amazonaws.com/bingo-terra-imgs-prod:xxx"

codebuild_type = "LINUX_CONTAINER"

source_type = "CODEPIPELINE"

buildspec_path = "./terraform/deployment/buildspec.yml"

environment_variables = [{
 "name"  = "TERRAGRUNT_PIPELINE"
 "value" = "frontend-build"
},
 {
   "name"  = "TERRAGRUNT_REGION"
   "value" = "eu-west-2"
 },
 {
   "name"  = "TERRAGRUNT_ENVIRONMENT"
   "value" = "prod"
 },
 {
   "name"  = "TERRAGRUNT_COMMAND"
   "value" = "terragrunt apply -auto-approve"
 },
]

An attentive reader might have noticed the buildspec_path = “./terraform/deployment/buildspec.yml” statement. In this particular case, we used it as a source buildspec file for CodeBuild that is invoking Terragrunt commands and making changes in the environment.

version: 0.2

env:
 parameter-store:
   CODEBUILD_KEY: "/algorithms/prod/deploy-key"

phases:
 install:
   commands:
     - echo "Establishing SSH connection..."
     - mkdir -p ~/.ssh
     - echo "$CODEBUILD_KEY" > ~/.ssh/id_rsa
     - chmod 600 ~/.ssh/id_rsa
     - ssh-keygen -F github.com || ssh-keyscan github.com >>~/.ssh/known_hosts
     - git config --global url."[email protected]:".insteadOf "https://github.com/"
     - pip3 --version
     - pip --version

 pre_build:
   commands:
     - echo "Looking for working directory..."
     - cd terraform/aws/$TERRAGRUNT_REGION/$TERRAGRUNT_PIPELINE
     - rm -rf .terragrunt-cache/

 build:
   commands:
     - echo "Show Terraform plan output"
     - terragrunt plan
     - echo "Deploying Terragrunt functions..."
     - echo "Running $TERRAGRUNT_COMMAND for $TERRAGRUNT_PIPELINE pipeline."
     - $TERRAGRUNT_COMMAND

 post_build:
   commands:
     - echo "Terragrunt deployment completed on `date`"

Is it packed already? Next stop “serverless architecture”

So far, I’ve covered tools we use to make things easier and those that save us time. Nonetheless, I would deceive you and blur the reality if I was to say that they work out-of-the-box. For me and my team, it’s all about the estimation; how much time we need to start using new tool effectively, and how much time we save by using a particular tool. Business doesn’t care about tools, it cares about time. My advice is, don’t bind yourself to tools but rather to the question: “what/how much will I achieve if I use it”. Roll up your sleeves, more chapters are on the way…