More about client requirements, current situation and limitations
Hello, welcome to the next part of the article about serverless pipelines which we implemented for one of our clients to automate configuration management of EC2 instances. In the first chapter we’ve quickly went through AWS native services, their specification and options that allow us to extend them. In the second one, called “Client requirements and how we overcame all challenges with CI/CD in AWS”, we would like to demonstrate a specific use case and talk more about client requirements, current situation and limitations. Here, we will go through every challenge, explain what solutions we used to resolve all problems and why we made such a choice.
First, let’s take a look at the current situation in the project and our requirements.
Our client took his first steps as a startup a few years ago. He grew with time - as his application.
Currently, it is receiving 10 million requests per day, and all that traffic is distributed to 5 AWS regions, 10 EC2 instances and 2 AWS accounts (DEV and PROD).
Although almost all of Infrastructure was defined as code, provisioning was managed from local computers. What else is missing here? We had to resolve few problems, namely:
The budget was set for a maximum of 30 hours per month during the current development phase, for maintenance and improvement. Optimistically, we started looking for simple solutions which could be delivered as quickly as possible. And self-maintained AWS services seemed like a good fit here.
Let’s have a closer look on requirements which we had to meet and how were solved our challenges.
We wanted to simplify our work as much as possible but also replicate our solution easily. That’s why we decided to define infrastructure and application configuration as code.
Which contributed to the choice of the final solution?
We were guided by the principle of simplicity. We also wanted to use something with low entry barriers (not requiring learning from the beginning). In other words, something easy to implement and manage, so you wouldn’t have to look for new people with specific skills. Budget was limited, but any commercial or open source project was allowed. More importantly, we wanted it to be actively developed.
Out of many possibilities we decided to use Terraform for storing configuration of infrastructure, and Ansible for managing our EC2 instances.
Why Ansible?
As we described in the previous chapter, with Ansible we’ve been able to define exactly which steps need to be executed on EC2 instance and in what order. It proves quite useful, when we have to configure operating systems and applications level components at the same time, and don’t want to mix up all the steps.
Why Terraform?
Simply put, with Terraform we can define how the final state of our infrastructure should look like. It also collects information about states and how they change with time.
To resolve the first problem, we defined playbooks, roles and tasks as code for managing EC2 instances in Ansible, e.g. packaged installation, configuration of operating system, changes made in application. Every step, which needed to be executed on the instance, was defined idempotently, which means that tasks were applied multiple times without changing the result beyond the initial application. You can achieve this in Ansible by conditional operators, debug modules and other tools that check the state of specific operation.
You can find one example below:
Code: Ansible task for managing network configuration on EC2 instance.
AWS CodeBuild will be responsible for provisioning of those changes. Originally, it was responsible for building images, running tests, compiling a source code, etc. Apparently, it has a much bigger functionality. AWS CodeBuild behaves like an EC2 instance that is created when the job starts but dies with its completion. It means that you can use it as a deployment instance, but pay only for minutes while running a job.
What does this give us?
AWS resources just integrate very well with each other. You can grant very granular permissions to allow actions from AWS CodeBuild to ECR, CloudWatch Logs, S3 buckets, and so on and so on… using IAM roles with IAM policies.
And that means:
By configuring AWS CodeBuild inside AWS network, you reduce the distance between deployment server and target hosts, and, as a consequence, you also reduce deployment delays.
In other words:
To conclude this part, we decided to use webhooks.
Webhooks allow you to integrate version control systems, e.g. GitHub or BitBucket, with other tools. They work based on events and trigger some actions. Webhooks can be used to automatically update an external issue tracker, trigger CI builds, update a backup mirror, or even deploy a change to your servers.
At that point, you can configure AWS CodeBuild with various sources:
Available events: PUSH, PULL_REQUEST_CREATED, PULL_REQUEST_UPDATED, PULL_REQUEST_MERGED and PULL_REQUEST_REOPENED (GitHub only).
Available filters: ACTOR_ID, HEAD_REF, BASE_REF, FILE_PATH (GitHub only).
Unlike CodeBuild, CodePipeline allows only two detection options for starting a pipeline automatically:
Another benefit of using AWS CodeBuild is that you can track changes of every job. You can find logs directly in AWS console inside CodeBuild project or CloudWatch Logs Group named as your project.
CloudWatch is storing all outputs from console into Log Streams and keeps them even when CodeBuild project doesn’t exist anymore. Based on that data, you can make more advanced analytics. You can also move your logs to S3 bucket and save some money by changing storage class.
Although CodeBuild works very comprehensively and will definitely suffice in some cases, you can always create a full complicated pipeline, instead of just a single job, using AWS CodePipeline.
CodePipeline lets you create multiple stages, sources or actions, and integrate them with various tools. Like in our example, where we used a 3-stage Pipeline:
It was really important for us to keep control of our environment. And that meant tracking all changes and versions that were deployed. I can’t imagine working in teams with many contributors, and not using Version Control Systems.
It’s worth mentioning that even though CodeBuild integrates easily with GitHub and Bitbucket, CodePipeline can only have CodeCommit and GitHub as a source, at least for the moment (track AWS news).
The idea was to automate and speed up all manual work. We went through all problems and found a compromise between requirements, money and time spent on building our automation solution. Of course, someone can always say that this isn’t going to work for his use case. And that’s fine. Focus on what you and your project really need, not on what you would like to have.
In the third, and last, part called “Solution Architecture and implementation of serverless pipelines in AWS” we would like to go a bit deeper and show how we built our solution, and share some thoughts after using it for few months. To illustrate the final solution we will also prepare a DEMO. We will wrap things up with our conclusions based on few months of use. We will also answer to some Qs: what was good, what could be better, what we plan for the future.
Interested? See you in the next chapter.
We'd love to answer your questions and help you thrive in the cloud.