AWS RDS integrity testing tool based on a serverless approach.
For one of our customers, a digital healthcare company and provider of a solution for patient engagement, we’ve developed an AWS RDS integrity testing tool based on a serverless approach. We have used a combination of AWS Step Functions, AWS Lambda, AWS SSM and AWS KMS. In short, this is not solely a story about atechnological concept but also about a way of dealing with such problems in a startup that finds itself under the pressure of other priorities.
We all live in a world where data has become a currency. Any potential loss or database downtime means loss of income. Which brings us to the key question: how will you, as an organization, ensure that, once the failure springs up, you will have the quality and the completeness of data?
Essentially, we all are aware of failures, but how many of us ever tested a restoration from previously created AWS RDS snapshots. Not all of us, certainly. Personally, I’ve seen many examples of a “the cloud has magic powers” attitude and heard many people saying: “we do not need to worry about data, we’ve got regular system snapshots done by AWS”.
Once the failure creeps in (developer deletes a tiny part of the data “accidentally”), it’s always too late for guessing. I won’t get into details and dwell on how you feel when you have to act fast, especially when dealing with a problem you have never tested before. Let’s just say that your heart races like a speeding train.
An old Russian proverb says: “Trust, but verify”. For our internal and external ISO 27001 audits, we needed a clear report of the integrity of our encrypted backups. Chaos Gears implemented a method to make the integrity of our backups easy to demonstrate to internal and external auditors”.
Up to this point, our client had RDS databases launched in 8 different AWS regions. Keeping the data consistent and being able to restore it from a particular backup in case of an outage is always at the top of our priorities. Unfortunately, in this case, there was no habit of regular and internal testing. So, the right time came with an ISO audit.
The balance between tasks automation and time spent on building automated flows is a question that always sparks long debates. Our team decided to avoid developing something that would introduce additional maintenance overhead and, more importantly, one that would certainly kill our monthly billing, just because we wanted to save some time. That’s why we leveraged AWS Step Functions to close the whole workflow in a single place.
NOTE: For those who have never worked with AWS Step Functions, it is a serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services into business-critical applications.
If costs are fine, then you’re on the right track
When you use AWS Step Functions, you are charged based on the number of state transitions required to execute your application. After the free tier, which includes 4,000 state transitions per month, you pay $0.025 per 1,000 state transitions. Our client would accept an automated approach if a short development period could overlap with low end costs.
AWS Step Functions are a great way to build and step through series of AWS services in a matter of minutes. In this case, it also addressed few concerns that our client specified as highly important from his perspective:
Below, you’ll find what we’ve developed as an initial version of the flow which, right now, is on a roadmap for future development.
IMPORTANT: All of the steps mentioned above work as Job with a try-catch approach.In a nutshell, if any of the steps from 1 to 9 fail, then it isimmediately directed to the “Destroying RDS Instance/ Slack Notification” step.
Basically, our company is full of enthusiasts, who like to take small steps with tangible results. So, here’s what our next priorities look like:
Problem solving is all about seeking relatively easiest solutions, provided they exist. Honestly, I wouldn't treat serverless AWS services as a remedy for every problem. However, speaking from my own experience, it definitely allows us to quickly check the idea against the solution. With the example depicted above, I was able to provide an already working version for tests after just 2 days, and I didn’t have to think about non relevant setup issues.
We'd love to answer your questions and help you thrive in the cloud.