AWSserverless

AppSync – first jumps in at the deep end – part 3

By 03/19/2021No Comments

In the previous part, we circled around the “data sources” topic in connection with AWS AppSync. This time, I won’t spend much time on integration, but rather focus on the “single- vs multi-table data modelling in projects based on the AWS AppSync” topic.

TL; TR

I’ll use a serverless IoT project we’re currently working on as an example to share my thoughts on selecting multi-table design and using Dynamodb with AWS AppSync instead of API Gateway.

Intro

Treat this part of my article simply as lessons that I’ve learned – lessons which, in some ways, definitely changed my prior point of view. It’s not about convincing anyone that Dynamodb can replace all your SQL-like scenarios, because despite being a powerful db service it might drive you to technical madness if planned improperly. Especially with SQL-friendly databases, like Amazon Aurora, where you have a lot of fully-managed options with a much easier learning curve.

Note: Try not to forget that while DynamoDB does a great job with serverless projects (or rather event-driven-based architectures), it was not built strictly for serverless use. When customers ask me whether to use Dynamodb, I usually say: “It may be either a blessing or a nerve-wracking down-hole.”

Multi-table in the past and now

When starting this project I was already aware of two distinct design approaches I could choose from: a multi-table and a single-table one. To be honest, knowing both is something completely different then implementing each of them. In all our previous projects (non-graphql ones) where the AWS Dynamodb has been chosen as a data source, I followed a multi-table scenario. For those who don’t know what it means, let me explain that in a human-friendly way:

multi-table – means having a table per business entity where each item (with its unique primary key), and its attributes in this particular table, represents one type of data. Simply put, imagine Events and Customer service containing data stored within two separated tables. In order to fetch the data, like “Give me a customer and all his events”, I was forced to query two separate tables so I could retrieve it. Neither AppSync nor API Gateway (which I also used) would’ve changed a thing ‘cause two exclusive db queries had to be invoked anyway. Essentially, this seems to be a less scalable approach compared to well-known JOINs from SQL databases, and an important aspect in comparison to a single-table approach.

Before analysing the single-table design in GraphQL ecosystem, I’d like to present some points that I’ve walked through before taking the decision of not squeezing the data into a single Dynamodb table. Let me start with the following points:

  • config management
  • costs

I hope everything will be clear soon.

Configuration management

 

DynamoDB has a well-defined CloudFormation template support that makes the whole process of provisioning easy as pie. With a few lines of YAML you can quickly implement DynamoDB tables and indexes along with necessary associated IAM policies. Some time ago we decided to carry out our projects relying on Serverless Framework, but if you’re not a fan of this framework check AWS. It also offers SAM (Serverless Application Model) which will model all your serverless ideas using YAML by transforming and expanding the SAM syntax into AWS CloudFormation syntax.

Just pick the one you like and start. I wouldn’t say that there is more to do when managing multiple Dynamodb tables in comparison to a single one. Some of you might say that having one table with all items in it, rather than ten separate tables, reduces the number of alarms and metrics to watch. But hey, is it really such a huge deal? I’ve worked with many customers who had multiple Dynamodb tables and the config maintenance was never an issue.

Money, money, money

Basically, Dynamodb works on partitions. Each partition can handle 10GB of data, 3000 read capacity units (RCU) and 1000 write capacity units (WCU), indicating a direct connection between the amount of data stored in a table and performance requirements. Whenever you create a new Dynamodb table you have to either manually provision read and write capacity units for your primary and secondary indexes (provided you know the access volume to your database while it grows) or use DynamoDB On-Demand pricing.

Regardless, you save real money if you use really useful indexes. Dynamodb shines if you restrict yourself to “gets”, which basically are key/value lookups on single items, or “queries” that work as conditional lookups on items that have the same partition key but different range/sort keys.

Having that in mind, I realised that if we follow a single-table design, we wouldn’t make any significant savings, especially with an on-demand approach. This project didn’t require the scaling capabilities of DynamoDB to get going. And just like in many startup stories, we didn’t exactly know how this application would evolve over time. Anyway, in my scenario it was still acceptable to leave multiple tables in. At this point, it is you who assesses and decides whether your project will give benefits you need.

Don’t get me wrong, everyone loves money savings, but sometimes the amount of time needed to properly design single-table (steep learning curve I’ve mentioned before) is not worth the annual spending decrease you get when choosing single-table over multiple ones.

IMPORTANT: I’m a huge fan of taking new paths and I’m not afraid of steep learning curves, but at the same time I stay humble and offer my customers valuable services they expect from me. If you have time and ambition, keep learning. Technology hates sloth.

Alright, this time, neither money nor config maintenance aspects drove me to the point where I started thinking: “Hey, maybe this graphql-based project is a good fit for a single Dynamodb table. Let’s use it for the entire project”. This is where performance comes in.

 Performance

The contemporary world forces us to constantly measure our performance. It doesn’t matter if you’re a human or some fancy application, sooner or later your success will be placed somewhere along the performance axis. If we focus on Dynamodb performance, we have to constantly tune the table so that access patterns we’ve modelled can be handled ideally with one request. This helped me to understand that the scenario with AWS AppSync, working as an execution engine between the frontend and our Dynamodb tables, won’t be the right choice for a single-table design.

Why?

To answer that, I have to take a step back. Remember the example I’ve mentioned earlier about Customers and their Events in “Give me a customer and all his events”? In the case of GraphQL the web browser makes a single request to the backend. The content of that request may look like the one below:

query getCustomerEvents {
 getCustomers(id: "b46c7c13-b3b3-4f93-9b47") {
   id
   name
   surname
   events(id: "b46c7c13-b3b3-4f93-9b47") {
       items {
           createdAt
           id
           name
           startDate
           endDate
           description
       }
     }
   }
}

Seemingly, all looks good. Our client is making only a single request to the backend. Unfortunately, the devil is in the details. If I decide to follow a single-table design, the root resolver will be executed first to find the User with id b46c7c13-b3b3-4f93-9b47. And that is our first query to the database.

Once that data is available, the next resolver will make subsequent requests to our table which is exactly what we’re trying to avoid with single-table design. We don’t want to make multiple, serial requests to DynamoDB in a single-table scenario in order to fulfill an access pattern.

GraphQL is a powerful tool for a comprehensive interaction with our backend, but because the entities are resolved separately, it’s better, at least in my opinion, to model each entity in a separate table. It doesn’t mean you can’t do otherwise. I simply don’t see any value in doing so. However, in any upcoming project where API Gateway (REST-API) could be used instead of the AWS AppSync, single-table would definitely be the right choice.

Outro

It doesn’t matter what approach you favour and which design looks better on paper. Remember to always check and test the defined problem with your project’s design. What pretends to be easy might turn out to be a real pain in the a**.