A way to get full transparency of your cloud storage bills.
There must be a way to get full transparency of your cloud storage bills which is something I never got out of commercial solutions out there with fixed subscription packages. I mean, some files I store are better off in cold storage and my family will always drop and forget and remember it 6 months later. Psychologically we are inclined to compensate on our ordered life in some unordered way and I chose to go full on hoarder-style with S3 Object storage and serverless architecture for this little PoC.
I am going to describe the infrastructure behind something which can be called a fully serverless object uploader, with some validation (in this case I have implemented a user upload cap but can be anything else) and less obvious but also important - object listing. I should point this is a proof of concept (PoC) which proved — it is possible in practice to push the boundaries of IAM and Lambda to control how much some authenticated user can upload, and where.
My first cognitive checkpoint was to tap in check what my browser sends as part of the multipart upload request, to find something I can use.
And bingo, we can use the content length in number of octets to check our request is not shooting over the upload cap.
“The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET.”
https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Now back to IAM to check if that can handle this validation for us.
IAM can be really granular. In this case, we use POST policy conditions which allow us to check the minimum and maximum allowable size for the uploaded content. This is exactly the type of logic offloading I was looking for to make this solution cheaper. Which means we should be able to generate these policy on the fly if the usedStorage is less than the user uploadCap.
An example POST policy condition to restrict exactly where and how much the request should be. This is attached to the formData used by your browser to upload object directly to S3.
So far so good, it’s just how fast can we glue this together so that we can start working on a client application. Get this down on paper and you will see it clearer.
Clearly you don’t want to give access to your serverless app to anyone, AWS Cognito is the way to go - and you can read more about it on our blog.
The API is surely better off serverless as well. Go with API Gateway to define your endpoints for GET ‘/session’ and PUT ‘/session’. It will invoke your lambdas as per Diagram 1 and 2.
While Diagram 1 is a straight forward wrapper around IAM roles, in Diagram 2 the flow makes more sense if you look from the perspective of DynamoDB which holds the state of your upload (INITIATED, REJECTED, COMPLETE) - using these flags made it easier to debug on the way and can we used in the future to backtrack issues with uploads, it doesn’t cost anything having here anyway.
While the first two (INITIATED, REJECTED) are easy to settle on per first upload request, the last (COMPLETE) one can be written as a result of the S3 trigger.
So in order of occurrence, on Diagram 2 we have got the happy path:
You are going to love to hear this part was not easy.
This being completely designed for home use I estimated negligible Lambda and API Gateway requests so that we stayed in the Free Tier.
2 requests per upload and 1 per List and 1 per Download = 4 requests per object max
Give or take 1TB of pictures thats 312500 objects
This is not the end...
While this is somehow more expensive than some of the shelf commercial solutions, I know there will be other S3 API compatible solutions which will take this POC to the next level. Surely this is somehow not a finished solution, it will require more though around upload state handling, object namespace - I would like to make use of object tags to avoid clashes, what about Object lifecycle? Glacier can give us that great saving, but this is all a good start - at least we learned what makes up serverless building blocks.
We'd love to answer your questions and help you thrive in the cloud.