ArticleAWSBoto3LambdaS3

New challenges with Boto3

By 09/03/2019 September 4th, 2019 No Comments

For starters

Let’s remind some basic facts about Boto3. To be clear, I won’t give you exact definition because it’s dull as ditchwater. Instead, I’ll try to describe it in more intelligible way. The easiest one is to imagine Boto3 as a bridge connecting two sides. We will use it to connect our developer to huge amount of available AWS services. Without going into details, Boto3 is crucial for every application intended for using AWS resources.

Fine, but why I am writing about this sdk again, you may ask. I’ve described its components in the previous article – Boto3 client, credentials, S3, dynamodb. So, why focus on it once more? Because I’m confident that information contained in the last part cover only a small piece of all the possibilities it offers. But to make this article more interesting to you, dear reader, I won’t introduce any new services but instead I’ll focus on some issues that appeared during the 1 year I’ve spend using it. If you read my previous article and you thought that Boto3 is free of any drawbacks or shortcomings, I’m really sorry for misleading you.

Allow me to introduce…

John – a young and inexperienced programmer who wants to be somebody. He spends his free time at home, thinking about Amazon Web Service. One day he comes up with an idea.

– I will create an S3 bucket – he says.

Great idea, John. Let’s do that. As you know from previous article, John will create Boto3 S3 client in order to connect to the right service (bearing in mind the credentials). Ok, John, show us a piece of your code:

import boto3

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')

bucket_name='name'
Client.create_bucket(Bucket=bucket_name)

He couldn’t wait for us and he decided to run it immediately. What’s interesting is that the code executed without a single error… Great job, John. He was so happy that he used aws cli to get info about his newly created bucket. Now, he will show us the location of his bucket.

– Ohhhhhh no, it’s null – John stammers.

Yeah, no wonder. You’ve create a bucket without specific region, so it was created in a default location which is us-east-1.

– But my config file contains default region which is eu-central-1. That’s strange.

Unfortunately, that’s something you need to learn by heart. Without specific region all your newly created buckets will have LocationConstraint equal Null, and that is us-west-1. I can tell you, that you need to use extra flag LocationConstraint.

– Ok, I will try that.

import boto3

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')

bucket_name='name'
Client.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={
    	'LocationConstraint': 'eu-central-1'
  })

– Yes, now my location constraint contains the right value – eu-central-1 – but I want to try something else.

import boto3

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')

bucket_name='name'
Client.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={
    	'LocationConstraint': 'eu-west-2'
  })

– Something’s wrong. But I’ve only changed the region to eu-west-2.

John, you have to remember about the connectivity between S3 client and buckets you need to create. Look at your code. You created s3 client in default region, and that is…?

– Region of my s3 client is default, so eu-central-1?

Exactly!!! So, you tried to use region that differs from the one used by your Client. Create your client in a proper way.

import boto3

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3', region_name='eu-west-2')

bucket_name='name'
Client.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={
    	'LocationConstraint': 'eu-west-2'
  })

– Yeah, I checked It works perfectly, now.

I’m proud of you, John. BUT WAIT, where are you going? We have to run another test.

– [Sigh…]

Lambda, is everything clear?

Ok, John. You had your break, but now is the time for more work. Please, show us the lambda handler that I’ve send you.

– Of course, here it is.

def lambda_handler(event, context):
   
   try:
       data = event['body']
   except:
       raise Exception("Event body not found")

   bucket_name = (json.loads(event['body'])).get('bucket')
   key = (json.loads(event['body'])).get('key')
  
   response = {
       "statusCode": 200,
       "body": "Yeah, that works"
   }

   return response

As you can see, this handler is not very complicated, but to make everything clear I can give you a little tip. The most important thing to do is to check if our event contains its body. In other words, to check if the event contains any data that can be used later. Most of you probably guessed that we will work again with s3 buckets because two parameters are connected with this service:

  • bucket_name
  • key

If you haven’t noticed this, don’t worry. Neither have John.

Since we’ve mentioned him… John, can you add the least complex example for getting object from S3, please.

def lambda_handler(event, context):
   
   try:
       data = event['body']
   except:
       raise Exception("Event body not found")

   bucket_name = (json.loads(event['body'])).get('bucket')
   key = (json.loads(event['body'])).get('key')
  
   client = boto3.client('s3')
   resource = boto3.resource('s3')
   
   s3_response_object = resource.get_object(Bucket=bucket_name, Key=key)
 
   response = {
       "statusCode": 200,
       "body": "Yeah, that works"
   }

   return response

– Ok, that works fine. I got item, using “test_file.json” key. So, you’ve already connected the right policy to lamba?

Your right, I’ve prepared it earlier. Next, create a directory inside your bucket and put your file there.

– Now my object key is ‘test_folder/test_file.json’. That’s strange. It’s not working anymore. I get response ‘Access Denied’, even though the policy is already included.

Ok, John. Log into your AWS management console and show us this policy.

Effect: Allow
            	Action:
              	    - s3:*
            	Resource:
                    - arn:aws:s3:::${article_test_bucket}

Looking at it cursorily, we might say that our lambda should have access to every action performed on our bucket. But that’s not true. It will only work on items that are put directly inside the bucket (without any directories). If you know how to improve it, John, do so.

    Effect: Allow
	Action:
      	- s3:*
	Resource:
           - arn:aws:s3:::${article_test_bucket}
    - arn:aws:s3:::${article_test_bucket}/*

Exactly! As you can see, John added an extra line which gives access to objects inside any path. Is it working?

– Of course, no errors.

Great! Now, let’s reverse the situation. Create a new file and put it into our bucket.

def lambda_handler(event, context):
   
   try:
   	data = event['body']
   except:
   	raise Exception("Event body not found")

   bucket_name = (json.loads(event['body'])).get('bucket')
   key = (json.loads(event['body'])).get('key')
 
   client = boto3.client('s3')
   resource = boto3.resource('s3')
   
   f= open("test_file.txt","w+")

   s3_response = client.upload_file('test_file.txt', bucketName,
                                           'test_file.txt')
 
   response = {
   	"statusCode": 200,
   	"body": "Yeah, that works"
   }
return response

– Hmm, it doesn’t work. I get error 30 – read only filesystem. Does it mean I can’t create any files with lambda? That’s silly.

No, no, you’re wrong, John. Allow me to explain. All catalogs, directories, etc. created inside lambda are read-only. You’ve said the same, more or less. You can’t create any files there. Fortunately, there is one exception to this rule. Inside lambda you can find one unrestricted folder called /tmp. That is your solution.

– So, my code should look like this?

def lambda_handler(event, context):
   
   try:
   	data = event['body']
   except:
   	raise Exception("Event body not found")

   bucket_name = (json.loads(event['body'])).get('bucket')
   key = (json.loads(event['body'])).get('key')
 
   client = boto3.client('s3')
   resource = boto3.resource('s3')
   
   f= open("/tmp/test_file.txt","w+")

   s3_response = clnt.upload_file('/tmp/test_file.txt',                                   bucketName,'test_file.txt')
 
   response = {
   	"statusCode": 200,
   	"body": "Yeah, that works"
   }
return response

Precisely! Now, you can write your file inside /tmp catalog and it should work without any problems. It’s yet another rule you need to learn by heart.

– Yes, it works, but it is a lot to remember.

Hey, nobody said Boto3 were easy 😉

I can give you one more solution to this problem. You can omit step with /tmp by using single put_object function, instead of creating file and then uploading it to s3.

– Ok, let’s try it.          

def lambda_handler(event, context):
   
   try:
   	data = event['body']
   except:
   	raise Exception("Event body not found")

   bucket_name = (json.loads(event['body'])).get('bucket')
   key = (json.loads(event['body'])).get('key')
 
   client = boto3.client('s3')
   resource = boto3.resource('s3')
   
   s3_response = resource..put_object(
            	Bucket=bucket_name,
            	Body='some_string_data',
            	Key='test.txt'
        	)
 
   response = {
   	"statusCode": 200,
   	"body": "Yeah, that works"
   }

return response

– I have already tested it and… it works. So, what’s the difference?

In the first example you divided the algorithm into two steps.

  • Create file
  • Upload it to s3

As I said before, all folders except the one mentioned before won’t allow you to write anything inside them. You need to use that special folder.

In the second case you directly wrote a file to s3. Your file is being created in the ‘flow’ so there is no need to store it anywhere else except the final destination.

– So, why should I focus on the first scenario at all? It’s even more complicated than the second one.

Great question. Everything depends on the situation. You’re right, it is faster (and probably easier) than the second case. But imagine that you created an empty zip file and you want to put it to s3 with some other files already inside. You need to store all files somewhere, including the zip one. In this situation put_object won’t work. But /tmp scenario will work perfectly here.

Let’s put it all together

When you create new s3 bucket, using only bucket name, it will be created in us-east-1 region by default.

Adding flag LocationConstraint allows you to choose a specific region.

Creating bucket in a region that differs from your client’s region will cause errors (IllegalLocationConstraintException).

 

Creating client with region flag fixes

Client = boto3.client(‘s3′, region_name=’eu-north-1’)

 

Using policy arn:aws:s3:::${article_test_bucket} with s3 bucket gives you access to resources that are put directly inside s3 bucket (without folders and subfolders).

Adding extra line arn:aws:s3:::${article_test_bucket}/* gives access to previously locked items.

You can’t create files to any path, using aws lambda. You need to store all your newly created files inside one folder called tmp.

 

If you want to put an object on S3, you can create a file inside tmp folder and then upload it to S3. But in some cases you can omit this step and simply use put_object function to directly place it on S3.

Next Scenario?

Ok, let’s move to the next example… John, JOHN, WHERE ARE YOU GOING!?!?! I’m terribly sorry but I think John is really fed up with our exercises. Nevertheless, I do hope that this article will help you with your next applications. Together with John we look for solutions in advance so you wouldn’t have to. So, try to apply your newly acquired knowledge to create something great. And wait for the next part of our article, it’s coming soon. That is, if I manage to catch John first…