ArticleAWSBoto3ProgrammingS3

Boto3, SDK for your first AWS app

By 05/08/2019June 19th, 2019No Comments
Brief introduction

Have you ever thought about how frustrating it might get when you want to use your AWS management console to create a bunch of S3 buckets? Alternatively, you need to prepare some services depending on given inputs values? Your solution for this and many more problems are Boto3, an AWS SDK for Python language. Nearly every single operation which can be done with AWS management console can be replaced with adequately created code using Boto3 library.

Why is it so useful? During our student’s life, we encountered many situations where writing your code was tedious due to computing power which we were provided (I know that the dean has limited amounts of money, but IT students expect something more). It happens especially if we think about multi-threading lectures, where machines running a maximum of 6 threads were all there was. With boto3 we can write the same apps, but using resources given by AWS. More computing power is needed?  EC2 surely solves this problem with much ease. With more computing power we can use more threads, with more threads our app execution time is much faster and so on and so on… For now, we are going to put EC2 and threads aside so we can focus on boto3.

However, why should I use it? After all, AWS provides an excellent interface which is quite simple and gives access to all services. Simple, yes, but it doesn’t necessarily mean better. Imagine a situation, when you need to create one hundred S3 buckets:

Bucket1

Bucket2

Bucket3

.      

.      

.      

Bucket 100

So you log into your management console. From all the possible services in the list, you choose S3 bucket service. Next step is to create a new bucket. So you type your bucket’s name and hit create… Great, you now have your first, newly created bucket. Now don’t worry, you need to create 99 more…

With Boto3 library, you could write your python script in which your function takes only one parameter: bucket_name. The script will contain a looped function call that will create a new bucket with the parameter being bucket name + loop number. So by invoking your script, you will get exactly one hundred new buckets.

Easy, don’t you think? But what’s more, if we add a second parameter to our script, which describes the number of loops, we will get the script that could create 1, 100, 1000 and more buckets. Management of services by boto3 is very flexible. In our case, we are using it to solve an elementary problem, but it’s commonly used in a huge application where we stumble upon more complex issues and where we use more than one service.

Boto3 can support nearly every service that is available on AWS, but it’s not this article’s goal to give you examples where to use it, but rather how to use it in general, so let’s get started.

Prepare your Boto3 script

As stated before – boto3 is Python library. So the first thing you need to get is Python interpreter. If that step is behind you, you need to download boto3 library. By default it’s not available in standard Python SDK, so open your terminal and just type:

pip install boto3

Sometimes it’s necessary to update pip before you start the installation of boto3, but in most situations, this process should end successfully. Next step is to import your boto3 library into Python script; we can do this by typing:

import boto3

In this case, if we want to call a boto3 function, we must type boto3.function_name(). Another way to import boto3 library is by typing:

from boto3 import *

In that case, when using boto3 functions, we can omit the first part and call function_name().

In the next part of the article, we will use the import boto3 case, which is a bit safer option. If we want to start working with a particular service, we need to create a client for this service. Let’s see the most straightforward s3 client:

import boto3

Client = boto3.client('s3')

Now we can put theory into practice and create our one hundred buckets. Let’s code our loop:

import boto3

Client = boto3.client('s3')

for i in range(100):
    bucket_name="bucket"+str(i+1)
    Client.create_bucket(Bucket=bucket_name)

With one loop, one variable containing bucket name and boto3 create_bucket function we can replace the whole time-consuming process of creating those bucket manually on AWS management console.

Credentials

In most cases, unfortunately, these operations would not yield expected results. How our client would know, on which account should these buckets be created? Usually, it requires credentials – access keys which are stored in .aws folder. We will just need a way to extract them.

import boto3

boto3.setup_default_session(profile name=profile_name)

Client = boto3.client('s3')

Now we get profile_name credentials which gives us an access to particular services. We established that profile_name credentials have access to s3 bucket. In the end our whole code should look like this:

import boto3

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')

for i in range(100):
    bucket_name="bucket"+str(i+1)
    Client.create_bucket(Bucket=bucket_name)

But in what situation can we omit our credentials? One example could be AWS lambda with properly created policy giving access to our s3 buckets which holds our boto3 script. boto3 s3 clients that were created during lambda process will have the same access rights as in lambda policy. Of course, there are many more situations where we can omit extracting credentials from .aws. The most important thing is to be aware of it.

Exceptions

When things go sideways, exceptions come to the rescue. Aside from using regular Python

try / except statements; we can make use of botocore.exceptions.

Remember, that an exception serves its purpose when unwanted stuff happens truly as an exception that means its excluded from a general statement or does not follow a rule. There shouldn’t be too many exceptions flying around your code. It would make it rather hard or unpleasant to read your code and to refactor it. Having that in mind, let’s jump right into the action with the code from previous examples. We want to create buckets, but we also want to make sure that the bucket we want to create doesn’t already exist somewhere on a different account (or even on the same account). Of course, boto3 throws it’s own exceptions when bad stuff happens. What we are doing is essentially creating a wrapper that lets us control the flow. We decide what to do in the except block: whether to use the pass statement, raise the exception as it is, or maybe use it only partially and raise it in a particular way – for example raising an exception with a particular error code, but more on that later. The simplest version would look something like the code below:

import boto3
import botocore.exceptions

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')
for i in range(100):
    bucket_name="bucket"+str(i+1)
    try:
        Client.create_bucket(Bucket=bucket_name)
    except botocore.exceptions.ClientError as e:
      raise Exception('Your exception statement ' + e)

We are just placing the create_bucket method call inside a try / except block. When an exception is caught, it’s raised with your statement of choice joined with a statement shown by boto3 native exception. It doesn’t necessarily have to be that way. You could ignore the native exception, and just display your own statement or vice-versa. You could even use the pass statement so that nothing happens. This last option doesn’t seem useful, but it might be. If you would be in the middle of creating a hundred buckets, and just so happens that the 51st bucket’s name is already in use, you probably wouldn’t want to raise an exception and stop the program in the middle of the execution – this would mean that you no longer would be able to run your script again without any problems and create the remaining buckets, because the first 50 buckets are already created. You would have to modify the script. So to avoid that, you use the pass statement in the except block. Yes, this way you miss the 51st bucket, but the process goes on and gets completed, and what you are left with is a collection of 99 buckets ready to use, which in some cases might be a better solution, especially when that 1 bucket doesn’t matter. Yes, it’s just an example, maybe a dull one, but I think it serves its purpose.

Here we have a bit more specific example. If we want to make use of error codes, we first use botocore.exceptions.ClientError. Then we obtain our error code using error_code = e.response[‘Error’][‘Code’] and voilà. We can now proceed with processing our error code, in order to, for example, raise an exception only when the error has  a code 404 (Not Found). Example code is listed below:

import boto3
import botocore.exceptions

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')
    for i in range(100):
        bucket_name="bucket"+str(i+1)
        try:
            Client.create_bucket(Bucket=bucket_name)
      except botocore.exceptions.ClientError as e:
            error_code = e.response['Error']['Code']
            if error_code == '403':
                raise Exception("Forbidden / Access Denied statement")
        elif error_code == '404':
                raise Exception("Not Found statement")
        else:
            raise Exception("Other statement " + e)

We can always use else statement to have a general rule for the rest of the exceptions that may be caught. This behavior of error code can serve many purposes and helps manage the flow of your app.

But we can do much more

Do not think that creating buckets is all that we can do. There are many more functions that you can call. We can for example:

– list our buckets

import boto3

Client = boto3.client('s3')

response = Client.list_buckets()
buckets = [bucket['Name'] for bucket in response['Buckets']]
print("Bucket List: %s" % buckets)

– upload a file

import boto3

Client = boto3.client('s3')

Client.upload_file(filename, bucket_name, filename)

– get bucket policy

import boto3

Client = boto3.client('s3')

response = Client.get_bucket_policy(
    Bucket='string')

– delete bucket

import boto3

Client = boto3.client('s3')

response = Client.delete_bucket(
    Bucket='string'
)

Your first app

Examples above are really simple, let’s take a look at a bit more advanced app. We want to be able to:

  • Create a bucket.
  • Copy an object from one bucket to the newly created one.
  • Update DynamoDB with proper information.

Create a bucket

Again? Yes, but now we will do it in a slightly different way. As you know, Python supports object-oriented programming so let’s try to create our S3 client object.

import boto3
from botocore.exceptions import ClientError

class bucketS3(object):
    def __init__(self, service, event, context):
        self.service = service
          self.event = event
          self.context = context
          try:
            self.conn_client = boto3.client(self.service)
            self.resource = boto3.resource(self.service)
          except ClientError as e:
             print(e.response['Error']['Code']+':',
                   e.response['Error']['Message'])

s3=bucketS3("s3", "event", "context")

Now our new s3 object is ready for next steps. As you can see it contains two important fields:

  • conn_client – our boto3 s3 client, the same one like in previous examples,
  • resources – that’s our high level session method.

Great class, isn’t it? But we need to be honest, now it does nothing so let’s write a method for creating a bucket, which will essentially be a method callable on s3 object.

import boto3
from botocore.exceptions import ClientError

class bucketS3(object):
    def __init__(self, service, event, context):
        self.service = service
          self.event = event
          self.context = context
          try:
            self.conn_client = boto3.client(self.service)
            self.resource = boto3.resource(self.service)
          except ClientError as e:
              print(e.response['Error']['Code']+':',
                    e.response['Error']['Message'])

    def create_bucket(self, bucket_name):
        try:
            self.conn_client.create_bucket(Bucket=bucket_name)
        except botocore.exceptions.ClientError as e:
          raise Exception('Your exception statement ' + e)

s3=bucketS3("s3", "event", "context")
s3.create_bucket("bucket_name")

The result should be the same like in other examples, we create our new s3 bucket.Copy an object from one bucket to the newly created one

Copying object is pretty much as simple as creating new bucket operation but there is one important difference. We work on objects inside bucket so we need to use self.resource instead of self.conn_client.

import boto3
from botocore.exceptions import ClientError

class bucketS3(object):
    def __init__(self, service, event, context):
        self.service = service
          self.event = event
          self.context = context
          try:
            self.conn_client = boto3.client(self.service)
            self.resource = boto3.resource(self.service)
          except ClientError as e:
            print(e.response['Error']['Code']+':',
                  e.response['Error']['Message'])

    def create_bucket(self, bucket_name):
        try:
            self.conn_client.create_bucket(Bucket=bucket_name)
        except botocore.exceptions.ClientError as e:
          raise Exception('Your exception statement ' + e)

    def copy_object(self, bucket_source, object_key, bucket_dest,   new_object_key):
        try:  
            copy_source = {
              'Bucket': bucket_source
              'Key': object_key
              }
            bucket = s3.resources.Bucket(bucket_dest)
            bucket.copy(copy_source, new_object_key)
        except botocore.exceptions.ClientError as e:
          raise Exception('Your exception statement ' + e)

s3=bucketS3("s3", "event", "context")
s3.create_bucket("bucket_name")
s3.copy_object("bucket_source", "key", "bucket_name")

Update dynamoDB with proper information

Preparation of our new bucket is done, we can add new services to our project. Rules for creating DynamoDB client are very similar to s3 client ones. We started with objective approach and now we will do the same:

import boto3
import datetime

class dynamoDB(object):

    def __init__(self, tablename):
        self.tablename = tablename
        try:
            self.resource = boto3.resource('dynamodb')
            self.client = boto3.client('dynamodb')
            self.table = self.resource.Table(self.tablename)
        except ClientError as err:
            print("----ClientError: " + str( err ))
    
    def update_dynamo_db(self, bucket_name):
        current=datetime.datetime.now()
        try:
            self.table.update_item(
                   Key={'bucket_name': 'bucket_name'},
                   UpdateExpression="SET #dt = :var1",
                   ExpressionAttributeValues={
                       ":var1": str(current):[19]
                   },
                   ExpressionAttributeNames={
                       "#dt": "date"
                   }
                   )
      except ClientError as err:
          print("----ClientError: " + str( err ))

dynamo=dynamoDB('tablename')
dynamo.update_dynamod_db('bucket_name')

As you can see, updating dynamoDB seemed to be a little bit harder to code than creating S3 bucket. The reality is that it is not as hard. We need to specify Key which then specifies a record which we want to update. In our example the record is filed with our newly created bucket name:

  • UpdateExpression – Definition of operation which we want to do. Set, because we want to  update expression.
  • ExpressionAttributeValues
  • Definition of values, we want to put inside the dynamoDB field. In our example we want to put variable :var1 holding a date of modification.

ExpressionAttributeNames – Definition of fields we want to update. In our example we want to update date field.

Saga pattern

Let’s take a step backward for a minute and take one last look at our example of creating 100 buckets. Sometimes you want to have all of your transactions completed successfully. When they are not, you should rollback the changes. That’s what saga pattern stands for.

What we want to do with our particular code is implement an exception in such a way, that once it is caught, all the bucket creation processes are rolled back. That basically means deleting all the buckets that were created until the exception was raised. Let’s get into it:

import boto3
import botocore.exceptions

boto3.setup_default_session(profile name=profile_name)
Client = boto3.client('s3')
for i in range(100):
    bucket_name="bucket"+str(i+1)
    try:
        Client.create_bucket(Bucket=bucket_name)
    except botocore.exceptions.ClientError as e:
        for j in range(i-1, -1, -1):
      bucket_name="bucket"+str(j+1)
    Client.delete_bucket(Bucket=bucket_name)
  raise Exception('Your exception statement ' + e)

It may not look that pretty, but it’s the simplest way we can reverse the changes made in our case, which is creating 100 buckets in a for a loop. We need to adapt the process of reverting to the process of making the changes. This pattern is also used for lambda and step function when it might seem a bit more intuitive.

Last few words

Boto3 is such a massive library that it is impossible to describe all of its features inside a single article. Despite that fact, these few examples of using a boto3 library should be enough for you to write your first script. If you want more, you could look through AWS documentation or demonstrate a little patience and wait for the next part…